Update URLs etc for new infrastructure at GitHub/Googlegroups.

This commit is contained in:
Philip Hazel 2021-08-26 16:10:11 +01:00
parent 876ba431b0
commit 23c16e6ced
3 changed files with 66 additions and 80 deletions

33
README
View File

@ -6,25 +6,18 @@ API. Since its initial release in 2015, there has been further development of
the code and it now differs from PCRE1 in more than just the API. There are new the code and it now differs from PCRE1 in more than just the API. There are new
features, and the internals have been improved. The original PCRE1 library is features, and the internals have been improved. The original PCRE1 library is
now obsolete and should not be used in new projects. The latest release of now obsolete and should not be used in new projects. The latest release of
PCRE2 is available in three alternative formats from: PCRE2 is available in .tar.gz or .zip form from its GitHub repository:
============================================================================= https://github.com/PhilipHazel/pcre2/releases
This information is still current (21 August 2021), but the PCRE2 project is in
the process of moving to different infrastructure, so in the near future there
will be new URLs here. The mailing list will also change.
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz There is a mailing list for discussion about the development of PCRE2 at
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2 pcre2-dev@googlegroups.com. You can subscribe by sending an email to
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip pcre2-dev+subscribe@googlegroups.com.
There is a mailing list for discussion about the development of PCRE at You can access the archives and also subscribe or manage your subscription
pcre-dev@exim.org. You can access the archives and subscribe or manage your here:
subscription here:
https://lists.exim.org/mailman/listinfo/pcre-dev
=============================================================================
https://groups.google.com/pcre2-dev
Please read the NEWS file if you are upgrading from a previous release. The Please read the NEWS file if you are upgrading from a previous release. The
contents of this README file are: contents of this README file are:
@ -387,7 +380,7 @@ library. They are also documented in the pcre2build man page.
defined and has a value greater than or equal to 199901L (indicating C99). defined and has a value greater than or equal to 199901L (indicating C99).
However, there is at least one environment that claims to be C99 but does not However, there is at least one environment that claims to be C99 but does not
support these modifiers. If --disable-percent-zt is specified, no use is made support these modifiers. If --disable-percent-zt is specified, no use is made
of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for of the z or t modifiers. Instead of %td or %zu, %lu is used, with a cast for
size_t values. size_t values.
. There is a special option called --enable-fuzz-support for use by people who . There is a special option called --enable-fuzz-support for use by people who
@ -578,9 +571,9 @@ at build time" for more details.
Making new tarballs Making new tarballs
------------------- -------------------
The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
zip formats. The command "make distcheck" does the same, but then does a trial The command "make distcheck" does the same, but then does a trial build of the
build of the new distribution to ensure that it works. new distribution to ensure that it works.
If you have modified any of the man page sources in the doc directory, you If you have modified any of the man page sources in the doc directory, you
should first run the PrepareRelease script before making a distribution. This should first run the PrepareRelease script before making a distribution. This
@ -912,4 +905,4 @@ The distribution should contain the files listed below.
Philip Hazel Philip Hazel
Email local part: Philip.Hazel Email local part: Philip.Hazel
Email domain: gmail.com Email domain: gmail.com
Last updated: 28 April 2021 Last updated: 25 August 2021

View File

@ -1,4 +1,4 @@
.TH PCRE2 3 "28 April 2021" "PCRE2 10.37" .TH PCRE2 3 "25 August 2021" "PCRE2 10.38"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH INTRODUCTION .SH INTRODUCTION
@ -11,7 +11,8 @@ nearly two decades, the limitations of the original API were making development
increasingly difficult. The new API is more extensible, and it was simplified increasingly difficult. The new API is more extensible, and it was simplified
by abolishing the separate "study" optimizing function; in PCRE2, patterns are by abolishing the separate "study" optimizing function; in PCRE2, patterns are
automatically optimized where possible. Since forking from PCRE1, the code has automatically optimized where possible. Since forking from PCRE1, the code has
been extensively refactored and new features introduced. been extensively refactored and new features introduced. The old library is now
obsolete and is no longer maintained.
.P .P
As well as Perl-style regular expression patterns, some features that appeared As well as Perl-style regular expression patterns, some features that appeared
in Python and the original PCRE before they appeared in Perl are available in Python and the original PCRE before they appeared in Perl are available
@ -190,18 +191,18 @@ function, listing its arguments and results.
.sp .sp
.nf .nf
Philip Hazel Philip Hazel
University Computing Service Retired from University Computing Service
Cambridge, England. Cambridge, England.
.fi .fi
.P .P
Putting an actual email address here is a spam magnet. If you want to email me, Putting an actual email address here is a spam magnet. If you want to email me,
use my two initials, followed by the two digits 10, at the domain cam.ac.uk. use my two names separated by a dot at google.com.
. .
. .
.SH REVISION .SH REVISION
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 April 2021 Last updated: 25 August 2021
Copyright (c) 1997-2021 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -54,8 +54,8 @@ Unicode.tables The files in this directory were downloaded from the Unicode
ucptest.c A short C program for testing the Unicode property macros ucptest.c A short C program for testing the Unicode property macros
that do lookups in the pcre2_ucd.c data, mainly useful after that do lookups in the pcre2_ucd.c data, mainly useful after
rebuilding the Unicode property table. Compile and run this in rebuilding the Unicode property table. Compile and run this in
the "maint" directory (see comments at its head). This program the "maint" directory (see comments at its head). This program
can also be used to find characters with specific properties. can also be used to find characters with specific properties.
ucptestdata A directory containing four files, testinput{1,2} and ucptestdata A directory containing four files, testinput{1,2} and
testoutput{1,2}, for use in conjunction with the ucptest testoutput{1,2}, for use in conjunction with the ucptest
@ -129,7 +129,8 @@ distribution for a new release.
different configurations, and it also runs some of them with valgrind, all of different configurations, and it also runs some of them with valgrind, all of
which can take quite some time. which can take quite some time.
. Run tests in both 32-bit and 64-bit environments if possible. . Run tests in both 32-bit and 64-bit environments if possible. I can no longer
run 32-bit tests.
. Run tests with two or more different compilers (e.g. clang and gcc), and . Run tests with two or more different compilers (e.g. clang and gcc), and
make use of -fsanitize=address and friends where possible. For gcc, make use of -fsanitize=address and friends where possible. For gcc,
@ -140,7 +141,8 @@ distribution for a new release.
be added when compiling with JIT. Another useful clang option is be added when compiling with JIT. Another useful clang option is
-fsanitize=signed-integer-overflow -fsanitize=signed-integer-overflow
. Do a test build using CMake. . Do a test build using CMake. Remove src/config.h first, lest it override the
version that CMake creates. Do NOT use parallel make.
. Run perltest.sh on the test data for tests 1 and 4. The output should match . Run perltest.sh on the test data for tests 1 and 4. The output should match
the PCRE2 test output, apart from the version identification at the start of the PCRE2 test output, apart from the version identification at the start of
@ -160,8 +162,7 @@ distribution for a new release.
compiler as a change from gcc. Adding -xarch=v9 to the cc options does a compiler as a change from gcc. Adding -xarch=v9 to the cc options does a
64-bit test, but it also needs -S 64 for pcre2test to increase the stack size 64-bit test, but it also needs -S 64 for pcre2test to increase the stack size
for test 2. Since I retired I can no longer do much of this, but instead I for test 2. Since I retired I can no longer do much of this, but instead I
rely on putting out release candidates for folks on the pcre-dev list to rely on putting out release candidates for testing by the community.
test.
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing . The buildbots at http://buildfarm.opencsw.org/ do some automated testing
of PCRE2 and should be checked before putting out a release. of PCRE2 and should be checked before putting out a release.
@ -214,27 +215,19 @@ changes in a shared library:
Making a PCRE2 release Making a PCRE2 release
====================== ======================
Run PrepareRelease and commit the files that it changes (by removing trailing Run PrepareRelease and commit the files that it changes. The first thing this
spaces). The first thing this script does is to run CheckMan on the man pages; script does is to run CheckMan on the man pages; if it finds any markup errors,
if it finds any markup errors, it reports them and then aborts. it reports them and then aborts. Otherwise it removes trailing spaces from
sources and refreshes the HTML documentation. Update the GitHub repository with
"git push".
Once PrepareRelease has run clean, run "make distcheck" to create the tarballs Once PrepareRelease has run clean, run "make distcheck" to create the tarball
and the zipball. Double-check with "svn status", then create an SVN tagged and the zipball. I then sign these files. Double-check with "git status" that
copy: the repository is fully up-to-date, then create a new tag on GitHub. Upload the
tarball, zipball, and the signatures as "assets" of the GitHub release.
==============================================================================
This information is out-of-date: the PCRE2 project is moving to different
infrastructure (as of 21 August 2021). This file will be updated in due course.
svn copy svn://vcs.exim.org/pcre2/code/trunk \
svn://vcs.exim.org/pcre2/code/tags/pcre2-10.xx
When the new release is out, don't forget to tell webmaster@pcre.org and the When the new release is out, don't forget to tell webmaster@pcre.org and the
mailing list. Also, update the list of version numbers in Bugzilla mailing list.
(administration > products > PCRE > Edit versions).
==============================================================================
Future ideas (wish list) Future ideas (wish list)
@ -242,7 +235,8 @@ Future ideas (wish list)
This section records a list of ideas so that they do not get forgotten. They This section records a list of ideas so that they do not get forgotten. They
vary enormously in their usefulness and potential for implementation. Some are vary enormously in their usefulness and potential for implementation. Some are
very sensible; some are rather wacky. Some have been on this list for years. very sensible; some are rather wacky. Some have been on this list for many
years.
. Optimization . Optimization
@ -283,9 +277,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
. An option to convert results into character offsets and character lengths. . An option to convert results into character offsets and character lengths.
. An option for pcre2grep to scan only the start of a file. I am not keen -
this is the job of "head".
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once, . A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
preceded by a blank line, instead of adding it to every matched line, and (b) preceded by a blank line, instead of adding it to every matched line, and (b)
support --outputfile=name. support --outputfile=name.
@ -324,10 +315,9 @@ very sensible; some are rather wacky. Some have been on this list for years.
. PCRE2 cannot at present distinguish between subpatterns with different names, . PCRE2 cannot at present distinguish between subpatterns with different names,
but the same number (created by the use of ?|). In order to do so, a way of but the same number (created by the use of ?|). In order to do so, a way of
remembering *which* subpattern numbered n matched is needed. Bugzilla #760. remembering *which* subpattern numbered n matched is needed. (*MARK) can
(*MARK) can perhaps be used as a way round this problem. However, note that perhaps be used as a way round this problem. However, note that Perl does not
Perl does not distinguish: like PCRE2, a name is just an alias for a number distinguish: like PCRE2, a name is just an alias for a number in Perl.
in Perl.
. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include . Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
"something" and the the #ifdef appears only in one place, in "something". "something" and the the #ifdef appears only in one place, in "something".
@ -355,8 +345,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
. (?[...]) extended classes: big project. . (?[...]) extended classes: big project.
. Bugzilla #1694 requests backwards searching.
. Allow a callout to specify a number of characters to skip. This can be done . Allow a callout to specify a number of characters to skip. This can be done
compatibly via an extra callout field. compatibly via an extra callout field.
@ -368,9 +356,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
. A limit on substitutions: a user suggested somehow finding a way of making . A limit on substitutions: a user suggested somehow finding a way of making
match_limit apply to the whole operation instead of each match separately. match_limit apply to the whole operation instead of each match separately.
. Redesign handling of class/nclass/xclass because the compile code logic is
currently very contorted and obscure.
. Some #defines could be replaced with enums to improve robustness. . Some #defines could be replaced with enums to improve robustness.
. There was a request for an option for pcre2_match() to return the longest . There was a request for an option for pcre2_match() to return the longest
@ -387,7 +372,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
The test function could make use of get_substrings() to cover more code. The test function could make use of get_substrings() to cover more code.
. A neater way of handling recursion file names in pcre2grep, e.g. a single . A neater way of handling recursion file names in pcre2grep, e.g. a single
buffer that can grow. buffer that can grow. See also GitHub issue #2 (recursion looping via
symlinks).
. A user suggested that before/after parameters in pcre2grep could have . A user suggested that before/after parameters in pcre2grep could have
negative values, to list lines near to the matched line, but not necessarily negative values, to list lines near to the matched line, but not necessarily
@ -402,14 +388,7 @@ very sensible; some are rather wacky. Some have been on this list for years.
. Breaking loops that match an empty string: perhaps find a way of continuing . Breaking loops that match an empty string: perhaps find a way of continuing
if *something* has changed, but this might mean remembering additional data. if *something* has changed, but this might mean remembering additional data.
"Something" could be a capture value, but then a list of previous values "Something" could be a capture value, but then a list of previous values
would be needed to avoid a cycle of changes. Bugzilla #2182. would be needed to avoid a cycle of changes.
. The use of \K in assertions is problematic. There was some talk of Perl
banning this, but it hasn't happened. Some problems could be avoided by
not allowing it to set a value before the match start; others by not allowing
it to set a value after the match end. This could be controlled by an option
such as PCRE2_SANE_BACKSLASH_K, for compatibility (or possibly make the sane
behaviour the default and implement PCRE2_INSANE_BACKSLASH_K).
. If a function could be written to find 3-character (or other length) fixed . If a function could be written to find 3-character (or other length) fixed
strings, at least one of which must be present for a match, efficient strings, at least one of which must be present for a match, efficient
@ -417,6 +396,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
. If pcre2grep had --first-line (match only in the first line) it could be . If pcre2grep had --first-line (match only in the first line) it could be
efficiently used to find files "starting with xxx". What about --last-line? efficiently used to find files "starting with xxx". What about --last-line?
There was also the suggestion of an option for pcre2grep to scan only the
start of a file. I am not keen - this is the job of "head".
. A user requested a means of determining whether a failed match was failed by . A user requested a means of determining whether a failed match was failed by
the start-of-match optimizations, or by running the match engine. Easy enough the start-of-match optimizations, or by running the match engine. Easy enough
@ -426,12 +407,14 @@ very sensible; some are rather wacky. Some have been on this list for years.
interpreters? JIT already does some of this, but it may not be worth it for interpreters? JIT already does some of this, but it may not be worth it for
the interpreters. the interpreters.
. There was a request for a way of re-defining \w (and therefore \W, \b, and . Redesign handling of class/nclass/xclass because the compile code logic is
\B). An in-pattern sequence such as (?w=[...]) was suggested. Easiest way currently very contorted and obscure. Also there was a request for a way of
would be simply to inline the class, with lookarounds for \b and \B. Ideally re-defining \w (and therefore \W, \b, and \B). An in-pattern sequence such as
the setting should last till the end of the group, which means remembering (?w=[...]) was suggested. Easiest way would be simply to inline the class,
all previous settings; maybe a fixed amount of stack would do - how deep with lookarounds for \b and \B. Ideally the setting should last till the end
would anyone want to nest these things? Bugzilla #2301. of the group, which means remembering all previous settings; maybe a fixed
amount of stack would do - how deep would anyone want to nest these things?
See GitHub issue #13 for a compendium of character class issues.
. Recognize the short script names. They are already listed in maint/ . Recognize the short script names. They are already listed in maint/
Multistage2.py because they are needed for scanning the script extensions Multistage2.py because they are needed for scanning the script extensions
@ -444,7 +427,16 @@ very sensible; some are rather wacky. Some have been on this list for years.
facility for a length limit in pcre2_config(), and what would be the facility for a length limit in pcre2_config(), and what would be the
encoding? encoding?
. Quantified groups with a fixed count currently operate by replicating the
group in the compiled bytecode. This may not really matter in these days of
gigabyte memory, but perhaps another implementation might be considered.
Needs coordination between the interpreters and JIT.
. There are regular requests for variable-length lookbehinds.
. See also any suggestions in the GitHub issues.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: Philip.Hazel
Email domain: cam.ac.uk Email domain: gmail.com
Last updated: 01 April 2020 Last updated: 26 August 2021