Update URLs etc for new infrastructure at GitHub/Googlegroups.
This commit is contained in:
parent
876ba431b0
commit
23c16e6ced
33
README
33
README
|
@ -6,25 +6,18 @@ API. Since its initial release in 2015, there has been further development of
|
||||||
the code and it now differs from PCRE1 in more than just the API. There are new
|
the code and it now differs from PCRE1 in more than just the API. There are new
|
||||||
features, and the internals have been improved. The original PCRE1 library is
|
features, and the internals have been improved. The original PCRE1 library is
|
||||||
now obsolete and should not be used in new projects. The latest release of
|
now obsolete and should not be used in new projects. The latest release of
|
||||||
PCRE2 is available in three alternative formats from:
|
PCRE2 is available in .tar.gz or .zip form from its GitHub repository:
|
||||||
|
|
||||||
=============================================================================
|
https://github.com/PhilipHazel/pcre2/releases
|
||||||
This information is still current (21 August 2021), but the PCRE2 project is in
|
|
||||||
the process of moving to different infrastructure, so in the near future there
|
|
||||||
will be new URLs here. The mailing list will also change.
|
|
||||||
|
|
||||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz
|
There is a mailing list for discussion about the development of PCRE2 at
|
||||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2
|
pcre2-dev@googlegroups.com. You can subscribe by sending an email to
|
||||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip
|
pcre2-dev+subscribe@googlegroups.com.
|
||||||
|
|
||||||
There is a mailing list for discussion about the development of PCRE at
|
You can access the archives and also subscribe or manage your subscription
|
||||||
pcre-dev@exim.org. You can access the archives and subscribe or manage your
|
here:
|
||||||
subscription here:
|
|
||||||
|
|
||||||
https://lists.exim.org/mailman/listinfo/pcre-dev
|
|
||||||
|
|
||||||
=============================================================================
|
|
||||||
|
|
||||||
|
https://groups.google.com/pcre2-dev
|
||||||
|
|
||||||
Please read the NEWS file if you are upgrading from a previous release. The
|
Please read the NEWS file if you are upgrading from a previous release. The
|
||||||
contents of this README file are:
|
contents of this README file are:
|
||||||
|
@ -387,7 +380,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
defined and has a value greater than or equal to 199901L (indicating C99).
|
defined and has a value greater than or equal to 199901L (indicating C99).
|
||||||
However, there is at least one environment that claims to be C99 but does not
|
However, there is at least one environment that claims to be C99 but does not
|
||||||
support these modifiers. If --disable-percent-zt is specified, no use is made
|
support these modifiers. If --disable-percent-zt is specified, no use is made
|
||||||
of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for
|
of the z or t modifiers. Instead of %td or %zu, %lu is used, with a cast for
|
||||||
size_t values.
|
size_t values.
|
||||||
|
|
||||||
. There is a special option called --enable-fuzz-support for use by people who
|
. There is a special option called --enable-fuzz-support for use by people who
|
||||||
|
@ -578,9 +571,9 @@ at build time" for more details.
|
||||||
Making new tarballs
|
Making new tarballs
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
|
The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
|
||||||
zip formats. The command "make distcheck" does the same, but then does a trial
|
The command "make distcheck" does the same, but then does a trial build of the
|
||||||
build of the new distribution to ensure that it works.
|
new distribution to ensure that it works.
|
||||||
|
|
||||||
If you have modified any of the man page sources in the doc directory, you
|
If you have modified any of the man page sources in the doc directory, you
|
||||||
should first run the PrepareRelease script before making a distribution. This
|
should first run the PrepareRelease script before making a distribution. This
|
||||||
|
@ -912,4 +905,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: Philip.Hazel
|
Email local part: Philip.Hazel
|
||||||
Email domain: gmail.com
|
Email domain: gmail.com
|
||||||
Last updated: 28 April 2021
|
Last updated: 25 August 2021
|
||||||
|
|
11
doc/pcre2.3
11
doc/pcre2.3
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2 3 "28 April 2021" "PCRE2 10.37"
|
.TH PCRE2 3 "25 August 2021" "PCRE2 10.38"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH INTRODUCTION
|
.SH INTRODUCTION
|
||||||
|
@ -11,7 +11,8 @@ nearly two decades, the limitations of the original API were making development
|
||||||
increasingly difficult. The new API is more extensible, and it was simplified
|
increasingly difficult. The new API is more extensible, and it was simplified
|
||||||
by abolishing the separate "study" optimizing function; in PCRE2, patterns are
|
by abolishing the separate "study" optimizing function; in PCRE2, patterns are
|
||||||
automatically optimized where possible. Since forking from PCRE1, the code has
|
automatically optimized where possible. Since forking from PCRE1, the code has
|
||||||
been extensively refactored and new features introduced.
|
been extensively refactored and new features introduced. The old library is now
|
||||||
|
obsolete and is no longer maintained.
|
||||||
.P
|
.P
|
||||||
As well as Perl-style regular expression patterns, some features that appeared
|
As well as Perl-style regular expression patterns, some features that appeared
|
||||||
in Python and the original PCRE before they appeared in Perl are available
|
in Python and the original PCRE before they appeared in Perl are available
|
||||||
|
@ -190,18 +191,18 @@ function, listing its arguments and results.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
.fi
|
.fi
|
||||||
.P
|
.P
|
||||||
Putting an actual email address here is a spam magnet. If you want to email me,
|
Putting an actual email address here is a spam magnet. If you want to email me,
|
||||||
use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
|
use my two names separated by a dot at google.com.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH REVISION
|
.SH REVISION
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 April 2021
|
Last updated: 25 August 2021
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2021 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
102
maint/README
102
maint/README
|
@ -54,8 +54,8 @@ Unicode.tables The files in this directory were downloaded from the Unicode
|
||||||
ucptest.c A short C program for testing the Unicode property macros
|
ucptest.c A short C program for testing the Unicode property macros
|
||||||
that do lookups in the pcre2_ucd.c data, mainly useful after
|
that do lookups in the pcre2_ucd.c data, mainly useful after
|
||||||
rebuilding the Unicode property table. Compile and run this in
|
rebuilding the Unicode property table. Compile and run this in
|
||||||
the "maint" directory (see comments at its head). This program
|
the "maint" directory (see comments at its head). This program
|
||||||
can also be used to find characters with specific properties.
|
can also be used to find characters with specific properties.
|
||||||
|
|
||||||
ucptestdata A directory containing four files, testinput{1,2} and
|
ucptestdata A directory containing four files, testinput{1,2} and
|
||||||
testoutput{1,2}, for use in conjunction with the ucptest
|
testoutput{1,2}, for use in conjunction with the ucptest
|
||||||
|
@ -129,7 +129,8 @@ distribution for a new release.
|
||||||
different configurations, and it also runs some of them with valgrind, all of
|
different configurations, and it also runs some of them with valgrind, all of
|
||||||
which can take quite some time.
|
which can take quite some time.
|
||||||
|
|
||||||
. Run tests in both 32-bit and 64-bit environments if possible.
|
. Run tests in both 32-bit and 64-bit environments if possible. I can no longer
|
||||||
|
run 32-bit tests.
|
||||||
|
|
||||||
. Run tests with two or more different compilers (e.g. clang and gcc), and
|
. Run tests with two or more different compilers (e.g. clang and gcc), and
|
||||||
make use of -fsanitize=address and friends where possible. For gcc,
|
make use of -fsanitize=address and friends where possible. For gcc,
|
||||||
|
@ -140,7 +141,8 @@ distribution for a new release.
|
||||||
be added when compiling with JIT. Another useful clang option is
|
be added when compiling with JIT. Another useful clang option is
|
||||||
-fsanitize=signed-integer-overflow
|
-fsanitize=signed-integer-overflow
|
||||||
|
|
||||||
. Do a test build using CMake.
|
. Do a test build using CMake. Remove src/config.h first, lest it override the
|
||||||
|
version that CMake creates. Do NOT use parallel make.
|
||||||
|
|
||||||
. Run perltest.sh on the test data for tests 1 and 4. The output should match
|
. Run perltest.sh on the test data for tests 1 and 4. The output should match
|
||||||
the PCRE2 test output, apart from the version identification at the start of
|
the PCRE2 test output, apart from the version identification at the start of
|
||||||
|
@ -160,8 +162,7 @@ distribution for a new release.
|
||||||
compiler as a change from gcc. Adding -xarch=v9 to the cc options does a
|
compiler as a change from gcc. Adding -xarch=v9 to the cc options does a
|
||||||
64-bit test, but it also needs -S 64 for pcre2test to increase the stack size
|
64-bit test, but it also needs -S 64 for pcre2test to increase the stack size
|
||||||
for test 2. Since I retired I can no longer do much of this, but instead I
|
for test 2. Since I retired I can no longer do much of this, but instead I
|
||||||
rely on putting out release candidates for folks on the pcre-dev list to
|
rely on putting out release candidates for testing by the community.
|
||||||
test.
|
|
||||||
|
|
||||||
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing
|
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing
|
||||||
of PCRE2 and should be checked before putting out a release.
|
of PCRE2 and should be checked before putting out a release.
|
||||||
|
@ -214,27 +215,19 @@ changes in a shared library:
|
||||||
Making a PCRE2 release
|
Making a PCRE2 release
|
||||||
======================
|
======================
|
||||||
|
|
||||||
Run PrepareRelease and commit the files that it changes (by removing trailing
|
Run PrepareRelease and commit the files that it changes. The first thing this
|
||||||
spaces). The first thing this script does is to run CheckMan on the man pages;
|
script does is to run CheckMan on the man pages; if it finds any markup errors,
|
||||||
if it finds any markup errors, it reports them and then aborts.
|
it reports them and then aborts. Otherwise it removes trailing spaces from
|
||||||
|
sources and refreshes the HTML documentation. Update the GitHub repository with
|
||||||
|
"git push".
|
||||||
|
|
||||||
Once PrepareRelease has run clean, run "make distcheck" to create the tarballs
|
Once PrepareRelease has run clean, run "make distcheck" to create the tarball
|
||||||
and the zipball. Double-check with "svn status", then create an SVN tagged
|
and the zipball. I then sign these files. Double-check with "git status" that
|
||||||
copy:
|
the repository is fully up-to-date, then create a new tag on GitHub. Upload the
|
||||||
|
tarball, zipball, and the signatures as "assets" of the GitHub release.
|
||||||
==============================================================================
|
|
||||||
This information is out-of-date: the PCRE2 project is moving to different
|
|
||||||
infrastructure (as of 21 August 2021). This file will be updated in due course.
|
|
||||||
|
|
||||||
svn copy svn://vcs.exim.org/pcre2/code/trunk \
|
|
||||||
svn://vcs.exim.org/pcre2/code/tags/pcre2-10.xx
|
|
||||||
|
|
||||||
When the new release is out, don't forget to tell webmaster@pcre.org and the
|
When the new release is out, don't forget to tell webmaster@pcre.org and the
|
||||||
mailing list. Also, update the list of version numbers in Bugzilla
|
mailing list.
|
||||||
(administration > products > PCRE > Edit versions).
|
|
||||||
|
|
||||||
==============================================================================
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Future ideas (wish list)
|
Future ideas (wish list)
|
||||||
|
@ -242,7 +235,8 @@ Future ideas (wish list)
|
||||||
|
|
||||||
This section records a list of ideas so that they do not get forgotten. They
|
This section records a list of ideas so that they do not get forgotten. They
|
||||||
vary enormously in their usefulness and potential for implementation. Some are
|
vary enormously in their usefulness and potential for implementation. Some are
|
||||||
very sensible; some are rather wacky. Some have been on this list for years.
|
very sensible; some are rather wacky. Some have been on this list for many
|
||||||
|
years.
|
||||||
|
|
||||||
. Optimization
|
. Optimization
|
||||||
|
|
||||||
|
@ -283,9 +277,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
|
|
||||||
. An option to convert results into character offsets and character lengths.
|
. An option to convert results into character offsets and character lengths.
|
||||||
|
|
||||||
. An option for pcre2grep to scan only the start of a file. I am not keen -
|
|
||||||
this is the job of "head".
|
|
||||||
|
|
||||||
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
|
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
|
||||||
preceded by a blank line, instead of adding it to every matched line, and (b)
|
preceded by a blank line, instead of adding it to every matched line, and (b)
|
||||||
support --outputfile=name.
|
support --outputfile=name.
|
||||||
|
@ -324,10 +315,9 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
|
|
||||||
. PCRE2 cannot at present distinguish between subpatterns with different names,
|
. PCRE2 cannot at present distinguish between subpatterns with different names,
|
||||||
but the same number (created by the use of ?|). In order to do so, a way of
|
but the same number (created by the use of ?|). In order to do so, a way of
|
||||||
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
|
remembering *which* subpattern numbered n matched is needed. (*MARK) can
|
||||||
(*MARK) can perhaps be used as a way round this problem. However, note that
|
perhaps be used as a way round this problem. However, note that Perl does not
|
||||||
Perl does not distinguish: like PCRE2, a name is just an alias for a number
|
distinguish: like PCRE2, a name is just an alias for a number in Perl.
|
||||||
in Perl.
|
|
||||||
|
|
||||||
. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
|
. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
|
||||||
"something" and the the #ifdef appears only in one place, in "something".
|
"something" and the the #ifdef appears only in one place, in "something".
|
||||||
|
@ -355,8 +345,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
|
|
||||||
. (?[...]) extended classes: big project.
|
. (?[...]) extended classes: big project.
|
||||||
|
|
||||||
. Bugzilla #1694 requests backwards searching.
|
|
||||||
|
|
||||||
. Allow a callout to specify a number of characters to skip. This can be done
|
. Allow a callout to specify a number of characters to skip. This can be done
|
||||||
compatibly via an extra callout field.
|
compatibly via an extra callout field.
|
||||||
|
|
||||||
|
@ -368,9 +356,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
. A limit on substitutions: a user suggested somehow finding a way of making
|
. A limit on substitutions: a user suggested somehow finding a way of making
|
||||||
match_limit apply to the whole operation instead of each match separately.
|
match_limit apply to the whole operation instead of each match separately.
|
||||||
|
|
||||||
. Redesign handling of class/nclass/xclass because the compile code logic is
|
|
||||||
currently very contorted and obscure.
|
|
||||||
|
|
||||||
. Some #defines could be replaced with enums to improve robustness.
|
. Some #defines could be replaced with enums to improve robustness.
|
||||||
|
|
||||||
. There was a request for an option for pcre2_match() to return the longest
|
. There was a request for an option for pcre2_match() to return the longest
|
||||||
|
@ -387,7 +372,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
The test function could make use of get_substrings() to cover more code.
|
The test function could make use of get_substrings() to cover more code.
|
||||||
|
|
||||||
. A neater way of handling recursion file names in pcre2grep, e.g. a single
|
. A neater way of handling recursion file names in pcre2grep, e.g. a single
|
||||||
buffer that can grow.
|
buffer that can grow. See also GitHub issue #2 (recursion looping via
|
||||||
|
symlinks).
|
||||||
|
|
||||||
. A user suggested that before/after parameters in pcre2grep could have
|
. A user suggested that before/after parameters in pcre2grep could have
|
||||||
negative values, to list lines near to the matched line, but not necessarily
|
negative values, to list lines near to the matched line, but not necessarily
|
||||||
|
@ -402,14 +388,7 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
. Breaking loops that match an empty string: perhaps find a way of continuing
|
. Breaking loops that match an empty string: perhaps find a way of continuing
|
||||||
if *something* has changed, but this might mean remembering additional data.
|
if *something* has changed, but this might mean remembering additional data.
|
||||||
"Something" could be a capture value, but then a list of previous values
|
"Something" could be a capture value, but then a list of previous values
|
||||||
would be needed to avoid a cycle of changes. Bugzilla #2182.
|
would be needed to avoid a cycle of changes.
|
||||||
|
|
||||||
. The use of \K in assertions is problematic. There was some talk of Perl
|
|
||||||
banning this, but it hasn't happened. Some problems could be avoided by
|
|
||||||
not allowing it to set a value before the match start; others by not allowing
|
|
||||||
it to set a value after the match end. This could be controlled by an option
|
|
||||||
such as PCRE2_SANE_BACKSLASH_K, for compatibility (or possibly make the sane
|
|
||||||
behaviour the default and implement PCRE2_INSANE_BACKSLASH_K).
|
|
||||||
|
|
||||||
. If a function could be written to find 3-character (or other length) fixed
|
. If a function could be written to find 3-character (or other length) fixed
|
||||||
strings, at least one of which must be present for a match, efficient
|
strings, at least one of which must be present for a match, efficient
|
||||||
|
@ -417,6 +396,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
|
|
||||||
. If pcre2grep had --first-line (match only in the first line) it could be
|
. If pcre2grep had --first-line (match only in the first line) it could be
|
||||||
efficiently used to find files "starting with xxx". What about --last-line?
|
efficiently used to find files "starting with xxx". What about --last-line?
|
||||||
|
There was also the suggestion of an option for pcre2grep to scan only the
|
||||||
|
start of a file. I am not keen - this is the job of "head".
|
||||||
|
|
||||||
. A user requested a means of determining whether a failed match was failed by
|
. A user requested a means of determining whether a failed match was failed by
|
||||||
the start-of-match optimizations, or by running the match engine. Easy enough
|
the start-of-match optimizations, or by running the match engine. Easy enough
|
||||||
|
@ -426,12 +407,14 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
interpreters? JIT already does some of this, but it may not be worth it for
|
interpreters? JIT already does some of this, but it may not be worth it for
|
||||||
the interpreters.
|
the interpreters.
|
||||||
|
|
||||||
. There was a request for a way of re-defining \w (and therefore \W, \b, and
|
. Redesign handling of class/nclass/xclass because the compile code logic is
|
||||||
\B). An in-pattern sequence such as (?w=[...]) was suggested. Easiest way
|
currently very contorted and obscure. Also there was a request for a way of
|
||||||
would be simply to inline the class, with lookarounds for \b and \B. Ideally
|
re-defining \w (and therefore \W, \b, and \B). An in-pattern sequence such as
|
||||||
the setting should last till the end of the group, which means remembering
|
(?w=[...]) was suggested. Easiest way would be simply to inline the class,
|
||||||
all previous settings; maybe a fixed amount of stack would do - how deep
|
with lookarounds for \b and \B. Ideally the setting should last till the end
|
||||||
would anyone want to nest these things? Bugzilla #2301.
|
of the group, which means remembering all previous settings; maybe a fixed
|
||||||
|
amount of stack would do - how deep would anyone want to nest these things?
|
||||||
|
See GitHub issue #13 for a compendium of character class issues.
|
||||||
|
|
||||||
. Recognize the short script names. They are already listed in maint/
|
. Recognize the short script names. They are already listed in maint/
|
||||||
Multistage2.py because they are needed for scanning the script extensions
|
Multistage2.py because they are needed for scanning the script extensions
|
||||||
|
@ -444,7 +427,16 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
||||||
facility for a length limit in pcre2_config(), and what would be the
|
facility for a length limit in pcre2_config(), and what would be the
|
||||||
encoding?
|
encoding?
|
||||||
|
|
||||||
|
. Quantified groups with a fixed count currently operate by replicating the
|
||||||
|
group in the compiled bytecode. This may not really matter in these days of
|
||||||
|
gigabyte memory, but perhaps another implementation might be considered.
|
||||||
|
Needs coordination between the interpreters and JIT.
|
||||||
|
|
||||||
|
. There are regular requests for variable-length lookbehinds.
|
||||||
|
|
||||||
|
. See also any suggestions in the GitHub issues.
|
||||||
|
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: Philip.Hazel
|
||||||
Email domain: cam.ac.uk
|
Email domain: gmail.com
|
||||||
Last updated: 01 April 2020
|
Last updated: 26 August 2021
|
||||||
|
|
Loading…
Reference in New Issue