Update URLs etc for new infrastructure at GitHub/Googlegroups.
This commit is contained in:
parent
876ba431b0
commit
23c16e6ced
33
README
33
README
|
@ -6,25 +6,18 @@ API. Since its initial release in 2015, there has been further development of
|
|||
the code and it now differs from PCRE1 in more than just the API. There are new
|
||||
features, and the internals have been improved. The original PCRE1 library is
|
||||
now obsolete and should not be used in new projects. The latest release of
|
||||
PCRE2 is available in three alternative formats from:
|
||||
PCRE2 is available in .tar.gz or .zip form from its GitHub repository:
|
||||
|
||||
=============================================================================
|
||||
This information is still current (21 August 2021), but the PCRE2 project is in
|
||||
the process of moving to different infrastructure, so in the near future there
|
||||
will be new URLs here. The mailing list will also change.
|
||||
https://github.com/PhilipHazel/pcre2/releases
|
||||
|
||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz
|
||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2
|
||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip
|
||||
There is a mailing list for discussion about the development of PCRE2 at
|
||||
pcre2-dev@googlegroups.com. You can subscribe by sending an email to
|
||||
pcre2-dev+subscribe@googlegroups.com.
|
||||
|
||||
There is a mailing list for discussion about the development of PCRE at
|
||||
pcre-dev@exim.org. You can access the archives and subscribe or manage your
|
||||
subscription here:
|
||||
|
||||
https://lists.exim.org/mailman/listinfo/pcre-dev
|
||||
|
||||
=============================================================================
|
||||
You can access the archives and also subscribe or manage your subscription
|
||||
here:
|
||||
|
||||
https://groups.google.com/pcre2-dev
|
||||
|
||||
Please read the NEWS file if you are upgrading from a previous release. The
|
||||
contents of this README file are:
|
||||
|
@ -387,7 +380,7 @@ library. They are also documented in the pcre2build man page.
|
|||
defined and has a value greater than or equal to 199901L (indicating C99).
|
||||
However, there is at least one environment that claims to be C99 but does not
|
||||
support these modifiers. If --disable-percent-zt is specified, no use is made
|
||||
of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for
|
||||
of the z or t modifiers. Instead of %td or %zu, %lu is used, with a cast for
|
||||
size_t values.
|
||||
|
||||
. There is a special option called --enable-fuzz-support for use by people who
|
||||
|
@ -578,9 +571,9 @@ at build time" for more details.
|
|||
Making new tarballs
|
||||
-------------------
|
||||
|
||||
The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
|
||||
zip formats. The command "make distcheck" does the same, but then does a trial
|
||||
build of the new distribution to ensure that it works.
|
||||
The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
|
||||
The command "make distcheck" does the same, but then does a trial build of the
|
||||
new distribution to ensure that it works.
|
||||
|
||||
If you have modified any of the man page sources in the doc directory, you
|
||||
should first run the PrepareRelease script before making a distribution. This
|
||||
|
@ -912,4 +905,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: Philip.Hazel
|
||||
Email domain: gmail.com
|
||||
Last updated: 28 April 2021
|
||||
Last updated: 25 August 2021
|
||||
|
|
11
doc/pcre2.3
11
doc/pcre2.3
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2 3 "28 April 2021" "PCRE2 10.37"
|
||||
.TH PCRE2 3 "25 August 2021" "PCRE2 10.38"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH INTRODUCTION
|
||||
|
@ -11,7 +11,8 @@ nearly two decades, the limitations of the original API were making development
|
|||
increasingly difficult. The new API is more extensible, and it was simplified
|
||||
by abolishing the separate "study" optimizing function; in PCRE2, patterns are
|
||||
automatically optimized where possible. Since forking from PCRE1, the code has
|
||||
been extensively refactored and new features introduced.
|
||||
been extensively refactored and new features introduced. The old library is now
|
||||
obsolete and is no longer maintained.
|
||||
.P
|
||||
As well as Perl-style regular expression patterns, some features that appeared
|
||||
in Python and the original PCRE before they appeared in Perl are available
|
||||
|
@ -190,18 +191,18 @@ function, listing its arguments and results.
|
|||
.sp
|
||||
.nf
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
Cambridge, England.
|
||||
.fi
|
||||
.P
|
||||
Putting an actual email address here is a spam magnet. If you want to email me,
|
||||
use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
|
||||
use my two names separated by a dot at google.com.
|
||||
.
|
||||
.
|
||||
.SH REVISION
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 28 April 2021
|
||||
Last updated: 25 August 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
102
maint/README
102
maint/README
|
@ -54,8 +54,8 @@ Unicode.tables The files in this directory were downloaded from the Unicode
|
|||
ucptest.c A short C program for testing the Unicode property macros
|
||||
that do lookups in the pcre2_ucd.c data, mainly useful after
|
||||
rebuilding the Unicode property table. Compile and run this in
|
||||
the "maint" directory (see comments at its head). This program
|
||||
can also be used to find characters with specific properties.
|
||||
the "maint" directory (see comments at its head). This program
|
||||
can also be used to find characters with specific properties.
|
||||
|
||||
ucptestdata A directory containing four files, testinput{1,2} and
|
||||
testoutput{1,2}, for use in conjunction with the ucptest
|
||||
|
@ -129,7 +129,8 @@ distribution for a new release.
|
|||
different configurations, and it also runs some of them with valgrind, all of
|
||||
which can take quite some time.
|
||||
|
||||
. Run tests in both 32-bit and 64-bit environments if possible.
|
||||
. Run tests in both 32-bit and 64-bit environments if possible. I can no longer
|
||||
run 32-bit tests.
|
||||
|
||||
. Run tests with two or more different compilers (e.g. clang and gcc), and
|
||||
make use of -fsanitize=address and friends where possible. For gcc,
|
||||
|
@ -140,7 +141,8 @@ distribution for a new release.
|
|||
be added when compiling with JIT. Another useful clang option is
|
||||
-fsanitize=signed-integer-overflow
|
||||
|
||||
. Do a test build using CMake.
|
||||
. Do a test build using CMake. Remove src/config.h first, lest it override the
|
||||
version that CMake creates. Do NOT use parallel make.
|
||||
|
||||
. Run perltest.sh on the test data for tests 1 and 4. The output should match
|
||||
the PCRE2 test output, apart from the version identification at the start of
|
||||
|
@ -160,8 +162,7 @@ distribution for a new release.
|
|||
compiler as a change from gcc. Adding -xarch=v9 to the cc options does a
|
||||
64-bit test, but it also needs -S 64 for pcre2test to increase the stack size
|
||||
for test 2. Since I retired I can no longer do much of this, but instead I
|
||||
rely on putting out release candidates for folks on the pcre-dev list to
|
||||
test.
|
||||
rely on putting out release candidates for testing by the community.
|
||||
|
||||
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing
|
||||
of PCRE2 and should be checked before putting out a release.
|
||||
|
@ -214,27 +215,19 @@ changes in a shared library:
|
|||
Making a PCRE2 release
|
||||
======================
|
||||
|
||||
Run PrepareRelease and commit the files that it changes (by removing trailing
|
||||
spaces). The first thing this script does is to run CheckMan on the man pages;
|
||||
if it finds any markup errors, it reports them and then aborts.
|
||||
Run PrepareRelease and commit the files that it changes. The first thing this
|
||||
script does is to run CheckMan on the man pages; if it finds any markup errors,
|
||||
it reports them and then aborts. Otherwise it removes trailing spaces from
|
||||
sources and refreshes the HTML documentation. Update the GitHub repository with
|
||||
"git push".
|
||||
|
||||
Once PrepareRelease has run clean, run "make distcheck" to create the tarballs
|
||||
and the zipball. Double-check with "svn status", then create an SVN tagged
|
||||
copy:
|
||||
|
||||
==============================================================================
|
||||
This information is out-of-date: the PCRE2 project is moving to different
|
||||
infrastructure (as of 21 August 2021). This file will be updated in due course.
|
||||
|
||||
svn copy svn://vcs.exim.org/pcre2/code/trunk \
|
||||
svn://vcs.exim.org/pcre2/code/tags/pcre2-10.xx
|
||||
Once PrepareRelease has run clean, run "make distcheck" to create the tarball
|
||||
and the zipball. I then sign these files. Double-check with "git status" that
|
||||
the repository is fully up-to-date, then create a new tag on GitHub. Upload the
|
||||
tarball, zipball, and the signatures as "assets" of the GitHub release.
|
||||
|
||||
When the new release is out, don't forget to tell webmaster@pcre.org and the
|
||||
mailing list. Also, update the list of version numbers in Bugzilla
|
||||
(administration > products > PCRE > Edit versions).
|
||||
|
||||
==============================================================================
|
||||
|
||||
mailing list.
|
||||
|
||||
|
||||
Future ideas (wish list)
|
||||
|
@ -242,7 +235,8 @@ Future ideas (wish list)
|
|||
|
||||
This section records a list of ideas so that they do not get forgotten. They
|
||||
vary enormously in their usefulness and potential for implementation. Some are
|
||||
very sensible; some are rather wacky. Some have been on this list for years.
|
||||
very sensible; some are rather wacky. Some have been on this list for many
|
||||
years.
|
||||
|
||||
. Optimization
|
||||
|
||||
|
@ -283,9 +277,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
|
||||
. An option to convert results into character offsets and character lengths.
|
||||
|
||||
. An option for pcre2grep to scan only the start of a file. I am not keen -
|
||||
this is the job of "head".
|
||||
|
||||
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
|
||||
preceded by a blank line, instead of adding it to every matched line, and (b)
|
||||
support --outputfile=name.
|
||||
|
@ -324,10 +315,9 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
|
||||
. PCRE2 cannot at present distinguish between subpatterns with different names,
|
||||
but the same number (created by the use of ?|). In order to do so, a way of
|
||||
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
|
||||
(*MARK) can perhaps be used as a way round this problem. However, note that
|
||||
Perl does not distinguish: like PCRE2, a name is just an alias for a number
|
||||
in Perl.
|
||||
remembering *which* subpattern numbered n matched is needed. (*MARK) can
|
||||
perhaps be used as a way round this problem. However, note that Perl does not
|
||||
distinguish: like PCRE2, a name is just an alias for a number in Perl.
|
||||
|
||||
. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
|
||||
"something" and the the #ifdef appears only in one place, in "something".
|
||||
|
@ -355,8 +345,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
|
||||
. (?[...]) extended classes: big project.
|
||||
|
||||
. Bugzilla #1694 requests backwards searching.
|
||||
|
||||
. Allow a callout to specify a number of characters to skip. This can be done
|
||||
compatibly via an extra callout field.
|
||||
|
||||
|
@ -368,9 +356,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
. A limit on substitutions: a user suggested somehow finding a way of making
|
||||
match_limit apply to the whole operation instead of each match separately.
|
||||
|
||||
. Redesign handling of class/nclass/xclass because the compile code logic is
|
||||
currently very contorted and obscure.
|
||||
|
||||
. Some #defines could be replaced with enums to improve robustness.
|
||||
|
||||
. There was a request for an option for pcre2_match() to return the longest
|
||||
|
@ -387,7 +372,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
The test function could make use of get_substrings() to cover more code.
|
||||
|
||||
. A neater way of handling recursion file names in pcre2grep, e.g. a single
|
||||
buffer that can grow.
|
||||
buffer that can grow. See also GitHub issue #2 (recursion looping via
|
||||
symlinks).
|
||||
|
||||
. A user suggested that before/after parameters in pcre2grep could have
|
||||
negative values, to list lines near to the matched line, but not necessarily
|
||||
|
@ -402,14 +388,7 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
. Breaking loops that match an empty string: perhaps find a way of continuing
|
||||
if *something* has changed, but this might mean remembering additional data.
|
||||
"Something" could be a capture value, but then a list of previous values
|
||||
would be needed to avoid a cycle of changes. Bugzilla #2182.
|
||||
|
||||
. The use of \K in assertions is problematic. There was some talk of Perl
|
||||
banning this, but it hasn't happened. Some problems could be avoided by
|
||||
not allowing it to set a value before the match start; others by not allowing
|
||||
it to set a value after the match end. This could be controlled by an option
|
||||
such as PCRE2_SANE_BACKSLASH_K, for compatibility (or possibly make the sane
|
||||
behaviour the default and implement PCRE2_INSANE_BACKSLASH_K).
|
||||
would be needed to avoid a cycle of changes.
|
||||
|
||||
. If a function could be written to find 3-character (or other length) fixed
|
||||
strings, at least one of which must be present for a match, efficient
|
||||
|
@ -417,6 +396,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
|
||||
. If pcre2grep had --first-line (match only in the first line) it could be
|
||||
efficiently used to find files "starting with xxx". What about --last-line?
|
||||
There was also the suggestion of an option for pcre2grep to scan only the
|
||||
start of a file. I am not keen - this is the job of "head".
|
||||
|
||||
. A user requested a means of determining whether a failed match was failed by
|
||||
the start-of-match optimizations, or by running the match engine. Easy enough
|
||||
|
@ -426,12 +407,14 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
interpreters? JIT already does some of this, but it may not be worth it for
|
||||
the interpreters.
|
||||
|
||||
. There was a request for a way of re-defining \w (and therefore \W, \b, and
|
||||
\B). An in-pattern sequence such as (?w=[...]) was suggested. Easiest way
|
||||
would be simply to inline the class, with lookarounds for \b and \B. Ideally
|
||||
the setting should last till the end of the group, which means remembering
|
||||
all previous settings; maybe a fixed amount of stack would do - how deep
|
||||
would anyone want to nest these things? Bugzilla #2301.
|
||||
. Redesign handling of class/nclass/xclass because the compile code logic is
|
||||
currently very contorted and obscure. Also there was a request for a way of
|
||||
re-defining \w (and therefore \W, \b, and \B). An in-pattern sequence such as
|
||||
(?w=[...]) was suggested. Easiest way would be simply to inline the class,
|
||||
with lookarounds for \b and \B. Ideally the setting should last till the end
|
||||
of the group, which means remembering all previous settings; maybe a fixed
|
||||
amount of stack would do - how deep would anyone want to nest these things?
|
||||
See GitHub issue #13 for a compendium of character class issues.
|
||||
|
||||
. Recognize the short script names. They are already listed in maint/
|
||||
Multistage2.py because they are needed for scanning the script extensions
|
||||
|
@ -444,7 +427,16 @@ very sensible; some are rather wacky. Some have been on this list for years.
|
|||
facility for a length limit in pcre2_config(), and what would be the
|
||||
encoding?
|
||||
|
||||
. Quantified groups with a fixed count currently operate by replicating the
|
||||
group in the compiled bytecode. This may not really matter in these days of
|
||||
gigabyte memory, but perhaps another implementation might be considered.
|
||||
Needs coordination between the interpreters and JIT.
|
||||
|
||||
. There are regular requests for variable-length lookbehinds.
|
||||
|
||||
. See also any suggestions in the GitHub issues.
|
||||
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 01 April 2020
|
||||
Email local part: Philip.Hazel
|
||||
Email domain: gmail.com
|
||||
Last updated: 26 August 2021
|
||||
|
|
Loading…
Reference in New Issue