Documentation update.

This commit is contained in:
Philip.Hazel 2019-06-03 16:39:20 +00:00
parent 16d47a9cb1
commit dea540877b
1 changed files with 72 additions and 64 deletions

View File

@ -141,8 +141,9 @@ distribution for a new release.
. Run perltest.sh on the test data for tests 1 and 4. The output should match
the PCRE2 test output, apart from the version identification at the start of
each test. The other tests are not Perl-compatible (they use various
PCRE2-specific features or options).
each test. Sometimes there are other differences in test 4 if PCRE2 and Perl
are using different Unicode releases. The other tests are not Perl-compatible
(they use various PCRE2-specific features or options).
. It is possible to test with the emulated memmove() function by undefining
HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often.
@ -155,8 +156,9 @@ distribution for a new release.
systems. For example, on Solaris it is helpful to test using Sun's cc
compiler as a change from gcc. Adding -xarch=v9 to the cc options does a
64-bit test, but it also needs -S 64 for pcre2test to increase the stack size
for test 2. Since I retired I can no longer do this, but instead I rely on
putting out release candidates for folks on the pcre-dev list to test.
for test 2. Since I retired I can no longer do much of this, but instead I
rely on putting out release candidates for folks on the pcre-dev list to
test.
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing
of PCRE2 and should be checked before putting out a release.
@ -285,7 +287,7 @@ very sensible; some are rather wacky. Some have been on this list for years.
to switch this dynamically. It would have to be specified when PCRE2 was
compiled. PCRE2 would then call a function every time it wanted a character.
. pcre2grep: add -rs for a sorted recurse? Having to store file names and sort
. pcre2grep: add -rs for a sorted recurse. Having to store file names and sort
them will of course slow it down.
. Someone suggested --disable-callout to save code space when callouts are
@ -325,10 +327,10 @@ very sensible; some are rather wacky. Some have been on this list for years.
. If Perl ever supports the POSIX notation [[.something.]] PCRE2 should try
to follow.
. Bugzilla #554 requested support for invalid UTF-8 strings.
. A user wanted a way of ignoring all Unicode "mark" characters so that, for
example "a" followed by an accent would, together, match "a".
example "a" followed by an accent would, together, match "a". This can only
be done clumsily at present by using a lookahead such as /(?=a)\X/, which
works for "combining" characters.
. Perl supports [\N{x}-\N{y}] as a Unicode range, even in EBCDIC. PCRE2
supports \N{U+dd..} everywhere, but not in EBCDIC.
@ -345,9 +347,6 @@ very sensible; some are rather wacky. Some have been on this list for years.
. Bugzilla #1694 requests backwards searching.
. A callout from pcre2_substitute() that happens after (before?) each
substitution (value = 256?).
. Allow a callout to specify a number of characters to skip. This can be done
compatibly via an extra callout field.
@ -359,15 +358,12 @@ very sensible; some are rather wacky. Some have been on this list for years.
. A limit on substitutions: a user suggested somehow finding a way of making
match_limit apply to the whole operation instead of each match separately.
. There was a suggestion that Perl should lock out \K in lookarounds. If it
does, PCRE2 should follow.
. Redesign handling of class/nclass/xclass because the compile code logic is
currently very contorted and obscure.
. Some #defines could be replaced with enums to improve robustness.
. There was a request for and option for pcre2_match() to return the longest
. There was a request for an option for pcre2_match() to return the longest
match. This would mean searching for all possible matches, of course.
. Perl's /a modifier sets Unicode, but restricts \d etc to ASCII characters,
@ -417,7 +413,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
to define a bit in the match data, but all three matchers would need work.
. Would inlining "simple" recursions provide a useful performance boost for the
interpreters? JIT already does some of this.
interpreters? JIT already does some of this, but it may not be worth it for
the interpreters.
. There was a request for a way of re-defining \w (and therefore \W, \b, and
\B). An in-pattern sequence such as (?w=[...]) was suggested. Easiest way
@ -426,7 +423,18 @@ very sensible; some are rather wacky. Some have been on this list for years.
all previous settings; maybe a fixed amount of stack would do - how deep
would anyone want to nest these things? Bugzilla #2301.
. Recognize the short script names. They are already listed in maint/
Multistage2.py because they are needed for scanning the script extensions
file.
. Use script extensions for \p?
. A user suggested something like --with-build-info to set a build information
string that could be retrieved by pcre2_config(). However, there's no
facility for a length limit in pcre2_config(), and what would be the
encoding?
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 07 October 2018
Last updated: 03 June 2019