Documentation update.

2019-06-03 16:39:20 +00:00 · 2019-06-03 16:39:20 +00:00 · dea540877b
parent 16d47a9cb1
commit dea540877b
1 changed files with 72 additions and 64 deletions
--- a/maint/README
+++ b/maint/README
@ -141,8 +141,9 @@ distribution for a new release.

 . Run perltest.sh on the test data for tests 1 and 4. The output should match
  the PCRE2 test output, apart from the version identification at the start of
-  each test. The other tests are not Perl-compatible (they use various
-  PCRE2-specific features or options).
+  each test. Sometimes there are other differences in test 4 if PCRE2 and Perl
+  are using different Unicode releases. The other tests are not Perl-compatible
+  (they use various PCRE2-specific features or options).

 . It is possible to test with the emulated memmove() function by undefining
  HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often.
@ -155,8 +156,9 @@ distribution for a new release.
  systems. For example, on Solaris it is helpful to test using Sun's cc
  compiler as a change from gcc. Adding -xarch=v9 to the cc options does a
  64-bit test, but it also needs -S 64 for pcre2test to increase the stack size
-  for test 2. Since I retired I can no longer do this, but instead I rely on
-  putting out release candidates for folks on the pcre-dev list to test.
+  for test 2. Since I retired I can no longer do much of this, but instead I
+  rely on putting out release candidates for folks on the pcre-dev list to
+  test.

 . The buildbots at http://buildfarm.opencsw.org/ do some automated testing
  of PCRE2 and should be checked before putting out a release.
@ -285,7 +287,7 @@ very sensible; some are rather wacky. Some have been on this list for years.
  to switch this dynamically. It would have to be specified when PCRE2 was
  compiled. PCRE2 would then call a function every time it wanted a character.

-. pcre2grep: add -rs for a sorted recurse? Having to store file names and sort
+. pcre2grep: add -rs for a sorted recurse. Having to store file names and sort
  them will of course slow it down.

 . Someone suggested --disable-callout to save code space when callouts are
@ -325,10 +327,10 @@ very sensible; some are rather wacky. Some have been on this list for years.
 . If Perl ever supports the POSIX notation [[.something.]] PCRE2 should try
  to follow.

-. Bugzilla #554 requested support for invalid UTF-8 strings.
-
 . A user wanted a way of ignoring all Unicode "mark" characters so that, for
-  example "a" followed by an accent would, together, match "a".
+  example "a" followed by an accent would, together, match "a". This can only
+  be done clumsily at present by using a lookahead such as /(?=a)\X/, which
+  works for "combining" characters.

 . Perl supports [\N{x}-\N{y}] as a Unicode range, even in EBCDIC. PCRE2
  supports \N{U+dd..} everywhere, but not in EBCDIC.
@ -345,9 +347,6 @@ very sensible; some are rather wacky. Some have been on this list for years.

 . Bugzilla #1694 requests backwards searching.

-. A callout from pcre2_substitute() that happens after (before?) each
-  substitution (value = 256?).
-
 . Allow a callout to specify a number of characters to skip. This can be done
  compatibly via an extra callout field.

@ -359,15 +358,12 @@ very sensible; some are rather wacky. Some have been on this list for years.
 . A limit on substitutions: a user suggested somehow finding a way of making
  match_limit apply to the whole operation instead of each match separately.

-. There was a suggestion that Perl should lock out \K in lookarounds. If it
-  does, PCRE2 should follow.
-
 . Redesign handling of class/nclass/xclass because the compile code logic is
  currently very contorted and obscure.

 . Some #defines could be replaced with enums to improve robustness.

-. There was a request for and option for pcre2_match() to return the longest 
+. There was a request for an option for pcre2_match() to return the longest
  match. This would mean searching for all possible matches, of course.

 . Perl's /a modifier sets Unicode, but restricts \d etc to ASCII characters,
@ -417,7 +413,8 @@ very sensible; some are rather wacky. Some have been on this list for years.
  to define a bit in the match data, but all three matchers would need work.

 . Would inlining "simple" recursions provide a useful performance boost for the
-  interpreters? JIT already does some of this.
+  interpreters? JIT already does some of this, but it may not be worth it for
+  the interpreters.

 . There was a request for a way of re-defining \w (and therefore \W, \b, and
  \B). An in-pattern sequence such as (?w=[...]) was suggested. Easiest way
@ -426,7 +423,18 @@ very sensible; some are rather wacky. Some have been on this list for years.
  all previous settings; maybe a fixed amount of stack would do - how deep
  would anyone want to nest these things? Bugzilla #2301.

+. Recognize the short script names. They are already listed in maint/
+  Multistage2.py because they are needed for scanning the script extensions
+  file.
+
+. Use script extensions for \p?
+
+. A user suggested something like --with-build-info to set a build information
+  string that could be retrieved by pcre2_config(). However, there's no
+  facility for a length limit in pcre2_config(), and what would be the
+  encoding?
+
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 07 October 2018
+Last updated: 03 June 2019