Final file tidies for 10.32

This commit is contained in:
Philip.Hazel 2018-09-11 14:27:39 +00:00
parent ab30606b01
commit bf3c7c68ec
14 changed files with 680 additions and 641 deletions

View File

@ -155,37 +155,37 @@ bcopy() doesn't return a result. This feature is now refactored always to call
an emulation function when there is no memmove(). The emulation makes use of an emulation function when there is no memmove(). The emulation makes use of
bcopy() when available. bcopy() when available.
34. When serializing a pattern, set the memctl, executable_jit, and tables 34. When serializing a pattern, set the memctl, executable_jit, and tables
fields (that is, all the fields that contain pointers) to zeros so that the fields (that is, all the fields that contain pointers) to zeros so that the
result of serializing is always the same. These fields are re-set when the result of serializing is always the same. These fields are re-set when the
pattern is deserialized. pattern is deserialized.
35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated 35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
negative class with no characters less than 0x100 followed by a positive class negative class with no characters less than 0x100 followed by a positive class
with only characters less than 0x100, the first class was incorrectly being with only characters less than 0x100, the first class was incorrectly being
auto-possessified, causing incorrect match failures. auto-possessified, causing incorrect match failures.
36. Removed the character type bit ctype_meta, which dates from PCRE1 and is 36. Removed the character type bit ctype_meta, which dates from PCRE1 and is
not used in PCRE2. not used in PCRE2.
37. Tidied up unnecessarily complicated macros used in the escapes table. 37. Tidied up unnecessarily complicated macros used in the escapes table.
38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted 38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted
from distribution tarballs, owing to a typo in Makefile.am which had from distribution tarballs, owing to a typo in Makefile.am which had
testoutput8-16-3 twice. Now fixed. testoutput8-16-3 twice. Now fixed.
39. If the only branch in a conditional subpattern was anchored, the whole 39. If the only branch in a conditional subpattern was anchored, the whole
subpattern was treated as anchored, when it should not have been, since the subpattern was treated as anchored, when it should not have been, since the
assumed empty second branch cannot be anchored. Demonstrated by test patterns assumed empty second branch cannot be anchored. Demonstrated by test patterns
such as /(?(1)^())b/ or /(?(?=^))b/. such as /(?(1)^())b/ or /(?(?=^))b/.
40. A repeated conditional subpattern that could match an empty string was 40. A repeated conditional subpattern that could match an empty string was
always assumed to be unanchored. Now it it checked just like any other always assumed to be unanchored. Now it it checked just like any other
repeated conditional subpattern, and can be found to be anchored if the minimum repeated conditional subpattern, and can be found to be anchored if the minimum
quantifier is one or more. I can't see much use for a repeated anchored quantifier is one or more. I can't see much use for a repeated anchored
pattern, but the behaviour is now consistent. pattern, but the behaviour is now consistent.
41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint 41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint
(for an event that could never occur but you had to have external information (for an event that could never occur but you had to have external information
to know that). to know that).
@ -194,7 +194,7 @@ there was a line that was sufficiently long to cause the input buffer to be
expanded, the variable holding the location of the end of the previous match expanded, the variable holding the location of the end of the previous match
was being adjusted incorrectly, and could cause an overflow warning from a code was being adjusted incorrectly, and could cause an overflow warning from a code
sanitizer. However, as the value is used only to print pending "after" lines sanitizer. However, as the value is used only to print pending "after" lines
when the next match is reached (and there are no such lines in this case) this when the next match is reached (and there are no such lines in this case) this
bug could do no damage. bug could do no damage.

10
LICENCE
View File

@ -4,11 +4,11 @@ PCRE2 LICENCE
PCRE2 is a library of functions to support regular expressions whose syntax PCRE2 is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Release 10 of PCRE2 is distributed under the terms of the "BSD" licence, as Releases 10.00 and above of PCRE2 are distributed under the terms of the "BSD"
specified below, with one exemption for certain binary redistributions. The licence, as specified below, with one exemption for certain binary
documentation for PCRE2, supplied in the "doc" directory, is distributed under redistributions. The documentation for PCRE2, supplied in the "doc" directory,
the same terms as the software itself. The data in the testdata directory is is distributed under the same terms as the software itself. The data in the
not copyrighted and is in the public domain. testdata directory is not copyrighted and is in the public domain.
The basic library functions are written in C and are freestanding. Also The basic library functions are written in C and are freestanding. Also
included in the distribution is a just-in-time compiler that can be used to included in the distribution is a just-in-time compiler that can be used to

9
NEWS
View File

@ -1,11 +1,12 @@
News about PCRE2 releases News about PCRE2 releases
------------------------- -------------------------
Version 10.32 13-August-2018
---------------------------- Version 10.32 10-September-2018
-------------------------------
This is another mainly bugfix and tidying release with a few minor This is another mainly bugfix and tidying release with a few minor
enhancements. enhancements. These are the main ones:
1. pcre2grep now supports the inclusion of binary zeros in patterns that are 1. pcre2grep now supports the inclusion of binary zeros in patterns that are
read from files via the -f option. read from files via the -f option.
@ -22,7 +23,7 @@ parameter now applies to pcre2_dfa_match().
5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. 5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
6. Added support for \N{U+dddd}, but not in EBCDIC environments. 6. Added support for \N{U+dddd}, but only in Unicode mode.
7. Added support for (?^) to unset all imnsx options. 7. Added support for (?^) to unset all imnsx options.

View File

@ -10,8 +10,8 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre2_major, [10]) m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [32]) m4_define(pcre2_minor, [32])
m4_define(pcre2_prerelease, [-RC1]) m4_define(pcre2_prerelease, [])
m4_define(pcre2_date, [2018-08-13]) m4_define(pcre2_date, [2018-09-10])
# NOTE: The CMakeLists.txt file searches for the above variables in the first # NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved. # 50 lines of this file. Please update that if the variables above are moved.
@ -839,7 +839,7 @@ AC_SUBST(EXTRA_LIBPCRE2_POSIX_LDFLAGS)
# When we run 'make distcheck', use these arguments. Turning off compiler # When we run 'make distcheck', use these arguments. Turning off compiler
# optimization makes it run faster. # optimization makes it run faster.
DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit --enable-utf" DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit"
AC_SUBST(DISTCHECK_CONFIGURE_FLAGS) AC_SUBST(DISTCHECK_CONFIGURE_FLAGS)
# Check that, if --enable-pcre2grep-libz or --enable-pcre2grep-libbz2 is # Check that, if --enable-pcre2grep-libz or --enable-pcre2grep-libbz2 is

View File

@ -1804,7 +1804,8 @@ Unicode support (which is the default). If Unicode support is not available,
the use of this option provokes an error. Details of how PCRE2_UTF changes the the use of this option provokes an error. Details of how PCRE2_UTF changes the
behaviour of PCRE2 are given in the behaviour of PCRE2 are given in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a> <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
page. page. In particular, note that it changes the way PCRE2_CASELESS handles
characters with code points greater than 127.
<a name="extracompileoptions"></a></P> <a name="extracompileoptions"></a></P>
<br><b> <br><b>
Extra compile options Extra compile options
@ -2776,7 +2777,7 @@ Elements in the ovector that do not correspond to capturing parentheses in the
pattern are never changed. That is, if a pattern contains <i>n</i> capturing pattern are never changed. That is, if a pattern contains <i>n</i> capturing
parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
<b>pcre2_match()</b>. The other elements retain whatever values they previously <b>pcre2_match()</b>. The other elements retain whatever values they previously
had. had. After a failed match attempt, the contents of the ovector are unchanged.
<a name="matchotherdata"></a></P> <a name="matchotherdata"></a></P>
<br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br> <br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
<P> <P>
@ -3192,6 +3193,12 @@ functions from the match context, if provided, or else those that were used to
allocate memory for the compiled code. allocate memory for the compiled code.
</P> </P>
<P> <P>
If an external <i>match_data</i> block is provided, its contents afterwards
are those set by the final call to <b>pcre2_match()</b>, which will have
ended in a matching error. The contents of the ovector within the match data
block may or may not have been changed.
</P>
<P>
The <i>outlengthptr</i> argument must point to a variable that contains the The <i>outlengthptr</i> argument must point to a variable that contains the
length, in code units, of the output buffer. If the function is successful, the length, in code units, of the output buffer. If the function is successful, the
value is updated to contain the length of the new string, excluding the value is updated to contain the length of the new string, excluding the
@ -3658,7 +3665,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 03 August 2018 Last updated: 07 September 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -399,14 +399,15 @@ these escapes are as follows:
\ddd character with octal code ddd, or backreference \ddd character with octal code ddd, or backreference
\o{ddd..} character with octal code ddd.. \o{ddd..} character with octal code ddd..
\xhh character with hex code hh \xhh character with hex code hh
\x{hhh..} character with hex code hhh.. (default mode) \x{hhh..} character with hex code hhh..
\N{U+hhh..} character with Unicode code point hhh.. \N{U+hhh..} character with Unicode hex code point hhh..
\uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set) \uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
</pre> </pre>
The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
\N{name} to specify characters by Unicode name; PCRE2 does not support this.
Note that when \N is not followed by an opening brace (curly bracket) it has Note that when \N is not followed by an opening brace (curly bracket) it has
an entirely different meaning, matching any character that is not a newline. an entirely different meaning, matching any character that is not a newline.
Perl also uses \N{name} to specify characters by Unicode name; PCRE2 does not
support this.
</P> </P>
<P> <P>
The precise effect of \cx on ASCII characters is as follows: if x is a lower The precise effect of \cx on ASCII characters is as follows: if x is a lower
@ -530,7 +531,8 @@ limited to certain values, as follows:
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the
so-called "surrogate" code points). The check for these can be disabled by the so-called "surrogate" code points). The check for these can be disabled by the
caller of <b>pcre2_compile()</b> by setting the option caller of <b>pcre2_compile()</b> by setting the option
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. However, this is possible only in UTF-8
and UTF-32 modes, because these values are not representable in UTF-16.
</P> </P>
<br><b> <br><b>
Escape sequences in character classes Escape sequences in character classes
@ -3595,13 +3597,16 @@ verbs in subroutines is different in some cases.
an immediate backtrack. an immediate backtrack.
</P> </P>
<P> <P>
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause (*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when
the subroutine match to fail. triggered by being backtracked to in a subpattern called as a subroutine. There
is then a backtrack at the outer level.
</P> </P>
<P> <P>
(*THEN) skips to the next alternative in the innermost enclosing group within (*THEN), when triggered, skips to the next alternative in the innermost
the subpattern that has alternatives. If there is no such group within the enclosing group within the subpattern that has alternatives (its normal
subpattern, (*THEN) causes the subroutine match to fail. behaviour). However, if there is no such group within the subroutine
subpattern, the subroutine match fails and there is a backtrack at the outer
level.
</P> </P>
<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br> <br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
<P> <P>
@ -3619,7 +3624,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 03 August 2018 Last updated: 04 September 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -70,7 +70,7 @@ This table applies to ASCII and Unicode environments.
\ddd character with octal code ddd, or backreference \ddd character with octal code ddd, or backreference
\o{ddd..} character with octal code ddd.. \o{ddd..} character with octal code ddd..
\U "U" if PCRE2_ALT_BSUX is set (otherwise is an error) \U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
\N{U+hh..} character with Unicode code point hh.. \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
\uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set) \uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
\xhh character with hex code hh \xhh character with hex code hh
\x{hh..} character with hex code hh.. \x{hh..} character with hex code hh..
@ -634,7 +634,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br> <br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 01 August 2018 Last updated: 02 September 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -26,7 +26,8 @@ you must call
with the PCRE2_UTF option flag, or the pattern must start with the sequence with the PCRE2_UTF option flag, or the pattern must start with the sequence
(*UTF). When either of these is the case, both the pattern and any subject (*UTF). When either of these is the case, both the pattern and any subject
strings that are matched against it are treated as UTF strings instead of strings that are matched against it are treated as UTF strings instead of
strings of individual one-code-unit characters. strings of individual one-code-unit characters. There are also some other
changes to the way characters are handled, as documented below.
</P> </P>
<P> <P>
If you do not need Unicode support you can build PCRE2 without it, in which If you do not need Unicode support you can build PCRE2 without it, in which
@ -59,6 +60,11 @@ values have to use braced sequences. Unbraced octal code points up to \777 are
also recognized; larger ones can be coded using \o{...}. also recognized; larger ones can be coded using \o{...}.
</P> </P>
<P> <P>
The escape sequence \N{U+&#60;hex digits&#62;} is recognized as another way of
specifying a Unicode character by code point in a UTF mode. It is not allowed
in non-UTF modes.
</P>
<P>
In UTF modes, repeat quantifiers apply to complete UTF characters, not to In UTF modes, repeat quantifiers apply to complete UTF characters, not to
individual code units. individual code units.
</P> </P>
@ -294,9 +300,9 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 17 May 2017 Last updated: 02 September 2018
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -1756,7 +1756,7 @@ behaviour of PCRE2 are given in the
.\" HREF .\" HREF
\fBpcre2unicode\fP \fBpcre2unicode\fP
.\" .\"
page. In particular, note that it changes the way PCRE2_CASELESS handles page. In particular, note that it changes the way PCRE2_CASELESS handles
characters with code points greater than 127. characters with code points greater than 127.
. .
. .
@ -3200,9 +3200,9 @@ data block is obtained and freed within this function, using memory management
functions from the match context, if provided, or else those that were used to functions from the match context, if provided, or else those that were used to
allocate memory for the compiled code. allocate memory for the compiled code.
.P .P
If an external \fImatch_data\fP block is provided, its contents afterwards If an external \fImatch_data\fP block is provided, its contents afterwards
are those set by the final call to \fBpcre2_match()\fP, which will have are those set by the final call to \fBpcre2_match()\fP, which will have
ended in a matching error. The contents of the ovector within the match data ended in a matching error. The contents of the ovector within the match data
block may or may not have been changed. block may or may not have been changed.
.P .P
The \fIoutlengthptr\fP argument must point to a variable that contains the The \fIoutlengthptr\fP argument must point to a variable that contains the

View File

@ -3630,7 +3630,7 @@ is then a backtrack at the outer level.
(*THEN), when triggered, skips to the next alternative in the innermost (*THEN), when triggered, skips to the next alternative in the innermost
enclosing group within the subpattern that has alternatives (its normal enclosing group within the subpattern that has alternatives (its normal
behaviour). However, if there is no such group within the subroutine behaviour). However, if there is no such group within the subroutine
subpattern, the subroutine match fails and there is a backtrack at the outer subpattern, the subroutine match fails and there is a backtrack at the outer
level. level.
. .
. .

View File

@ -53,7 +53,7 @@ values have to use braced sequences. Unbraced octal code points up to \e777 are
also recognized; larger ones can be coded using \eo{...}. also recognized; larger ones can be coded using \eo{...}.
.P .P
The escape sequence \eN{U+<hex digits>} is recognized as another way of The escape sequence \eN{U+<hex digits>} is recognized as another way of
specifying a Unicode character by code point in a UTF mode. It is not allowed specifying a Unicode character by code point in a UTF mode. It is not allowed
in non-UTF modes. in non-UTF modes.
.P .P
In UTF modes, repeat quantifiers apply to complete UTF characters, not to In UTF modes, repeat quantifiers apply to complete UTF characters, not to

View File

@ -214,7 +214,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE2" #define PACKAGE_NAME "PCRE2"
/* Define to the full name and version of this package. */ /* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE2 10.32-RC1" #define PACKAGE_STRING "PCRE2 10.32"
/* Define to the one symbol short name of this package. */ /* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2" #define PACKAGE_TARNAME "pcre2"
@ -223,7 +223,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL "" #define PACKAGE_URL ""
/* Define to the version of this package. */ /* Define to the version of this package. */
#define PACKAGE_VERSION "10.32-RC1" #define PACKAGE_VERSION "10.32"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested /* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system parentheses (of any kind) in a pattern. This limits the amount of system
@ -343,7 +343,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif #endif
/* Version number of package */ /* Version number of package */
#define VERSION "10.32-RC1" #define VERSION "10.32"
/* Define to 1 if on MINIX. */ /* Define to 1 if on MINIX. */
/* #undef _MINIX */ /* #undef _MINIX */

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10 #define PCRE2_MAJOR 10
#define PCRE2_MINOR 32 #define PCRE2_MINOR 32
#define PCRE2_PRERELEASE -RC1 #define PCRE2_PRERELEASE
#define PCRE2_DATE 2018-08-13 #define PCRE2_DATE 2018-09-10
/* For the benefit of systems without stdint.h, an alternative is to use /* For the benefit of systems without stdint.h, an alternative is to use
inttypes.h. The existence of these headers is checked by configure or CMake. */ inttypes.h. The existence of these headers is checked by configure or CMake. */
@ -316,7 +316,7 @@ pcre2_pattern_convert(). */
#define PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP 190 #define PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP 190
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191 #define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192 #define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193 #define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194 #define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194