Code tidies for 10.30-RC1 release candidate.

This commit is contained in:
Philip.Hazel 2017-07-19 16:04:15 +00:00
parent e3052af6fd
commit 810d9b6da5
65 changed files with 1320 additions and 1152 deletions

View File

@ -2,8 +2,8 @@ Change Log for PCRE2
--------------------
Version 10.30-DEV 09-March-2017
-------------------------------
Version 10.30-RC1 18-July-2017
------------------------------
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the stack) for

57
NEWS
View File

@ -1,6 +1,63 @@
News about PCRE2 releases
-------------------------
Version 10.30-RC1 18-July-2017
------------------------------
The full list of changes that includes bugfixes and tidies is, as always, in
ChangeLog. These are the most important new features:
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the system stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other
previously hard-to-do issues. For patterns that have a lot of backtracking, the
heap is now used, and there is explicit limit on the amount, settable by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
but is renamed as "depth limit" (though the old names remain for
compatibility).
There is also a change in the way callouts from pcre2_match() are handled. The
offset_vector field in the callout block is no longer a pointer to the
actual ovector that was passed to the matching function in the match data
block. Instead it points to an internal ovector of a size large enough to hold
all possible captured substrings in the pattern.
2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
the end of the subject.
3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
also supported.
4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
5. Additional compile options in the compile context are now available, and the
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
6. The newline type PCRE2_NEWLINE_NUL is now available.
7. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply.
8. The option REG_PEND (a GNU extension) is now available for the POSIX
wrapper. Also there is a new option PCRE2_LITERAL which is used to support
REG_NOSPEC.
9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
is tidier and also fixes some bugs.
10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
11. There are some experimental functions for converting foreign patterns
(globs and POSIX patterns) into PCRE2 patterns.
Version 10.23 14-February-2017
------------------------------

47
README
View File

@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the
end of a line. Whatever you specify at build time is the default; the caller
of PCRE2 can change the selection at run time. The default newline indicator
is a single LF character (the Unix standard). You can specify the default
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
--enable-newline-is-any to the "configure" command, respectively.
of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
character as indicating the end of a line. Whatever you specify at build time
is the default; the caller of PCRE2 can change the selection at run time. The
default newline indicator is a single LF character (the Unix standard). You
can specify the default newline indicator by adding --enable-newline-is-cr,
--enable-newline-is-lf, --enable-newline-is-crlf,
--enable-newline-is-anycrlf, --enable-newline-is-any, or
--enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of computing resource
it uses when matching a pattern with the Perl-compatible matching function.
If the limit is exceeded during a match, the match fails. The default is ten
million. You can change the default by setting, for example,
it uses when matching a pattern. If the limit is exceeded during a match, the
match fails. The default is ten million. You can change the default by
setting, for example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion in the
pcre2api man page (search for pcre2_set_match_limit).
pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively.
Test 8 checks some internal offsets and code size features; it is run only when
the default "link size" of 2 is set (in other cases the sizes change) and when
Unicode support is enabled.
Test 8 checks some internal offsets and code size features, but it is run only
when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
32-bit modes and for different link sizes, so there are different output files
for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains some special UTF and UCP tests that give different output for
the different widths.
different code unit widths.
Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out.
Tests 24 and 25 test the experimental pattern conversion functions, without and
with UTF support, respectively.
Character tables
----------------
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default
locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
src/pcre2_compile.c )
src/pcre2_config.c )
src/pcre2_context.c )
src/pcre2_convert.c )
src/pcre2_dfa_match.c )
src/pcre2_error.c )
src/pcre2_find_bracket.c )
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program
src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program
(C) Auxiliary files:
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file
src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure"
src/config.h.generic ) a version of config.h for use in non-"configure"
) environments
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 June 2017
Last updated: 18 July 2017

View File

@ -830,7 +830,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $supportBSC -ne 0 ] ; then
echo " Skipped because \C is not disabled"
else
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput23 testtry
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput23 testtry
checkresult $? 23 ""
fi
fi
@ -839,7 +839,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ "$do24" = yes ] ; then
echo $title24
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput24 testtry
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput24 testtry
checkresult $? 24 ""
fi
@ -850,7 +850,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $utf -eq 0 ] ; then
echo " Skipped because UTF-$bits support is not available"
else
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput25 testtry
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput25 testtry
checkresult $? 25 ""
fi
fi

View File

@ -10,17 +10,17 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [30])
m4_define(pcre2_prerelease, [-DEV])
m4_define(pcre2_date, [2017-03-05])
m4_define(pcre2_prerelease, [-RC1])
m4_define(pcre2_date, [2017-07-18])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre2_8_version, [5:0:5])
m4_define(libpcre2_16_version, [5:0:5])
m4_define(libpcre2_32_version, [5:0:5])
m4_define(libpcre2_posix_version, [1:1:0])
m4_define(libpcre2_8_version, [6:0:6])
m4_define(libpcre2_16_version, [6:0:6])
m4_define(libpcre2_32_version, [6:0:6])
m4_define(libpcre2_posix_version, [2:0:0])
AC_PREREQ(2.57)
AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2)

View File

@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the
end of a line. Whatever you specify at build time is the default; the caller
of PCRE2 can change the selection at run time. The default newline indicator
is a single LF character (the Unix standard). You can specify the default
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
--enable-newline-is-any to the "configure" command, respectively.
of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
character as indicating the end of a line. Whatever you specify at build time
is the default; the caller of PCRE2 can change the selection at run time. The
default newline indicator is a single LF character (the Unix standard). You
can specify the default newline indicator by adding --enable-newline-is-cr,
--enable-newline-is-lf, --enable-newline-is-crlf,
--enable-newline-is-anycrlf, --enable-newline-is-any, or
--enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of computing resource
it uses when matching a pattern with the Perl-compatible matching function.
If the limit is exceeded during a match, the match fails. The default is ten
million. You can change the default by setting, for example,
it uses when matching a pattern. If the limit is exceeded during a match, the
match fails. The default is ten million. You can change the default by
setting, for example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion in the
pcre2api man page (search for pcre2_set_match_limit).
pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively.
Test 8 checks some internal offsets and code size features; it is run only when
the default "link size" of 2 is set (in other cases the sizes change) and when
Unicode support is enabled.
Test 8 checks some internal offsets and code size features, but it is run only
when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
32-bit modes and for different link sizes, so there are different output files
for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains some special UTF and UCP tests that give different output for
the different widths.
different code unit widths.
Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out.
Tests 24 and 25 test the experimental pattern conversion functions, without and
with UTF support, respectively.
Character tables
----------------
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default
locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
src/pcre2_compile.c )
src/pcre2_config.c )
src/pcre2_context.c )
src/pcre2_convert.c )
src/pcre2_dfa_match.c )
src/pcre2_error.c )
src/pcre2_find_bracket.c )
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program
src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program
(C) Auxiliary files:
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file
src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure"
src/config.h.generic ) a version of config.h for use in non-"configure"
) environments
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 June 2017
Last updated: 18 July 2017

View File

@ -87,10 +87,10 @@ Options that specify values have names that start with --with.
<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
<P>
By default, a library called <b>libpcre2-8</b> is built, containing functions
that take string arguments contained in vectors of bytes, interpreted either as
that take string arguments contained in arrays of bytes, interpreted either as
single-byte characters, or UTF-8 strings. You can also build two other
libraries, called <b>libpcre2-16</b> and <b>libpcre2-32</b>, which process
strings that are contained in vectors of 16-bit and 32-bit code units,
strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the <b>configure</b> command:
@ -208,19 +208,23 @@ to the <b>configure</b> command. There is a fourth option, specified by
--enable-newline-is-anycrlf
</pre>
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
indicating a line ending. Finally, a fifth option, specified by
indicating a line ending. A fifth option, specified by
<pre>
--enable-newline-is-any
</pre>
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029).
separator, U+2028), and PS (paragraph separator, U+2029). The final option is
<pre>
--enable-newline-is-nul
</pre>
which causes NUL (binary zero) is set as the default line-ending character.
</P>
<P>
Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system.
recommended to use the standard for your operating system.
</P>
<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
@ -301,7 +305,9 @@ because the size of each backtracking "frame" depends on the number of
capturing parentheses in a pattern, the amount of heap that is used before the
limit is reached varies from pattern to pattern. This limit was more useful in
versions before 10.30, where function recursion was used for backtracking.
However, as well as applying to <b>pcre2_match()</b>, this limit also controls
</P>
<P>
As well as applying to <b>pcre2_match()</b>, the depth limit also controls
the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching.
@ -559,7 +565,7 @@ Cambridge, England.
</P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 June 2017
Last updated: 18 July 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -3487,10 +3487,10 @@ PCRE2 BUILD-TIME OPTIONS
BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
By default, a library called libpcre2-8 is built, containing functions
that take string arguments contained in vectors of bytes, interpreted
that take string arguments contained in arrays of bytes, interpreted
either as single-byte characters, or UTF-8 strings. You can also build
two other libraries, called libpcre2-16 and libpcre2-32, which process
strings that are contained in vectors of 16-bit and 32-bit code units,
strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters
or UTF-16/UTF-32 strings. To build these additional libraries, add one
or both of the following to the configure command:
@ -3609,7 +3609,7 @@ NEWLINE RECOGNITION
--enable-newline-is-anycrlf
which causes PCRE2 to recognize any of the three sequences CR, LF, or
CRLF as indicating a line ending. Finally, a fifth option, specified by
CRLF as indicating a line ending. A fifth option, specified by
--enable-newline-is-any
@ -3617,11 +3617,16 @@ NEWLINE RECOGNITION
newline sequences are the three just mentioned, plus the single charac-
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
U+2029).
U+2029). The final option is
--enable-newline-is-nul
which causes NUL (binary zero) is set as the default line-ending char-
acter.
Whatever default line ending convention is selected when PCRE2 is built
can be overridden by applications that use the library. At build time
it is conventional to use the standard for your operating system.
it is recommended to use the standard for your operating system.
WHAT \R MATCHES
@ -3703,11 +3708,12 @@ LIMITING PCRE2 RESOURCE USAGE
number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function
recursion was used for backtracking. However, as well as applying to
pcre2_match(), this limit also controls the depth of recursive function
calls in pcre2_dfa_match(). These are used for lookaround assertions,
atomic groups, and recursion within patterns. The limit does not apply
to JIT matching.
recursion was used for backtracking.
As well as applying to pcre2_match(), the depth limit also controls the
depth of recursive function calls in pcre2_dfa_match(). These are used
for lookaround assertions, atomic groups, and recursion within pat-
terns. The limit does not apply to JIT matching.
CREATING CHARACTER TABLES AT BUILD TIME
@ -3969,7 +3975,7 @@ AUTHOR
REVISION
Last updated: 17 June 2017
Last updated: 18 July 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "17 June 2017" "PCRE2 10.30"
.TH PCRE2BUILD 3 "18 July 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.
@ -66,10 +66,10 @@ Options that specify values have names that start with --with.
.rs
.sp
By default, a library called \fBlibpcre2-8\fP is built, containing functions
that take string arguments contained in vectors of bytes, interpreted either as
that take string arguments contained in arrays of bytes, interpreted either as
single-byte characters, or UTF-8 strings. You can also build two other
libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process
strings that are contained in vectors of 16-bit and 32-bit code units,
strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the \fBconfigure\fP command:
@ -197,18 +197,22 @@ to the \fBconfigure\fP command. There is a fourth option, specified by
--enable-newline-is-anycrlf
.sp
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
indicating a line ending. Finally, a fifth option, specified by
indicating a line ending. A fifth option, specified by
.sp
--enable-newline-is-any
.sp
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029).
separator, U+2028), and PS (paragraph separator, U+2029). The final option is
.sp
--enable-newline-is-nul
.sp
which causes NUL (binary zero) is set as the default line-ending character.
.P
Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system.
recommended to use the standard for your operating system.
.
.
.SH "WHAT \eR MATCHES"
@ -297,7 +301,8 @@ because the size of each backtracking "frame" depends on the number of
capturing parentheses in a pattern, the amount of heap that is used before the
limit is reached varies from pattern to pattern. This limit was more useful in
versions before 10.30, where function recursion was used for backtracking.
However, as well as applying to \fBpcre2_match()\fP, this limit also controls
.P
As well as applying to \fBpcre2_match()\fP, the depth limit also controls
the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are
used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching.
@ -577,6 +582,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 June 2017
Last updated: 18 July 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -132,6 +132,12 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <zlib.h> header file. */
/* #undef HAVE_ZLIB_H */
/* This limits the amount of memory that pcre2_match() may use while matching
a pattern. The value is in kilobytes. */
#ifndef HEAP_LIMIT
#define HEAP_LIMIT 20000000
#endif
/* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
@ -148,7 +154,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif
/* The value of MATCH_LIMIT determines the default number of times the
internal match() function can record a backtrack position during a single
pcre2_match() function can record a backtrack position during a single
matching attempt. There is a runtime interface for setting a different
limit. The limit exists in order to catch runaway regular expressions that
take for ever to determine that they do not match. The default is set very
@ -188,8 +194,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* The value of NEWLINE_DEFAULT determines the default newline character
sequence. PCRE2 client programs can override this by selecting other values
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), and 5
(ANYCRLF). */
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), 5
(ANYCRLF), and 6 (NUL). */
#ifndef NEWLINE_DEFAULT
#define NEWLINE_DEFAULT 2
#endif
@ -204,7 +210,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE2"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE2 10.30-DEV"
#define PACKAGE_STRING "PCRE2 10.30-RC1"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2"
@ -213,7 +219,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL ""
/* Define to the version of this package. */
#define PACKAGE_VERSION "10.30-DEV"
#define PACKAGE_VERSION "10.30-RC1"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system
@ -261,6 +267,11 @@ sure both macros are undefined; an emulation function will then be used. */
your system. */
/* #undef PTHREAD_CREATE_JOINABLE */
/* Define to any non-zero number to enable support for SELinux compatible
executable memory allocator in JIT. Note that this will have no effect
unless SUPPORT_JIT is also defined. */
/* #undef SLJIT_PROT_EXECUTABLE_ALLOCATOR */
/* Define to 1 if you have the ANSI C header files. */
/* #undef STDC_HEADERS */
@ -328,7 +339,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif
/* Version number of package */
#define VERSION "10.30-DEV"
#define VERSION "10.30-RC1"
/* Define to 1 if on MINIX. */
/* #undef _MINIX */

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10
#define PCRE2_MINOR 30
#define PCRE2_PRERELEASE -DEV
#define PCRE2_DATE 2017-03-05
#define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2017-07-18
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10
#define PCRE2_MINOR 30
#define PCRE2_PRERELEASE -DEV
#define PCRE2_DATE 2017-03-05
#define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2017-07-18
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate
@ -138,6 +138,14 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */
#define PCRE2_LITERAL 0x02000000u /* C */
/* An additional compile options word is available in the compile context. */
#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */
#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
/* These are for pcre2_jit_compile(). */
@ -176,6 +184,16 @@ ignored for pcre2_jit_match(). */
#define PCRE2_NO_JIT 0x00002000u
/* Options for pcre2_pattern_convert(). */
#define PCRE2_CONVERT_UTF 0x00000001u
#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u
#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u
#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u
#define PCRE2_CONVERT_GLOB 0x00000010u
#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u
#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u
/* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be
greater than zero. */
@ -185,6 +203,7 @@ greater than zero. */
#define PCRE2_NEWLINE_CRLF 3
#define PCRE2_NEWLINE_ANY 4
#define PCRE2_NEWLINE_ANYCRLF 5
#define PCRE2_NEWLINE_NUL 6
#define PCRE2_BSR_UNICODE 1
#define PCRE2_BSR_ANYCRLF 2
@ -270,6 +289,8 @@ numbers must not be changed. */
#define PCRE2_ERROR_TOOMANYREPLACE (-61)
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
#define PCRE2_ERROR_HEAPLIMIT (-63)
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
/* Request types for pcre2_pattern_info() */
@ -351,6 +372,9 @@ typedef struct pcre2_real_compile_context pcre2_compile_context; \
struct pcre2_real_match_context; \
typedef struct pcre2_real_match_context pcre2_match_context; \
\
struct pcre2_real_convert_context; \
typedef struct pcre2_real_convert_context pcre2_convert_context; \
\
struct pcre2_real_code; \
typedef struct pcre2_real_code pcre2_code; \
\
@ -434,6 +458,8 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_bsr(pcre2_compile_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_character_tables(pcre2_compile_context *, const unsigned char *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
@ -466,6 +492,18 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_recursion_memory_management(pcre2_match_context *, \
void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *);
#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
*pcre2_convert_context_copy(pcre2_convert_context *); \
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
*pcre2_convert_context_create(pcre2_general_context *); \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_convert_context_free(pcre2_convert_context *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_glob_separator(pcre2_convert_context *, uint32_t);
/* Functions concerned with compiling a pattern to PCRE internal code. */
@ -572,6 +610,16 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *);
/* Functions for converting pattern source strings. */
#define PCRE2_CONVERT_FUNCTIONS \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \
PCRE2_SIZE *, pcre2_convert_context *); \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_converted_pattern_free(PCRE2_UCHAR *);
/* Functions for JIT processing */
#define PCRE2_JIT_FUNCTIONS \
@ -623,6 +671,7 @@ pcre2_compile are called by application code. */
#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_)
#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_)
#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_)
#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_)
#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_)
#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_)
#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_)
@ -634,6 +683,7 @@ pcre2_compile are called by application code. */
#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_)
#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_)
#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_)
@ -649,6 +699,10 @@ pcre2_compile are called by application code. */
#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_)
#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_)
#define pcre2_config PCRE2_SUFFIX(pcre2_config_)
#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_)
#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_)
#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_)
#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_)
#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_)
#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_)
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
@ -672,6 +726,7 @@ pcre2_compile are called by application code. */
#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_)
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_)
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
@ -680,8 +735,11 @@ pcre2_compile are called by application code. */
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_)
#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_)
#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_)
#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_)
#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_)
#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_)
#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_)
#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_)
@ -716,6 +774,8 @@ PCRE2_STRUCTURE_LIST \
PCRE2_GENERAL_INFO_FUNCTIONS \
PCRE2_GENERAL_CONTEXT_FUNCTIONS \
PCRE2_COMPILE_CONTEXT_FUNCTIONS \
PCRE2_CONVERT_CONTEXT_FUNCTIONS \
PCRE2_CONVERT_FUNCTIONS \
PCRE2_MATCH_CONTEXT_FUNCTIONS \
PCRE2_COMPILE_FUNCTIONS \
PCRE2_PATTERN_INFO_FUNCTIONS \
@ -745,6 +805,7 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
#undef PCRE2_GENERAL_INFO_FUNCTIONS
#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS
#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS
#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS
#undef PCRE2_MATCH_CONTEXT_FUNCTIONS
#undef PCRE2_COMPILE_FUNCTIONS
#undef PCRE2_PATTERN_INFO_FUNCTIONS

View File

@ -4351,7 +4351,7 @@ struct sljit_jump *quit;
struct sljit_jump *partial_quit[2];
sljit_u8 instruction[8];
sljit_s32 tmp1_ind = sljit_get_register_index(TMP1);
sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
// sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
sljit_s32 str_ptr_ind = sljit_get_register_index(STR_PTR);
sljit_s32 data_ind = 0;
sljit_s32 tmp_ind = 1;
@ -4376,7 +4376,9 @@ if (common->mode == PCRE2_JIT_COMPLETE)
OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit));
SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
// SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
SLJIT_ASSERT(tmp1_ind < 8);
/* MOVD xmm, r/m32 */
instruction[0] = 0x66;

View File

@ -4073,7 +4073,8 @@ else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%
* Show compile extra options *
*************************************************/
/* Called for unsupported POSIX options.
/* Called only for unsupported POSIX options at present, and therefore needed
only when the 8-bit library is being compiled.
Arguments:
options an options word
@ -4083,17 +4084,21 @@ Arguments:
Returns: nothing
*/
#ifdef SUPPORT_PCRE2_8
static void
show_compile_extra_options(uint32_t options, const char *before,
const char *after)
{
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
else fprintf(outfile, "%s%s%s%s",
else fprintf(outfile, "%s%s%s%s%s%s",
before,
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
after);
}
#endif

View File

@ -1555,6 +1555,7 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_cmov(struct sljit_compile
sljit_s32 dst_reg,
sljit_s32 src, sljit_sw srcw)
{
(void)srcw; /* To stop compiler warning */
#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS)
CHECK_ARGUMENT(!(type & ~(0xff | SLJIT_I32_OP)));
CHECK_ARGUMENT((type & 0xff) >= SLJIT_EQUAL && (type & 0xff) <= SLJIT_ORDERED_F64);

11
testdata/testinput1 vendored
View File

@ -95,17 +95,6 @@
aaac
abbbbbbbbbbbac
/^(b+|a){1,2}?bc/
bbc
/^(b*|ba){1,2}?bc/
babc
bbabc
bababc
\= Expect no match
bababbc
babababc
/^(ba|b*){1,2}?bc/
babc
bbabc

15
testdata/testinput2 vendored
View File

@ -5350,4 +5350,19 @@ a)"xI
\= Expect no match
Not a whole line
# Perl gets this wrong, failing to capture 'b' in group 1.
/^(b+|a){1,2}?bc/
bbc
# And again here, for the "babc" subject string.
/^(b*|ba){1,2}?bc/
babc
bbabc
bababc
\= Expect no match
bababbc
babababc
# End of testinput2

21
testdata/testoutput1 vendored
View File

@ -183,27 +183,6 @@ No match
abbbbbbbbbbbac
No match
/^(b+|a){1,2}?bc/
bbc
0: bbc
1: b
/^(b*|ba){1,2}?bc/
babc
0: babc
1: ba
bbabc
0: bbabc
1: ba
bababc
0: bababc
1: ba
\= Expect no match
bababbc
No match
babababc
No match
/^(ba|b*){1,2}?bc/
babc
0: babc

25
testdata/testoutput2 vendored
View File

@ -16300,6 +16300,31 @@ No match
Not a whole line
No match
# Perl gets this wrong, failing to capture 'b' in group 1.
/^(b+|a){1,2}?bc/
bbc
0: bbc
1: b
# And again here, for the "babc" subject string.
/^(b*|ba){1,2}?bc/
babc
0: babc
1: ba
bbabc
0: bbabc
1: ba
bababc
0: bababc
1: ba
\= Expect no match
bababbc
No match
babababc
No match
# End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated