Code tidies for 10.30-RC1 release candidate.

This commit is contained in:
Philip.Hazel 2017-07-19 16:04:15 +00:00
parent e3052af6fd
commit 810d9b6da5
65 changed files with 1320 additions and 1152 deletions

View File

@ -2,8 +2,8 @@ Change Log for PCRE2
-------------------- --------------------
Version 10.30-DEV 09-March-2017 Version 10.30-RC1 18-July-2017
------------------------------- ------------------------------
1. The main interpreter, pcre2_match(), has been refactored into a new version 1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the stack) for that does not use recursive function calls (and therefore the stack) for

57
NEWS
View File

@ -1,6 +1,63 @@
News about PCRE2 releases News about PCRE2 releases
------------------------- -------------------------
Version 10.30-RC1 18-July-2017
------------------------------
The full list of changes that includes bugfixes and tidies is, as always, in
ChangeLog. These are the most important new features:
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the system stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other
previously hard-to-do issues. For patterns that have a lot of backtracking, the
heap is now used, and there is explicit limit on the amount, settable by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
but is renamed as "depth limit" (though the old names remain for
compatibility).
There is also a change in the way callouts from pcre2_match() are handled. The
offset_vector field in the callout block is no longer a pointer to the
actual ovector that was passed to the matching function in the match data
block. Instead it points to an internal ovector of a size large enough to hold
all possible captured substrings in the pattern.
2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
the end of the subject.
3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
also supported.
4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
5. Additional compile options in the compile context are now available, and the
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
6. The newline type PCRE2_NEWLINE_NUL is now available.
7. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply.
8. The option REG_PEND (a GNU extension) is now available for the POSIX
wrapper. Also there is a new option PCRE2_LITERAL which is used to support
REG_NOSPEC.
9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
is tidier and also fixes some bugs.
10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
11. There are some experimental functions for converting foreign patterns
(globs and POSIX patterns) into PCRE2 patterns.
Version 10.23 14-February-2017 Version 10.23 14-February-2017
------------------------------ ------------------------------

47
README
View File

@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP). or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
end of a line. Whatever you specify at build time is the default; the caller character as indicating the end of a line. Whatever you specify at build time
of PCRE2 can change the selection at run time. The default newline indicator is the default; the caller of PCRE2 can change the selection at run time. The
is a single LF character (the Unix standard). You can specify the default default newline indicator is a single LF character (the Unix standard). You
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf, can specify the default newline indicator by adding --enable-newline-is-cr,
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or --enable-newline-is-lf, --enable-newline-is-crlf,
--enable-newline-is-any to the "configure" command, respectively. --enable-newline-is-anycrlf, --enable-newline-is-any, or
--enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending . By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers sequence. This is independent of the option specifying what PCRE2 considers
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500 --with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of computing resource . PCRE2 has a counter that can be set to limit the amount of computing resource
it uses when matching a pattern with the Perl-compatible matching function. it uses when matching a pattern. If the limit is exceeded during a match, the
If the limit is exceeded during a match, the match fails. The default is ten match fails. The default is ten million. You can change the default by
million. You can change the default by setting, for example, setting, for example,
--with-match-limit=500000 --with-match-limit=500000
on the "configure" command. This is just the default; individual calls to on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion in the pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
pcre2api man page (search for pcre2_set_match_limit). discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking . There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory during a matching process, which indirectly limits the amount of heap memory
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively. non-UTF mode and UTF-mode with Unicode property support, respectively.
Test 8 checks some internal offsets and code size features; it is run only when Test 8 checks some internal offsets and code size features, but it is run only
the default "link size" of 2 is set (in other cases the sizes change) and when when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
Unicode support is enabled. 32-bit modes and for different link sizes, so there are different output files
for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in 16-bit and 32-bit modes. These are tests that generate different output in
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes. pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains some special UTF and UCP tests that give different output for Test 14 contains some special UTF and UCP tests that give different output for
the different widths. different code unit widths.
Test 15 contains a number of tests that must not be run with JIT. They check, Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive among other non-JIT things, the match-limiting features of the intepretive
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out. and with UTF support, respectively. Test 23 tests \C when it is locked out.
Tests 24 and 25 test the experimental pattern conversion functions, without and
with UTF support, respectively.
Character tables Character tables
---------------- ----------------
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(), character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to file does not get automatically re-generated. The best way to do this is to
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
src/pcre2_compile.c ) src/pcre2_compile.c )
src/pcre2_config.c ) src/pcre2_config.c )
src/pcre2_context.c ) src/pcre2_context.c )
src/pcre2_convert.c )
src/pcre2_dfa_match.c ) src/pcre2_dfa_match.c )
src/pcre2_error.c ) src/pcre2_error.c )
src/pcre2_find_bracket.c ) src/pcre2_find_bracket.c )
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2 src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2 src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program src/pcre2test.c comprehensive test program
src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program src/pcre2_jit_test.c JIT test program
(C) Auxiliary files: (C) Auxiliary files:
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand" (E) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments ) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure" src/config.h.generic ) a version of config.h for use in non-"configure"
) environments ) environments
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 17 June 2017 Last updated: 18 July 2017

View File

@ -830,7 +830,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $supportBSC -ne 0 ] ; then if [ $supportBSC -ne 0 ] ; then
echo " Skipped because \C is not disabled" echo " Skipped because \C is not disabled"
else else
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput23 testtry $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput23 testtry
checkresult $? 23 "" checkresult $? 23 ""
fi fi
fi fi
@ -839,7 +839,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ "$do24" = yes ] ; then if [ "$do24" = yes ] ; then
echo $title24 echo $title24
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput24 testtry $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput24 testtry
checkresult $? 24 "" checkresult $? 24 ""
fi fi
@ -850,7 +850,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $utf -eq 0 ] ; then if [ $utf -eq 0 ] ; then
echo " Skipped because UTF-$bits support is not available" echo " Skipped because UTF-$bits support is not available"
else else
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput25 testtry $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput25 testtry
checkresult $? 25 "" checkresult $? 25 ""
fi fi
fi fi

View File

@ -10,17 +10,17 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre2_major, [10]) m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [30]) m4_define(pcre2_minor, [30])
m4_define(pcre2_prerelease, [-DEV]) m4_define(pcre2_prerelease, [-RC1])
m4_define(pcre2_date, [2017-03-05]) m4_define(pcre2_date, [2017-07-18])
# NOTE: The CMakeLists.txt file searches for the above variables in the first # NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved. # 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age) # Libtool shared library interface versions (current:revision:age)
m4_define(libpcre2_8_version, [5:0:5]) m4_define(libpcre2_8_version, [6:0:6])
m4_define(libpcre2_16_version, [5:0:5]) m4_define(libpcre2_16_version, [6:0:6])
m4_define(libpcre2_32_version, [5:0:5]) m4_define(libpcre2_32_version, [6:0:6])
m4_define(libpcre2_posix_version, [1:1:0]) m4_define(libpcre2_posix_version, [2:0:0])
AC_PREREQ(2.57) AC_PREREQ(2.57)
AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2) AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2)

View File

@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP). or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
end of a line. Whatever you specify at build time is the default; the caller character as indicating the end of a line. Whatever you specify at build time
of PCRE2 can change the selection at run time. The default newline indicator is the default; the caller of PCRE2 can change the selection at run time. The
is a single LF character (the Unix standard). You can specify the default default newline indicator is a single LF character (the Unix standard). You
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf, can specify the default newline indicator by adding --enable-newline-is-cr,
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or --enable-newline-is-lf, --enable-newline-is-crlf,
--enable-newline-is-any to the "configure" command, respectively. --enable-newline-is-anycrlf, --enable-newline-is-any, or
--enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending . By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers sequence. This is independent of the option specifying what PCRE2 considers
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500 --with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of computing resource . PCRE2 has a counter that can be set to limit the amount of computing resource
it uses when matching a pattern with the Perl-compatible matching function. it uses when matching a pattern. If the limit is exceeded during a match, the
If the limit is exceeded during a match, the match fails. The default is ten match fails. The default is ten million. You can change the default by
million. You can change the default by setting, for example, setting, for example,
--with-match-limit=500000 --with-match-limit=500000
on the "configure" command. This is just the default; individual calls to on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion in the pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
pcre2api man page (search for pcre2_set_match_limit). discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking . There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory during a matching process, which indirectly limits the amount of heap memory
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively. non-UTF mode and UTF-mode with Unicode property support, respectively.
Test 8 checks some internal offsets and code size features; it is run only when Test 8 checks some internal offsets and code size features, but it is run only
the default "link size" of 2 is set (in other cases the sizes change) and when when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
Unicode support is enabled. 32-bit modes and for different link sizes, so there are different output files
for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in 16-bit and 32-bit modes. These are tests that generate different output in
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes. pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains some special UTF and UCP tests that give different output for Test 14 contains some special UTF and UCP tests that give different output for
the different widths. different code unit widths.
Test 15 contains a number of tests that must not be run with JIT. They check, Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive among other non-JIT things, the match-limiting features of the intepretive
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out. and with UTF support, respectively. Test 23 tests \C when it is locked out.
Tests 24 and 25 test the experimental pattern conversion functions, without and
with UTF support, respectively.
Character tables Character tables
---------------- ----------------
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(), character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to file does not get automatically re-generated. The best way to do this is to
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
src/pcre2_compile.c ) src/pcre2_compile.c )
src/pcre2_config.c ) src/pcre2_config.c )
src/pcre2_context.c ) src/pcre2_context.c )
src/pcre2_convert.c )
src/pcre2_dfa_match.c ) src/pcre2_dfa_match.c )
src/pcre2_error.c ) src/pcre2_error.c )
src/pcre2_find_bracket.c ) src/pcre2_find_bracket.c )
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2 src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2 src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program src/pcre2test.c comprehensive test program
src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program src/pcre2_jit_test.c JIT test program
(C) Auxiliary files: (C) Auxiliary files:
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand" (E) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments ) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure" src/config.h.generic ) a version of config.h for use in non-"configure"
) environments ) environments
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 17 June 2017 Last updated: 18 July 2017

View File

@ -87,10 +87,10 @@ Options that specify values have names that start with --with.
<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br> <br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
<P> <P>
By default, a library called <b>libpcre2-8</b> is built, containing functions By default, a library called <b>libpcre2-8</b> is built, containing functions
that take string arguments contained in vectors of bytes, interpreted either as that take string arguments contained in arrays of bytes, interpreted either as
single-byte characters, or UTF-8 strings. You can also build two other single-byte characters, or UTF-8 strings. You can also build two other
libraries, called <b>libpcre2-16</b> and <b>libpcre2-32</b>, which process libraries, called <b>libpcre2-16</b> and <b>libpcre2-32</b>, which process
strings that are contained in vectors of 16-bit and 32-bit code units, strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters or respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the <b>configure</b> command: the following to the <b>configure</b> command:
@ -208,19 +208,23 @@ to the <b>configure</b> command. There is a fourth option, specified by
--enable-newline-is-anycrlf --enable-newline-is-anycrlf
</pre> </pre>
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
indicating a line ending. Finally, a fifth option, specified by indicating a line ending. A fifth option, specified by
<pre> <pre>
--enable-newline-is-any --enable-newline-is-any
</pre> </pre>
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical sequences are the three just mentioned, plus the single characters VT (vertical
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029). separator, U+2028), and PS (paragraph separator, U+2029). The final option is
<pre>
--enable-newline-is-nul
</pre>
which causes NUL (binary zero) is set as the default line-ending character.
</P> </P>
<P> <P>
Whatever default line ending convention is selected when PCRE2 is built can be Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system. recommended to use the standard for your operating system.
</P> </P>
<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br> <br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
<P> <P>
@ -301,7 +305,9 @@ because the size of each backtracking "frame" depends on the number of
capturing parentheses in a pattern, the amount of heap that is used before the capturing parentheses in a pattern, the amount of heap that is used before the
limit is reached varies from pattern to pattern. This limit was more useful in limit is reached varies from pattern to pattern. This limit was more useful in
versions before 10.30, where function recursion was used for backtracking. versions before 10.30, where function recursion was used for backtracking.
However, as well as applying to <b>pcre2_match()</b>, this limit also controls </P>
<P>
As well as applying to <b>pcre2_match()</b>, the depth limit also controls
the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
used for lookaround assertions, atomic groups, and recursion within patterns. used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching. The limit does not apply to JIT matching.
@ -559,7 +565,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br> <br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 17 June 2017 Last updated: 18 July 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -3487,10 +3487,10 @@ PCRE2 BUILD-TIME OPTIONS
BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
By default, a library called libpcre2-8 is built, containing functions By default, a library called libpcre2-8 is built, containing functions
that take string arguments contained in vectors of bytes, interpreted that take string arguments contained in arrays of bytes, interpreted
either as single-byte characters, or UTF-8 strings. You can also build either as single-byte characters, or UTF-8 strings. You can also build
two other libraries, called libpcre2-16 and libpcre2-32, which process two other libraries, called libpcre2-16 and libpcre2-32, which process
strings that are contained in vectors of 16-bit and 32-bit code units, strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters respectively. These can be interpreted either as single-unit characters
or UTF-16/UTF-32 strings. To build these additional libraries, add one or UTF-16/UTF-32 strings. To build these additional libraries, add one
or both of the following to the configure command: or both of the following to the configure command:
@ -3609,7 +3609,7 @@ NEWLINE RECOGNITION
--enable-newline-is-anycrlf --enable-newline-is-anycrlf
which causes PCRE2 to recognize any of the three sequences CR, LF, or which causes PCRE2 to recognize any of the three sequences CR, LF, or
CRLF as indicating a line ending. Finally, a fifth option, specified by CRLF as indicating a line ending. A fifth option, specified by
--enable-newline-is-any --enable-newline-is-any
@ -3617,97 +3617,103 @@ NEWLINE RECOGNITION
newline sequences are the three just mentioned, plus the single charac- newline sequences are the three just mentioned, plus the single charac-
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+0085), LS (line separator, U+2028), and PS (paragraph separator,
U+2029). U+2029). The final option is
--enable-newline-is-nul
which causes NUL (binary zero) is set as the default line-ending char-
acter.
Whatever default line ending convention is selected when PCRE2 is built Whatever default line ending convention is selected when PCRE2 is built
can be overridden by applications that use the library. At build time can be overridden by applications that use the library. At build time
it is conventional to use the standard for your operating system. it is recommended to use the standard for your operating system.
WHAT \R MATCHES WHAT \R MATCHES
By default, the sequence \R in a pattern matches any Unicode newline By default, the sequence \R in a pattern matches any Unicode newline
sequence, independently of what has been selected as the line ending sequence, independently of what has been selected as the line ending
sequence. If you specify sequence. If you specify
--enable-bsr-anycrlf --enable-bsr-anycrlf
the default is changed so that \R matches only CR, LF, or CRLF. What- the default is changed so that \R matches only CR, LF, or CRLF. What-
ever is selected when PCRE2 is built can be overridden by applications ever is selected when PCRE2 is built can be overridden by applications
that use the library. that use the library.
HANDLING VERY LARGE PATTERNS HANDLING VERY LARGE PATTERNS
Within a compiled pattern, offset values are used to point from one Within a compiled pattern, offset values are used to point from one
part to another (for example, from an opening parenthesis to an alter- part to another (for example, from an opening parenthesis to an alter-
nation metacharacter). By default, in the 8-bit and 16-bit libraries, nation metacharacter). By default, in the 8-bit and 16-bit libraries,
two-byte values are used for these offsets, leading to a maximum size two-byte values are used for these offsets, leading to a maximum size
for a compiled pattern of around 64K code units. This is sufficient to for a compiled pattern of around 64K code units. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do handle all but the most gigantic patterns. Nevertheless, some people do
want to process truly enormous patterns, so it is possible to compile want to process truly enormous patterns, so it is possible to compile
PCRE2 to use three-byte or four-byte offsets by adding a setting such PCRE2 to use three-byte or four-byte offsets by adding a setting such
as as
--with-link-size=3 --with-link-size=3
to the configure command. The value given must be 2, 3, or 4. For the to the configure command. The value given must be 2, 3, or 4. For the
16-bit library, a value of 3 is rounded up to 4. In these libraries, 16-bit library, a value of 3 is rounded up to 4. In these libraries,
using longer offsets slows down the operation of PCRE2 because it has using longer offsets slows down the operation of PCRE2 because it has
to load additional data when handling them. For the 32-bit library the to load additional data when handling them. For the 32-bit library the
value is always 4 and cannot be overridden; the value of --with-link- value is always 4 and cannot be overridden; the value of --with-link-
size is ignored. size is ignored.
LIMITING PCRE2 RESOURCE USAGE LIMITING PCRE2 RESOURCE USAGE
The pcre2_match() function increments a counter each time it goes round The pcre2_match() function increments a counter each time it goes round
its main loop. Putting a limit on this counter controls the amount of its main loop. Putting a limit on this counter controls the amount of
computing resource used by a single call to pcre2_match(). The limit computing resource used by a single call to pcre2_match(). The limit
can be changed at run time, as described in the pcre2api documentation. can be changed at run time, as described in the pcre2api documentation.
The default is 10 million, but this can be changed by adding a setting The default is 10 million, but this can be changed by adding a setting
such as such as
--with-match-limit=500000 --with-match-limit=500000
to the configure command. This setting also applies to the to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the pcre2_dfa_match() matching function, and to JIT matching (though the
counting is done differently). counting is done differently).
The pcre2_match() function starts out using a 20K vector on the system The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking stack to record backtracking points. The more nested backtracking
points there are (that is, the deeper the search tree), the more memory points there are (that is, the deeper the search tree), the more memory
is needed. If the initial vector is not large enough, heap memory is is needed. If the initial vector is not large enough, heap memory is
used, up to a certain limit, which is specified in kilobytes. The limit used, up to a certain limit, which is specified in kilobytes. The limit
can be changed at run time, as described in the pcre2api documentation. can be changed at run time, as described in the pcre2api documentation.
The default limit (in effect unlimited) is 20 million. You can change The default limit (in effect unlimited) is 20 million. You can change
this by a setting such as this by a setting such as
--with-heap-limit=500 --with-heap-limit=500
which limits the amount of heap to 500 kilobytes. This limit applies which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply JIT (which has its own memory arrangements) is used, nor does it apply
to pcre2_dfa_match(). to pcre2_dfa_match().
You can also explicitly limit the depth of nested backtracking in the You can also explicitly limit the depth of nested backtracking in the
pcre2_match() interpreter. This limit defaults to the value that is set pcre2_match() interpreter. This limit defaults to the value that is set
for --with-match-limit. You can set a lower default limit by adding, for --with-match-limit. You can set a lower default limit by adding,
for example, for example,
--with-match-limit_depth=10000 --with-match-limit_depth=10000
to the configure command. This value can be overridden at run time. to the configure command. This value can be overridden at run time.
This depth limit indirectly limits the amount of heap memory that is This depth limit indirectly limits the amount of heap memory that is
used, but because the size of each backtracking "frame" depends on the used, but because the size of each backtracking "frame" depends on the
number of capturing parentheses in a pattern, the amount of heap that number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern. is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function This limit was more useful in versions before 10.30, where function
recursion was used for backtracking. However, as well as applying to recursion was used for backtracking.
pcre2_match(), this limit also controls the depth of recursive function
calls in pcre2_dfa_match(). These are used for lookaround assertions, As well as applying to pcre2_match(), the depth limit also controls the
atomic groups, and recursion within patterns. The limit does not apply depth of recursive function calls in pcre2_dfa_match(). These are used
to JIT matching. for lookaround assertions, atomic groups, and recursion within pat-
terns. The limit does not apply to JIT matching.
CREATING CHARACTER TABLES AT BUILD TIME CREATING CHARACTER TABLES AT BUILD TIME
@ -3969,7 +3975,7 @@ AUTHOR
REVISION REVISION
Last updated: 17 June 2017 Last updated: 18 July 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "17 June 2017" "PCRE2 10.30" .TH PCRE2BUILD 3 "18 July 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
. .
@ -66,10 +66,10 @@ Options that specify values have names that start with --with.
.rs .rs
.sp .sp
By default, a library called \fBlibpcre2-8\fP is built, containing functions By default, a library called \fBlibpcre2-8\fP is built, containing functions
that take string arguments contained in vectors of bytes, interpreted either as that take string arguments contained in arrays of bytes, interpreted either as
single-byte characters, or UTF-8 strings. You can also build two other single-byte characters, or UTF-8 strings. You can also build two other
libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process
strings that are contained in vectors of 16-bit and 32-bit code units, strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters or respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the \fBconfigure\fP command: the following to the \fBconfigure\fP command:
@ -197,18 +197,22 @@ to the \fBconfigure\fP command. There is a fourth option, specified by
--enable-newline-is-anycrlf --enable-newline-is-anycrlf
.sp .sp
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
indicating a line ending. Finally, a fifth option, specified by indicating a line ending. A fifth option, specified by
.sp .sp
--enable-newline-is-any --enable-newline-is-any
.sp .sp
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical sequences are the three just mentioned, plus the single characters VT (vertical
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029). separator, U+2028), and PS (paragraph separator, U+2029). The final option is
.sp
--enable-newline-is-nul
.sp
which causes NUL (binary zero) is set as the default line-ending character.
.P .P
Whatever default line ending convention is selected when PCRE2 is built can be Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system. recommended to use the standard for your operating system.
. .
. .
.SH "WHAT \eR MATCHES" .SH "WHAT \eR MATCHES"
@ -297,7 +301,8 @@ because the size of each backtracking "frame" depends on the number of
capturing parentheses in a pattern, the amount of heap that is used before the capturing parentheses in a pattern, the amount of heap that is used before the
limit is reached varies from pattern to pattern. This limit was more useful in limit is reached varies from pattern to pattern. This limit was more useful in
versions before 10.30, where function recursion was used for backtracking. versions before 10.30, where function recursion was used for backtracking.
However, as well as applying to \fBpcre2_match()\fP, this limit also controls .P
As well as applying to \fBpcre2_match()\fP, the depth limit also controls
the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are
used for lookaround assertions, atomic groups, and recursion within patterns. used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching. The limit does not apply to JIT matching.
@ -577,6 +582,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 17 June 2017 Last updated: 18 July 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -132,6 +132,12 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <zlib.h> header file. */ /* Define to 1 if you have the <zlib.h> header file. */
/* #undef HAVE_ZLIB_H */ /* #undef HAVE_ZLIB_H */
/* This limits the amount of memory that pcre2_match() may use while matching
a pattern. The value is in kilobytes. */
#ifndef HEAP_LIMIT
#define HEAP_LIMIT 20000000
#endif
/* The value of LINK_SIZE determines the number of bytes used to store links /* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases. compiled patterns up to 64K long. This covers the vast majority of cases.
@ -148,7 +154,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif #endif
/* The value of MATCH_LIMIT determines the default number of times the /* The value of MATCH_LIMIT determines the default number of times the
internal match() function can record a backtrack position during a single pcre2_match() function can record a backtrack position during a single
matching attempt. There is a runtime interface for setting a different matching attempt. There is a runtime interface for setting a different
limit. The limit exists in order to catch runaway regular expressions that limit. The limit exists in order to catch runaway regular expressions that
take for ever to determine that they do not match. The default is set very take for ever to determine that they do not match. The default is set very
@ -188,8 +194,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* The value of NEWLINE_DEFAULT determines the default newline character /* The value of NEWLINE_DEFAULT determines the default newline character
sequence. PCRE2 client programs can override this by selecting other values sequence. PCRE2 client programs can override this by selecting other values
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), and 5 at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), 5
(ANYCRLF). */ (ANYCRLF), and 6 (NUL). */
#ifndef NEWLINE_DEFAULT #ifndef NEWLINE_DEFAULT
#define NEWLINE_DEFAULT 2 #define NEWLINE_DEFAULT 2
#endif #endif
@ -204,7 +210,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE2" #define PACKAGE_NAME "PCRE2"
/* Define to the full name and version of this package. */ /* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE2 10.30-DEV" #define PACKAGE_STRING "PCRE2 10.30-RC1"
/* Define to the one symbol short name of this package. */ /* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2" #define PACKAGE_TARNAME "pcre2"
@ -213,7 +219,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL "" #define PACKAGE_URL ""
/* Define to the version of this package. */ /* Define to the version of this package. */
#define PACKAGE_VERSION "10.30-DEV" #define PACKAGE_VERSION "10.30-RC1"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested /* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system parentheses (of any kind) in a pattern. This limits the amount of system
@ -261,6 +267,11 @@ sure both macros are undefined; an emulation function will then be used. */
your system. */ your system. */
/* #undef PTHREAD_CREATE_JOINABLE */ /* #undef PTHREAD_CREATE_JOINABLE */
/* Define to any non-zero number to enable support for SELinux compatible
executable memory allocator in JIT. Note that this will have no effect
unless SUPPORT_JIT is also defined. */
/* #undef SLJIT_PROT_EXECUTABLE_ALLOCATOR */
/* Define to 1 if you have the ANSI C header files. */ /* Define to 1 if you have the ANSI C header files. */
/* #undef STDC_HEADERS */ /* #undef STDC_HEADERS */
@ -328,7 +339,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif #endif
/* Version number of package */ /* Version number of package */
#define VERSION "10.30-DEV" #define VERSION "10.30-RC1"
/* Define to 1 if on MINIX. */ /* Define to 1 if on MINIX. */
/* #undef _MINIX */ /* #undef _MINIX */

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10 #define PCRE2_MAJOR 10
#define PCRE2_MINOR 30 #define PCRE2_MINOR 30
#define PCRE2_PRERELEASE -DEV #define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2017-03-05 #define PCRE2_DATE 2017-07-18
/* When an application links to a PCRE DLL in Windows, the symbols that are /* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate imported have to be identified as such. When building PCRE2, the appropriate

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10 #define PCRE2_MAJOR 10
#define PCRE2_MINOR 30 #define PCRE2_MINOR 30
#define PCRE2_PRERELEASE -DEV #define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2017-03-05 #define PCRE2_DATE 2017-07-18
/* When an application links to a PCRE DLL in Windows, the symbols that are /* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate imported have to be identified as such. When building PCRE2, the appropriate
@ -138,6 +138,14 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */ #define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */ #define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */ #define PCRE2_EXTENDED_MORE 0x01000000u /* C */
#define PCRE2_LITERAL 0x02000000u /* C */
/* An additional compile options word is available in the compile context. */
#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */
#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
/* These are for pcre2_jit_compile(). */ /* These are for pcre2_jit_compile(). */
@ -176,6 +184,16 @@ ignored for pcre2_jit_match(). */
#define PCRE2_NO_JIT 0x00002000u #define PCRE2_NO_JIT 0x00002000u
/* Options for pcre2_pattern_convert(). */
#define PCRE2_CONVERT_UTF 0x00000001u
#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u
#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u
#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u
#define PCRE2_CONVERT_GLOB 0x00000010u
#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u
#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u
/* Newline and \R settings, for use in compile contexts. The newline values /* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be must be kept in step with values set in config.h and both sets must all be
greater than zero. */ greater than zero. */
@ -185,6 +203,7 @@ greater than zero. */
#define PCRE2_NEWLINE_CRLF 3 #define PCRE2_NEWLINE_CRLF 3
#define PCRE2_NEWLINE_ANY 4 #define PCRE2_NEWLINE_ANY 4
#define PCRE2_NEWLINE_ANYCRLF 5 #define PCRE2_NEWLINE_ANYCRLF 5
#define PCRE2_NEWLINE_NUL 6
#define PCRE2_BSR_UNICODE 1 #define PCRE2_BSR_UNICODE 1
#define PCRE2_BSR_ANYCRLF 2 #define PCRE2_BSR_ANYCRLF 2
@ -270,6 +289,8 @@ numbers must not be changed. */
#define PCRE2_ERROR_TOOMANYREPLACE (-61) #define PCRE2_ERROR_TOOMANYREPLACE (-61)
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62) #define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
#define PCRE2_ERROR_HEAPLIMIT (-63) #define PCRE2_ERROR_HEAPLIMIT (-63)
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
/* Request types for pcre2_pattern_info() */ /* Request types for pcre2_pattern_info() */
@ -351,6 +372,9 @@ typedef struct pcre2_real_compile_context pcre2_compile_context; \
struct pcre2_real_match_context; \ struct pcre2_real_match_context; \
typedef struct pcre2_real_match_context pcre2_match_context; \ typedef struct pcre2_real_match_context pcre2_match_context; \
\ \
struct pcre2_real_convert_context; \
typedef struct pcre2_real_convert_context pcre2_convert_context; \
\
struct pcre2_real_code; \ struct pcre2_real_code; \
typedef struct pcre2_real_code pcre2_code; \ typedef struct pcre2_real_code pcre2_code; \
\ \
@ -434,6 +458,8 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_bsr(pcre2_compile_context *, uint32_t); \ pcre2_set_bsr(pcre2_compile_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_character_tables(pcre2_compile_context *, const unsigned char *); \ pcre2_set_character_tables(pcre2_compile_context *, const unsigned char *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \ pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
@ -466,6 +492,18 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_recursion_memory_management(pcre2_match_context *, \ pcre2_set_recursion_memory_management(pcre2_match_context *, \
void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *); void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *);
#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
*pcre2_convert_context_copy(pcre2_convert_context *); \
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
*pcre2_convert_context_create(pcre2_general_context *); \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_convert_context_free(pcre2_convert_context *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_glob_separator(pcre2_convert_context *, uint32_t);
/* Functions concerned with compiling a pattern to PCRE internal code. */ /* Functions concerned with compiling a pattern to PCRE internal code. */
@ -572,6 +610,16 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *); PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *);
/* Functions for converting pattern source strings. */
#define PCRE2_CONVERT_FUNCTIONS \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \
PCRE2_SIZE *, pcre2_convert_context *); \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_converted_pattern_free(PCRE2_UCHAR *);
/* Functions for JIT processing */ /* Functions for JIT processing */
#define PCRE2_JIT_FUNCTIONS \ #define PCRE2_JIT_FUNCTIONS \
@ -623,6 +671,7 @@ pcre2_compile are called by application code. */
#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_) #define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_)
#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_) #define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_)
#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_) #define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_)
#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_)
#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_) #define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_)
#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_) #define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_)
#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_) #define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_)
@ -634,6 +683,7 @@ pcre2_compile are called by application code. */
#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_) #define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_) #define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_) #define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_)
#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_) #define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_)
#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_) #define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_)
@ -649,6 +699,10 @@ pcre2_compile are called by application code. */
#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_) #define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_)
#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_) #define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_)
#define pcre2_config PCRE2_SUFFIX(pcre2_config_) #define pcre2_config PCRE2_SUFFIX(pcre2_config_)
#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_)
#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_)
#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_)
#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_)
#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_) #define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_)
#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_) #define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_)
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_) #define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
@ -672,6 +726,7 @@ pcre2_compile are called by application code. */
#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_) #define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_)
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_) #define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_) #define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_)
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_) #define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_) #define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_) #define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
@ -680,8 +735,11 @@ pcre2_compile are called by application code. */
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_) #define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_) #define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_) #define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_)
#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_) #define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_)
#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_) #define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_)
#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_)
#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_)
#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_) #define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_)
#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_) #define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_)
#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_) #define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_)
@ -716,6 +774,8 @@ PCRE2_STRUCTURE_LIST \
PCRE2_GENERAL_INFO_FUNCTIONS \ PCRE2_GENERAL_INFO_FUNCTIONS \
PCRE2_GENERAL_CONTEXT_FUNCTIONS \ PCRE2_GENERAL_CONTEXT_FUNCTIONS \
PCRE2_COMPILE_CONTEXT_FUNCTIONS \ PCRE2_COMPILE_CONTEXT_FUNCTIONS \
PCRE2_CONVERT_CONTEXT_FUNCTIONS \
PCRE2_CONVERT_FUNCTIONS \
PCRE2_MATCH_CONTEXT_FUNCTIONS \ PCRE2_MATCH_CONTEXT_FUNCTIONS \
PCRE2_COMPILE_FUNCTIONS \ PCRE2_COMPILE_FUNCTIONS \
PCRE2_PATTERN_INFO_FUNCTIONS \ PCRE2_PATTERN_INFO_FUNCTIONS \
@ -745,6 +805,7 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
#undef PCRE2_GENERAL_INFO_FUNCTIONS #undef PCRE2_GENERAL_INFO_FUNCTIONS
#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS #undef PCRE2_GENERAL_CONTEXT_FUNCTIONS
#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS #undef PCRE2_COMPILE_CONTEXT_FUNCTIONS
#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS
#undef PCRE2_MATCH_CONTEXT_FUNCTIONS #undef PCRE2_MATCH_CONTEXT_FUNCTIONS
#undef PCRE2_COMPILE_FUNCTIONS #undef PCRE2_COMPILE_FUNCTIONS
#undef PCRE2_PATTERN_INFO_FUNCTIONS #undef PCRE2_PATTERN_INFO_FUNCTIONS

View File

@ -4351,7 +4351,7 @@ struct sljit_jump *quit;
struct sljit_jump *partial_quit[2]; struct sljit_jump *partial_quit[2];
sljit_u8 instruction[8]; sljit_u8 instruction[8];
sljit_s32 tmp1_ind = sljit_get_register_index(TMP1); sljit_s32 tmp1_ind = sljit_get_register_index(TMP1);
sljit_s32 tmp2_ind = sljit_get_register_index(TMP2); // sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
sljit_s32 str_ptr_ind = sljit_get_register_index(STR_PTR); sljit_s32 str_ptr_ind = sljit_get_register_index(STR_PTR);
sljit_s32 data_ind = 0; sljit_s32 data_ind = 0;
sljit_s32 tmp_ind = 1; sljit_s32 tmp_ind = 1;
@ -4376,7 +4376,9 @@ if (common->mode == PCRE2_JIT_COMPLETE)
OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit)); OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit));
SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1); // SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
SLJIT_ASSERT(tmp1_ind < 8);
/* MOVD xmm, r/m32 */ /* MOVD xmm, r/m32 */
instruction[0] = 0x66; instruction[0] = 0x66;

View File

@ -4073,7 +4073,8 @@ else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%
* Show compile extra options * * Show compile extra options *
*************************************************/ *************************************************/
/* Called for unsupported POSIX options. /* Called only for unsupported POSIX options at present, and therefore needed
only when the 8-bit library is being compiled.
Arguments: Arguments:
options an options word options an options word
@ -4083,17 +4084,21 @@ Arguments:
Returns: nothing Returns: nothing
*/ */
#ifdef SUPPORT_PCRE2_8
static void static void
show_compile_extra_options(uint32_t options, const char *before, show_compile_extra_options(uint32_t options, const char *before,
const char *after) const char *after)
{ {
if (options == 0) fprintf(outfile, "%s <none>%s", before, after); if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
else fprintf(outfile, "%s%s%s%s", else fprintf(outfile, "%s%s%s%s%s%s",
before, before,
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "", ((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "", ((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
after); after);
} }
#endif

View File

@ -124,10 +124,10 @@
/* SLJIT_REWRITABLE_JUMP is 0x1000. */ /* SLJIT_REWRITABLE_JUMP is 0x1000. */
#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) #if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86)
# define PATCH_MB 0x4 # define PATCH_MB 0x4
# define PATCH_MW 0x8 # define PATCH_MW 0x8
#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) #if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64)
# define PATCH_MD 0x10 # define PATCH_MD 0x10
#endif #endif
#endif #endif
@ -1555,6 +1555,7 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_cmov(struct sljit_compile
sljit_s32 dst_reg, sljit_s32 dst_reg,
sljit_s32 src, sljit_sw srcw) sljit_s32 src, sljit_sw srcw)
{ {
(void)srcw; /* To stop compiler warning */
#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) #if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS)
CHECK_ARGUMENT(!(type & ~(0xff | SLJIT_I32_OP))); CHECK_ARGUMENT(!(type & ~(0xff | SLJIT_I32_OP)));
CHECK_ARGUMENT((type & 0xff) >= SLJIT_EQUAL && (type & 0xff) <= SLJIT_ORDERED_F64); CHECK_ARGUMENT((type & 0xff) >= SLJIT_EQUAL && (type & 0xff) <= SLJIT_ORDERED_F64);

11
testdata/testinput1 vendored
View File

@ -95,17 +95,6 @@
aaac aaac
abbbbbbbbbbbac abbbbbbbbbbbac
/^(b+|a){1,2}?bc/
bbc
/^(b*|ba){1,2}?bc/
babc
bbabc
bababc
\= Expect no match
bababbc
babababc
/^(ba|b*){1,2}?bc/ /^(ba|b*){1,2}?bc/
babc babc
bbabc bbabc

15
testdata/testinput2 vendored
View File

@ -5350,4 +5350,19 @@ a)"xI
\= Expect no match \= Expect no match
Not a whole line Not a whole line
# Perl gets this wrong, failing to capture 'b' in group 1.
/^(b+|a){1,2}?bc/
bbc
# And again here, for the "babc" subject string.
/^(b*|ba){1,2}?bc/
babc
bbabc
bababc
\= Expect no match
bababbc
babababc
# End of testinput2 # End of testinput2

21
testdata/testoutput1 vendored
View File

@ -183,27 +183,6 @@ No match
abbbbbbbbbbbac abbbbbbbbbbbac
No match No match
/^(b+|a){1,2}?bc/
bbc
0: bbc
1: b
/^(b*|ba){1,2}?bc/
babc
0: babc
1: ba
bbabc
0: bbabc
1: ba
bababc
0: bababc
1: ba
\= Expect no match
bababbc
No match
babababc
No match
/^(ba|b*){1,2}?bc/ /^(ba|b*){1,2}?bc/
babc babc
0: babc 0: babc

25
testdata/testoutput2 vendored
View File

@ -16300,6 +16300,31 @@ No match
Not a whole line Not a whole line
No match No match
# Perl gets this wrong, failing to capture 'b' in group 1.
/^(b+|a){1,2}?bc/
bbc
0: bbc
1: b
# And again here, for the "babc" subject string.
/^(b*|ba){1,2}?bc/
babc
0: babc
1: ba
bbabc
0: bbabc
1: ba
bababc
0: bababc
1: ba
\= Expect no match
bababbc
No match
babababc
No match
# End of testinput2 # End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number) Error -65: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values. # with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 /(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 /(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 /(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated Failed: error 186 at offset 12820: regular expression is too complicated

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values. # with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 /(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 /(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 /(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated Failed: error 186 at offset 12820: regular expression is too complicated