Code tidies for 10.30-RC1 release candidate.
This commit is contained in:
parent
e3052af6fd
commit
810d9b6da5
|
@ -2,8 +2,8 @@ Change Log for PCRE2
|
|||
--------------------
|
||||
|
||||
|
||||
Version 10.30-DEV 09-March-2017
|
||||
-------------------------------
|
||||
Version 10.30-RC1 18-July-2017
|
||||
------------------------------
|
||||
|
||||
1. The main interpreter, pcre2_match(), has been refactored into a new version
|
||||
that does not use recursive function calls (and therefore the stack) for
|
||||
|
|
57
NEWS
57
NEWS
|
@ -1,6 +1,63 @@
|
|||
News about PCRE2 releases
|
||||
-------------------------
|
||||
|
||||
Version 10.30-RC1 18-July-2017
|
||||
------------------------------
|
||||
|
||||
The full list of changes that includes bugfixes and tidies is, as always, in
|
||||
ChangeLog. These are the most important new features:
|
||||
|
||||
1. The main interpreter, pcre2_match(), has been refactored into a new version
|
||||
that does not use recursive function calls (and therefore the system stack) for
|
||||
remembering backtracking positions. This makes --disable-stack-for-recursion a
|
||||
NOOP. The new implementation allows backtracking into recursive group calls in
|
||||
patterns, making it more compatible with Perl, and also fixes some other
|
||||
previously hard-to-do issues. For patterns that have a lot of backtracking, the
|
||||
heap is now used, and there is explicit limit on the amount, settable by
|
||||
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
|
||||
but is renamed as "depth limit" (though the old names remain for
|
||||
compatibility).
|
||||
|
||||
There is also a change in the way callouts from pcre2_match() are handled. The
|
||||
offset_vector field in the callout block is no longer a pointer to the
|
||||
actual ovector that was passed to the matching function in the match data
|
||||
block. Instead it points to an internal ovector of a size large enough to hold
|
||||
all possible captured substrings in the pattern.
|
||||
|
||||
2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
|
||||
the end of the subject.
|
||||
|
||||
3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
|
||||
pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
|
||||
also supported.
|
||||
|
||||
4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
|
||||
|
||||
5. Additional compile options in the compile context are now available, and the
|
||||
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
|
||||
|
||||
6. The newline type PCRE2_NEWLINE_NUL is now available.
|
||||
|
||||
7. The match limit value now also applies to pcre2_dfa_match() as there are
|
||||
patterns that can use up a lot of resources without necessarily recursing very
|
||||
deeply.
|
||||
|
||||
8. The option REG_PEND (a GNU extension) is now available for the POSIX
|
||||
wrapper. Also there is a new option PCRE2_LITERAL which is used to support
|
||||
REG_NOSPEC.
|
||||
|
||||
9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
|
||||
benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
|
||||
using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
|
||||
is tidier and also fixes some bugs.
|
||||
|
||||
10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
|
||||
|
||||
11. There are some experimental functions for converting foreign patterns
|
||||
(globs and POSIX patterns) into PCRE2 patterns.
|
||||
|
||||
|
||||
Version 10.23 14-February-2017
|
||||
------------------------------
|
||||
|
||||
|
|
47
README
47
README
|
@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
|
|||
or starting a pattern with (*UCP).
|
||||
|
||||
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
|
||||
of the preceding, or any of the Unicode newline sequences, as indicating the
|
||||
end of a line. Whatever you specify at build time is the default; the caller
|
||||
of PCRE2 can change the selection at run time. The default newline indicator
|
||||
is a single LF character (the Unix standard). You can specify the default
|
||||
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
|
||||
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
|
||||
--enable-newline-is-any to the "configure" command, respectively.
|
||||
of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
|
||||
character as indicating the end of a line. Whatever you specify at build time
|
||||
is the default; the caller of PCRE2 can change the selection at run time. The
|
||||
default newline indicator is a single LF character (the Unix standard). You
|
||||
can specify the default newline indicator by adding --enable-newline-is-cr,
|
||||
--enable-newline-is-lf, --enable-newline-is-crlf,
|
||||
--enable-newline-is-anycrlf, --enable-newline-is-any, or
|
||||
--enable-newline-is-nul to the "configure" command, respectively.
|
||||
|
||||
. By default, the sequence \R in a pattern matches any Unicode line ending
|
||||
sequence. This is independent of the option specifying what PCRE2 considers
|
||||
|
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
|
|||
--with-parens-nest-limit=500
|
||||
|
||||
. PCRE2 has a counter that can be set to limit the amount of computing resource
|
||||
it uses when matching a pattern with the Perl-compatible matching function.
|
||||
If the limit is exceeded during a match, the match fails. The default is ten
|
||||
million. You can change the default by setting, for example,
|
||||
it uses when matching a pattern. If the limit is exceeded during a match, the
|
||||
match fails. The default is ten million. You can change the default by
|
||||
setting, for example,
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
on the "configure" command. This is just the default; individual calls to
|
||||
pcre2_match() can supply their own value. There is more discussion in the
|
||||
pcre2api man page (search for pcre2_set_match_limit).
|
||||
pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
|
||||
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
||||
|
||||
. There is a separate counter that limits the depth of nested backtracking
|
||||
during a matching process, which indirectly limits the amount of heap memory
|
||||
|
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
|||
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||
|
||||
Test 8 checks some internal offsets and code size features; it is run only when
|
||||
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||
Unicode support is enabled.
|
||||
Test 8 checks some internal offsets and code size features, but it is run only
|
||||
when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
|
||||
32-bit modes and for different link sizes, so there are different output files
|
||||
for each mode and link size.
|
||||
|
||||
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||
16-bit and 32-bit modes. These are tests that generate different output in
|
||||
|
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
|
|||
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||
|
||||
Test 14 contains some special UTF and UCP tests that give different output for
|
||||
the different widths.
|
||||
different code unit widths.
|
||||
|
||||
Test 15 contains a number of tests that must not be run with JIT. They check,
|
||||
among other non-JIT things, the match-limiting features of the intepretive
|
||||
|
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
|
|||
Tests 21 and 22 test \C support when the use of \C is not locked out, without
|
||||
and with UTF support, respectively. Test 23 tests \C when it is locked out.
|
||||
|
||||
Tests 24 and 25 test the experimental pattern conversion functions, without and
|
||||
with UTF support, respectively.
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
|
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
|
|||
by the program dftables (compiled from dftables.c), which uses the ANSI C
|
||||
character handling functions such as isalnum(), isalpha(), isupper(),
|
||||
islower(), etc. to build the table sources. This means that the default C
|
||||
locale which is set for your system will control the contents of these default
|
||||
locale that is set for your system will control the contents of these default
|
||||
tables. You can change the default tables by editing pcre2_chartables.c and
|
||||
then re-building PCRE2. If you do this, you should take care to ensure that the
|
||||
file does not get automatically re-generated. The best way to do this is to
|
||||
|
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
|
|||
src/pcre2_compile.c )
|
||||
src/pcre2_config.c )
|
||||
src/pcre2_context.c )
|
||||
src/pcre2_convert.c )
|
||||
src/pcre2_dfa_match.c )
|
||||
src/pcre2_error.c )
|
||||
src/pcre2_find_bracket.c )
|
||||
|
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
|
|||
src/pcre2demo.c simple demonstration of coding calls to PCRE2
|
||||
src/pcre2grep.c source of a grep utility that uses PCRE2
|
||||
src/pcre2test.c comprehensive test program
|
||||
src/pcre2_printint.c part of pcre2test
|
||||
src/pcre2_jit_test.c JIT test program
|
||||
|
||||
(C) Auxiliary files:
|
||||
|
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
|
|||
|
||||
(E) Auxiliary files for building PCRE2 "by hand"
|
||||
|
||||
pcre2.h.generic ) a version of the public PCRE2 header file
|
||||
src/pcre2.h.generic ) a version of the public PCRE2 header file
|
||||
) for use in non-"configure" environments
|
||||
config.h.generic ) a version of config.h for use in non-"configure"
|
||||
src/config.h.generic ) a version of config.h for use in non-"configure"
|
||||
) environments
|
||||
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 17 June 2017
|
||||
Last updated: 18 July 2017
|
||||
|
|
6
RunTest
6
RunTest
|
@ -830,7 +830,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
|||
if [ $supportBSC -ne 0 ] ; then
|
||||
echo " Skipped because \C is not disabled"
|
||||
else
|
||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput23 testtry
|
||||
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput23 testtry
|
||||
checkresult $? 23 ""
|
||||
fi
|
||||
fi
|
||||
|
@ -839,7 +839,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
|||
|
||||
if [ "$do24" = yes ] ; then
|
||||
echo $title24
|
||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput24 testtry
|
||||
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput24 testtry
|
||||
checkresult $? 24 ""
|
||||
fi
|
||||
|
||||
|
@ -850,7 +850,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
|||
if [ $utf -eq 0 ] ; then
|
||||
echo " Skipped because UTF-$bits support is not available"
|
||||
else
|
||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput25 testtry
|
||||
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput25 testtry
|
||||
checkresult $? 25 ""
|
||||
fi
|
||||
fi
|
||||
|
|
12
configure.ac
12
configure.ac
|
@ -10,17 +10,17 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
|
|||
|
||||
m4_define(pcre2_major, [10])
|
||||
m4_define(pcre2_minor, [30])
|
||||
m4_define(pcre2_prerelease, [-DEV])
|
||||
m4_define(pcre2_date, [2017-03-05])
|
||||
m4_define(pcre2_prerelease, [-RC1])
|
||||
m4_define(pcre2_date, [2017-07-18])
|
||||
|
||||
# NOTE: The CMakeLists.txt file searches for the above variables in the first
|
||||
# 50 lines of this file. Please update that if the variables above are moved.
|
||||
|
||||
# Libtool shared library interface versions (current:revision:age)
|
||||
m4_define(libpcre2_8_version, [5:0:5])
|
||||
m4_define(libpcre2_16_version, [5:0:5])
|
||||
m4_define(libpcre2_32_version, [5:0:5])
|
||||
m4_define(libpcre2_posix_version, [1:1:0])
|
||||
m4_define(libpcre2_8_version, [6:0:6])
|
||||
m4_define(libpcre2_16_version, [6:0:6])
|
||||
m4_define(libpcre2_32_version, [6:0:6])
|
||||
m4_define(libpcre2_posix_version, [2:0:0])
|
||||
|
||||
AC_PREREQ(2.57)
|
||||
AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2)
|
||||
|
|
|
@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
|
|||
or starting a pattern with (*UCP).
|
||||
|
||||
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
|
||||
of the preceding, or any of the Unicode newline sequences, as indicating the
|
||||
end of a line. Whatever you specify at build time is the default; the caller
|
||||
of PCRE2 can change the selection at run time. The default newline indicator
|
||||
is a single LF character (the Unix standard). You can specify the default
|
||||
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
|
||||
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
|
||||
--enable-newline-is-any to the "configure" command, respectively.
|
||||
of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
|
||||
character as indicating the end of a line. Whatever you specify at build time
|
||||
is the default; the caller of PCRE2 can change the selection at run time. The
|
||||
default newline indicator is a single LF character (the Unix standard). You
|
||||
can specify the default newline indicator by adding --enable-newline-is-cr,
|
||||
--enable-newline-is-lf, --enable-newline-is-crlf,
|
||||
--enable-newline-is-anycrlf, --enable-newline-is-any, or
|
||||
--enable-newline-is-nul to the "configure" command, respectively.
|
||||
|
||||
. By default, the sequence \R in a pattern matches any Unicode line ending
|
||||
sequence. This is independent of the option specifying what PCRE2 considers
|
||||
|
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
|
|||
--with-parens-nest-limit=500
|
||||
|
||||
. PCRE2 has a counter that can be set to limit the amount of computing resource
|
||||
it uses when matching a pattern with the Perl-compatible matching function.
|
||||
If the limit is exceeded during a match, the match fails. The default is ten
|
||||
million. You can change the default by setting, for example,
|
||||
it uses when matching a pattern. If the limit is exceeded during a match, the
|
||||
match fails. The default is ten million. You can change the default by
|
||||
setting, for example,
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
on the "configure" command. This is just the default; individual calls to
|
||||
pcre2_match() can supply their own value. There is more discussion in the
|
||||
pcre2api man page (search for pcre2_set_match_limit).
|
||||
pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
|
||||
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
||||
|
||||
. There is a separate counter that limits the depth of nested backtracking
|
||||
during a matching process, which indirectly limits the amount of heap memory
|
||||
|
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
|||
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||
|
||||
Test 8 checks some internal offsets and code size features; it is run only when
|
||||
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||
Unicode support is enabled.
|
||||
Test 8 checks some internal offsets and code size features, but it is run only
|
||||
when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
|
||||
32-bit modes and for different link sizes, so there are different output files
|
||||
for each mode and link size.
|
||||
|
||||
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||
16-bit and 32-bit modes. These are tests that generate different output in
|
||||
|
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
|
|||
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||
|
||||
Test 14 contains some special UTF and UCP tests that give different output for
|
||||
the different widths.
|
||||
different code unit widths.
|
||||
|
||||
Test 15 contains a number of tests that must not be run with JIT. They check,
|
||||
among other non-JIT things, the match-limiting features of the intepretive
|
||||
|
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
|
|||
Tests 21 and 22 test \C support when the use of \C is not locked out, without
|
||||
and with UTF support, respectively. Test 23 tests \C when it is locked out.
|
||||
|
||||
Tests 24 and 25 test the experimental pattern conversion functions, without and
|
||||
with UTF support, respectively.
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
|
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
|
|||
by the program dftables (compiled from dftables.c), which uses the ANSI C
|
||||
character handling functions such as isalnum(), isalpha(), isupper(),
|
||||
islower(), etc. to build the table sources. This means that the default C
|
||||
locale which is set for your system will control the contents of these default
|
||||
locale that is set for your system will control the contents of these default
|
||||
tables. You can change the default tables by editing pcre2_chartables.c and
|
||||
then re-building PCRE2. If you do this, you should take care to ensure that the
|
||||
file does not get automatically re-generated. The best way to do this is to
|
||||
|
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
|
|||
src/pcre2_compile.c )
|
||||
src/pcre2_config.c )
|
||||
src/pcre2_context.c )
|
||||
src/pcre2_convert.c )
|
||||
src/pcre2_dfa_match.c )
|
||||
src/pcre2_error.c )
|
||||
src/pcre2_find_bracket.c )
|
||||
|
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
|
|||
src/pcre2demo.c simple demonstration of coding calls to PCRE2
|
||||
src/pcre2grep.c source of a grep utility that uses PCRE2
|
||||
src/pcre2test.c comprehensive test program
|
||||
src/pcre2_printint.c part of pcre2test
|
||||
src/pcre2_jit_test.c JIT test program
|
||||
|
||||
(C) Auxiliary files:
|
||||
|
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
|
|||
|
||||
(E) Auxiliary files for building PCRE2 "by hand"
|
||||
|
||||
pcre2.h.generic ) a version of the public PCRE2 header file
|
||||
src/pcre2.h.generic ) a version of the public PCRE2 header file
|
||||
) for use in non-"configure" environments
|
||||
config.h.generic ) a version of config.h for use in non-"configure"
|
||||
src/config.h.generic ) a version of config.h for use in non-"configure"
|
||||
) environments
|
||||
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 17 June 2017
|
||||
Last updated: 18 July 2017
|
||||
|
|
|
@ -87,10 +87,10 @@ Options that specify values have names that start with --with.
|
|||
<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
|
||||
<P>
|
||||
By default, a library called <b>libpcre2-8</b> is built, containing functions
|
||||
that take string arguments contained in vectors of bytes, interpreted either as
|
||||
that take string arguments contained in arrays of bytes, interpreted either as
|
||||
single-byte characters, or UTF-8 strings. You can also build two other
|
||||
libraries, called <b>libpcre2-16</b> and <b>libpcre2-32</b>, which process
|
||||
strings that are contained in vectors of 16-bit and 32-bit code units,
|
||||
strings that are contained in arrays of 16-bit and 32-bit code units,
|
||||
respectively. These can be interpreted either as single-unit characters or
|
||||
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
|
||||
the following to the <b>configure</b> command:
|
||||
|
@ -208,19 +208,23 @@ to the <b>configure</b> command. There is a fourth option, specified by
|
|||
--enable-newline-is-anycrlf
|
||||
</pre>
|
||||
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
|
||||
indicating a line ending. Finally, a fifth option, specified by
|
||||
indicating a line ending. A fifth option, specified by
|
||||
<pre>
|
||||
--enable-newline-is-any
|
||||
</pre>
|
||||
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
|
||||
sequences are the three just mentioned, plus the single characters VT (vertical
|
||||
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
|
||||
separator, U+2028), and PS (paragraph separator, U+2029).
|
||||
separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
||||
<pre>
|
||||
--enable-newline-is-nul
|
||||
</pre>
|
||||
which causes NUL (binary zero) is set as the default line-ending character.
|
||||
</P>
|
||||
<P>
|
||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||
overridden by applications that use the library. At build time it is
|
||||
conventional to use the standard for your operating system.
|
||||
recommended to use the standard for your operating system.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<P>
|
||||
|
@ -301,7 +305,9 @@ because the size of each backtracking "frame" depends on the number of
|
|||
capturing parentheses in a pattern, the amount of heap that is used before the
|
||||
limit is reached varies from pattern to pattern. This limit was more useful in
|
||||
versions before 10.30, where function recursion was used for backtracking.
|
||||
However, as well as applying to <b>pcre2_match()</b>, this limit also controls
|
||||
</P>
|
||||
<P>
|
||||
As well as applying to <b>pcre2_match()</b>, the depth limit also controls
|
||||
the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
|
||||
used for lookaround assertions, atomic groups, and recursion within patterns.
|
||||
The limit does not apply to JIT matching.
|
||||
|
@ -559,7 +565,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 17 June 2017
|
||||
Last updated: 18 July 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
102
doc/pcre2.txt
102
doc/pcre2.txt
|
@ -3487,10 +3487,10 @@ PCRE2 BUILD-TIME OPTIONS
|
|||
BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||
|
||||
By default, a library called libpcre2-8 is built, containing functions
|
||||
that take string arguments contained in vectors of bytes, interpreted
|
||||
that take string arguments contained in arrays of bytes, interpreted
|
||||
either as single-byte characters, or UTF-8 strings. You can also build
|
||||
two other libraries, called libpcre2-16 and libpcre2-32, which process
|
||||
strings that are contained in vectors of 16-bit and 32-bit code units,
|
||||
strings that are contained in arrays of 16-bit and 32-bit code units,
|
||||
respectively. These can be interpreted either as single-unit characters
|
||||
or UTF-16/UTF-32 strings. To build these additional libraries, add one
|
||||
or both of the following to the configure command:
|
||||
|
@ -3609,7 +3609,7 @@ NEWLINE RECOGNITION
|
|||
--enable-newline-is-anycrlf
|
||||
|
||||
which causes PCRE2 to recognize any of the three sequences CR, LF, or
|
||||
CRLF as indicating a line ending. Finally, a fifth option, specified by
|
||||
CRLF as indicating a line ending. A fifth option, specified by
|
||||
|
||||
--enable-newline-is-any
|
||||
|
||||
|
@ -3617,97 +3617,103 @@ NEWLINE RECOGNITION
|
|||
newline sequences are the three just mentioned, plus the single charac-
|
||||
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
|
||||
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
|
||||
U+2029).
|
||||
U+2029). The final option is
|
||||
|
||||
--enable-newline-is-nul
|
||||
|
||||
which causes NUL (binary zero) is set as the default line-ending char-
|
||||
acter.
|
||||
|
||||
Whatever default line ending convention is selected when PCRE2 is built
|
||||
can be overridden by applications that use the library. At build time
|
||||
it is conventional to use the standard for your operating system.
|
||||
can be overridden by applications that use the library. At build time
|
||||
it is recommended to use the standard for your operating system.
|
||||
|
||||
|
||||
WHAT \R MATCHES
|
||||
|
||||
By default, the sequence \R in a pattern matches any Unicode newline
|
||||
sequence, independently of what has been selected as the line ending
|
||||
By default, the sequence \R in a pattern matches any Unicode newline
|
||||
sequence, independently of what has been selected as the line ending
|
||||
sequence. If you specify
|
||||
|
||||
--enable-bsr-anycrlf
|
||||
|
||||
the default is changed so that \R matches only CR, LF, or CRLF. What-
|
||||
ever is selected when PCRE2 is built can be overridden by applications
|
||||
the default is changed so that \R matches only CR, LF, or CRLF. What-
|
||||
ever is selected when PCRE2 is built can be overridden by applications
|
||||
that use the library.
|
||||
|
||||
|
||||
HANDLING VERY LARGE PATTERNS
|
||||
|
||||
Within a compiled pattern, offset values are used to point from one
|
||||
part to another (for example, from an opening parenthesis to an alter-
|
||||
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
|
||||
two-byte values are used for these offsets, leading to a maximum size
|
||||
for a compiled pattern of around 64K code units. This is sufficient to
|
||||
Within a compiled pattern, offset values are used to point from one
|
||||
part to another (for example, from an opening parenthesis to an alter-
|
||||
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
|
||||
two-byte values are used for these offsets, leading to a maximum size
|
||||
for a compiled pattern of around 64K code units. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do
|
||||
want to process truly enormous patterns, so it is possible to compile
|
||||
PCRE2 to use three-byte or four-byte offsets by adding a setting such
|
||||
want to process truly enormous patterns, so it is possible to compile
|
||||
PCRE2 to use three-byte or four-byte offsets by adding a setting such
|
||||
as
|
||||
|
||||
--with-link-size=3
|
||||
|
||||
to the configure command. The value given must be 2, 3, or 4. For the
|
||||
16-bit library, a value of 3 is rounded up to 4. In these libraries,
|
||||
using longer offsets slows down the operation of PCRE2 because it has
|
||||
to load additional data when handling them. For the 32-bit library the
|
||||
value is always 4 and cannot be overridden; the value of --with-link-
|
||||
to the configure command. The value given must be 2, 3, or 4. For the
|
||||
16-bit library, a value of 3 is rounded up to 4. In these libraries,
|
||||
using longer offsets slows down the operation of PCRE2 because it has
|
||||
to load additional data when handling them. For the 32-bit library the
|
||||
value is always 4 and cannot be overridden; the value of --with-link-
|
||||
size is ignored.
|
||||
|
||||
|
||||
LIMITING PCRE2 RESOURCE USAGE
|
||||
|
||||
The pcre2_match() function increments a counter each time it goes round
|
||||
its main loop. Putting a limit on this counter controls the amount of
|
||||
computing resource used by a single call to pcre2_match(). The limit
|
||||
its main loop. Putting a limit on this counter controls the amount of
|
||||
computing resource used by a single call to pcre2_match(). The limit
|
||||
can be changed at run time, as described in the pcre2api documentation.
|
||||
The default is 10 million, but this can be changed by adding a setting
|
||||
The default is 10 million, but this can be changed by adding a setting
|
||||
such as
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
to the configure command. This setting also applies to the
|
||||
pcre2_dfa_match() matching function, and to JIT matching (though the
|
||||
to the configure command. This setting also applies to the
|
||||
pcre2_dfa_match() matching function, and to JIT matching (though the
|
||||
counting is done differently).
|
||||
|
||||
The pcre2_match() function starts out using a 20K vector on the system
|
||||
stack to record backtracking points. The more nested backtracking
|
||||
The pcre2_match() function starts out using a 20K vector on the system
|
||||
stack to record backtracking points. The more nested backtracking
|
||||
points there are (that is, the deeper the search tree), the more memory
|
||||
is needed. If the initial vector is not large enough, heap memory is
|
||||
is needed. If the initial vector is not large enough, heap memory is
|
||||
used, up to a certain limit, which is specified in kilobytes. The limit
|
||||
can be changed at run time, as described in the pcre2api documentation.
|
||||
The default limit (in effect unlimited) is 20 million. You can change
|
||||
The default limit (in effect unlimited) is 20 million. You can change
|
||||
this by a setting such as
|
||||
|
||||
--with-heap-limit=500
|
||||
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
||||
only to interpretive matching in pcre2_match(). It does not apply when
|
||||
JIT (which has its own memory arrangements) is used, nor does it apply
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
||||
only to interpretive matching in pcre2_match(). It does not apply when
|
||||
JIT (which has its own memory arrangements) is used, nor does it apply
|
||||
to pcre2_dfa_match().
|
||||
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
pcre2_match() interpreter. This limit defaults to the value that is set
|
||||
for --with-match-limit. You can set a lower default limit by adding,
|
||||
for --with-match-limit. You can set a lower default limit by adding,
|
||||
for example,
|
||||
|
||||
--with-match-limit_depth=10000
|
||||
|
||||
to the configure command. This value can be overridden at run time.
|
||||
This depth limit indirectly limits the amount of heap memory that is
|
||||
used, but because the size of each backtracking "frame" depends on the
|
||||
number of capturing parentheses in a pattern, the amount of heap that
|
||||
is used before the limit is reached varies from pattern to pattern.
|
||||
This limit was more useful in versions before 10.30, where function
|
||||
recursion was used for backtracking. However, as well as applying to
|
||||
pcre2_match(), this limit also controls the depth of recursive function
|
||||
calls in pcre2_dfa_match(). These are used for lookaround assertions,
|
||||
atomic groups, and recursion within patterns. The limit does not apply
|
||||
to JIT matching.
|
||||
to the configure command. This value can be overridden at run time.
|
||||
This depth limit indirectly limits the amount of heap memory that is
|
||||
used, but because the size of each backtracking "frame" depends on the
|
||||
number of capturing parentheses in a pattern, the amount of heap that
|
||||
is used before the limit is reached varies from pattern to pattern.
|
||||
This limit was more useful in versions before 10.30, where function
|
||||
recursion was used for backtracking.
|
||||
|
||||
As well as applying to pcre2_match(), the depth limit also controls the
|
||||
depth of recursive function calls in pcre2_dfa_match(). These are used
|
||||
for lookaround assertions, atomic groups, and recursion within pat-
|
||||
terns. The limit does not apply to JIT matching.
|
||||
|
||||
|
||||
CREATING CHARACTER TABLES AT BUILD TIME
|
||||
|
@ -3969,7 +3975,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 17 June 2017
|
||||
Last updated: 18 July 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2BUILD 3 "17 June 2017" "PCRE2 10.30"
|
||||
.TH PCRE2BUILD 3 "18 July 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.
|
||||
|
@ -66,10 +66,10 @@ Options that specify values have names that start with --with.
|
|||
.rs
|
||||
.sp
|
||||
By default, a library called \fBlibpcre2-8\fP is built, containing functions
|
||||
that take string arguments contained in vectors of bytes, interpreted either as
|
||||
that take string arguments contained in arrays of bytes, interpreted either as
|
||||
single-byte characters, or UTF-8 strings. You can also build two other
|
||||
libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process
|
||||
strings that are contained in vectors of 16-bit and 32-bit code units,
|
||||
strings that are contained in arrays of 16-bit and 32-bit code units,
|
||||
respectively. These can be interpreted either as single-unit characters or
|
||||
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
|
||||
the following to the \fBconfigure\fP command:
|
||||
|
@ -197,18 +197,22 @@ to the \fBconfigure\fP command. There is a fourth option, specified by
|
|||
--enable-newline-is-anycrlf
|
||||
.sp
|
||||
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
|
||||
indicating a line ending. Finally, a fifth option, specified by
|
||||
indicating a line ending. A fifth option, specified by
|
||||
.sp
|
||||
--enable-newline-is-any
|
||||
.sp
|
||||
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
|
||||
sequences are the three just mentioned, plus the single characters VT (vertical
|
||||
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
|
||||
separator, U+2028), and PS (paragraph separator, U+2029).
|
||||
separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
||||
.sp
|
||||
--enable-newline-is-nul
|
||||
.sp
|
||||
which causes NUL (binary zero) is set as the default line-ending character.
|
||||
.P
|
||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||
overridden by applications that use the library. At build time it is
|
||||
conventional to use the standard for your operating system.
|
||||
recommended to use the standard for your operating system.
|
||||
.
|
||||
.
|
||||
.SH "WHAT \eR MATCHES"
|
||||
|
@ -297,7 +301,8 @@ because the size of each backtracking "frame" depends on the number of
|
|||
capturing parentheses in a pattern, the amount of heap that is used before the
|
||||
limit is reached varies from pattern to pattern. This limit was more useful in
|
||||
versions before 10.30, where function recursion was used for backtracking.
|
||||
However, as well as applying to \fBpcre2_match()\fP, this limit also controls
|
||||
.P
|
||||
As well as applying to \fBpcre2_match()\fP, the depth limit also controls
|
||||
the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are
|
||||
used for lookaround assertions, atomic groups, and recursion within patterns.
|
||||
The limit does not apply to JIT matching.
|
||||
|
@ -577,6 +582,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 17 June 2017
|
||||
Last updated: 18 July 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -132,6 +132,12 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
/* Define to 1 if you have the <zlib.h> header file. */
|
||||
/* #undef HAVE_ZLIB_H */
|
||||
|
||||
/* This limits the amount of memory that pcre2_match() may use while matching
|
||||
a pattern. The value is in kilobytes. */
|
||||
#ifndef HEAP_LIMIT
|
||||
#define HEAP_LIMIT 20000000
|
||||
#endif
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||
as offsets within the compiled regex. The default is 2, which allows for
|
||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
||||
|
@ -148,7 +154,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#endif
|
||||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the
|
||||
internal match() function can record a backtrack position during a single
|
||||
pcre2_match() function can record a backtrack position during a single
|
||||
matching attempt. There is a runtime interface for setting a different
|
||||
limit. The limit exists in order to catch runaway regular expressions that
|
||||
take for ever to determine that they do not match. The default is set very
|
||||
|
@ -188,8 +194,8 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
|
||||
/* The value of NEWLINE_DEFAULT determines the default newline character
|
||||
sequence. PCRE2 client programs can override this by selecting other values
|
||||
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), and 5
|
||||
(ANYCRLF). */
|
||||
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), 5
|
||||
(ANYCRLF), and 6 (NUL). */
|
||||
#ifndef NEWLINE_DEFAULT
|
||||
#define NEWLINE_DEFAULT 2
|
||||
#endif
|
||||
|
@ -204,7 +210,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#define PACKAGE_NAME "PCRE2"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE2 10.30-DEV"
|
||||
#define PACKAGE_STRING "PCRE2 10.30-RC1"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre2"
|
||||
|
@ -213,7 +219,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#define PACKAGE_URL ""
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "10.30-DEV"
|
||||
#define PACKAGE_VERSION "10.30-RC1"
|
||||
|
||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||
|
@ -261,6 +267,11 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
your system. */
|
||||
/* #undef PTHREAD_CREATE_JOINABLE */
|
||||
|
||||
/* Define to any non-zero number to enable support for SELinux compatible
|
||||
executable memory allocator in JIT. Note that this will have no effect
|
||||
unless SUPPORT_JIT is also defined. */
|
||||
/* #undef SLJIT_PROT_EXECUTABLE_ALLOCATOR */
|
||||
|
||||
/* Define to 1 if you have the ANSI C header files. */
|
||||
/* #undef STDC_HEADERS */
|
||||
|
||||
|
@ -328,7 +339,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#endif
|
||||
|
||||
/* Version number of package */
|
||||
#define VERSION "10.30-DEV"
|
||||
#define VERSION "10.30-RC1"
|
||||
|
||||
/* Define to 1 if on MINIX. */
|
||||
/* #undef _MINIX */
|
||||
|
|
|
@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
|
||||
#define PCRE2_MAJOR 10
|
||||
#define PCRE2_MINOR 30
|
||||
#define PCRE2_PRERELEASE -DEV
|
||||
#define PCRE2_DATE 2017-03-05
|
||||
#define PCRE2_PRERELEASE -RC1
|
||||
#define PCRE2_DATE 2017-07-18
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE2, the appropriate
|
||||
|
|
|
@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
|
||||
#define PCRE2_MAJOR 10
|
||||
#define PCRE2_MINOR 30
|
||||
#define PCRE2_PRERELEASE -DEV
|
||||
#define PCRE2_DATE 2017-03-05
|
||||
#define PCRE2_PRERELEASE -RC1
|
||||
#define PCRE2_DATE 2017-07-18
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE2, the appropriate
|
||||
|
@ -138,6 +138,14 @@ D is inspected during pcre2_dfa_match() execution
|
|||
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
|
||||
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
|
||||
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */
|
||||
#define PCRE2_LITERAL 0x02000000u /* C */
|
||||
|
||||
/* An additional compile options word is available in the compile context. */
|
||||
|
||||
#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */
|
||||
#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */
|
||||
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
|
||||
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
|
||||
|
||||
/* These are for pcre2_jit_compile(). */
|
||||
|
||||
|
@ -176,6 +184,16 @@ ignored for pcre2_jit_match(). */
|
|||
|
||||
#define PCRE2_NO_JIT 0x00002000u
|
||||
|
||||
/* Options for pcre2_pattern_convert(). */
|
||||
|
||||
#define PCRE2_CONVERT_UTF 0x00000001u
|
||||
#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u
|
||||
#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u
|
||||
#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u
|
||||
#define PCRE2_CONVERT_GLOB 0x00000010u
|
||||
#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u
|
||||
#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u
|
||||
|
||||
/* Newline and \R settings, for use in compile contexts. The newline values
|
||||
must be kept in step with values set in config.h and both sets must all be
|
||||
greater than zero. */
|
||||
|
@ -185,6 +203,7 @@ greater than zero. */
|
|||
#define PCRE2_NEWLINE_CRLF 3
|
||||
#define PCRE2_NEWLINE_ANY 4
|
||||
#define PCRE2_NEWLINE_ANYCRLF 5
|
||||
#define PCRE2_NEWLINE_NUL 6
|
||||
|
||||
#define PCRE2_BSR_UNICODE 1
|
||||
#define PCRE2_BSR_ANYCRLF 2
|
||||
|
@ -270,6 +289,8 @@ numbers must not be changed. */
|
|||
#define PCRE2_ERROR_TOOMANYREPLACE (-61)
|
||||
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
|
||||
#define PCRE2_ERROR_HEAPLIMIT (-63)
|
||||
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
|
||||
|
||||
|
||||
/* Request types for pcre2_pattern_info() */
|
||||
|
||||
|
@ -351,6 +372,9 @@ typedef struct pcre2_real_compile_context pcre2_compile_context; \
|
|||
struct pcre2_real_match_context; \
|
||||
typedef struct pcre2_real_match_context pcre2_match_context; \
|
||||
\
|
||||
struct pcre2_real_convert_context; \
|
||||
typedef struct pcre2_real_convert_context pcre2_convert_context; \
|
||||
\
|
||||
struct pcre2_real_code; \
|
||||
typedef struct pcre2_real_code pcre2_code; \
|
||||
\
|
||||
|
@ -434,6 +458,8 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
|||
pcre2_set_bsr(pcre2_compile_context *, uint32_t); \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
pcre2_set_character_tables(pcre2_compile_context *, const unsigned char *); \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
|
@ -466,6 +492,18 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
|||
pcre2_set_recursion_memory_management(pcre2_match_context *, \
|
||||
void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *);
|
||||
|
||||
#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \
|
||||
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
|
||||
*pcre2_convert_context_copy(pcre2_convert_context *); \
|
||||
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
|
||||
*pcre2_convert_context_create(pcre2_general_context *); \
|
||||
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
|
||||
pcre2_convert_context_free(pcre2_convert_context *); \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
pcre2_set_glob_separator(pcre2_convert_context *, uint32_t);
|
||||
|
||||
|
||||
/* Functions concerned with compiling a pattern to PCRE internal code. */
|
||||
|
||||
|
@ -572,6 +610,16 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
|||
PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *);
|
||||
|
||||
|
||||
/* Functions for converting pattern source strings. */
|
||||
|
||||
#define PCRE2_CONVERT_FUNCTIONS \
|
||||
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
|
||||
pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \
|
||||
PCRE2_SIZE *, pcre2_convert_context *); \
|
||||
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
|
||||
pcre2_converted_pattern_free(PCRE2_UCHAR *);
|
||||
|
||||
|
||||
/* Functions for JIT processing */
|
||||
|
||||
#define PCRE2_JIT_FUNCTIONS \
|
||||
|
@ -623,6 +671,7 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_)
|
||||
#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_)
|
||||
#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_)
|
||||
#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_)
|
||||
#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_)
|
||||
#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_)
|
||||
#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_)
|
||||
|
@ -634,6 +683,7 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
|
||||
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
|
||||
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
|
||||
#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_)
|
||||
#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_)
|
||||
#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_)
|
||||
|
||||
|
@ -649,6 +699,10 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_)
|
||||
#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_)
|
||||
#define pcre2_config PCRE2_SUFFIX(pcre2_config_)
|
||||
#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_)
|
||||
#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_)
|
||||
#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_)
|
||||
#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_)
|
||||
#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_)
|
||||
#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_)
|
||||
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
|
||||
|
@ -672,6 +726,7 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_)
|
||||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||
#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_)
|
||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||
|
@ -680,8 +735,11 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||
#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_)
|
||||
#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_)
|
||||
#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_)
|
||||
#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_)
|
||||
#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_)
|
||||
#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_)
|
||||
#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_)
|
||||
#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_)
|
||||
|
@ -716,6 +774,8 @@ PCRE2_STRUCTURE_LIST \
|
|||
PCRE2_GENERAL_INFO_FUNCTIONS \
|
||||
PCRE2_GENERAL_CONTEXT_FUNCTIONS \
|
||||
PCRE2_COMPILE_CONTEXT_FUNCTIONS \
|
||||
PCRE2_CONVERT_CONTEXT_FUNCTIONS \
|
||||
PCRE2_CONVERT_FUNCTIONS \
|
||||
PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
||||
PCRE2_COMPILE_FUNCTIONS \
|
||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||
|
@ -745,6 +805,7 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
|||
#undef PCRE2_GENERAL_INFO_FUNCTIONS
|
||||
#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS
|
||||
#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS
|
||||
#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS
|
||||
#undef PCRE2_MATCH_CONTEXT_FUNCTIONS
|
||||
#undef PCRE2_COMPILE_FUNCTIONS
|
||||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||
|
|
|
@ -4351,7 +4351,7 @@ struct sljit_jump *quit;
|
|||
struct sljit_jump *partial_quit[2];
|
||||
sljit_u8 instruction[8];
|
||||
sljit_s32 tmp1_ind = sljit_get_register_index(TMP1);
|
||||
sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
|
||||
// sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
|
||||
sljit_s32 str_ptr_ind = sljit_get_register_index(STR_PTR);
|
||||
sljit_s32 data_ind = 0;
|
||||
sljit_s32 tmp_ind = 1;
|
||||
|
@ -4376,7 +4376,9 @@ if (common->mode == PCRE2_JIT_COMPLETE)
|
|||
|
||||
OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit));
|
||||
|
||||
SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
|
||||
// SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
|
||||
|
||||
SLJIT_ASSERT(tmp1_ind < 8);
|
||||
|
||||
/* MOVD xmm, r/m32 */
|
||||
instruction[0] = 0x66;
|
||||
|
|
|
@ -4073,7 +4073,8 @@ else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%
|
|||
* Show compile extra options *
|
||||
*************************************************/
|
||||
|
||||
/* Called for unsupported POSIX options.
|
||||
/* Called only for unsupported POSIX options at present, and therefore needed
|
||||
only when the 8-bit library is being compiled.
|
||||
|
||||
Arguments:
|
||||
options an options word
|
||||
|
@ -4083,17 +4084,21 @@ Arguments:
|
|||
Returns: nothing
|
||||
*/
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
static void
|
||||
show_compile_extra_options(uint32_t options, const char *before,
|
||||
const char *after)
|
||||
{
|
||||
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
|
||||
else fprintf(outfile, "%s%s%s%s",
|
||||
else fprintf(outfile, "%s%s%s%s%s%s",
|
||||
before,
|
||||
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
|
||||
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
|
||||
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
|
||||
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
|
||||
after);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -124,10 +124,10 @@
|
|||
/* SLJIT_REWRITABLE_JUMP is 0x1000. */
|
||||
|
||||
#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86)
|
||||
# define PATCH_MB 0x4
|
||||
# define PATCH_MW 0x8
|
||||
# define PATCH_MB 0x4
|
||||
# define PATCH_MW 0x8
|
||||
#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64)
|
||||
# define PATCH_MD 0x10
|
||||
# define PATCH_MD 0x10
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
@ -1555,6 +1555,7 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_cmov(struct sljit_compile
|
|||
sljit_s32 dst_reg,
|
||||
sljit_s32 src, sljit_sw srcw)
|
||||
{
|
||||
(void)srcw; /* To stop compiler warning */
|
||||
#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS)
|
||||
CHECK_ARGUMENT(!(type & ~(0xff | SLJIT_I32_OP)));
|
||||
CHECK_ARGUMENT((type & 0xff) >= SLJIT_EQUAL && (type & 0xff) <= SLJIT_ORDERED_F64);
|
||||
|
|
|
@ -95,17 +95,6 @@
|
|||
aaac
|
||||
abbbbbbbbbbbac
|
||||
|
||||
/^(b+|a){1,2}?bc/
|
||||
bbc
|
||||
|
||||
/^(b*|ba){1,2}?bc/
|
||||
babc
|
||||
bbabc
|
||||
bababc
|
||||
\= Expect no match
|
||||
bababbc
|
||||
babababc
|
||||
|
||||
/^(ba|b*){1,2}?bc/
|
||||
babc
|
||||
bbabc
|
||||
|
|
|
@ -5350,4 +5350,19 @@ a)"xI
|
|||
\= Expect no match
|
||||
Not a whole line
|
||||
|
||||
# Perl gets this wrong, failing to capture 'b' in group 1.
|
||||
|
||||
/^(b+|a){1,2}?bc/
|
||||
bbc
|
||||
|
||||
# And again here, for the "babc" subject string.
|
||||
|
||||
/^(b*|ba){1,2}?bc/
|
||||
babc
|
||||
bbabc
|
||||
bababc
|
||||
\= Expect no match
|
||||
bababbc
|
||||
babababc
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -183,27 +183,6 @@ No match
|
|||
abbbbbbbbbbbac
|
||||
No match
|
||||
|
||||
/^(b+|a){1,2}?bc/
|
||||
bbc
|
||||
0: bbc
|
||||
1: b
|
||||
|
||||
/^(b*|ba){1,2}?bc/
|
||||
babc
|
||||
0: babc
|
||||
1: ba
|
||||
bbabc
|
||||
0: bbabc
|
||||
1: ba
|
||||
bababc
|
||||
0: bababc
|
||||
1: ba
|
||||
\= Expect no match
|
||||
bababbc
|
||||
No match
|
||||
babababc
|
||||
No match
|
||||
|
||||
/^(ba|b*){1,2}?bc/
|
||||
babc
|
||||
0: babc
|
||||
|
|
|
@ -16300,6 +16300,31 @@ No match
|
|||
Not a whole line
|
||||
No match
|
||||
|
||||
# Perl gets this wrong, failing to capture 'b' in group 1.
|
||||
|
||||
/^(b+|a){1,2}?bc/
|
||||
bbc
|
||||
0: bbc
|
||||
1: b
|
||||
|
||||
# And again here, for the "babc" subject string.
|
||||
|
||||
/^(b*|ba){1,2}?bc/
|
||||
babc
|
||||
0: babc
|
||||
1: ba
|
||||
bbabc
|
||||
0: bbabc
|
||||
1: ba
|
||||
bababc
|
||||
0: bababc
|
||||
1: ba
|
||||
\= Expect no match
|
||||
bababbc
|
||||
No match
|
||||
babababc
|
||||
No match
|
||||
|
||||
# End of testinput2
|
||||
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
|
||||
Error -62: bad serialized data
|
||||
|
|
|
@ -853,10 +853,8 @@ Memory allocation (code space): 28
|
|||
# with link size - hence multiple tests with different values.
|
||||
|
||||
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||
Failed: error 186 at offset 5813: regular expression is too complicated
|
||||
|
||||
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||
|
||||
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||
|
|
|
@ -853,10 +853,8 @@ Memory allocation (code space): 28
|
|||
# with link size - hence multiple tests with different values.
|
||||
|
||||
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||
Failed: error 186 at offset 5813: regular expression is too complicated
|
||||
|
||||
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||
|
||||
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||
|
|
Loading…
Reference in New Issue