Typos in documentation and comments noted by Jason Hood.
This commit is contained in:
parent
fa58ac6734
commit
fabea723cf
|
@ -146,7 +146,7 @@ SET(PCRE2_PARENS_NEST_LIMIT "250" CACHE STRING
|
||||||
"Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
|
"Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
|
||||||
|
|
||||||
SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
|
SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
|
||||||
"Default limit on heap memory (kilobytes). See HEAP_LIMIT in config.h.in for details.")
|
"Default limit on heap memory (kibibytes). See HEAP_LIMIT in config.h.in for details.")
|
||||||
|
|
||||||
SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
|
SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
|
||||||
"Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
|
"Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
|
||||||
|
|
|
@ -17,7 +17,7 @@ groups altogether. Now it shows those that come before any actual captures as
|
||||||
3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
|
3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
|
||||||
whatever the build configuration was. It now correctly says "\R matches all
|
whatever the build configuration was. It now correctly says "\R matches all
|
||||||
Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
|
Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
|
||||||
specified. Similarly, running "pcfre2test -C bsr" never produced the result
|
specified. Similarly, running "pcre2test -C bsr" never produced the result
|
||||||
ANY.
|
ANY.
|
||||||
|
|
||||||
4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
|
4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
|
||||||
|
@ -370,7 +370,7 @@ tests to improve coverage.
|
||||||
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
||||||
pcre2test, a crash could occur.
|
pcre2test, a crash could occur.
|
||||||
|
|
||||||
32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
|
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
|
||||||
all the tests can run with clang's sanitizing options.
|
all the tests can run with clang's sanitizing options.
|
||||||
|
|
||||||
33. Implement extra compile options in the compile context and add the first
|
33. Implement extra compile options in the compile context and add the first
|
||||||
|
|
4
HACKING
4
HACKING
|
@ -348,7 +348,7 @@ The /i, /m, or /s options (PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
|
||||||
others) may be changed in the middle of patterns by items such as (?i). Their
|
others) may be changed in the middle of patterns by items such as (?i). Their
|
||||||
processing is handled entirely at compile time by generating different opcodes
|
processing is handled entirely at compile time by generating different opcodes
|
||||||
for the different settings. The runtime functions do not need to keep track of
|
for the different settings. The runtime functions do not need to keep track of
|
||||||
an options state.
|
an option's state.
|
||||||
|
|
||||||
PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
|
PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
|
||||||
are tracked and processed during the parsing pre-pass. The others are handled
|
are tracked and processed during the parsing pre-pass. The others are handled
|
||||||
|
@ -764,7 +764,7 @@ OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
|
||||||
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
||||||
"subroutine" calls, even though they are not strictly a recursion. Up till
|
"subroutine" calls, even though they are not strictly a recursion. Up till
|
||||||
release 10.30 recursions were treated as atomic groups, making them
|
release 10.30 recursions were treated as atomic groups, making them
|
||||||
incompatible with Perl (but PCRE had then well before Perl did). From 10.30,
|
incompatible with Perl (but PCRE had them well before Perl did). From 10.30,
|
||||||
backtracking into recursions is supported.
|
backtracking into recursions is supported.
|
||||||
|
|
||||||
Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
|
Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
|
||||||
|
|
4
NEWS
4
NEWS
|
@ -31,7 +31,7 @@ remembering backtracking positions. This makes --disable-stack-for-recursion a
|
||||||
NOOP. The new implementation allows backtracking into recursive group calls in
|
NOOP. The new implementation allows backtracking into recursive group calls in
|
||||||
patterns, making it more compatible with Perl, and also fixes some other
|
patterns, making it more compatible with Perl, and also fixes some other
|
||||||
previously hard-to-do issues. For patterns that have a lot of backtracking, the
|
previously hard-to-do issues. For patterns that have a lot of backtracking, the
|
||||||
heap is now used, and there is explicit limit on the amount, settable by
|
heap is now used, and there is an explicit limit on the amount, settable by
|
||||||
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
|
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
|
||||||
but is renamed as "depth limit" (though the old names remain for
|
but is renamed as "depth limit" (though the old names remain for
|
||||||
compatibility).
|
compatibility).
|
||||||
|
@ -53,7 +53,7 @@ also supported.
|
||||||
|
|
||||||
5. Additional compile options in the compile context are now available, and the
|
5. Additional compile options in the compile context are now available, and the
|
||||||
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
|
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
|
||||||
|
|
||||||
6. The newline type PCRE2_NEWLINE_NUL is now available.
|
6. The newline type PCRE2_NEWLINE_NUL is now available.
|
||||||
|
|
||||||
|
|
|
@ -127,7 +127,7 @@ can skip ahead to the CMake section.
|
||||||
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
||||||
these yourself.
|
these yourself.
|
||||||
|
|
||||||
Not also that the pcre2_fuzzsupport.c file contains special code that is
|
Note also that the pcre2_fuzzsupport.c file contains special code that is
|
||||||
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
||||||
you are doing that, you can ignore it.
|
you are doing that, you can ignore it.
|
||||||
|
|
||||||
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1Mb in some Windows
|
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
17
README
17
README
|
@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
The units are kilobytes. This limit does not apply when the JIT optimization
|
The units are kibibytes (units of 1024 bytes). This limit does not apply when
|
||||||
(which has its own memory control features) is used. There is more discussion
|
the JIT optimization (which has its own memory control features) is used.
|
||||||
on the pcre2api man page (search for pcre2_set_heap_limit).
|
There is more discussion on the pcre2api man page (search for
|
||||||
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||||
|
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
||||||
you add --disable-pcre2grep-jit to the "configure" command.
|
you add --disable-pcre2grep-jit to the "configure" command.
|
||||||
|
|
||||||
. On non-Windows sytems there is support for calling external scripts during
|
. There is support for calling external programs during matching in the
|
||||||
matching in the pcre2grep command via PCRE2's callout facility with string
|
pcre2grep command, using PCRE2's callout facility with string arguments. This
|
||||||
arguments. This support can be disabled by adding --disable-pcre2grep-callout
|
support can be disabled by adding --disable-pcre2grep-callout to the
|
||||||
to the "configure" command.
|
"configure" command.
|
||||||
|
|
||||||
. The pcre2grep program currently supports only 8-bit data files, and so
|
. The pcre2grep program currently supports only 8-bit data files, and so
|
||||||
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
||||||
|
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 27 April 2018
|
Last updated: 17 June 2018
|
||||||
|
|
|
@ -708,7 +708,7 @@ $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >
|
||||||
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
|
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
|
||||||
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
|
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
|
||||||
|
|
||||||
# It seems inpossible to handle NUL characters easily in Solaris (aka SunOS).
|
# It seems impossible to handle NUL characters easily in Solaris (aka SunOS).
|
||||||
# The version of sed explicitly doesn't like them. For the moment, we just
|
# The version of sed explicitly doesn't like them. For the moment, we just
|
||||||
# don't run this test under SunOS. Fudge the output so that the comparison
|
# don't run this test under SunOS. Fudge the output so that the comparison
|
||||||
# works. A similar problem has also been reported for MacOS (Darwin).
|
# works. A similar problem has also been reported for MacOS (Darwin).
|
||||||
|
|
2
RunTest
2
RunTest
|
@ -843,7 +843,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
||||||
checkresult $? 24 ""
|
checkresult $? 24 ""
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# UTF pattern converson tests
|
# UTF pattern conversion tests
|
||||||
|
|
||||||
if [ "$do25" = yes ] ; then
|
if [ "$do25" = yes ] ; then
|
||||||
echo $title25
|
echo $title25
|
||||||
|
|
|
@ -288,7 +288,7 @@ AC_ARG_WITH(parens-nest-limit,
|
||||||
# Handle --with-heap-limit
|
# Handle --with-heap-limit
|
||||||
AC_ARG_WITH(heap-limit,
|
AC_ARG_WITH(heap-limit,
|
||||||
AS_HELP_STRING([--with-heap-limit=N],
|
AS_HELP_STRING([--with-heap-limit=N],
|
||||||
[default limit on heap memory (kilobytes, default=20000000)]),
|
[default limit on heap memory (kibibytes, default=20000000)]),
|
||||||
, with_heap_limit=20000000)
|
, with_heap_limit=20000000)
|
||||||
|
|
||||||
# Handle --with-match-limit=N
|
# Handle --with-match-limit=N
|
||||||
|
@ -754,7 +754,7 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
|
||||||
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
||||||
This limits the amount of memory that may be used while matching
|
This limits the amount of memory that may be used while matching
|
||||||
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
||||||
not apply to JIT matching. The value is in kilobytes.])
|
not apply to JIT matching. The value is in kibibytes (units of 1024 bytes).])
|
||||||
|
|
||||||
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
||||||
This limit is parameterized just in case anybody ever wants to
|
This limit is parameterized just in case anybody ever wants to
|
||||||
|
@ -1017,7 +1017,7 @@ $PACKAGE-$VERSION configuration summary:
|
||||||
Rebuild char tables ................ : ${enable_rebuild_chartables}
|
Rebuild char tables ................ : ${enable_rebuild_chartables}
|
||||||
Internal link size ................. : ${with_link_size}
|
Internal link size ................. : ${with_link_size}
|
||||||
Nested parentheses limit ........... : ${with_parens_nest_limit}
|
Nested parentheses limit ........... : ${with_parens_nest_limit}
|
||||||
Heap limit ......................... : ${with_heap_limit} kilobytes
|
Heap limit ......................... : ${with_heap_limit} kibibytes
|
||||||
Match limit ........................ : ${with_match_limit}
|
Match limit ........................ : ${with_match_limit}
|
||||||
Match depth limit .................. : ${with_match_limit_depth}
|
Match depth limit .................. : ${with_match_limit_depth}
|
||||||
Build shared libs .................. : ${enable_shared}
|
Build shared libs .................. : ${enable_shared}
|
||||||
|
|
|
@ -127,7 +127,7 @@ can skip ahead to the CMake section.
|
||||||
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
||||||
these yourself.
|
these yourself.
|
||||||
|
|
||||||
Not also that the pcre2_fuzzsupport.c file contains special code that is
|
Note also that the pcre2_fuzzsupport.c file contains special code that is
|
||||||
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
||||||
you are doing that, you can ignore it.
|
you are doing that, you can ignore it.
|
||||||
|
|
||||||
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1Mb in some Windows
|
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
|
@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
The units are kilobytes. This limit does not apply when the JIT optimization
|
The units are kibibytes (units of 1024 bytes). This limit does not apply when
|
||||||
(which has its own memory control features) is used. There is more discussion
|
the JIT optimization (which has its own memory control features) is used.
|
||||||
on the pcre2api man page (search for pcre2_set_heap_limit).
|
There is more discussion on the pcre2api man page (search for
|
||||||
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||||
|
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
||||||
you add --disable-pcre2grep-jit to the "configure" command.
|
you add --disable-pcre2grep-jit to the "configure" command.
|
||||||
|
|
||||||
. On non-Windows sytems there is support for calling external scripts during
|
. There is support for calling external programs during matching in the
|
||||||
matching in the pcre2grep command via PCRE2's callout facility with string
|
pcre2grep command, using PCRE2's callout facility with string arguments. This
|
||||||
arguments. This support can be disabled by adding --disable-pcre2grep-callout
|
support can be disabled by adding --disable-pcre2grep-callout to the
|
||||||
to the "configure" command.
|
"configure" command.
|
||||||
|
|
||||||
. The pcre2grep program currently supports only 8-bit data files, and so
|
. The pcre2grep program currently supports only 8-bit data files, and so
|
||||||
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
||||||
|
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 27 April 2018
|
Last updated: 17 June 2018
|
||||||
|
|
|
@ -28,7 +28,7 @@ DESCRIPTION
|
||||||
<P>
|
<P>
|
||||||
This function is part of an experimental set of pattern conversion functions.
|
This function is part of an experimental set of pattern conversion functions.
|
||||||
It sets the component separator character that is used when converting globs.
|
It sets the component separator character that is used when converting globs.
|
||||||
The second argument must one of the characters forward slash, backslash, or
|
The second argument must be one of the characters forward slash, backslash, or
|
||||||
dot. The default is backslash when running under Windows, otherwise forward
|
dot. The default is backslash when running under Windows, otherwise forward
|
||||||
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
||||||
the second argument is invalid.
|
the second argument is invalid.
|
||||||
|
|
|
@ -562,10 +562,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
||||||
<P>
|
<P>
|
||||||
Each of the first three conventions is used by at least one operating system as
|
Each of the first three conventions is used by at least one operating system as
|
||||||
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
||||||
The default default is LF, which is the Unix standard. However, the newline
|
If it is not, the default is set to LF, which is the Unix standard. However,
|
||||||
convention can be changed by an application when calling <b>pcre2_compile()</b>,
|
the newline convention can be changed by an application when calling
|
||||||
or it can be specified by special text at the start of the pattern itself; this
|
<b>pcre2_compile()</b>, or it can be specified by special text at the start of
|
||||||
overrides any other settings. See the
|
the pattern itself; this overrides any other settings. See the
|
||||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
page for details of the special character sequences.
|
page for details of the special character sequences.
|
||||||
</P>
|
</P>
|
||||||
|
@ -949,17 +949,18 @@ offset limit. In other words, whichever limit comes first is used.
|
||||||
<b> uint32_t <i>value</i>);</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
The <i>heap_limit</i> parameter specifies, in units of kibibytes (1024 bytes),
|
||||||
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
the maximum amount of heap memory that <b>pcre2_match()</b> may use to hold
|
||||||
information when running an interpretive match. This limit also applies to
|
backtracking information when running an interpretive match. This limit also
|
||||||
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
|
applies to <b>pcre2_dfa_match()</b>, which may use the heap when processing
|
||||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
patterns with a lot of nested pattern recursion or lookarounds or atomic
|
||||||
does not apply to matching with the JIT optimization, which has its own memory
|
groups. This limit does not apply to matching with the JIT optimization, which
|
||||||
control arrangements (see the
|
has its own memory control arrangements (see the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation for more details). If the limit is reached, the negative error
|
documentation for more details). If the limit is reached, the negative error
|
||||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||||
built; the default default is very large and is essentially "unlimited".
|
is built; if it is not, the default is set very large and is essentially
|
||||||
|
"unlimited".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
|
@ -1044,7 +1045,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
|
||||||
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
||||||
uses it to limit the depth of nested internal recursive function calls that
|
uses it to limit the depth of nested internal recursive function calls that
|
||||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
limits, indirectly, the amount of system stack that is used. It was more useful
|
||||||
in versions before 10.32, when stack memory was used for local workspace
|
in versions before 10.32, when stack memory was used for local workspace
|
||||||
vectors for recursive function calls. From version 10.32, only local variables
|
vectors for recursive function calls. From version 10.32, only local variables
|
||||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||||
|
@ -1060,11 +1061,11 @@ probably better to limit heap usage directly by calling
|
||||||
<b>pcre2_set_heap_limit()</b>.
|
<b>pcre2_set_heap_limit()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The default value for the depth limit can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; if it is
|
||||||
default default is the same value as the default for the match limit. If the
|
not, the default is set to the same value as the default for the match limit.
|
||||||
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
|
If the limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
|
||||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||||
item at the start of a pattern of the form
|
supplied by an item at the start of a pattern of the form
|
||||||
<pre>
|
<pre>
|
||||||
(*LIMIT_DEPTH=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1120,7 +1121,7 @@ given with <b>pcre2_set_depth_limit()</b> above.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
The output is a uint32_t integer that gives, in kibibytes, the default limit
|
||||||
for the amount of heap memory used by <b>pcre2_match()</b> or
|
for the amount of heap memory used by <b>pcre2_match()</b> or
|
||||||
<b>pcre2_dfa_match()</b>. Further details are given with
|
<b>pcre2_dfa_match()</b>. Further details are given with
|
||||||
<b>pcre2_set_heap_limit()</b> above.
|
<b>pcre2_set_heap_limit()</b> above.
|
||||||
|
@ -1431,7 +1432,7 @@ If this bit is set, letters in the pattern match both upper and lower case
|
||||||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||||
properties are used for all characters with more than one other case, and for
|
properties are used for all characters with more than one other case, and for
|
||||||
all characters whose code points are greater than U+007f. For lower valued
|
all characters whose code points are greater than U+007F. For lower valued
|
||||||
characters with only one other case, a lookup table is used for speed. When
|
characters with only one other case, a lookup table is used for speed. When
|
||||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||||
|
@ -1613,8 +1614,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
||||||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||||
were followed by ?: but named parentheses can still be used for capturing (and
|
were followed by ?: but named parentheses can still be used for capturing (and
|
||||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||||
Note that, when this option is set, references to capturing groups (back
|
Note that, when this option is set, references to capturing groups
|
||||||
references or recursion/subroutine calls) may only refer to named groups,
|
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||||
though the reference can be by name or by number.
|
though the reference can be by name or by number.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NO_AUTO_POSSESS
|
PCRE2_NO_AUTO_POSSESS
|
||||||
|
@ -2019,10 +2020,10 @@ returned if there are no back references.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t whose value indicates what character sequences the \R
|
The output is a uint32_t integer whose value indicates what character sequences
|
||||||
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R matches
|
the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R
|
||||||
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \R
|
matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
|
||||||
matches only CR, LF, or CRLF.
|
that \R matches only CR, LF, or CRLF.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_CAPTURECOUNT
|
PCRE2_INFO_CAPTURECOUNT
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2034,10 +2035,10 @@ The third argument should point to an <b>uint32_t</b> variable.
|
||||||
</pre>
|
</pre>
|
||||||
If the pattern set a backtracking depth limit by including an item of the form
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2047,7 +2048,7 @@ values for the first code unit in any match. For example, a pattern that starts
|
||||||
with [abc] results in a table with three bits set. When code unit values
|
with [abc] results in a table with three bits set. When code unit values
|
||||||
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
||||||
value 255 or above". If such a table was constructed, a pointer to it is
|
value 255 or above". If such a table was constructed, a pointer to it is
|
||||||
returned. Otherwise NULL is returned. The third argument should point to an
|
returned. Otherwise NULL is returned. The third argument should point to a
|
||||||
<b>const uint8_t *</b> variable.
|
<b>const uint8_t *</b> variable.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_FIRSTCODETYPE
|
PCRE2_INFO_FIRSTCODETYPE
|
||||||
|
@ -2074,7 +2075,7 @@ and up to 0xffffffff when not using UTF-32 mode.
|
||||||
</pre>
|
</pre>
|
||||||
Return the size (in bytes) of the data frames that are used to remember
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||||
without the use of JIT. The third argument should point to an <b>size_t</b>
|
without the use of JIT. The third argument should point to a <b>size_t</b>
|
||||||
variable. The frame size depends on the number of capturing parentheses in the
|
variable. The frame size depends on the number of capturing parentheses in the
|
||||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -2094,10 +2095,10 @@ the equivalent hexadecimal or octal escape sequences.
|
||||||
</pre>
|
</pre>
|
||||||
If the pattern set a heap memory limit by including an item of the form
|
If the pattern set a heap memory limit by including an item of the form
|
||||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2141,15 +2142,15 @@ in such cases.
|
||||||
</pre>
|
</pre>
|
||||||
If the pattern set a match limit by including an item of the form
|
If the pattern set a match limit by including an item of the form
|
||||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_MAXLOOKBEHIND
|
PCRE2_INFO_MAXLOOKBEHIND
|
||||||
</pre>
|
</pre>
|
||||||
Return the number of characters (not code units) in the longest lookbehind
|
Return the number of characters (not code units) in the longest lookbehind
|
||||||
assertion in the pattern. The third argument should point to an unsigned 32-bit
|
assertion in the pattern. The third argument should point to a uint32_t
|
||||||
integer. This information is useful when doing multi-segment matching using the
|
integer. This information is useful when doing multi-segment matching using the
|
||||||
partial matching facilities. Note that the simple assertions \b and \B
|
partial matching facilities. Note that the simple assertions \b and \B
|
||||||
require a one-character lookbehind. \A also registers a one-character
|
require a one-character lookbehind. \A also registers a one-character
|
||||||
|
@ -2417,7 +2418,7 @@ zero, the search for a match starts at the beginning of the subject, and this
|
||||||
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
||||||
must point to the start of a character, or to the end of the subject (in UTF-32
|
must point to the start of a character, or to the end of the subject (in UTF-32
|
||||||
mode, one code unit equals one character, so all offsets are valid). Like the
|
mode, one code unit equals one character, so all offsets are valid). Like the
|
||||||
pattern string, the subject may contain binary zeroes.
|
pattern string, the subject may contain binary zeros.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A non-zero starting offset is useful when searching for another match in the
|
A non-zero starting offset is useful when searching for another match in the
|
||||||
|
|
|
@ -227,7 +227,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
||||||
<pre>
|
<pre>
|
||||||
--enable-newline-is-nul
|
--enable-newline-is-nul
|
||||||
</pre>
|
</pre>
|
||||||
which causes NUL (binary zero) is set as the default line-ending character.
|
which causes NUL (binary zero) to be set as the default line-ending character.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||||
|
@ -286,15 +286,15 @@ The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
which is specified in kilobytes. The limit can be changed at run time, as
|
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
||||||
described in the
|
at run time, as described in the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation. The default limit (in effect unlimited) is 20 million. You can
|
documentation. The default limit (in effect unlimited) is 20 million. You can
|
||||||
change this by a setting such as
|
change this by a setting such as
|
||||||
<pre>
|
<pre>
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
</pre>
|
</pre>
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||||
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
||||||
may also use the heap for internal workspace when processing complicated
|
may also use the heap for internal workspace when processing complicated
|
||||||
patterns. This limit does not apply when JIT (which has its own memory
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
|
@ -542,7 +542,7 @@ generated from the string.
|
||||||
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
|
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
|
||||||
to be created. This is normally run under valgrind or used when PCRE2 is
|
to be created. This is normally run under valgrind or used when PCRE2 is
|
||||||
compiled with address sanitizing enabled. It calls the fuzzing function and
|
compiled with address sanitizing enabled. It calls the fuzzing function and
|
||||||
outputs information about it is doing. The input strings are specified by
|
outputs information about what it is doing. The input strings are specified by
|
||||||
arguments: if an argument starts with "=" the rest of it is a literal input
|
arguments: if an argument starts with "=" the rest of it is a literal input
|
||||||
string. Otherwise, it is assumed to be a file name, and the contents of the
|
string. Otherwise, it is assumed to be a file name, and the contents of the
|
||||||
file are the test string.
|
file are the test string.
|
||||||
|
|
|
@ -31,7 +31,7 @@ page.
|
||||||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||||
they do not mean what you might think. For example, (?!a){3} does not assert
|
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||||
that the next three characters are not "a". It just asserts that the next
|
that the next three characters are not "a". It just asserts that the next
|
||||||
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
||||||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||||
for example, \b* (but not \b{3}), but these do not seem to have any use.
|
for example, \b* (but not \b{3}), but these do not seem to have any use.
|
||||||
</P>
|
</P>
|
||||||
|
@ -77,8 +77,8 @@ The \Q...\E sequence is recognized both inside and outside character classes.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||||
constructions. However, there is support PCRE2's "callout" feature, which
|
constructions. However, PCRE2 does have a "callout" feature, which allows an
|
||||||
allows an external function to be called during pattern matching. See the
|
external function to be called during pattern matching. See the
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
documentation for details.
|
documentation for details.
|
||||||
</P>
|
</P>
|
||||||
|
|
|
@ -86,9 +86,10 @@ controlled by parameters that can be set by the <b>--buffer-size</b> and
|
||||||
that is obtained at the start of processing. If an input file contains very
|
that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
||||||
default values for these parameters are specified when <b>pcre2grep</b> is
|
default values for these parameters can be set when <b>pcre2grep</b> is
|
||||||
built, with the default defaults being 20K and 1M respectively. An error occurs
|
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||||
if a line is too long and the buffer can no longer be expanded.
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
|
longer be expanded.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The block of memory that is actually used is three times the "buffer size", to
|
The block of memory that is actually used is three times the "buffer size", to
|
||||||
|
@ -500,13 +501,13 @@ short form for this option.
|
||||||
When this option is given, non-compressed input is read and processed line by
|
When this option is given, non-compressed input is read and processed line by
|
||||||
line, and the output is flushed after each write. By default, input is read in
|
line, and the output is flushed after each write. By default, input is read in
|
||||||
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
||||||
terminal (which is currently possible only in Unix-like environments). Output
|
terminal (which is currently possible only in Unix-like environments or
|
||||||
to terminal is normally automatically flushed by the operating system. This
|
Windows). Output to terminal is normally automatically flushed by the operating
|
||||||
option can be useful when the input or output is attached to a pipe and you do
|
system. This option can be useful when the input or output is attached to a
|
||||||
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
|
pipe and you do not want <b>pcre2grep</b> to buffer up large amounts of data.
|
||||||
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
However, its use will affect performance, and the <b>-M</b> (multiline) option
|
||||||
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
|
ceases to work. When input is from a compressed .gz or .bz2 file,
|
||||||
ignored.
|
<b>--line-buffered</b> is ignored.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--line-offsets</b>
|
<b>--line-offsets</b>
|
||||||
|
@ -541,11 +542,11 @@ counter that is incremented each time around its main processing loop. If the
|
||||||
value set by <b>--match-limit</b> is reached, an error occurs.
|
value set by <b>--match-limit</b> is reached, an error occurs.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount
|
The <b>--heap-limit</b> option specifies, as a number of kibibytes (units of
|
||||||
of heap memory that may be used for matching. Heap memory is needed only if
|
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
||||||
matching the pattern requires a significant number of nested backtracking
|
memory is needed only if matching the pattern requires a significant number of
|
||||||
points to be remembered. This parameter can be set to zero to forbid the use of
|
nested backtracking points to be remembered. This parameter can be set to zero
|
||||||
heap memory altogether.
|
to forbid the use of heap memory altogether.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
||||||
|
@ -556,9 +557,9 @@ limit acts varies from pattern to pattern. This limit is of use only if it is
|
||||||
set smaller than <b>--match-limit</b>.
|
set smaller than <b>--match-limit</b>.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
There are no short forms for these options. The default settings are specified
|
There are no short forms for these options. The default limits can be set
|
||||||
when the PCRE2 library is compiled, with the default defaults being very large
|
when the PCRE2 library is compiled; if they are not specified, the defaults
|
||||||
and so effectively unlimited.
|
are very large and so effectively unlimited.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
\fB--max-buffer-size=<i>number</i>
|
\fB--max-buffer-size=<i>number</i>
|
||||||
|
|
|
@ -54,9 +54,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
|
||||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||||
order to limit the amount of system stack used at compile time. The default
|
order to limit the amount of system stack used at compile time. The default
|
||||||
limit can be specified when PCRE2 is built; the default default is 250. An
|
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||||
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||||
set the limit in a compile context.
|
to set the limit in a compile context.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The maximum length of name for a named subpattern is 32 code units, and the
|
The maximum length of name for a named subpattern is 32 code units, and the
|
||||||
|
|
|
@ -196,7 +196,7 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used. The heap limit is
|
setting of one of these limits, the lower value is used. The heap limit is
|
||||||
specified in kilobytes.
|
specified in kibibytes (units of 1024 bytes).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||||
|
@ -549,7 +549,7 @@ Absolute and relative back references
|
||||||
<P>
|
<P>
|
||||||
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative backreference. A named backreference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \g{name}. Back references are discussed
|
can be coded as \g{name}. backreferences are discussed
|
||||||
<a href="#backreferences">later,</a>
|
<a href="#backreferences">later,</a>
|
||||||
following the discussion of
|
following the discussion of
|
||||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||||
|
@ -1037,7 +1037,7 @@ joiner" characters. Characters with the "mark" property always have the
|
||||||
modifier). Extending characters are allowed before the modifier.
|
modifier). Extending characters are allowed before the modifier.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
7. Do not break within emoji zwj sequences (zero-width jointer followed by
|
7. Do not break within emoji zwj sequences (zero-width joiner followed by
|
||||||
"glue after ZWJ" or "base glue after ZWJ").
|
"glue after ZWJ" or "base glue after ZWJ").
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -2210,8 +2210,8 @@ after the reference.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There may be more than one backreference to the same subpattern. If a
|
There may be more than one backreference to the same subpattern. If a
|
||||||
subpattern has not actually been used in a particular match, any back
|
subpattern has not actually been used in a particular match, any backreferences
|
||||||
references to it always fail by default. For example, the pattern
|
to it always fail by default. For example, the pattern
|
||||||
<pre>
|
<pre>
|
||||||
(a|(bc))\2
|
(a|(bc))\2
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Back references of this type cause the group that they reference to be treated
|
backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
<a href="#atomicgroup">atomic group.</a>
|
<a href="#atomicgroup">atomic group.</a>
|
||||||
Once the whole group has been matched, a subsequent matching failure cannot
|
Once the whole group has been matched, a subsequent matching failure cannot
|
||||||
|
|
|
@ -139,7 +139,7 @@ because it disables the use of back references.
|
||||||
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
||||||
(which has the type const char *) must be set to point to the character beyond
|
(which has the type const char *) must be set to point to the character beyond
|
||||||
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
||||||
now contain binary zeroes, which are treated as data characters. Without
|
now contain binary zeros, which are treated as data characters. Without
|
||||||
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
||||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||||
caution in software intended to be portable to other systems.
|
caution in software intended to be portable to other systems.
|
||||||
|
@ -248,10 +248,10 @@ function.
|
||||||
<pre>
|
<pre>
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
</pre>
|
</pre>
|
||||||
When this option is set, the subject string is starts at <i>string</i> +
|
When this option is set, the subject string starts at <i>string</i> +
|
||||||
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
||||||
should point to the first character beyond the string. There may be binary
|
should point to the first character beyond the string. There may be binary
|
||||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
zeros within the subject string, and indeed, using REG_STARTEND is the only
|
||||||
way to pass a subject string that contains a binary zero.
|
way to pass a subject string that contains a binary zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
|
|
@ -442,7 +442,7 @@ of the newline or \R options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
<pre>
|
<pre>
|
||||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||||
(*LIMIT_MATCH=d) set the match limit to d
|
(*LIMIT_MATCH=d) set the match limit to d
|
||||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||||
|
|
|
@ -129,7 +129,7 @@ to occur).
|
||||||
UTF-8 (in its original definition) is not capable of encoding values greater
|
UTF-8 (in its original definition) is not capable of encoding values greater
|
||||||
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
||||||
testing this library in non-UTF mode with <b>utf8_input</b> set, if any
|
testing this library in non-UTF mode with <b>utf8_input</b> set, if any
|
||||||
character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
|
character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
|
||||||
0x80000000 is added to the character's value. This is the only way of passing
|
0x80000000 is added to the character's value. This is the only way of passing
|
||||||
such code points in a pattern string. For subject strings, using an escape
|
such code points in a pattern string. For subject strings, using an escape
|
||||||
sequence is preferable.
|
sequence is preferable.
|
||||||
|
@ -264,7 +264,7 @@ Do not output the version number of <b>pcre2test</b> at the start of execution.
|
||||||
<P>
|
<P>
|
||||||
<b>-S</b> <i>size</i>
|
<b>-S</b> <i>size</i>
|
||||||
On Unix-like systems, set the size of the run-time stack to <i>size</i>
|
On Unix-like systems, set the size of the run-time stack to <i>size</i>
|
||||||
megabytes.
|
mebibytes (units of 1024*1024 bytes).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-subject</b> <i>modifier-list</i>
|
<b>-subject</b> <i>modifier-list</i>
|
||||||
|
@ -679,8 +679,8 @@ Newline and \R handling
|
||||||
<P>
|
<P>
|
||||||
The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
|
The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
|
||||||
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
|
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
|
||||||
\R matches any Unicode newline sequence. The default is specified when PCRE2
|
\R matches any Unicode newline sequence. The default can be specified when
|
||||||
is built, with the default default being Unicode.
|
PCRE2 is built; if it is not, the default is set to Unicode.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>newline</b> modifier specifies which characters are to be interpreted as
|
The <b>newline</b> modifier specifies which characters are to be interpreted as
|
||||||
|
@ -1418,11 +1418,11 @@ Setting the JIT stack size
|
||||||
<P>
|
<P>
|
||||||
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||||
optimization is not being used. The value is a number of kilobytes. Setting
|
optimization is not being used. The value is a number of kibibytes (units of
|
||||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
|
||||||
default is necessary only for very complicated patterns. If <b>jitstack</b> is
|
that is larger than the default is necessary only for very complicated
|
||||||
set non-zero on a subject line it overrides any value that was set on the
|
patterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
|
||||||
pattern.
|
value that was set on the pattern.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting heap, match, and depth limits
|
Setting heap, match, and depth limits
|
||||||
|
@ -1468,10 +1468,10 @@ and non-recursive, to the internal matching function, thus controlling the
|
||||||
overall amount of computing resource that is used.
|
overall amount of computing resource that is used.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
|
For both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
|
||||||
limits the amount of heap memory used for matching. A value of zero disables
|
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
||||||
the use of any heap memory; many simple pattern matches can be done without
|
value of zero disables the use of any heap memory; many simple pattern matches
|
||||||
using the heap, so this is not an unreasonable setting.
|
can be done without using the heap, so zero is not an unreasonable setting.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
249
doc/pcre2.txt
249
doc/pcre2.txt
|
@ -619,11 +619,12 @@ NEWLINES
|
||||||
|
|
||||||
Each of the first three conventions is used by at least one operating
|
Each of the first three conventions is used by at least one operating
|
||||||
system as its standard newline sequence. When PCRE2 is built, a default
|
system as its standard newline sequence. When PCRE2 is built, a default
|
||||||
can be specified. The default default is LF, which is the Unix stan-
|
can be specified. If it is not, the default is set to LF, which is the
|
||||||
dard. However, the newline convention can be changed by an application
|
Unix standard. However, the newline convention can be changed by an
|
||||||
when calling pcre2_compile(), or it can be specified by special text at
|
application when calling pcre2_compile(), or it can be specified by
|
||||||
the start of the pattern itself; this overrides any other settings. See
|
special text at the start of the pattern itself; this overrides any
|
||||||
the pcre2pattern page for details of the special character sequences.
|
other settings. See the pcre2pattern page for details of the special
|
||||||
|
character sequences.
|
||||||
|
|
||||||
In the PCRE2 documentation the word "newline" is used to mean "the
|
In the PCRE2 documentation the word "newline" is used to mean "the
|
||||||
character or pair of characters that indicate a line break". The choice
|
character or pair of characters that indicate a line break". The choice
|
||||||
|
@ -957,17 +958,17 @@ PCRE2 CONTEXTS
|
||||||
int pcre2_set_heap_limit(pcre2_match_context *mcontext,
|
int pcre2_set_heap_limit(pcre2_match_context *mcontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
The heap_limit parameter specifies, in units of kilobytes, the maximum
|
The heap_limit parameter specifies, in units of kibibytes (1024 bytes),
|
||||||
amount of heap memory that pcre2_match() may use to hold backtracking
|
the maximum amount of heap memory that pcre2_match() may use to hold
|
||||||
information when running an interpretive match. This limit also applies
|
backtracking information when running an interpretive match. This limit
|
||||||
to pcre2_dfa_match(), which may use the heap when processing patterns
|
also applies to pcre2_dfa_match(), which may use the heap when process-
|
||||||
with a lot of nested pattern recursion or lookarounds or atomic groups.
|
ing patterns with a lot of nested pattern recursion or lookarounds or
|
||||||
This limit does not apply to matching with the JIT optimization, which
|
atomic groups. This limit does not apply to matching with the JIT opti-
|
||||||
has its own memory control arrangements (see the pcre2jit documentation
|
mization, which has its own memory control arrangements (see the
|
||||||
for more details). If the limit is reached, the negative error code
|
pcre2jit documentation for more details). If the limit is reached, the
|
||||||
PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2
|
negative error code PCRE2_ERROR_HEAPLIMIT is returned. The default
|
||||||
is built; the default default is very large and is essentially "unlim-
|
limit can be set when PCRE2 is built; if it is not, the default is set
|
||||||
ited".
|
very large and is essentially "unlimited".
|
||||||
|
|
||||||
A value for the heap limit may also be supplied by an item at the start
|
A value for the heap limit may also be supplied by an item at the start
|
||||||
of a pattern of the form
|
of a pattern of the form
|
||||||
|
@ -1042,7 +1043,7 @@ PCRE2 CONTEXTS
|
||||||
using JIT compiled code. However, it is supported by pcre2_dfa_match(),
|
using JIT compiled code. However, it is supported by pcre2_dfa_match(),
|
||||||
which uses it to limit the depth of nested internal recursive function
|
which uses it to limit the depth of nested internal recursive function
|
||||||
calls that implement atomic groups, lookaround assertions, and pattern
|
calls that implement atomic groups, lookaround assertions, and pattern
|
||||||
recursions. This limits, indirectly, the amount of system stack this is
|
recursions. This limits, indirectly, the amount of system stack that is
|
||||||
used. It was more useful in versions before 10.32, when stack memory
|
used. It was more useful in versions before 10.32, when stack memory
|
||||||
was used for local workspace vectors for recursive function calls. From
|
was used for local workspace vectors for recursive function calls. From
|
||||||
version 10.32, only local variables are allocated on the stack and as
|
version 10.32, only local variables are allocated on the stack and as
|
||||||
|
@ -1058,10 +1059,11 @@ PCRE2 CONTEXTS
|
||||||
directly by calling pcre2_set_heap_limit().
|
directly by calling pcre2_set_heap_limit().
|
||||||
|
|
||||||
The default value for the depth limit can be set when PCRE2 is built;
|
The default value for the depth limit can be set when PCRE2 is built;
|
||||||
the default default is the same value as the default for the match
|
if it is not, the default is set to the same value as the default for
|
||||||
limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match()
|
the match limit. If the limit is exceeded, pcre2_match() or
|
||||||
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
pcre2_dfa_match() returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth
|
||||||
supplied by an item at the start of a pattern of the form
|
limit may also be supplied by an item at the start of a pattern of the
|
||||||
|
form
|
||||||
|
|
||||||
(*LIMIT_DEPTH=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
|
|
||||||
|
@ -1117,7 +1119,7 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
|
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
|
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default
|
The output is a uint32_t integer that gives, in kibibytes, the default
|
||||||
limit for the amount of heap memory used by pcre2_match() or
|
limit for the amount of heap memory used by pcre2_match() or
|
||||||
pcre2_dfa_match(). Further details are given with
|
pcre2_dfa_match(). Further details are given with
|
||||||
pcre2_set_heap_limit() above.
|
pcre2_set_heap_limit() above.
|
||||||
|
@ -1413,7 +1415,7 @@ COMPILING A PATTERN
|
||||||
it can be changed within a pattern by a (?i) option setting. If
|
it can be changed within a pattern by a (?i) option setting. If
|
||||||
PCRE2_UTF is set, Unicode properties are used for all characters with
|
PCRE2_UTF is set, Unicode properties are used for all characters with
|
||||||
more than one other case, and for all characters whose code points are
|
more than one other case, and for all characters whose code points are
|
||||||
greater than U+007f. For lower valued characters with only one other
|
greater than U+007F. For lower valued characters with only one other
|
||||||
case, a lookup table is used for speed. When PCRE2_UTF is not set, a
|
case, a lookup table is used for speed. When PCRE2_UTF is not set, a
|
||||||
lookup table is used for all code points less than 256, and higher code
|
lookup table is used for all code points less than 256, and higher code
|
||||||
points (available only in 16-bit or 32-bit mode) are treated as not
|
points (available only in 16-bit or 32-bit mode) are treated as not
|
||||||
|
@ -1983,18 +1985,17 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
Return the number of the highest backreference in the pattern. The
|
Return the number of the highest backreference in the pattern. The
|
||||||
third argument should point to an uint32_t variable. Named subpatterns
|
third argument should point to an uint32_t variable. Named subpatterns
|
||||||
acquire numbers as well as names, and these count towards the highest
|
acquire numbers as well as names, and these count towards the highest
|
||||||
back reference. Back references such as \4 or \g{12} match the cap-
|
backreference. Backreferences such as \4 or \g{12} match the captured
|
||||||
tured characters of the given group, but in addition, the check that a
|
characters of the given group, but in addition, the check that a cap-
|
||||||
capturing group is set in a conditional subpattern such as (?(3)a|b) is
|
turing group is set in a conditional subpattern such as (?(3)a|b) is
|
||||||
also a back reference. Zero is returned if there are no back refer-
|
also a backreference. Zero is returned if there are no backreferences.
|
||||||
ences.
|
|
||||||
|
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
|
|
||||||
The output is a uint32_t whose value indicates what character sequences
|
The output is a uint32_t integer whose value indicates what character
|
||||||
the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that
|
sequences the \R escape sequence matches. A value of PCRE2_BSR_UNICODE
|
||||||
\R matches any Unicode line ending sequence; a value of PCRE2_BSR_ANY-
|
means that \R matches any Unicode line ending sequence; a value of
|
||||||
CRLF means that \R matches only CR, LF, or CRLF.
|
PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF.
|
||||||
|
|
||||||
PCRE2_INFO_CAPTURECOUNT
|
PCRE2_INFO_CAPTURECOUNT
|
||||||
|
|
||||||
|
@ -2006,8 +2007,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
|
|
||||||
If the pattern set a backtracking depth limit by including an item of
|
If the pattern set a backtracking depth limit by including an item of
|
||||||
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
|
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
|
||||||
third argument should point to an unsigned 32-bit integer. If no such
|
third argument should point to a uint32_t integer. If no such value has
|
||||||
value has been set, the call to pcre2_pattern_info() returns the error
|
been set, the call to pcre2_pattern_info() returns the error
|
||||||
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
||||||
ing if it is less than the limit set or defaulted by the caller of the
|
ing if it is less than the limit set or defaulted by the caller of the
|
||||||
match function.
|
match function.
|
||||||
|
@ -2021,7 +2022,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
code unit values greater than 255 are supported, the flag bit for 255
|
code unit values greater than 255 are supported, the flag bit for 255
|
||||||
means "any code unit of value 255 or above". If such a table was con-
|
means "any code unit of value 255 or above". If such a table was con-
|
||||||
structed, a pointer to it is returned. Otherwise NULL is returned. The
|
structed, a pointer to it is returned. Otherwise NULL is returned. The
|
||||||
third argument should point to an const uint8_t * variable.
|
third argument should point to a const uint8_t * variable.
|
||||||
|
|
||||||
PCRE2_INFO_FIRSTCODETYPE
|
PCRE2_INFO_FIRSTCODETYPE
|
||||||
|
|
||||||
|
@ -2048,7 +2049,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
|
|
||||||
Return the size (in bytes) of the data frames that are used to remember
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
backtracking positions when the pattern is processed by pcre2_match()
|
backtracking positions when the pattern is processed by pcre2_match()
|
||||||
without the use of JIT. The third argument should point to an size_t
|
without the use of JIT. The third argument should point to a size_t
|
||||||
variable. The frame size depends on the number of capturing parentheses
|
variable. The frame size depends on the number of capturing parentheses
|
||||||
in the pattern. Each additional capturing group adds two PCRE2_SIZE
|
in the pattern. Each additional capturing group adds two PCRE2_SIZE
|
||||||
variables.
|
variables.
|
||||||
|
@ -2070,11 +2071,10 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
|
|
||||||
If the pattern set a heap memory limit by including an item of the form
|
If the pattern set a heap memory limit by including an item of the form
|
||||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu-
|
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu-
|
||||||
ment should point to an unsigned 32-bit integer. If no such value has
|
ment should point to a uint32_t integer. If no such value has been set,
|
||||||
been set, the call to pcre2_pattern_info() returns the error
|
the call to pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET.
|
||||||
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
Note that this limit will only be used during matching if it is less
|
||||||
ing if it is less than the limit set or defaulted by the caller of the
|
than the limit set or defaulted by the caller of the match function.
|
||||||
match function.
|
|
||||||
|
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
|
|
||||||
|
@ -2120,8 +2120,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
|
|
||||||
If the pattern set a match limit by including an item of the form
|
If the pattern set a match limit by including an item of the form
|
||||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third
|
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third
|
||||||
argument should point to an unsigned 32-bit integer. If no such value
|
argument should point to a uint32_t integer. If no such value has been
|
||||||
has been set, the call to pcre2_pattern_info() returns the error
|
set, the call to pcre2_pattern_info() returns the error
|
||||||
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
||||||
ing if it is less than the limit set or defaulted by the caller of the
|
ing if it is less than the limit set or defaulted by the caller of the
|
||||||
match function.
|
match function.
|
||||||
|
@ -2129,15 +2129,15 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
PCRE2_INFO_MAXLOOKBEHIND
|
PCRE2_INFO_MAXLOOKBEHIND
|
||||||
|
|
||||||
Return the number of characters (not code units) in the longest lookbe-
|
Return the number of characters (not code units) in the longest lookbe-
|
||||||
hind assertion in the pattern. The third argument should point to an
|
hind assertion in the pattern. The third argument should point to a
|
||||||
unsigned 32-bit integer. This information is useful when doing multi-
|
uint32_t integer. This information is useful when doing multi-segment
|
||||||
segment matching using the partial matching facilities. Note that the
|
matching using the partial matching facilities. Note that the simple
|
||||||
simple assertions \b and \B require a one-character lookbehind. \A also
|
assertions \b and \B require a one-character lookbehind. \A also regis-
|
||||||
registers a one-character lookbehind, though it does not actually
|
ters a one-character lookbehind, though it does not actually inspect
|
||||||
inspect the previous character. This is to ensure that at least one
|
the previous character. This is to ensure that at least one character
|
||||||
character from the old segment is retained when a new segment is pro-
|
from the old segment is retained when a new segment is processed. Oth-
|
||||||
cessed. Otherwise, if there are no lookbehinds in the pattern, \A might
|
erwise, if there are no lookbehinds in the pattern, \A might match
|
||||||
match incorrectly at the start of a second or subsequent segment.
|
incorrectly at the start of a second or subsequent segment.
|
||||||
|
|
||||||
PCRE2_INFO_MINLENGTH
|
PCRE2_INFO_MINLENGTH
|
||||||
|
|
||||||
|
@ -2378,7 +2378,7 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
||||||
set must point to the start of a character, or to the end of the sub-
|
set must point to the start of a character, or to the end of the sub-
|
||||||
ject (in UTF-32 mode, one code unit equals one character, so all off-
|
ject (in UTF-32 mode, one code unit equals one character, so all off-
|
||||||
sets are valid). Like the pattern string, the subject may contain
|
sets are valid). Like the pattern string, the subject may contain
|
||||||
binary zeroes.
|
binary zeros.
|
||||||
|
|
||||||
A non-zero starting offset is useful when searching for another match
|
A non-zero starting offset is useful when searching for another match
|
||||||
in the same subject by calling pcre2_match() again after a previous
|
in the same subject by calling pcre2_match() again after a previous
|
||||||
|
@ -3445,8 +3445,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
PCRE2_ERROR_DFA_UCOND
|
PCRE2_ERROR_DFA_UCOND
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters a condition item
|
This return is given if pcre2_dfa_match() encounters a condition item
|
||||||
that uses a back reference for the condition, or a test for recursion
|
that uses a backreference for the condition, or a test for recursion in
|
||||||
in a specific group. These are not supported.
|
a specific group. These are not supported.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_WSSIZE
|
PCRE2_ERROR_DFA_WSSIZE
|
||||||
|
|
||||||
|
@ -3683,8 +3683,8 @@ NEWLINE RECOGNITION
|
||||||
|
|
||||||
--enable-newline-is-nul
|
--enable-newline-is-nul
|
||||||
|
|
||||||
which causes NUL (binary zero) is set as the default line-ending char-
|
which causes NUL (binary zero) to be set as the default line-ending
|
||||||
acter.
|
character.
|
||||||
|
|
||||||
Whatever default line ending convention is selected when PCRE2 is built
|
Whatever default line ending convention is selected when PCRE2 is built
|
||||||
can be overridden by applications that use the library. At build time
|
can be overridden by applications that use the library. At build time
|
||||||
|
@ -3745,18 +3745,18 @@ LIMITING PCRE2 RESOURCE USAGE
|
||||||
stack to record backtracking points. The more nested backtracking
|
stack to record backtracking points. The more nested backtracking
|
||||||
points there are (that is, the deeper the search tree), the more memory
|
points there are (that is, the deeper the search tree), the more memory
|
||||||
is needed. If the initial vector is not large enough, heap memory is
|
is needed. If the initial vector is not large enough, heap memory is
|
||||||
used, up to a certain limit, which is specified in kilobytes. The limit
|
used, up to a certain limit, which is specified in kibibytes (units of
|
||||||
can be changed at run time, as described in the pcre2api documentation.
|
1024 bytes). The limit can be changed at run time, as described in the
|
||||||
The default limit (in effect unlimited) is 20 million. You can change
|
pcre2api documentation. The default limit (in effect unlimited) is 20
|
||||||
this by a setting such as
|
million. You can change this by a setting such as
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||||
only to interpretive matching in pcre2_match() and pcre2_dfa_match(),
|
interpretive matching in pcre2_match() and pcre2_dfa_match(), which may
|
||||||
which may also use the heap for internal workspace when processing com-
|
also use the heap for internal workspace when processing complicated
|
||||||
plicated patterns. This limit does not apply when JIT (which has its
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
own memory arrangements) is used.
|
arrangements) is used.
|
||||||
|
|
||||||
You can also explicitly limit the depth of nested backtracking in the
|
You can also explicitly limit the depth of nested backtracking in the
|
||||||
pcre2_match() interpreter. This limit defaults to the value that is set
|
pcre2_match() interpreter. This limit defaults to the value that is set
|
||||||
|
@ -4005,10 +4005,10 @@ SUPPORT FOR FUZZERS
|
||||||
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
|
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
|
||||||
zcheck to be created. This is normally run under valgrind or used when
|
zcheck to be created. This is normally run under valgrind or used when
|
||||||
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
|
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
|
||||||
function and outputs information about it is doing. The input strings
|
function and outputs information about what it is doing. The input
|
||||||
are specified by arguments: if an argument starts with "=" the rest of
|
strings are specified by arguments: if an argument starts with "=" the
|
||||||
it is a literal input string. Otherwise, it is assumed to be a file
|
rest of it is a literal input string. Otherwise, it is assumed to be a
|
||||||
name, and the contents of the file are the test string.
|
file name, and the contents of the file are the test string.
|
||||||
|
|
||||||
|
|
||||||
OBSOLETE OPTION
|
OBSOLETE OPTION
|
||||||
|
@ -4167,9 +4167,9 @@ MISSING CALLOUTS
|
||||||
all branches are anchorable.
|
all branches are anchorable.
|
||||||
|
|
||||||
This optimization is disabled, however, if .* is in an atomic group or
|
This optimization is disabled, however, if .* is in an atomic group or
|
||||||
if there is a back reference to the capturing group in which it
|
if there is a backreference to the capturing group in which it appears.
|
||||||
appears. It is also disabled if the pattern contains (*PRUNE) or
|
It is also disabled if the pattern contains (*PRUNE) or (*SKIP). How-
|
||||||
(*SKIP). However, the presence of callouts does not affect it.
|
ever, the presence of callouts does not affect it.
|
||||||
|
|
||||||
For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT
|
For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT
|
||||||
and applied to the string "aa", the pcre2test output is:
|
and applied to the string "aa", the pcre2test output is:
|
||||||
|
@ -4489,7 +4489,7 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
|
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
|
||||||
tions, but they do not mean what you might think. For example, (?!a){3}
|
tions, but they do not mean what you might think. For example, (?!a){3}
|
||||||
does not assert that the next three characters are not "a". It just
|
does not assert that the next three characters are not "a". It just
|
||||||
asserts that the next character is not "a" three times (in principle:
|
asserts that the next character is not "a" three times (in principle;
|
||||||
PCRE2 optimizes this to run the assertion just once). Perl allows some
|
PCRE2 optimizes this to run the assertion just once). Perl allows some
|
||||||
repeat quantifiers on other assertions, for example, \b* (but not
|
repeat quantifiers on other assertions, for example, \b* (but not
|
||||||
\b{3}), but these do not seem to have any use.
|
\b{3}), but these do not seem to have any use.
|
||||||
|
@ -4534,9 +4534,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
classes.
|
classes.
|
||||||
|
|
||||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and
|
7. Fairly obviously, PCRE2 does not support the (?{code}) and
|
||||||
(??{code}) constructions. However, there is support PCRE2's "callout"
|
(??{code}) constructions. However, PCRE2 does have a "callout" feature,
|
||||||
feature, which allows an external function to be called during pattern
|
which allows an external function to be called during pattern matching.
|
||||||
matching. See the pcre2callout documentation for details.
|
See the pcre2callout documentation for details.
|
||||||
|
|
||||||
8. Subroutine calls (whether recursive or not) were treated as atomic
|
8. Subroutine calls (whether recursive or not) were treated as atomic
|
||||||
groups up to PCRE2 release 10.23, but from release 10.30 this changed,
|
groups up to PCRE2 release 10.23, but from release 10.30 this changed,
|
||||||
|
@ -4604,9 +4604,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
different length of string. Perl requires them all to have the same
|
different length of string. Perl requires them all to have the same
|
||||||
length.
|
length.
|
||||||
|
|
||||||
(b) From PCRE2 10.23, back references to groups of fixed length are
|
(b) From PCRE2 10.23, backreferences to groups of fixed length are sup-
|
||||||
supported in lookbehinds, provided that there is no possibility of ref-
|
ported in lookbehinds, provided that there is no possibility of refer-
|
||||||
erencing a non-unique number or name. Perl does not support backrefer-
|
encing a non-unique number or name. Perl does not support backrefer-
|
||||||
ences in lookbehinds.
|
ences in lookbehinds.
|
||||||
|
|
||||||
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
|
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
|
||||||
|
@ -5103,9 +5103,9 @@ SIZE AND OTHER LIMITATIONS
|
||||||
limit to the depth of nesting of parenthesized subpatterns of all
|
limit to the depth of nesting of parenthesized subpatterns of all
|
||||||
kinds. This is imposed in order to limit the amount of system stack
|
kinds. This is imposed in order to limit the amount of system stack
|
||||||
used at compile time. The default limit can be specified when PCRE2 is
|
used at compile time. The default limit can be specified when PCRE2 is
|
||||||
built; the default default is 250. An application can change this limit
|
built; if not, the default is set to 250. An application can change
|
||||||
by calling pcre2_set_parens_nest_limit() to set the limit in a compile
|
this limit by calling pcre2_set_parens_nest_limit() to set the limit in
|
||||||
context.
|
a compile context.
|
||||||
|
|
||||||
The maximum length of name for a named subpattern is 32 code units, and
|
The maximum length of name for a named subpattern is 32 code units, and
|
||||||
the maximum number of named subpatterns is 10000.
|
the maximum number of named subpatterns is 10000.
|
||||||
|
@ -5929,7 +5929,8 @@ SPECIAL START-OF-PATTERN ITEMS
|
||||||
pcre2_match() for it to have any effect. In other words, the pattern
|
pcre2_match() for it to have any effect. In other words, the pattern
|
||||||
writer can lower the limits set by the programmer, but not raise them.
|
writer can lower the limits set by the programmer, but not raise them.
|
||||||
If there is more than one setting of one of these limits, the lower
|
If there is more than one setting of one of these limits, the lower
|
||||||
value is used. The heap limit is specified in kilobytes.
|
value is used. The heap limit is specified in kibibytes (units of 1024
|
||||||
|
bytes).
|
||||||
|
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
|
||||||
name is still recognized for backwards compatibility.
|
name is still recognized for backwards compatibility.
|
||||||
|
@ -6230,8 +6231,8 @@ BACKSLASH
|
||||||
All UTF modes no greater than 0x10ffff and a valid code point
|
All UTF modes no greater than 0x10ffff and a valid code point
|
||||||
|
|
||||||
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff
|
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff
|
||||||
(the so-called "surrogate" codepoints). The check for these can be dis-
|
(the so-called "surrogate" code points). The check for these can be
|
||||||
abled by the caller of pcre2_compile() by setting the option
|
disabled by the caller of pcre2_compile() by setting the option
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||||
|
|
||||||
Escape sequences in character classes
|
Escape sequences in character classes
|
||||||
|
@ -6257,7 +6258,7 @@ BACKSLASH
|
||||||
|
|
||||||
The sequence \g followed by a signed or unsigned number, optionally
|
The sequence \g followed by a signed or unsigned number, optionally
|
||||||
enclosed in braces, is an absolute or relative backreference. A named
|
enclosed in braces, is an absolute or relative backreference. A named
|
||||||
back reference can be coded as \g{name}. Back references are discussed
|
backreference can be coded as \g{name}. backreferences are discussed
|
||||||
later, following the discussion of parenthesized subpatterns.
|
later, following the discussion of parenthesized subpatterns.
|
||||||
|
|
||||||
Absolute and relative subroutine calls
|
Absolute and relative subroutine calls
|
||||||
|
@ -6266,8 +6267,8 @@ BACKSLASH
|
||||||
name or a number enclosed either in angle brackets or single quotes, is
|
name or a number enclosed either in angle brackets or single quotes, is
|
||||||
an alternative syntax for referencing a subpattern as a "subroutine".
|
an alternative syntax for referencing a subpattern as a "subroutine".
|
||||||
Details are discussed later. Note that \g{...} (Perl syntax) and
|
Details are discussed later. Note that \g{...} (Perl syntax) and
|
||||||
\g<...> (Oniguruma syntax) are not synonymous. The former is a back
|
\g<...> (Oniguruma syntax) are not synonymous. The former is a backref-
|
||||||
reference; the latter is a subroutine call.
|
erence; the latter is a subroutine call.
|
||||||
|
|
||||||
Generic character types
|
Generic character types
|
||||||
|
|
||||||
|
@ -6593,7 +6594,7 @@ BACKSLASH
|
||||||
lowed by a modifier). Extending characters are allowed before the modi-
|
lowed by a modifier). Extending characters are allowed before the modi-
|
||||||
fier.
|
fier.
|
||||||
|
|
||||||
7. Do not break within emoji zwj sequences (zero-width jointer followed
|
7. Do not break within emoji zwj sequences (zero-width joiner followed
|
||||||
by "glue after ZWJ" or "base glue after ZWJ").
|
by "glue after ZWJ" or "base glue after ZWJ").
|
||||||
|
|
||||||
8. Do not break within emoji flag sequences. That is, do not break
|
8. Do not break within emoji flag sequences. That is, do not break
|
||||||
|
@ -7285,7 +7286,7 @@ NAMED SUBPATTERNS
|
||||||
|
|
||||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...)
|
In PCRE2, a subpattern can be named in one of three ways: (?<name>...)
|
||||||
or (?'name'...) as in Perl, or (?P<name>...) as in Python. References
|
or (?'name'...) as in Perl, or (?P<name>...) as in Python. References
|
||||||
to capturing parentheses from other parts of the pattern, such as back
|
to capturing parentheses from other parts of the pattern, such as back-
|
||||||
references, recursion, and conditions, can be made by name as well as
|
references, recursion, and conditions, can be made by name as well as
|
||||||
by number.
|
by number.
|
||||||
|
|
||||||
|
@ -7321,8 +7322,8 @@ NAMED SUBPATTERNS
|
||||||
that name that matched. This saves searching to find which numbered
|
that name that matched. This saves searching to find which numbered
|
||||||
subpattern it was.
|
subpattern it was.
|
||||||
|
|
||||||
If you make a back reference to a non-unique named subpattern from
|
If you make a backreference to a non-unique named subpattern from else-
|
||||||
elsewhere in the pattern, the subpatterns to which the name refers are
|
where in the pattern, the subpatterns to which the name refers are
|
||||||
checked in the order in which they appear in the overall pattern. The
|
checked in the order in which they appear in the overall pattern. The
|
||||||
first one that is set is used for the reference. For example, this pat-
|
first one that is set is used for the reference. For example, this pat-
|
||||||
tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":
|
tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":
|
||||||
|
@ -7481,9 +7482,9 @@ REPETITION
|
||||||
mization, or alternatively, using ^ to indicate anchoring explicitly.
|
mization, or alternatively, using ^ to indicate anchoring explicitly.
|
||||||
|
|
||||||
However, there are some cases where the optimization cannot be used.
|
However, there are some cases where the optimization cannot be used.
|
||||||
When .* is inside capturing parentheses that are the subject of a back
|
When .* is inside capturing parentheses that are the subject of a
|
||||||
reference elsewhere in the pattern, a match at the start may fail where
|
backreference elsewhere in the pattern, a match at the start may fail
|
||||||
a later one succeeds. Consider, for example:
|
where a later one succeeds. Consider, for example:
|
||||||
|
|
||||||
(.*)abc\1
|
(.*)abc\1
|
||||||
|
|
||||||
|
@ -7631,7 +7632,7 @@ BACK REFERENCES
|
||||||
it is always taken as a backreference, and causes an error only if
|
it is always taken as a backreference, and causes an error only if
|
||||||
there are not that many capturing left parentheses in the entire pat-
|
there are not that many capturing left parentheses in the entire pat-
|
||||||
tern. In other words, the parentheses that are referenced need not be
|
tern. In other words, the parentheses that are referenced need not be
|
||||||
to the left of the reference for numbers less than 8. A "forward back
|
to the left of the reference for numbers less than 8. A "forward back-
|
||||||
reference" of this type can make sense when a repetition is involved
|
reference" of this type can make sense when a repetition is involved
|
||||||
and the subpattern to the right has participated in an earlier itera-
|
and the subpattern to the right has participated in an earlier itera-
|
||||||
tion.
|
tion.
|
||||||
|
@ -7671,10 +7672,10 @@ BACK REFERENCES
|
||||||
This kind of forward reference can be useful it patterns that repeat.
|
This kind of forward reference can be useful it patterns that repeat.
|
||||||
Perl does not support the use of + in this way.
|
Perl does not support the use of + in this way.
|
||||||
|
|
||||||
A back reference matches whatever actually matched the capturing sub-
|
A backreference matches whatever actually matched the capturing subpat-
|
||||||
pattern in the current subject string, rather than anything matching
|
tern in the current subject string, rather than anything matching the
|
||||||
the subpattern itself (see "Subpatterns as subroutines" below for a way
|
subpattern itself (see "Subpatterns as subroutines" below for a way of
|
||||||
of doing that). So the pattern
|
doing that). So the pattern
|
||||||
|
|
||||||
(sens|respons)e and \1ibility
|
(sens|respons)e and \1ibility
|
||||||
|
|
||||||
|
@ -7704,14 +7705,14 @@ BACK REFERENCES
|
||||||
before or after the reference.
|
before or after the reference.
|
||||||
|
|
||||||
There may be more than one backreference to the same subpattern. If a
|
There may be more than one backreference to the same subpattern. If a
|
||||||
subpattern has not actually been used in a particular match, any back
|
subpattern has not actually been used in a particular match, any back-
|
||||||
references to it always fail by default. For example, the pattern
|
references to it always fail by default. For example, the pattern
|
||||||
|
|
||||||
(a|(bc))\2
|
(a|(bc))\2
|
||||||
|
|
||||||
always fails if it starts to match "a" rather than "bc". However, if
|
always fails if it starts to match "a" rather than "bc". However, if
|
||||||
the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a back
|
the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backref-
|
||||||
reference to an unset value matches an empty string.
|
erence to an unset value matches an empty string.
|
||||||
|
|
||||||
Because there may be many capturing parentheses in a pattern, all dig-
|
Because there may be many capturing parentheses in a pattern, all dig-
|
||||||
its following a backslash are taken as part of a potential backrefer-
|
its following a backslash are taken as part of a potential backrefer-
|
||||||
|
@ -7730,13 +7731,13 @@ BACK REFERENCES
|
||||||
(a|b\1)+
|
(a|b\1)+
|
||||||
|
|
||||||
matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
|
matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
|
||||||
ation of the subpattern, the back reference matches the character
|
ation of the subpattern, the backreference matches the character string
|
||||||
string corresponding to the previous iteration. In order for this to
|
corresponding to the previous iteration. In order for this to work, the
|
||||||
work, the pattern must be such that the first iteration does not need
|
pattern must be such that the first iteration does not need to match
|
||||||
to match the back reference. This can be done using alternation, as in
|
the backreference. This can be done using alternation, as in the exam-
|
||||||
the example above, or by a quantifier with a minimum of zero.
|
ple above, or by a quantifier with a minimum of zero.
|
||||||
|
|
||||||
Back references of this type cause the group that they reference to be
|
backreferences of this type cause the group that they reference to be
|
||||||
treated as an atomic group. Once the whole group has been matched, a
|
treated as an atomic group. Once the whole group has been matched, a
|
||||||
subsequent matching failure cannot cause backtracking into the middle
|
subsequent matching failure cannot cause backtracking into the middle
|
||||||
of the group.
|
of the group.
|
||||||
|
@ -7871,8 +7872,8 @@ ASSERTIONS
|
||||||
However, recursion, that is, a "subroutine" call into a group that is
|
However, recursion, that is, a "subroutine" call into a group that is
|
||||||
already active, is not supported.
|
already active, is not supported.
|
||||||
|
|
||||||
Perl does not support back references in lookbehinds. PCRE2 does sup-
|
Perl does not support backreferences in lookbehinds. PCRE2 does support
|
||||||
port them, but only if certain conditions are met. The
|
them, but only if certain conditions are met. The
|
||||||
PCRE2_MATCH_UNSET_BACKREF option must not be set, there must be no use
|
PCRE2_MATCH_UNSET_BACKREF option must not be set, there must be no use
|
||||||
of (?| in the pattern (it creates duplicate subpattern numbers), and if
|
of (?| in the pattern (it creates duplicate subpattern numbers), and if
|
||||||
the backreference is by name, the name must be unique. Of course, the
|
the backreference is by name, the name must be unique. Of course, the
|
||||||
|
@ -8332,11 +8333,10 @@ RECURSIVE PATTERNS
|
||||||
^(.)(\1|a(?2))
|
^(.)(\1|a(?2))
|
||||||
|
|
||||||
This pattern matches "bab". The first capturing parentheses match "b",
|
This pattern matches "bab". The first capturing parentheses match "b",
|
||||||
then in the second group, when the back reference \1 fails to match
|
then in the second group, when the backreference \1 fails to match "b",
|
||||||
"b", the second alternative matches "a" and then recurses. In the
|
the second alternative matches "a" and then recurses. In the recursion,
|
||||||
recursion, \1 does now match "b" and so the whole match succeeds. This
|
\1 does now match "b" and so the whole match succeeds. This match used
|
||||||
match used to fail in Perl, but in later versions (I tried 5.024) it
|
to fail in Perl, but in later versions (I tried 5.024) it now works.
|
||||||
now works.
|
|
||||||
|
|
||||||
|
|
||||||
SUBPATTERNS AS SUBROUTINES
|
SUBPATTERNS AS SUBROUTINES
|
||||||
|
@ -9253,11 +9253,10 @@ COMPILING A PATTERN
|
||||||
If this option is set, the reg_endp field in the preg structure (which
|
If this option is set, the reg_endp field in the preg structure (which
|
||||||
has the type const char *) must be set to point to the character beyond
|
has the type const char *) must be set to point to the character beyond
|
||||||
the end of the pattern before calling regcomp(). The pattern itself may
|
the end of the pattern before calling regcomp(). The pattern itself may
|
||||||
now contain binary zeroes, which are treated as data characters. With-
|
now contain binary zeros, which are treated as data characters. Without
|
||||||
out REG_PEND, a binary zero terminates the pattern and the re_endp
|
REG_PEND, a binary zero terminates the pattern and the re_endp field is
|
||||||
field is ignored. This is a GNU extension to the POSIX standard and
|
ignored. This is a GNU extension to the POSIX standard and should be
|
||||||
should be used with caution in software intended to be portable to
|
used with caution in software intended to be portable to other systems.
|
||||||
other systems.
|
|
||||||
|
|
||||||
REG_UCP
|
REG_UCP
|
||||||
|
|
||||||
|
@ -9364,10 +9363,10 @@ MATCHING A PATTERN
|
||||||
|
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
|
|
||||||
When this option is set, the subject string is starts at string +
|
When this option is set, the subject string starts at string +
|
||||||
pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
|
pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
|
||||||
point to the first character beyond the string. There may be binary
|
point to the first character beyond the string. There may be binary
|
||||||
zeroes within the subject string, and indeed, using REG_STARTEND is the
|
zeros within the subject string, and indeed, using REG_STARTEND is the
|
||||||
only way to pass a subject string that contains a binary zero.
|
only way to pass a subject string that contains a binary zero.
|
||||||
|
|
||||||
Whatever the value of pmatch[0].rm_so, the offsets of the matched
|
Whatever the value of pmatch[0].rm_so, the offsets of the matched
|
||||||
|
@ -9995,7 +9994,7 @@ OPTION SETTING
|
||||||
one of them may appear. For the first three, d is a decimal number.
|
one of them may appear. For the first three, d is a decimal number.
|
||||||
|
|
||||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||||
(*LIMIT_MATCH=d) set the match limit to d
|
(*LIMIT_MATCH=d) set the match limit to d
|
||||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "16 June 2017" "PCRE2 10.30"
|
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "16 June 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
|
|
@ -16,7 +16,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
This function is part of an experimental set of pattern conversion functions.
|
This function is part of an experimental set of pattern conversion functions.
|
||||||
It sets the component separator character that is used when converting globs.
|
It sets the component separator character that is used when converting globs.
|
||||||
The second argument must one of the characters forward slash, backslash, or
|
The second argument must be one of the characters forward slash, backslash, or
|
||||||
dot. The default is backslash when running under Windows, otherwise forward
|
dot. The default is backslash when running under Windows, otherwise forward
|
||||||
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
||||||
the second argument is invalid.
|
the second argument is invalid.
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SET_DEPTH_LIMIT 3 "11 April 2017" "PCRE2 10.30"
|
.TH PCRE2_SET_HEAP_LIMIT 3 "11 April 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
|
|
@ -497,10 +497,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
||||||
.P
|
.P
|
||||||
Each of the first three conventions is used by at least one operating system as
|
Each of the first three conventions is used by at least one operating system as
|
||||||
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
||||||
The default default is LF, which is the Unix standard. However, the newline
|
If it is not, the default is set to LF, which is the Unix standard. However,
|
||||||
convention can be changed by an application when calling \fBpcre2_compile()\fP,
|
the newline convention can be changed by an application when calling
|
||||||
or it can be specified by special text at the start of the pattern itself; this
|
\fBpcre2_compile()\fP, or it can be specified by special text at the start of
|
||||||
overrides any other settings. See the
|
the pattern itself; this overrides any other settings. See the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2pattern\fP
|
\fBpcre2pattern\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -885,19 +885,20 @@ offset limit. In other words, whichever limit comes first is used.
|
||||||
.B " uint32_t \fIvalue\fP);"
|
.B " uint32_t \fIvalue\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
The \fIheap_limit\fP parameter specifies, in units of kibibytes (1024 bytes),
|
||||||
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
the maximum amount of heap memory that \fBpcre2_match()\fP may use to hold
|
||||||
information when running an interpretive match. This limit also applies to
|
backtracking information when running an interpretive match. This limit also
|
||||||
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
|
applies to \fBpcre2_dfa_match()\fP, which may use the heap when processing
|
||||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
patterns with a lot of nested pattern recursion or lookarounds or atomic
|
||||||
does not apply to matching with the JIT optimization, which has its own memory
|
groups. This limit does not apply to matching with the JIT optimization, which
|
||||||
control arrangements (see the
|
has its own memory control arrangements (see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
documentation for more details). If the limit is reached, the negative error
|
documentation for more details). If the limit is reached, the negative error
|
||||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||||
built; the default default is very large and is essentially "unlimited".
|
is built; if it is not, the default is set very large and is essentially
|
||||||
|
"unlimited".
|
||||||
.P
|
.P
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
pattern of the form
|
pattern of the form
|
||||||
|
@ -975,7 +976,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
|
||||||
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
||||||
uses it to limit the depth of nested internal recursive function calls that
|
uses it to limit the depth of nested internal recursive function calls that
|
||||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
limits, indirectly, the amount of system stack that is used. It was more useful
|
||||||
in versions before 10.32, when stack memory was used for local workspace
|
in versions before 10.32, when stack memory was used for local workspace
|
||||||
vectors for recursive function calls. From version 10.32, only local variables
|
vectors for recursive function calls. From version 10.32, only local variables
|
||||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||||
|
@ -989,11 +990,11 @@ using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
|
||||||
probably better to limit heap usage directly by calling
|
probably better to limit heap usage directly by calling
|
||||||
\fBpcre2_set_heap_limit()\fP.
|
\fBpcre2_set_heap_limit()\fP.
|
||||||
.P
|
.P
|
||||||
The default value for the depth limit can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; if it is
|
||||||
default default is the same value as the default for the match limit. If the
|
not, the default is set to the same value as the default for the match limit.
|
||||||
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
|
If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
|
||||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||||
item at the start of a pattern of the form
|
supplied by an item at the start of a pattern of the form
|
||||||
.sp
|
.sp
|
||||||
(*LIMIT_DEPTH=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1050,7 +1051,7 @@ given with \fBpcre2_set_depth_limit()\fP above.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
The output is a uint32_t integer that gives, in kibibytes, the default limit
|
||||||
for the amount of heap memory used by \fBpcre2_match()\fP or
|
for the amount of heap memory used by \fBpcre2_match()\fP or
|
||||||
\fBpcre2_dfa_match()\fP. Further details are given with
|
\fBpcre2_dfa_match()\fP. Further details are given with
|
||||||
\fBpcre2_set_heap_limit()\fP above.
|
\fBpcre2_set_heap_limit()\fP above.
|
||||||
|
@ -1367,7 +1368,7 @@ If this bit is set, letters in the pattern match both upper and lower case
|
||||||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||||
properties are used for all characters with more than one other case, and for
|
properties are used for all characters with more than one other case, and for
|
||||||
all characters whose code points are greater than U+007f. For lower valued
|
all characters whose code points are greater than U+007F. For lower valued
|
||||||
characters with only one other case, a lookup table is used for speed. When
|
characters with only one other case, a lookup table is used for speed. When
|
||||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||||
|
@ -1550,8 +1551,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
||||||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||||
were followed by ?: but named parentheses can still be used for capturing (and
|
were followed by ?: but named parentheses can still be used for capturing (and
|
||||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||||
Note that, when this option is set, references to capturing groups (back
|
Note that, when this option is set, references to capturing groups
|
||||||
references or recursion/subroutine calls) may only refer to named groups,
|
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||||
though the reference can be by name or by number.
|
though the reference can be by name or by number.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_NO_AUTO_POSSESS
|
PCRE2_NO_AUTO_POSSESS
|
||||||
|
@ -1976,10 +1977,10 @@ returned if there are no back references.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t whose value indicates what character sequences the \eR
|
The output is a uint32_t integer whose value indicates what character sequences
|
||||||
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR matches
|
the \eR escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR
|
||||||
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \eR
|
matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
|
||||||
matches only CR, LF, or CRLF.
|
that \eR matches only CR, LF, or CRLF.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_CAPTURECOUNT
|
PCRE2_INFO_CAPTURECOUNT
|
||||||
.sp
|
.sp
|
||||||
|
@ -1991,10 +1992,10 @@ The third argument should point to an \fBuint32_t\fP variable.
|
||||||
.sp
|
.sp
|
||||||
If the pattern set a backtracking depth limit by including an item of the form
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
.sp
|
.sp
|
||||||
|
@ -2004,7 +2005,7 @@ values for the first code unit in any match. For example, a pattern that starts
|
||||||
with [abc] results in a table with three bits set. When code unit values
|
with [abc] results in a table with three bits set. When code unit values
|
||||||
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
||||||
value 255 or above". If such a table was constructed, a pointer to it is
|
value 255 or above". If such a table was constructed, a pointer to it is
|
||||||
returned. Otherwise NULL is returned. The third argument should point to an
|
returned. Otherwise NULL is returned. The third argument should point to a
|
||||||
\fBconst uint8_t *\fP variable.
|
\fBconst uint8_t *\fP variable.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_FIRSTCODETYPE
|
PCRE2_INFO_FIRSTCODETYPE
|
||||||
|
@ -2031,7 +2032,7 @@ and up to 0xffffffff when not using UTF-32 mode.
|
||||||
.sp
|
.sp
|
||||||
Return the size (in bytes) of the data frames that are used to remember
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||||
without the use of JIT. The third argument should point to an \fBsize_t\fP
|
without the use of JIT. The third argument should point to a \fBsize_t\fP
|
||||||
variable. The frame size depends on the number of capturing parentheses in the
|
variable. The frame size depends on the number of capturing parentheses in the
|
||||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||||
.sp
|
.sp
|
||||||
|
@ -2051,10 +2052,10 @@ the equivalent hexadecimal or octal escape sequences.
|
||||||
.sp
|
.sp
|
||||||
If the pattern set a heap memory limit by including an item of the form
|
If the pattern set a heap memory limit by including an item of the form
|
||||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
.sp
|
.sp
|
||||||
|
@ -2098,15 +2099,15 @@ in such cases.
|
||||||
.sp
|
.sp
|
||||||
If the pattern set a match limit by including an item of the form
|
If the pattern set a match limit by including an item of the form
|
||||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_MAXLOOKBEHIND
|
PCRE2_INFO_MAXLOOKBEHIND
|
||||||
.sp
|
.sp
|
||||||
Return the number of characters (not code units) in the longest lookbehind
|
Return the number of characters (not code units) in the longest lookbehind
|
||||||
assertion in the pattern. The third argument should point to an unsigned 32-bit
|
assertion in the pattern. The third argument should point to a uint32_t
|
||||||
integer. This information is useful when doing multi-segment matching using the
|
integer. This information is useful when doing multi-segment matching using the
|
||||||
partial matching facilities. Note that the simple assertions \eb and \eB
|
partial matching facilities. Note that the simple assertions \eb and \eB
|
||||||
require a one-character lookbehind. \eA also registers a one-character
|
require a one-character lookbehind. \eA also registers a one-character
|
||||||
|
@ -2393,7 +2394,7 @@ zero, the search for a match starts at the beginning of the subject, and this
|
||||||
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
||||||
must point to the start of a character, or to the end of the subject (in UTF-32
|
must point to the start of a character, or to the end of the subject (in UTF-32
|
||||||
mode, one code unit equals one character, so all offsets are valid). Like the
|
mode, one code unit equals one character, so all offsets are valid). Like the
|
||||||
pattern string, the subject may contain binary zeroes.
|
pattern string, the subject may contain binary zeros.
|
||||||
.P
|
.P
|
||||||
A non-zero starting offset is useful when searching for another match in the
|
A non-zero starting offset is useful when searching for another match in the
|
||||||
same subject by calling \fBpcre2_match()\fP again after a previous success.
|
same subject by calling \fBpcre2_match()\fP again after a previous success.
|
||||||
|
|
|
@ -216,7 +216,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
||||||
.sp
|
.sp
|
||||||
--enable-newline-is-nul
|
--enable-newline-is-nul
|
||||||
.sp
|
.sp
|
||||||
which causes NUL (binary zero) is set as the default line-ending character.
|
which causes NUL (binary zero) to be set as the default line-ending character.
|
||||||
.P
|
.P
|
||||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||||
overridden by applications that use the library. At build time it is
|
overridden by applications that use the library. At build time it is
|
||||||
|
@ -281,8 +281,8 @@ The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
which is specified in kilobytes. The limit can be changed at run time, as
|
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
||||||
described in the
|
at run time, as described in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -291,7 +291,7 @@ change this by a setting such as
|
||||||
.sp
|
.sp
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
.sp
|
.sp
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||||
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
||||||
may also use the heap for internal workspace when processing complicated
|
may also use the heap for internal workspace when processing complicated
|
||||||
patterns. This limit does not apply when JIT (which has its own memory
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
|
@ -552,7 +552,7 @@ generated from the string.
|
||||||
Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
|
Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
|
||||||
to be created. This is normally run under valgrind or used when PCRE2 is
|
to be created. This is normally run under valgrind or used when PCRE2 is
|
||||||
compiled with address sanitizing enabled. It calls the fuzzing function and
|
compiled with address sanitizing enabled. It calls the fuzzing function and
|
||||||
outputs information about it is doing. The input strings are specified by
|
outputs information about what it is doing. The input strings are specified by
|
||||||
arguments: if an argument starts with "=" the rest of it is a literal input
|
arguments: if an argument starts with "=" the rest of it is a literal input
|
||||||
string. Otherwise, it is assumed to be a file name, and the contents of the
|
string. Otherwise, it is assumed to be a file name, and the contents of the
|
||||||
file are the test string.
|
file are the test string.
|
||||||
|
|
|
@ -19,7 +19,7 @@ page.
|
||||||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||||
they do not mean what you might think. For example, (?!a){3} does not assert
|
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||||
that the next three characters are not "a". It just asserts that the next
|
that the next three characters are not "a". It just asserts that the next
|
||||||
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
||||||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||||
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
||||||
.P
|
.P
|
||||||
|
@ -62,8 +62,8 @@ Note the following examples:
|
||||||
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
||||||
.P
|
.P
|
||||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||||
constructions. However, there is support PCRE2's "callout" feature, which
|
constructions. However, PCRE2 does have a "callout" feature, which allows an
|
||||||
allows an external function to be called during pattern matching. See the
|
external function to be called during pattern matching. See the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
.\"
|
.\"
|
||||||
|
|
|
@ -57,9 +57,10 @@ controlled by parameters that can be set by the \fB--buffer-size\fP and
|
||||||
that is obtained at the start of processing. If an input file contains very
|
that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
||||||
default values for these parameters are specified when \fBpcre2grep\fP is
|
default values for these parameters can be set when \fBpcre2grep\fP is
|
||||||
built, with the default defaults being 20K and 1M respectively. An error occurs
|
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||||
if a line is too long and the buffer can no longer be expanded.
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
|
longer be expanded.
|
||||||
.P
|
.P
|
||||||
The block of memory that is actually used is three times the "buffer size", to
|
The block of memory that is actually used is three times the "buffer size", to
|
||||||
allow for buffering "before" and "after" lines. If the buffer size is too
|
allow for buffering "before" and "after" lines. If the buffer size is too
|
||||||
|
@ -434,13 +435,13 @@ short form for this option.
|
||||||
When this option is given, non-compressed input is read and processed line by
|
When this option is given, non-compressed input is read and processed line by
|
||||||
line, and the output is flushed after each write. By default, input is read in
|
line, and the output is flushed after each write. By default, input is read in
|
||||||
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
|
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
|
||||||
terminal (which is currently possible only in Unix-like environments). Output
|
terminal (which is currently possible only in Unix-like environments or
|
||||||
to terminal is normally automatically flushed by the operating system. This
|
Windows). Output to terminal is normally automatically flushed by the operating
|
||||||
option can be useful when the input or output is attached to a pipe and you do
|
system. This option can be useful when the input or output is attached to a
|
||||||
not want \fBpcre2grep\fP to buffer up large amounts of data. However, its use
|
pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data.
|
||||||
will affect performance, and the \fB-M\fP (multiline) option ceases to work.
|
However, its use will affect performance, and the \fB-M\fP (multiline) option
|
||||||
When input is from a compressed .gz or .bz2 file, \fB--line-buffered\fP is
|
ceases to work. When input is from a compressed .gz or .bz2 file,
|
||||||
ignored.
|
\fB--line-buffered\fP is ignored.
|
||||||
.TP
|
.TP
|
||||||
\fB--line-offsets\fP
|
\fB--line-offsets\fP
|
||||||
Instead of showing lines or parts of lines that match, show each match as a
|
Instead of showing lines or parts of lines that match, show each match as a
|
||||||
|
@ -470,11 +471,11 @@ is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
|
||||||
counter that is incremented each time around its main processing loop. If the
|
counter that is incremented each time around its main processing loop. If the
|
||||||
value set by \fB--match-limit\fP is reached, an error occurs.
|
value set by \fB--match-limit\fP is reached, an error occurs.
|
||||||
.sp
|
.sp
|
||||||
The \fB--heap-limit\fP option specifies, as a number of kilobytes, the amount
|
The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
|
||||||
of heap memory that may be used for matching. Heap memory is needed only if
|
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
||||||
matching the pattern requires a significant number of nested backtracking
|
memory is needed only if matching the pattern requires a significant number of
|
||||||
points to be remembered. This parameter can be set to zero to forbid the use of
|
nested backtracking points to be remembered. This parameter can be set to zero
|
||||||
heap memory altogether.
|
to forbid the use of heap memory altogether.
|
||||||
.sp
|
.sp
|
||||||
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
||||||
which indirectly limits the amount of memory that is used. The amount of memory
|
which indirectly limits the amount of memory that is used. The amount of memory
|
||||||
|
@ -483,9 +484,9 @@ parentheses in the pattern, so the amount of memory that is used before this
|
||||||
limit acts varies from pattern to pattern. This limit is of use only if it is
|
limit acts varies from pattern to pattern. This limit is of use only if it is
|
||||||
set smaller than \fB--match-limit\fP.
|
set smaller than \fB--match-limit\fP.
|
||||||
.sp
|
.sp
|
||||||
There are no short forms for these options. The default settings are specified
|
There are no short forms for these options. The default limits can be set
|
||||||
when the PCRE2 library is compiled, with the default defaults being very large
|
when the PCRE2 library is compiled; if they are not specified, the defaults
|
||||||
and so effectively unlimited.
|
are very large and so effectively unlimited.
|
||||||
.TP
|
.TP
|
||||||
\fB--max-buffer-size=\fInumber\fP
|
\fB--max-buffer-size=\fInumber\fP
|
||||||
This limits the expansion of the processing buffer, whose initial size can be
|
This limits the expansion of the processing buffer, whose initial size can be
|
||||||
|
|
|
@ -56,10 +56,10 @@ DESCRIPTION
|
||||||
that is obtained at the start of processing. If an input file contains
|
that is obtained at the start of processing. If an input file contains
|
||||||
very long lines, a larger buffer may be needed; this is handled by
|
very long lines, a larger buffer may be needed; this is handled by
|
||||||
automatically extending the buffer, up to the limit specified by --max-
|
automatically extending the buffer, up to the limit specified by --max-
|
||||||
buffer-size. The default values for these parameters are specified when
|
buffer-size. The default values for these parameters can be set when
|
||||||
pcre2grep is built, with the default defaults being 20K and 1M respec-
|
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||||
tively. An error occurs if a line is too long and the buffer can no
|
20K and 1M respectively. An error occurs if a line is too long and the
|
||||||
longer be expanded.
|
buffer can no longer be expanded.
|
||||||
|
|
||||||
The block of memory that is actually used is three times the "buffer
|
The block of memory that is actually used is three times the "buffer
|
||||||
size", to allow for buffering "before" and "after" lines. If the buffer
|
size", to allow for buffering "before" and "after" lines. If the buffer
|
||||||
|
@ -475,14 +475,14 @@ OPTIONS
|
||||||
processed line by line, and the output is flushed after each
|
processed line by line, and the output is flushed after each
|
||||||
write. By default, input is read in large chunks, unless
|
write. By default, input is read in large chunks, unless
|
||||||
pcre2grep can determine that it is reading from a terminal
|
pcre2grep can determine that it is reading from a terminal
|
||||||
(which is currently possible only in Unix-like environments).
|
(which is currently possible only in Unix-like environments
|
||||||
Output to terminal is normally automatically flushed by the
|
or Windows). Output to terminal is normally automatically
|
||||||
operating system. This option can be useful when the input or
|
flushed by the operating system. This option can be useful
|
||||||
output is attached to a pipe and you do not want pcre2grep to
|
when the input or output is attached to a pipe and you do not
|
||||||
buffer up large amounts of data. However, its use will affect
|
want pcre2grep to buffer up large amounts of data. However,
|
||||||
performance, and the -M (multiline) option ceases to work.
|
its use will affect performance, and the -M (multiline)
|
||||||
When input is from a compressed .gz or .bz2 file, --line-
|
option ceases to work. When input is from a compressed .gz or
|
||||||
buffered is ignored.
|
.bz2 file, --line-buffered is ignored.
|
||||||
|
|
||||||
--line-offsets
|
--line-offsets
|
||||||
Instead of showing lines or parts of lines that match, show
|
Instead of showing lines or parts of lines that match, show
|
||||||
|
@ -517,12 +517,12 @@ OPTIONS
|
||||||
processing loop. If the value set by --match-limit is
|
processing loop. If the value set by --match-limit is
|
||||||
reached, an error occurs.
|
reached, an error occurs.
|
||||||
|
|
||||||
The --heap-limit option specifies, as a number of kilobytes,
|
The --heap-limit option specifies, as a number of kibibytes
|
||||||
the amount of heap memory that may be used for matching. Heap
|
(units of 1024 bytes), the amount of heap memory that may be
|
||||||
memory is needed only if matching the pattern requires a sig-
|
used for matching. Heap memory is needed only if matching the
|
||||||
nificant number of nested backtracking points to be remem-
|
pattern requires a significant number of nested backtracking
|
||||||
bered. This parameter can be set to zero to forbid the use of
|
points to be remembered. This parameter can be set to zero to
|
||||||
heap memory altogether.
|
forbid the use of heap memory altogether.
|
||||||
|
|
||||||
The --depth-limit option limits the depth of nested back-
|
The --depth-limit option limits the depth of nested back-
|
||||||
tracking points, which indirectly limits the amount of memory
|
tracking points, which indirectly limits the amount of memory
|
||||||
|
@ -532,10 +532,10 @@ OPTIONS
|
||||||
limit acts varies from pattern to pattern. This limit is of
|
limit acts varies from pattern to pattern. This limit is of
|
||||||
use only if it is set smaller than --match-limit.
|
use only if it is set smaller than --match-limit.
|
||||||
|
|
||||||
There are no short forms for these options. The default set-
|
There are no short forms for these options. The default lim-
|
||||||
tings are specified when the PCRE2 library is compiled, with
|
its can be set when the PCRE2 library is compiled; if they
|
||||||
the default defaults being very large and so effectively
|
are not specified, the defaults are very large and so effec-
|
||||||
unlimited.
|
tively unlimited.
|
||||||
|
|
||||||
--max-buffer-size=number
|
--max-buffer-size=number
|
||||||
This limits the expansion of the processing buffer, whose
|
This limits the expansion of the processing buffer, whose
|
||||||
|
|
|
@ -38,9 +38,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
|
||||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||||
order to limit the amount of system stack used at compile time. The default
|
order to limit the amount of system stack used at compile time. The default
|
||||||
limit can be specified when PCRE2 is built; the default default is 250. An
|
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||||
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||||
set the limit in a compile context.
|
to set the limit in a compile context.
|
||||||
.P
|
.P
|
||||||
The maximum length of name for a named subpattern is 32 code units, and the
|
The maximum length of name for a named subpattern is 32 code units, and the
|
||||||
maximum number of named subpatterns is 10000.
|
maximum number of named subpatterns is 10000.
|
||||||
|
|
|
@ -163,7 +163,7 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used. The heap limit is
|
setting of one of these limits, the lower value is used. The heap limit is
|
||||||
specified in kilobytes.
|
specified in kibibytes (units of 1024 bytes).
|
||||||
.P
|
.P
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||||
still recognized for backwards compatibility.
|
still recognized for backwards compatibility.
|
||||||
|
@ -528,7 +528,7 @@ by code point, as described above.
|
||||||
.sp
|
.sp
|
||||||
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative backreference. A named backreference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \eg{name}. Back references are discussed
|
can be coded as \eg{name}. backreferences are discussed
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
later,
|
later,
|
||||||
|
@ -1026,7 +1026,7 @@ joiner" characters. Characters with the "mark" property always have the
|
||||||
6. Do not break within emoji modifier sequences (a base character followed by a
|
6. Do not break within emoji modifier sequences (a base character followed by a
|
||||||
modifier). Extending characters are allowed before the modifier.
|
modifier). Extending characters are allowed before the modifier.
|
||||||
.P
|
.P
|
||||||
7. Do not break within emoji zwj sequences (zero-width jointer followed by
|
7. Do not break within emoji zwj sequences (zero-width joiner followed by
|
||||||
"glue after ZWJ" or "base glue after ZWJ").
|
"glue after ZWJ" or "base glue after ZWJ").
|
||||||
.P
|
.P
|
||||||
8. Do not break within emoji flag sequences. That is, do not break between
|
8. Do not break within emoji flag sequences. That is, do not break between
|
||||||
|
@ -2205,8 +2205,8 @@ A subpattern that is referenced by name may appear in the pattern before or
|
||||||
after the reference.
|
after the reference.
|
||||||
.P
|
.P
|
||||||
There may be more than one backreference to the same subpattern. If a
|
There may be more than one backreference to the same subpattern. If a
|
||||||
subpattern has not actually been used in a particular match, any back
|
subpattern has not actually been used in a particular match, any backreferences
|
||||||
references to it always fail by default. For example, the pattern
|
to it always fail by default. For example, the pattern
|
||||||
.sp
|
.sp
|
||||||
(a|(bc))\e2
|
(a|(bc))\e2
|
||||||
.sp
|
.sp
|
||||||
|
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the back reference. This can be
|
||||||
done using alternation, as in the example above, or by a quantifier with a
|
done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
.P
|
.P
|
||||||
Back references of this type cause the group that they reference to be treated
|
backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
.\" HTML <a href="#atomicgroup">
|
.\" HTML <a href="#atomicgroup">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
|
|
@ -115,7 +115,7 @@ because it disables the use of back references.
|
||||||
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
||||||
(which has the type const char *) must be set to point to the character beyond
|
(which has the type const char *) must be set to point to the character beyond
|
||||||
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
||||||
now contain binary zeroes, which are treated as data characters. Without
|
now contain binary zeros, which are treated as data characters. Without
|
||||||
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
||||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||||
caution in software intended to be portable to other systems.
|
caution in software intended to be portable to other systems.
|
||||||
|
@ -224,10 +224,10 @@ function.
|
||||||
.sp
|
.sp
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
.sp
|
.sp
|
||||||
When this option is set, the subject string is starts at \fIstring\fP +
|
When this option is set, the subject string starts at \fIstring\fP +
|
||||||
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
||||||
should point to the first character beyond the string. There may be binary
|
should point to the first character beyond the string. There may be binary
|
||||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
zeros within the subject string, and indeed, using REG_STARTEND is the only
|
||||||
way to pass a subject string that contains a binary zero.
|
way to pass a subject string that contains a binary zero.
|
||||||
.P
|
.P
|
||||||
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
||||||
|
|
|
@ -419,7 +419,7 @@ of the newline or \eR options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
.sp
|
.sp
|
||||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||||
(*LIMIT_MATCH=d) set the match limit to d
|
(*LIMIT_MATCH=d) set the match limit to d
|
||||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||||
|
|
|
@ -101,7 +101,7 @@ to occur).
|
||||||
UTF-8 (in its original definition) is not capable of encoding values greater
|
UTF-8 (in its original definition) is not capable of encoding values greater
|
||||||
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
||||||
testing this library in non-UTF mode with \fButf8_input\fP set, if any
|
testing this library in non-UTF mode with \fButf8_input\fP set, if any
|
||||||
character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
|
character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
|
||||||
0x80000000 is added to the character's value. This is the only way of passing
|
0x80000000 is added to the character's value. This is the only way of passing
|
||||||
such code points in a pattern string. For subject strings, using an escape
|
such code points in a pattern string. For subject strings, using an escape
|
||||||
sequence is preferable.
|
sequence is preferable.
|
||||||
|
@ -220,7 +220,7 @@ Do not output the version number of \fBpcre2test\fP at the start of execution.
|
||||||
.TP 10
|
.TP 10
|
||||||
\fB-S\fP \fIsize\fP
|
\fB-S\fP \fIsize\fP
|
||||||
On Unix-like systems, set the size of the run-time stack to \fIsize\fP
|
On Unix-like systems, set the size of the run-time stack to \fIsize\fP
|
||||||
megabytes.
|
mebibytes (units of 1024*1024 bytes).
|
||||||
.TP 10
|
.TP 10
|
||||||
\fB-subject\fP \fImodifier-list\fP
|
\fB-subject\fP \fImodifier-list\fP
|
||||||
Behave as if each subject line contains the given modifiers.
|
Behave as if each subject line contains the given modifiers.
|
||||||
|
@ -639,8 +639,8 @@ The effects of these modifiers are described in the following sections.
|
||||||
.sp
|
.sp
|
||||||
The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
|
The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
|
||||||
set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
|
set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
|
||||||
\eR matches any Unicode newline sequence. The default is specified when PCRE2
|
\eR matches any Unicode newline sequence. The default can be specified when
|
||||||
is built, with the default default being Unicode.
|
PCRE2 is built; if it is not, the default is set to Unicode.
|
||||||
.P
|
.P
|
||||||
The \fBnewline\fP modifier specifies which characters are to be interpreted as
|
The \fBnewline\fP modifier specifies which characters are to be interpreted as
|
||||||
newlines, both in the pattern and in subject lines. The type must be one of CR,
|
newlines, both in the pattern and in subject lines. The type must be one of CR,
|
||||||
|
@ -1381,11 +1381,11 @@ matching provokes an error return ("bad option value") from
|
||||||
.sp
|
.sp
|
||||||
The \fBjitstack\fP modifier provides a way of setting the maximum stack size
|
The \fBjitstack\fP modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||||
optimization is not being used. The value is a number of kilobytes. Setting
|
optimization is not being used. The value is a number of kibibytes (units of
|
||||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
|
||||||
default is necessary only for very complicated patterns. If \fBjitstack\fP is
|
that is larger than the default is necessary only for very complicated
|
||||||
set non-zero on a subject line it overrides any value that was set on the
|
patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any
|
||||||
pattern.
|
value that was set on the pattern.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Setting heap, match, and depth limits"
|
.SS "Setting heap, match, and depth limits"
|
||||||
|
@ -1427,10 +1427,10 @@ matching, \fImatch_limit\fP controls the total number of calls, both recursive
|
||||||
and non-recursive, to the internal matching function, thus controlling the
|
and non-recursive, to the internal matching function, thus controlling the
|
||||||
overall amount of computing resource that is used.
|
overall amount of computing resource that is used.
|
||||||
.P
|
.P
|
||||||
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes)
|
For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
|
||||||
limits the amount of heap memory used for matching. A value of zero disables
|
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
||||||
the use of any heap memory; many simple pattern matches can be done without
|
value of zero disables the use of any heap memory; many simple pattern matches
|
||||||
using the heap, so this is not an unreasonable setting.
|
can be done without using the heap, so zero is not an unreasonable setting.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Showing MARK names"
|
.SS "Showing MARK names"
|
||||||
|
|
|
@ -94,7 +94,7 @@ INPUT ENCODING
|
||||||
UTF-8 (in its original definition) is not capable of encoding values
|
UTF-8 (in its original definition) is not capable of encoding values
|
||||||
greater than 0x7fffffff, but such values can be handled by the 32-bit
|
greater than 0x7fffffff, but such values can be handled by the 32-bit
|
||||||
library. When testing this library in non-UTF mode with utf8_input set,
|
library. When testing this library in non-UTF mode with utf8_input set,
|
||||||
if any character is preceded by the byte 0xff (which is an illegal byte
|
if any character is preceded by the byte 0xff (which is an invalid byte
|
||||||
in UTF-8) 0x80000000 is added to the character's value. This is the
|
in UTF-8) 0x80000000 is added to the character's value. This is the
|
||||||
only way of passing such code points in a pattern string. For subject
|
only way of passing such code points in a pattern string. For subject
|
||||||
strings, using an escape sequence is preferable.
|
strings, using an escape sequence is preferable.
|
||||||
|
@ -208,7 +208,7 @@ COMMAND LINE OPTIONS
|
||||||
execution.
|
execution.
|
||||||
|
|
||||||
-S size On Unix-like systems, set the size of the run-time stack to
|
-S size On Unix-like systems, set the size of the run-time stack to
|
||||||
size megabytes.
|
size mebibytes (units of 1024*1024 bytes).
|
||||||
|
|
||||||
-subject modifier-list
|
-subject modifier-list
|
||||||
Behave as if each subject line contains the given modifiers.
|
Behave as if each subject line contains the given modifiers.
|
||||||
|
@ -614,8 +614,9 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
The bsr modifier specifies what \R in a pattern should match. If it is
|
The bsr modifier specifies what \R in a pattern should match. If it is
|
||||||
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to
|
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to
|
||||||
"unicode", \R matches any Unicode newline sequence. The default is
|
"unicode", \R matches any Unicode newline sequence. The default can be
|
||||||
specified when PCRE2 is built, with the default default being Unicode.
|
specified when PCRE2 is built; if it is not, the default is set to Uni-
|
||||||
|
code.
|
||||||
|
|
||||||
The newline modifier specifies which characters are to be interpreted
|
The newline modifier specifies which characters are to be interpreted
|
||||||
as newlines, both in the pattern and in subject lines. The type must be
|
as newlines, both in the pattern and in subject lines. The type must be
|
||||||
|
@ -1272,11 +1273,11 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
The jitstack modifier provides a way of setting the maximum stack size
|
The jitstack modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if
|
that is used by the just-in-time optimization code. It is ignored if
|
||||||
JIT optimization is not being used. The value is a number of kilobytes.
|
JIT optimization is not being used. The value is a number of kibibytes
|
||||||
Setting zero reverts to the default of 32K. Providing a stack that is
|
(units of 1024 bytes). Setting zero reverts to the default of 32KiB.
|
||||||
larger than the default is necessary only for very complicated pat-
|
Providing a stack that is larger than the default is necessary only for
|
||||||
terns. If jitstack is set non-zero on a subject line it overrides any
|
very complicated patterns. If jitstack is set non-zero on a subject
|
||||||
value that was set on the pattern.
|
line it overrides any value that was set on the pattern.
|
||||||
|
|
||||||
Setting heap, match, and depth limits
|
Setting heap, match, and depth limits
|
||||||
|
|
||||||
|
@ -1315,11 +1316,11 @@ SUBJECT MODIFIERS
|
||||||
tion, thus controlling the overall amount of computing resource that is
|
tion, thus controlling the overall amount of computing resource that is
|
||||||
used.
|
used.
|
||||||
|
|
||||||
For both kinds of matching, the heap_limit number (which is in kilo-
|
For both kinds of matching, the heap_limit number, which is in
|
||||||
bytes) limits the amount of heap memory used for matching. A value of
|
kibibytes (units of 1024 bytes), limits the amount of heap memory used
|
||||||
zero disables the use of any heap memory; many simple pattern matches
|
for matching. A value of zero disables the use of any heap memory; many
|
||||||
can be done without using the heap, so this is not an unreasonable set-
|
simple pattern matches can be done without using the heap, so zero is
|
||||||
ting.
|
not an unreasonable setting.
|
||||||
|
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
||||||
|
|
|
@ -51,7 +51,7 @@ fi
|
||||||
# utf invoke UTF-8 functionality
|
# utf invoke UTF-8 functionality
|
||||||
#
|
#
|
||||||
# The data lines must not have any pcre2test modifiers. Unless
|
# The data lines must not have any pcre2test modifiers. Unless
|
||||||
# "subject_litersl" is on the pattern, data lines are processed as
|
# "subject_literal" is on the pattern, data lines are processed as
|
||||||
# Perl double-quoted strings, so if they contain " $ or @ characters, these
|
# Perl double-quoted strings, so if they contain " $ or @ characters, these
|
||||||
# have to be escaped. For this reason, all such characters in the
|
# have to be escaped. For this reason, all such characters in the
|
||||||
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
|
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
|
||||||
|
|
|
@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
/* Define to 1 if you have the <zlib.h> header file. */
|
/* Define to 1 if you have the <zlib.h> header file. */
|
||||||
/* #undef HAVE_ZLIB_H */
|
/* #undef HAVE_ZLIB_H */
|
||||||
|
|
||||||
/* This limits the amount of memory that pcre2_match() may use while matching
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
a pattern. The value is in kilobytes. */
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
|
to JIT matching. The value is in kilobytes. */
|
||||||
#ifndef HEAP_LIMIT
|
#ifndef HEAP_LIMIT
|
||||||
#define HEAP_LIMIT 20000000
|
#define HEAP_LIMIT 20000000
|
||||||
#endif
|
#endif
|
||||||
|
@ -155,7 +156,8 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* The value of MATCH_LIMIT determines the default number of times the
|
/* The value of MATCH_LIMIT determines the default number of times the
|
||||||
pcre2_match() function can record a backtrack position during a single
|
pcre2_match() function can record a backtrack position during a single
|
||||||
matching attempt. There is a runtime interface for setting a different
|
matching attempt. The value is also used to limit a loop counter in
|
||||||
|
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||||
limit. The limit exists in order to catch runaway regular expressions that
|
limit. The limit exists in order to catch runaway regular expressions that
|
||||||
take for ever to determine that they do not match. The default is set very
|
take for ever to determine that they do not match. The default is set very
|
||||||
large so that it does not accidentally catch legitimate cases. */
|
large so that it does not accidentally catch legitimate cases. */
|
||||||
|
@ -170,7 +172,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
||||||
must be less than the value of MATCH_LIMIT. The default is to use the same
|
must be less than the value of MATCH_LIMIT. The default is to use the same
|
||||||
value as MATCH_LIMIT. There is a runtime method for setting a different
|
value as MATCH_LIMIT. There is a runtime method for setting a different
|
||||||
limit. */
|
limit. In the case of pcre2_dfa_match(), this limit controls the depth of
|
||||||
|
the internal nested function calls that are used for pattern recursions,
|
||||||
|
lookarounds, and atomic groups. */
|
||||||
#ifndef MATCH_LIMIT_DEPTH
|
#ifndef MATCH_LIMIT_DEPTH
|
||||||
#define MATCH_LIMIT_DEPTH MATCH_LIMIT
|
#define MATCH_LIMIT_DEPTH MATCH_LIMIT
|
||||||
#endif
|
#endif
|
||||||
|
@ -210,7 +214,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_NAME "PCRE2"
|
#define PACKAGE_NAME "PCRE2"
|
||||||
|
|
||||||
/* Define to the full name and version of this package. */
|
/* Define to the full name and version of this package. */
|
||||||
#define PACKAGE_STRING "PCRE2 10.31"
|
#define PACKAGE_STRING "PCRE2 10.32-RC1"
|
||||||
|
|
||||||
/* Define to the one symbol short name of this package. */
|
/* Define to the one symbol short name of this package. */
|
||||||
#define PACKAGE_TARNAME "pcre2"
|
#define PACKAGE_TARNAME "pcre2"
|
||||||
|
@ -219,7 +223,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_URL ""
|
#define PACKAGE_URL ""
|
||||||
|
|
||||||
/* Define to the version of this package. */
|
/* Define to the version of this package. */
|
||||||
#define PACKAGE_VERSION "10.31"
|
#define PACKAGE_VERSION "10.32-RC1"
|
||||||
|
|
||||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||||
|
@ -339,7 +343,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* Version number of package */
|
/* Version number of package */
|
||||||
#define VERSION "10.31"
|
#define VERSION "10.32-RC1"
|
||||||
|
|
||||||
/* Define to 1 if on MINIX. */
|
/* Define to 1 if on MINIX. */
|
||||||
/* #undef _MINIX */
|
/* #undef _MINIX */
|
||||||
|
|
|
@ -134,7 +134,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* This limits the amount of memory that may be used while matching a pattern.
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
to JIT matching. The value is in kilobytes. */
|
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
|
||||||
#undef HEAP_LIMIT
|
#undef HEAP_LIMIT
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
/* The current PCRE version information. */
|
/* The current PCRE version information. */
|
||||||
|
|
||||||
#define PCRE2_MAJOR 10
|
#define PCRE2_MAJOR 10
|
||||||
#define PCRE2_MINOR 31
|
#define PCRE2_MINOR 32
|
||||||
#define PCRE2_PRERELEASE
|
#define PCRE2_PRERELEASE -RC1
|
||||||
#define PCRE2_DATE 2018-02-12
|
#define PCRE2_DATE 2018-02-19
|
||||||
|
|
||||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||||
imported have to be identified as such. When building PCRE2, the appropriate
|
imported have to be identified as such. When building PCRE2, the appropriate
|
||||||
|
|
|
@ -387,8 +387,8 @@ return (mb->callout)(cb, mb->callout_data);
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This function is called when internal_dfa_match() is about to be called
|
/* This function is called when internal_dfa_match() is about to be called
|
||||||
recursively and there is insufficient workingspace left in the current work
|
recursively and there is insufficient working space left in the current
|
||||||
space block. If there's an existing next block, use it; otherwise get a new
|
workspace block. If there's an existing next block, use it; otherwise get a new
|
||||||
block unless the heap limit is reached.
|
block unless the heap limit is reached.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
|
|
|
@ -43,7 +43,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
#include "config.h"
|
#include "config.h"
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* These defines enables debugging code */
|
/* These defines enable debugging code */
|
||||||
|
|
||||||
//#define DEBUG_FRAMES_DISPLAY
|
//#define DEBUG_FRAMES_DISPLAY
|
||||||
//#define DEBUG_SHOW_OPS
|
//#define DEBUG_SHOW_OPS
|
||||||
|
@ -2464,7 +2464,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
|
|
||||||
/* ===================================================================== */
|
/* ===================================================================== */
|
||||||
/* Match a single character type repeatedly. Note that the property type
|
/* Match a single character type repeatedly. Note that the property type
|
||||||
does not need to be in a stack frame as it not used within an RMATCH()
|
does not need to be in a stack frame as it is not used within an RMATCH()
|
||||||
loop. */
|
loop. */
|
||||||
|
|
||||||
#define Lstart_eptr F->temp_sptr[0]
|
#define Lstart_eptr F->temp_sptr[0]
|
||||||
|
|
|
@ -492,7 +492,7 @@ so many of them that they are split into two fields. */
|
||||||
|
|
||||||
/* These are the matching controls that may be set either on a pattern or on a
|
/* These are the matching controls that may be set either on a pattern or on a
|
||||||
data line. They are copied from the pattern controls as initial settings for
|
data line. They are copied from the pattern controls as initial settings for
|
||||||
data line controls Note that CTL_MEMORY is not included here, because it does
|
data line controls. Note that CTL_MEMORY is not included here, because it does
|
||||||
different things in the two cases. */
|
different things in the two cases. */
|
||||||
|
|
||||||
#define CTL_ALLPD (CTL_AFTERTEXT|\
|
#define CTL_ALLPD (CTL_AFTERTEXT|\
|
||||||
|
@ -5411,7 +5411,7 @@ switch(errorcode)
|
||||||
|
|
||||||
/* The pattern is now in pbuffer[8|16|32], with the length in code units in
|
/* The pattern is now in pbuffer[8|16|32], with the length in code units in
|
||||||
patlen. If it is to be converted, copy the result back afterwards so that it
|
patlen. If it is to be converted, copy the result back afterwards so that it
|
||||||
it ends up back in the usual place. */
|
ends up back in the usual place. */
|
||||||
|
|
||||||
if (pat_patctl.convert_type != CONVERT_UNSET)
|
if (pat_patctl.convert_type != CONVERT_UNSET)
|
||||||
{
|
{
|
||||||
|
@ -5735,7 +5735,7 @@ return PR_OK;
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This is used for DFA, normal, and JIT fast matching. For DFA matching it
|
/* This is used for DFA, normal, and JIT fast matching. For DFA matching it
|
||||||
should only called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
|
should only be called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
pp the subject string
|
pp the subject string
|
||||||
|
@ -7766,7 +7766,7 @@ printf(" -LM list pattern and subject modifiers, then exit\n");
|
||||||
printf(" -q quiet: do not output PCRE2 version number at start\n");
|
printf(" -q quiet: do not output PCRE2 version number at start\n");
|
||||||
printf(" -pattern <s> set default pattern modifier fields\n");
|
printf(" -pattern <s> set default pattern modifier fields\n");
|
||||||
printf(" -subject <s> set default subject modifier fields\n");
|
printf(" -subject <s> set default subject modifier fields\n");
|
||||||
printf(" -S <n> set stack size to <n> megabytes\n");
|
printf(" -S <n> set stack size to <n> mebibytes\n");
|
||||||
printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
|
printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
|
||||||
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
|
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
|
||||||
printf(" -T same as -t, but show total times at the end\n");
|
printf(" -T same as -t, but show total times at the end\n");
|
||||||
|
|
Loading…
Reference in New Issue