Typos in documentation and comments noted by Jason Hood.

This commit is contained in:
Philip.Hazel 2018-06-17 14:13:28 +00:00
parent fa58ac6734
commit fabea723cf
57 changed files with 2128 additions and 2118 deletions

View File

@ -146,7 +146,7 @@ SET(PCRE2_PARENS_NEST_LIMIT "250" CACHE STRING
"Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.") "Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
"Default limit on heap memory (kilobytes). See HEAP_LIMIT in config.h.in for details.") "Default limit on heap memory (kibibytes). See HEAP_LIMIT in config.h.in for details.")
SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
"Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.") "Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")

View File

@ -17,7 +17,7 @@ groups altogether. Now it shows those that come before any actual captures as
3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only", 3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
whatever the build configuration was. It now correctly says "\R matches all whatever the build configuration was. It now correctly says "\R matches all
Unicode newlines" in the default case when --enable-bsr-anycrlf has not been Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
specified. Similarly, running "pcfre2test -C bsr" never produced the result specified. Similarly, running "pcre2test -C bsr" never produced the result
ANY. ANY.
4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing 4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
@ -370,7 +370,7 @@ tests to improve coverage.
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in 31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
pcre2test, a crash could occur. pcre2test, a crash could occur.
32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that 32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
all the tests can run with clang's sanitizing options. all the tests can run with clang's sanitizing options.
33. Implement extra compile options in the compile context and add the first 33. Implement extra compile options in the compile context and add the first

View File

@ -348,7 +348,7 @@ The /i, /m, or /s options (PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
others) may be changed in the middle of patterns by items such as (?i). Their others) may be changed in the middle of patterns by items such as (?i). Their
processing is handled entirely at compile time by generating different opcodes processing is handled entirely at compile time by generating different opcodes
for the different settings. The runtime functions do not need to keep track of for the different settings. The runtime functions do not need to keep track of
an options state. an option's state.
PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
are tracked and processed during the parsing pre-pass. The others are handled are tracked and processed during the parsing pre-pass. The others are handled
@ -764,7 +764,7 @@ OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
bracket from the start of the whole pattern. OP_RECURSE is also used for bracket from the start of the whole pattern. OP_RECURSE is also used for
"subroutine" calls, even though they are not strictly a recursion. Up till "subroutine" calls, even though they are not strictly a recursion. Up till
release 10.30 recursions were treated as atomic groups, making them release 10.30 recursions were treated as atomic groups, making them
incompatible with Perl (but PCRE had then well before Perl did). From 10.30, incompatible with Perl (but PCRE had them well before Perl did). From 10.30,
backtracking into recursions is supported. backtracking into recursions is supported.
Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only

4
NEWS
View File

@ -31,7 +31,7 @@ remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other patterns, making it more compatible with Perl, and also fixes some other
previously hard-to-do issues. For patterns that have a lot of backtracking, the previously hard-to-do issues. For patterns that have a lot of backtracking, the
heap is now used, and there is explicit limit on the amount, settable by heap is now used, and there is an explicit limit on the amount, settable by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained, pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
but is renamed as "depth limit" (though the old names remain for but is renamed as "depth limit" (though the old names remain for
compatibility). compatibility).
@ -53,7 +53,7 @@ also supported.
5. Additional compile options in the compile context are now available, and the 5. Additional compile options in the compile context are now available, and the
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL. PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
6. The newline type PCRE2_NEWLINE_NUL is now available. 6. The newline type PCRE2_NEWLINE_NUL is now available.

View File

@ -127,7 +127,7 @@ can skip ahead to the CMake section.
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
these yourself. these yourself.
Not also that the pcre2_fuzzsupport.c file contains special code that is Note also that the pcre2_fuzzsupport.c file contains special code that is
useful to those who want to run fuzzing tests on the PCRE2 library. Unless useful to those who want to run fuzzing tests on the PCRE2 library. Unless
you are doing that, you can ignore it. you are doing that, you can ignore it.
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS STACK SIZE IN WINDOWS ENVIRONMENTS
Prior to release 10.30 the default system stack size of 1Mb in some Windows Prior to release 10.30 the default system stack size of 1MB in some Windows
environments caused issues with some tests. This should no longer be the case environments caused issues with some tests. This should no longer be the case
for 10.30 and later releases. for 10.30 and later releases.

17
README
View File

@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
--with-heap-limit=500 --with-heap-limit=500
The units are kilobytes. This limit does not apply when the JIT optimization The units are kibibytes (units of 1024 bytes). This limit does not apply when
(which has its own memory control features) is used. There is more discussion the JIT optimization (which has its own memory control features) is used.
on the pcre2api man page (search for pcre2_set_heap_limit). There is more discussion on the pcre2api man page (search for
pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around . In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the 64K bytes. You can increase this by adding --with-link-size=3 to the
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
. When JIT support is enabled, pcre2grep automatically makes use of it, unless . When JIT support is enabled, pcre2grep automatically makes use of it, unless
you add --disable-pcre2grep-jit to the "configure" command. you add --disable-pcre2grep-jit to the "configure" command.
. On non-Windows sytems there is support for calling external scripts during . There is support for calling external programs during matching in the
matching in the pcre2grep command via PCRE2's callout facility with string pcre2grep command, using PCRE2's callout facility with string arguments. This
arguments. This support can be disabled by adding --disable-pcre2grep-callout support can be disabled by adding --disable-pcre2grep-callout to the
to the "configure" command. "configure" command.
. The pcre2grep program currently supports only 8-bit data files, and so . The pcre2grep program currently supports only 8-bit data files, and so
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 27 April 2018 Last updated: 17 June 2018

View File

@ -708,7 +708,7 @@ $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep $valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
# It seems inpossible to handle NUL characters easily in Solaris (aka SunOS). # It seems impossible to handle NUL characters easily in Solaris (aka SunOS).
# The version of sed explicitly doesn't like them. For the moment, we just # The version of sed explicitly doesn't like them. For the moment, we just
# don't run this test under SunOS. Fudge the output so that the comparison # don't run this test under SunOS. Fudge the output so that the comparison
# works. A similar problem has also been reported for MacOS (Darwin). # works. A similar problem has also been reported for MacOS (Darwin).

View File

@ -843,7 +843,7 @@ for bmode in "$test8" "$test16" "$test32"; do
checkresult $? 24 "" checkresult $? 24 ""
fi fi
# UTF pattern converson tests # UTF pattern conversion tests
if [ "$do25" = yes ] ; then if [ "$do25" = yes ] ; then
echo $title25 echo $title25

View File

@ -288,7 +288,7 @@ AC_ARG_WITH(parens-nest-limit,
# Handle --with-heap-limit # Handle --with-heap-limit
AC_ARG_WITH(heap-limit, AC_ARG_WITH(heap-limit,
AS_HELP_STRING([--with-heap-limit=N], AS_HELP_STRING([--with-heap-limit=N],
[default limit on heap memory (kilobytes, default=20000000)]), [default limit on heap memory (kibibytes, default=20000000)]),
, with_heap_limit=20000000) , with_heap_limit=20000000)
# Handle --with-match-limit=N # Handle --with-match-limit=N
@ -754,7 +754,7 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [ AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
This limits the amount of memory that may be used while matching This limits the amount of memory that may be used while matching
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
not apply to JIT matching. The value is in kilobytes.]) not apply to JIT matching. The value is in kibibytes (units of 1024 bytes).])
AC_DEFINE([MAX_NAME_SIZE], [32], [ AC_DEFINE([MAX_NAME_SIZE], [32], [
This limit is parameterized just in case anybody ever wants to This limit is parameterized just in case anybody ever wants to
@ -1017,7 +1017,7 @@ $PACKAGE-$VERSION configuration summary:
Rebuild char tables ................ : ${enable_rebuild_chartables} Rebuild char tables ................ : ${enable_rebuild_chartables}
Internal link size ................. : ${with_link_size} Internal link size ................. : ${with_link_size}
Nested parentheses limit ........... : ${with_parens_nest_limit} Nested parentheses limit ........... : ${with_parens_nest_limit}
Heap limit ......................... : ${with_heap_limit} kilobytes Heap limit ......................... : ${with_heap_limit} kibibytes
Match limit ........................ : ${with_match_limit} Match limit ........................ : ${with_match_limit}
Match depth limit .................. : ${with_match_limit_depth} Match depth limit .................. : ${with_match_limit_depth}
Build shared libs .................. : ${enable_shared} Build shared libs .................. : ${enable_shared}

View File

@ -127,7 +127,7 @@ can skip ahead to the CMake section.
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
these yourself. these yourself.
Not also that the pcre2_fuzzsupport.c file contains special code that is Note also that the pcre2_fuzzsupport.c file contains special code that is
useful to those who want to run fuzzing tests on the PCRE2 library. Unless useful to those who want to run fuzzing tests on the PCRE2 library. Unless
you are doing that, you can ignore it. you are doing that, you can ignore it.
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS STACK SIZE IN WINDOWS ENVIRONMENTS
Prior to release 10.30 the default system stack size of 1Mb in some Windows Prior to release 10.30 the default system stack size of 1MB in some Windows
environments caused issues with some tests. This should no longer be the case environments caused issues with some tests. This should no longer be the case
for 10.30 and later releases. for 10.30 and later releases.

View File

@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
--with-heap-limit=500 --with-heap-limit=500
The units are kilobytes. This limit does not apply when the JIT optimization The units are kibibytes (units of 1024 bytes). This limit does not apply when
(which has its own memory control features) is used. There is more discussion the JIT optimization (which has its own memory control features) is used.
on the pcre2api man page (search for pcre2_set_heap_limit). There is more discussion on the pcre2api man page (search for
pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around . In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the 64K bytes. You can increase this by adding --with-link-size=3 to the
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
. When JIT support is enabled, pcre2grep automatically makes use of it, unless . When JIT support is enabled, pcre2grep automatically makes use of it, unless
you add --disable-pcre2grep-jit to the "configure" command. you add --disable-pcre2grep-jit to the "configure" command.
. On non-Windows sytems there is support for calling external scripts during . There is support for calling external programs during matching in the
matching in the pcre2grep command via PCRE2's callout facility with string pcre2grep command, using PCRE2's callout facility with string arguments. This
arguments. This support can be disabled by adding --disable-pcre2grep-callout support can be disabled by adding --disable-pcre2grep-callout to the
to the "configure" command. "configure" command.
. The pcre2grep program currently supports only 8-bit data files, and so . The pcre2grep program currently supports only 8-bit data files, and so
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 27 April 2018 Last updated: 17 June 2018

View File

@ -28,7 +28,7 @@ DESCRIPTION
<P> <P>
This function is part of an experimental set of pattern conversion functions. This function is part of an experimental set of pattern conversion functions.
It sets the component separator character that is used when converting globs. It sets the component separator character that is used when converting globs.
The second argument must one of the characters forward slash, backslash, or The second argument must be one of the characters forward slash, backslash, or
dot. The default is backslash when running under Windows, otherwise forward dot. The default is backslash when running under Windows, otherwise forward
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
the second argument is invalid. the second argument is invalid.

View File

@ -562,10 +562,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
<P> <P>
Each of the first three conventions is used by at least one operating system as Each of the first three conventions is used by at least one operating system as
its standard newline sequence. When PCRE2 is built, a default can be specified. its standard newline sequence. When PCRE2 is built, a default can be specified.
The default default is LF, which is the Unix standard. However, the newline If it is not, the default is set to LF, which is the Unix standard. However,
convention can be changed by an application when calling <b>pcre2_compile()</b>, the newline convention can be changed by an application when calling
or it can be specified by special text at the start of the pattern itself; this <b>pcre2_compile()</b>, or it can be specified by special text at the start of
overrides any other settings. See the the pattern itself; this overrides any other settings. See the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page for details of the special character sequences. page for details of the special character sequences.
</P> </P>
@ -949,17 +949,18 @@ offset limit. In other words, whichever limit comes first is used.
<b> uint32_t <i>value</i>);</b> <b> uint32_t <i>value</i>);</b>
<br> <br>
<br> <br>
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum The <i>heap_limit</i> parameter specifies, in units of kibibytes (1024 bytes),
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking the maximum amount of heap memory that <b>pcre2_match()</b> may use to hold
information when running an interpretive match. This limit also applies to backtracking information when running an interpretive match. This limit also
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a applies to <b>pcre2_dfa_match()</b>, which may use the heap when processing
lot of nested pattern recursion or lookarounds or atomic groups. This limit patterns with a lot of nested pattern recursion or lookarounds or atomic
does not apply to matching with the JIT optimization, which has its own memory groups. This limit does not apply to matching with the JIT optimization, which
control arrangements (see the has its own memory control arrangements (see the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details). If the limit is reached, the negative error documentation for more details). If the limit is reached, the negative error
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
built; the default default is very large and is essentially "unlimited". is built; if it is not, the default is set very large and is essentially
"unlimited".
</P> </P>
<P> <P>
A value for the heap limit may also be supplied by an item at the start of a A value for the heap limit may also be supplied by an item at the start of a
@ -1044,7 +1045,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
uses it to limit the depth of nested internal recursive function calls that uses it to limit the depth of nested internal recursive function calls that
implement atomic groups, lookaround assertions, and pattern recursions. This implement atomic groups, lookaround assertions, and pattern recursions. This
limits, indirectly, the amount of system stack this is used. It was more useful limits, indirectly, the amount of system stack that is used. It was more useful
in versions before 10.32, when stack memory was used for local workspace in versions before 10.32, when stack memory was used for local workspace
vectors for recursive function calls. From version 10.32, only local variables vectors for recursive function calls. From version 10.32, only local variables
are allocated on the stack and as each call uses only a few hundred bytes, even are allocated on the stack and as each call uses only a few hundred bytes, even
@ -1060,11 +1061,11 @@ probably better to limit heap usage directly by calling
<b>pcre2_set_heap_limit()</b>. <b>pcre2_set_heap_limit()</b>.
</P> </P>
<P> <P>
The default value for the depth limit can be set when PCRE2 is built; the The default value for the depth limit can be set when PCRE2 is built; if it is
default default is the same value as the default for the match limit. If the not, the default is set to the same value as the default for the match limit.
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns If the limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
item at the start of a pattern of the form supplied by an item at the start of a pattern of the form
<pre> <pre>
(*LIMIT_DEPTH=ddd) (*LIMIT_DEPTH=ddd)
</pre> </pre>
@ -1120,7 +1121,7 @@ given with <b>pcre2_set_depth_limit()</b> above.
<pre> <pre>
PCRE2_CONFIG_HEAPLIMIT PCRE2_CONFIG_HEAPLIMIT
</pre> </pre>
The output is a uint32_t integer that gives, in kilobytes, the default limit The output is a uint32_t integer that gives, in kibibytes, the default limit
for the amount of heap memory used by <b>pcre2_match()</b> or for the amount of heap memory used by <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b>. Further details are given with <b>pcre2_dfa_match()</b>. Further details are given with
<b>pcre2_set_heap_limit()</b> above. <b>pcre2_set_heap_limit()</b> above.
@ -1431,7 +1432,7 @@ If this bit is set, letters in the pattern match both upper and lower case
letters in the subject. It is equivalent to Perl's /i option, and it can be letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
properties are used for all characters with more than one other case, and for properties are used for all characters with more than one other case, and for
all characters whose code points are greater than U+007f. For lower valued all characters whose code points are greater than U+007F. For lower valued
characters with only one other case, a lookup table is used for speed. When characters with only one other case, a lookup table is used for speed. When
PCRE2_UTF is not set, a lookup table is used for all code points less than 256, PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
and higher code points (available only in 16-bit or 32-bit mode) are treated as and higher code points (available only in 16-bit or 32-bit mode) are treated as
@ -1613,8 +1614,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). This is the same as Perl's /n option. they acquire numbers in the usual way). This is the same as Perl's /n option.
Note that, when this option is set, references to capturing groups (back Note that, when this option is set, references to capturing groups
references or recursion/subroutine calls) may only refer to named groups, (backreferences or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number. though the reference can be by name or by number.
<pre> <pre>
PCRE2_NO_AUTO_POSSESS PCRE2_NO_AUTO_POSSESS
@ -2019,10 +2020,10 @@ returned if there are no back references.
<pre> <pre>
PCRE2_INFO_BSR PCRE2_INFO_BSR
</pre> </pre>
The output is a uint32_t whose value indicates what character sequences the \R The output is a uint32_t integer whose value indicates what character sequences
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R matches the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
matches only CR, LF, or CRLF. that \R matches only CR, LF, or CRLF.
<pre> <pre>
PCRE2_INFO_CAPTURECOUNT PCRE2_INFO_CAPTURECOUNT
</pre> </pre>
@ -2034,10 +2035,10 @@ The third argument should point to an <b>uint32_t</b> variable.
</pre> </pre>
If the pattern set a backtracking depth limit by including an item of the form If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to a uint32_t integer. If no such value has been set, the call to
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
that this limit will only be used during matching if it is less than the limit limit will only be used during matching if it is less than the limit set or
set or defaulted by the caller of the match function. defaulted by the caller of the match function.
<pre> <pre>
PCRE2_INFO_FIRSTBITMAP PCRE2_INFO_FIRSTBITMAP
</pre> </pre>
@ -2047,7 +2048,7 @@ values for the first code unit in any match. For example, a pattern that starts
with [abc] results in a table with three bits set. When code unit values with [abc] results in a table with three bits set. When code unit values
greater than 255 are supported, the flag bit for 255 means "any code unit of greater than 255 are supported, the flag bit for 255 means "any code unit of
value 255 or above". If such a table was constructed, a pointer to it is value 255 or above". If such a table was constructed, a pointer to it is
returned. Otherwise NULL is returned. The third argument should point to an returned. Otherwise NULL is returned. The third argument should point to a
<b>const uint8_t *</b> variable. <b>const uint8_t *</b> variable.
<pre> <pre>
PCRE2_INFO_FIRSTCODETYPE PCRE2_INFO_FIRSTCODETYPE
@ -2074,7 +2075,7 @@ and up to 0xffffffff when not using UTF-32 mode.
</pre> </pre>
Return the size (in bytes) of the data frames that are used to remember Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by <b>pcre2_match()</b> backtracking positions when the pattern is processed by <b>pcre2_match()</b>
without the use of JIT. The third argument should point to an <b>size_t</b> without the use of JIT. The third argument should point to a <b>size_t</b>
variable. The frame size depends on the number of capturing parentheses in the variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables. pattern. Each additional capturing group adds two PCRE2_SIZE variables.
<pre> <pre>
@ -2094,10 +2095,10 @@ the equivalent hexadecimal or octal escape sequences.
</pre> </pre>
If the pattern set a heap memory limit by including an item of the form If the pattern set a heap memory limit by including an item of the form
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to a uint32_t integer. If no such value has been set, the call to
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
that this limit will only be used during matching if it is less than the limit limit will only be used during matching if it is less than the limit set or
set or defaulted by the caller of the match function. defaulted by the caller of the match function.
<pre> <pre>
PCRE2_INFO_JCHANGED PCRE2_INFO_JCHANGED
</pre> </pre>
@ -2141,15 +2142,15 @@ in such cases.
</pre> </pre>
If the pattern set a match limit by including an item of the form If the pattern set a match limit by including an item of the form
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to a uint32_t integer. If no such value has been set, the call to
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
that this limit will only be used during matching if it is less than the limit limit will only be used during matching if it is less than the limit set or
set or defaulted by the caller of the match function. defaulted by the caller of the match function.
<pre> <pre>
PCRE2_INFO_MAXLOOKBEHIND PCRE2_INFO_MAXLOOKBEHIND
</pre> </pre>
Return the number of characters (not code units) in the longest lookbehind Return the number of characters (not code units) in the longest lookbehind
assertion in the pattern. The third argument should point to an unsigned 32-bit assertion in the pattern. The third argument should point to a uint32_t
integer. This information is useful when doing multi-segment matching using the integer. This information is useful when doing multi-segment matching using the
partial matching facilities. Note that the simple assertions \b and \B partial matching facilities. Note that the simple assertions \b and \B
require a one-character lookbehind. \A also registers a one-character require a one-character lookbehind. \A also registers a one-character
@ -2417,7 +2418,7 @@ zero, the search for a match starts at the beginning of the subject, and this
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
must point to the start of a character, or to the end of the subject (in UTF-32 must point to the start of a character, or to the end of the subject (in UTF-32
mode, one code unit equals one character, so all offsets are valid). Like the mode, one code unit equals one character, so all offsets are valid). Like the
pattern string, the subject may contain binary zeroes. pattern string, the subject may contain binary zeros.
</P> </P>
<P> <P>
A non-zero starting offset is useful when searching for another match in the A non-zero starting offset is useful when searching for another match in the

View File

@ -227,7 +227,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
<pre> <pre>
--enable-newline-is-nul --enable-newline-is-nul
</pre> </pre>
which causes NUL (binary zero) is set as the default line-ending character. which causes NUL (binary zero) to be set as the default line-ending character.
</P> </P>
<P> <P>
Whatever default line ending convention is selected when PCRE2 is built can be Whatever default line ending convention is selected when PCRE2 is built can be
@ -286,15 +286,15 @@ The <b>pcre2_match()</b> function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking points there stack to record backtracking points. The more nested backtracking points there
are (that is, the deeper the search tree), the more memory is needed. If the are (that is, the deeper the search tree), the more memory is needed. If the
initial vector is not large enough, heap memory is used, up to a certain limit, initial vector is not large enough, heap memory is used, up to a certain limit,
which is specified in kilobytes. The limit can be changed at run time, as which is specified in kibibytes (units of 1024 bytes). The limit can be changed
described in the at run time, as described in the
<a href="pcre2api.html"><b>pcre2api</b></a> <a href="pcre2api.html"><b>pcre2api</b></a>
documentation. The default limit (in effect unlimited) is 20 million. You can documentation. The default limit (in effect unlimited) is 20 million. You can
change this by a setting such as change this by a setting such as
<pre> <pre>
--with-heap-limit=500 --with-heap-limit=500
</pre> </pre>
which limits the amount of heap to 500 kilobytes. This limit applies only to which limits the amount of heap to 500 KiB. This limit applies only to
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
may also use the heap for internal workspace when processing complicated may also use the heap for internal workspace when processing complicated
patterns. This limit does not apply when JIT (which has its own memory patterns. This limit does not apply when JIT (which has its own memory
@ -542,7 +542,7 @@ generated from the string.
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b> Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
to be created. This is normally run under valgrind or used when PCRE2 is to be created. This is normally run under valgrind or used when PCRE2 is
compiled with address sanitizing enabled. It calls the fuzzing function and compiled with address sanitizing enabled. It calls the fuzzing function and
outputs information about it is doing. The input strings are specified by outputs information about what it is doing. The input strings are specified by
arguments: if an argument starts with "=" the rest of it is a literal input arguments: if an argument starts with "=" the rest of it is a literal input
string. Otherwise, it is assumed to be a file name, and the contents of the string. Otherwise, it is assumed to be a file name, and the contents of the
file are the test string. file are the test string.

View File

@ -31,7 +31,7 @@ page.
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but 2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
they do not mean what you might think. For example, (?!a){3} does not assert they do not mean what you might think. For example, (?!a){3} does not assert
that the next three characters are not "a". It just asserts that the next that the next three characters are not "a". It just asserts that the next
character is not "a" three times (in principle: PCRE2 optimizes this to run the character is not "a" three times (in principle; PCRE2 optimizes this to run the
assertion just once). Perl allows some repeat quantifiers on other assertions, assertion just once). Perl allows some repeat quantifiers on other assertions,
for example, \b* (but not \b{3}), but these do not seem to have any use. for example, \b* (but not \b{3}), but these do not seem to have any use.
</P> </P>
@ -77,8 +77,8 @@ The \Q...\E sequence is recognized both inside and outside character classes.
</P> </P>
<P> <P>
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code}) 7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
constructions. However, there is support PCRE2's "callout" feature, which constructions. However, PCRE2 does have a "callout" feature, which allows an
allows an external function to be called during pattern matching. See the external function to be called during pattern matching. See the
<a href="pcre2callout.html"><b>pcre2callout</b></a> <a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation for details. documentation for details.
</P> </P>

View File

@ -86,9 +86,10 @@ controlled by parameters that can be set by the <b>--buffer-size</b> and
that is obtained at the start of processing. If an input file contains very that is obtained at the start of processing. If an input file contains very
long lines, a larger buffer may be needed; this is handled by automatically long lines, a larger buffer may be needed; this is handled by automatically
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
default values for these parameters are specified when <b>pcre2grep</b> is default values for these parameters can be set when <b>pcre2grep</b> is
built, with the default defaults being 20K and 1M respectively. An error occurs built; if nothing is specified, the defaults are set to 20K and 1M
if a line is too long and the buffer can no longer be expanded. respectively. An error occurs if a line is too long and the buffer can no
longer be expanded.
</P> </P>
<P> <P>
The block of memory that is actually used is three times the "buffer size", to The block of memory that is actually used is three times the "buffer size", to
@ -500,13 +501,13 @@ short form for this option.
When this option is given, non-compressed input is read and processed line by When this option is given, non-compressed input is read and processed line by
line, and the output is flushed after each write. By default, input is read in line, and the output is flushed after each write. By default, input is read in
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
terminal (which is currently possible only in Unix-like environments). Output terminal (which is currently possible only in Unix-like environments or
to terminal is normally automatically flushed by the operating system. This Windows). Output to terminal is normally automatically flushed by the operating
option can be useful when the input or output is attached to a pipe and you do system. This option can be useful when the input or output is attached to a
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use pipe and you do not want <b>pcre2grep</b> to buffer up large amounts of data.
will affect performance, and the <b>-M</b> (multiline) option ceases to work. However, its use will affect performance, and the <b>-M</b> (multiline) option
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is ceases to work. When input is from a compressed .gz or .bz2 file,
ignored. <b>--line-buffered</b> is ignored.
</P> </P>
<P> <P>
<b>--line-offsets</b> <b>--line-offsets</b>
@ -541,11 +542,11 @@ counter that is incremented each time around its main processing loop. If the
value set by <b>--match-limit</b> is reached, an error occurs. value set by <b>--match-limit</b> is reached, an error occurs.
<br> <br>
<br> <br>
The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount The <b>--heap-limit</b> option specifies, as a number of kibibytes (units of
of heap memory that may be used for matching. Heap memory is needed only if 1024 bytes), the amount of heap memory that may be used for matching. Heap
matching the pattern requires a significant number of nested backtracking memory is needed only if matching the pattern requires a significant number of
points to be remembered. This parameter can be set to zero to forbid the use of nested backtracking points to be remembered. This parameter can be set to zero
heap memory altogether. to forbid the use of heap memory altogether.
<br> <br>
<br> <br>
The <b>--depth-limit</b> option limits the depth of nested backtracking points, The <b>--depth-limit</b> option limits the depth of nested backtracking points,
@ -556,9 +557,9 @@ limit acts varies from pattern to pattern. This limit is of use only if it is
set smaller than <b>--match-limit</b>. set smaller than <b>--match-limit</b>.
<br> <br>
<br> <br>
There are no short forms for these options. The default settings are specified There are no short forms for these options. The default limits can be set
when the PCRE2 library is compiled, with the default defaults being very large when the PCRE2 library is compiled; if they are not specified, the defaults
and so effectively unlimited. are very large and so effectively unlimited.
</P> </P>
<P> <P>
\fB--max-buffer-size=<i>number</i> \fB--max-buffer-size=<i>number</i>

View File

@ -54,9 +54,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
no more than 65535 capturing subpatterns. There is, however, a limit to the no more than 65535 capturing subpatterns. There is, however, a limit to the
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
order to limit the amount of system stack used at compile time. The default order to limit the amount of system stack used at compile time. The default
limit can be specified when PCRE2 is built; the default default is 250. An limit can be specified when PCRE2 is built; if not, the default is set to 250.
application can change this limit by calling pcre2_set_parens_nest_limit() to An application can change this limit by calling pcre2_set_parens_nest_limit()
set the limit in a compile context. to set the limit in a compile context.
</P> </P>
<P> <P>
The maximum length of name for a named subpattern is 32 code units, and the The maximum length of name for a named subpattern is 32 code units, and the

View File

@ -196,7 +196,7 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
for it to have any effect. In other words, the pattern writer can lower the for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used. The heap limit is setting of one of these limits, the lower value is used. The heap limit is
specified in kilobytes. specified in kibibytes (units of 1024 bytes).
</P> </P>
<P> <P>
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
@ -549,7 +549,7 @@ Absolute and relative back references
<P> <P>
The sequence \g followed by a signed or unsigned number, optionally enclosed The sequence \g followed by a signed or unsigned number, optionally enclosed
in braces, is an absolute or relative backreference. A named backreference in braces, is an absolute or relative backreference. A named backreference
can be coded as \g{name}. Back references are discussed can be coded as \g{name}. backreferences are discussed
<a href="#backreferences">later,</a> <a href="#backreferences">later,</a>
following the discussion of following the discussion of
<a href="#subpattern">parenthesized subpatterns.</a> <a href="#subpattern">parenthesized subpatterns.</a>
@ -1037,7 +1037,7 @@ joiner" characters. Characters with the "mark" property always have the
modifier). Extending characters are allowed before the modifier. modifier). Extending characters are allowed before the modifier.
</P> </P>
<P> <P>
7. Do not break within emoji zwj sequences (zero-width jointer followed by 7. Do not break within emoji zwj sequences (zero-width joiner followed by
"glue after ZWJ" or "base glue after ZWJ"). "glue after ZWJ" or "base glue after ZWJ").
</P> </P>
<P> <P>
@ -2210,8 +2210,8 @@ after the reference.
</P> </P>
<P> <P>
There may be more than one backreference to the same subpattern. If a There may be more than one backreference to the same subpattern. If a
subpattern has not actually been used in a particular match, any back subpattern has not actually been used in a particular match, any backreferences
references to it always fail by default. For example, the pattern to it always fail by default. For example, the pattern
<pre> <pre>
(a|(bc))\2 (a|(bc))\2
</pre> </pre>
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
minimum of zero. minimum of zero.
</P> </P>
<P> <P>
Back references of this type cause the group that they reference to be treated backreferences of this type cause the group that they reference to be treated
as an as an
<a href="#atomicgroup">atomic group.</a> <a href="#atomicgroup">atomic group.</a>
Once the whole group has been matched, a subsequent matching failure cannot Once the whole group has been matched, a subsequent matching failure cannot

View File

@ -139,7 +139,7 @@ because it disables the use of back references.
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
(which has the type const char *) must be set to point to the character beyond (which has the type const char *) must be set to point to the character beyond
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
now contain binary zeroes, which are treated as data characters. Without now contain binary zeros, which are treated as data characters. Without
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
ignored. This is a GNU extension to the POSIX standard and should be used with ignored. This is a GNU extension to the POSIX standard and should be used with
caution in software intended to be portable to other systems. caution in software intended to be portable to other systems.
@ -248,10 +248,10 @@ function.
<pre> <pre>
REG_STARTEND REG_STARTEND
</pre> </pre>
When this option is set, the subject string is starts at <i>string</i> + When this option is set, the subject string starts at <i>string</i> +
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which <i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
should point to the first character beyond the string. There may be binary should point to the first character beyond the string. There may be binary
zeroes within the subject string, and indeed, using REG_STARTEND is the only zeros within the subject string, and indeed, using REG_STARTEND is the only
way to pass a subject string that contains a binary zero. way to pass a subject string that contains a binary zero.
</P> </P>
<P> <P>

View File

@ -442,7 +442,7 @@ of the newline or \R options with similar syntax. More than one of them may
appear. For the first three, d is a decimal number. appear. For the first three, d is a decimal number.
<pre> <pre>
(*LIMIT_DEPTH=d) set the backtracking limit to d (*LIMIT_DEPTH=d) set the backtracking limit to d
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d) set the match limit to d (*LIMIT_MATCH=d) set the match limit to d
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching

View File

@ -129,7 +129,7 @@ to occur).
UTF-8 (in its original definition) is not capable of encoding values greater UTF-8 (in its original definition) is not capable of encoding values greater
than 0x7fffffff, but such values can be handled by the 32-bit library. When than 0x7fffffff, but such values can be handled by the 32-bit library. When
testing this library in non-UTF mode with <b>utf8_input</b> set, if any testing this library in non-UTF mode with <b>utf8_input</b> set, if any
character is preceded by the byte 0xff (which is an illegal byte in UTF-8) character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
0x80000000 is added to the character's value. This is the only way of passing 0x80000000 is added to the character's value. This is the only way of passing
such code points in a pattern string. For subject strings, using an escape such code points in a pattern string. For subject strings, using an escape
sequence is preferable. sequence is preferable.
@ -264,7 +264,7 @@ Do not output the version number of <b>pcre2test</b> at the start of execution.
<P> <P>
<b>-S</b> <i>size</i> <b>-S</b> <i>size</i>
On Unix-like systems, set the size of the run-time stack to <i>size</i> On Unix-like systems, set the size of the run-time stack to <i>size</i>
megabytes. mebibytes (units of 1024*1024 bytes).
</P> </P>
<P> <P>
<b>-subject</b> <i>modifier-list</i> <b>-subject</b> <i>modifier-list</i>
@ -679,8 +679,8 @@ Newline and \R handling
<P> <P>
The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode", set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
\R matches any Unicode newline sequence. The default is specified when PCRE2 \R matches any Unicode newline sequence. The default can be specified when
is built, with the default default being Unicode. PCRE2 is built; if it is not, the default is set to Unicode.
</P> </P>
<P> <P>
The <b>newline</b> modifier specifies which characters are to be interpreted as The <b>newline</b> modifier specifies which characters are to be interpreted as
@ -1418,11 +1418,11 @@ Setting the JIT stack size
<P> <P>
The <b>jitstack</b> modifier provides a way of setting the maximum stack size The <b>jitstack</b> modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if JIT that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Setting optimization is not being used. The value is a number of kibibytes (units of
zero reverts to the default of 32K. Providing a stack that is larger than the 1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
default is necessary only for very complicated patterns. If <b>jitstack</b> is that is larger than the default is necessary only for very complicated
set non-zero on a subject line it overrides any value that was set on the patterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
pattern. value that was set on the pattern.
</P> </P>
<br><b> <br><b>
Setting heap, match, and depth limits Setting heap, match, and depth limits
@ -1468,10 +1468,10 @@ and non-recursive, to the internal matching function, thus controlling the
overall amount of computing resource that is used. overall amount of computing resource that is used.
</P> </P>
<P> <P>
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes) For both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
limits the amount of heap memory used for matching. A value of zero disables (units of 1024 bytes), limits the amount of heap memory used for matching. A
the use of any heap memory; many simple pattern matches can be done without value of zero disables the use of any heap memory; many simple pattern matches
using the heap, so this is not an unreasonable setting. can be done without using the heap, so zero is not an unreasonable setting.
</P> </P>
<br><b> <br><b>
Showing MARK names Showing MARK names

View File

@ -619,11 +619,12 @@ NEWLINES
Each of the first three conventions is used by at least one operating Each of the first three conventions is used by at least one operating
system as its standard newline sequence. When PCRE2 is built, a default system as its standard newline sequence. When PCRE2 is built, a default
can be specified. The default default is LF, which is the Unix stan- can be specified. If it is not, the default is set to LF, which is the
dard. However, the newline convention can be changed by an application Unix standard. However, the newline convention can be changed by an
when calling pcre2_compile(), or it can be specified by special text at application when calling pcre2_compile(), or it can be specified by
the start of the pattern itself; this overrides any other settings. See special text at the start of the pattern itself; this overrides any
the pcre2pattern page for details of the special character sequences. other settings. See the pcre2pattern page for details of the special
character sequences.
In the PCRE2 documentation the word "newline" is used to mean "the In the PCRE2 documentation the word "newline" is used to mean "the
character or pair of characters that indicate a line break". The choice character or pair of characters that indicate a line break". The choice
@ -957,17 +958,17 @@ PCRE2 CONTEXTS
int pcre2_set_heap_limit(pcre2_match_context *mcontext, int pcre2_set_heap_limit(pcre2_match_context *mcontext,
uint32_t value); uint32_t value);
The heap_limit parameter specifies, in units of kilobytes, the maximum The heap_limit parameter specifies, in units of kibibytes (1024 bytes),
amount of heap memory that pcre2_match() may use to hold backtracking the maximum amount of heap memory that pcre2_match() may use to hold
information when running an interpretive match. This limit also applies backtracking information when running an interpretive match. This limit
to pcre2_dfa_match(), which may use the heap when processing patterns also applies to pcre2_dfa_match(), which may use the heap when process-
with a lot of nested pattern recursion or lookarounds or atomic groups. ing patterns with a lot of nested pattern recursion or lookarounds or
This limit does not apply to matching with the JIT optimization, which atomic groups. This limit does not apply to matching with the JIT opti-
has its own memory control arrangements (see the pcre2jit documentation mization, which has its own memory control arrangements (see the
for more details). If the limit is reached, the negative error code pcre2jit documentation for more details). If the limit is reached, the
PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 negative error code PCRE2_ERROR_HEAPLIMIT is returned. The default
is built; the default default is very large and is essentially "unlim- limit can be set when PCRE2 is built; if it is not, the default is set
ited". very large and is essentially "unlimited".
A value for the heap limit may also be supplied by an item at the start A value for the heap limit may also be supplied by an item at the start
of a pattern of the form of a pattern of the form
@ -1042,7 +1043,7 @@ PCRE2 CONTEXTS
using JIT compiled code. However, it is supported by pcre2_dfa_match(), using JIT compiled code. However, it is supported by pcre2_dfa_match(),
which uses it to limit the depth of nested internal recursive function which uses it to limit the depth of nested internal recursive function
calls that implement atomic groups, lookaround assertions, and pattern calls that implement atomic groups, lookaround assertions, and pattern
recursions. This limits, indirectly, the amount of system stack this is recursions. This limits, indirectly, the amount of system stack that is
used. It was more useful in versions before 10.32, when stack memory used. It was more useful in versions before 10.32, when stack memory
was used for local workspace vectors for recursive function calls. From was used for local workspace vectors for recursive function calls. From
version 10.32, only local variables are allocated on the stack and as version 10.32, only local variables are allocated on the stack and as
@ -1058,10 +1059,11 @@ PCRE2 CONTEXTS
directly by calling pcre2_set_heap_limit(). directly by calling pcre2_set_heap_limit().
The default value for the depth limit can be set when PCRE2 is built; The default value for the depth limit can be set when PCRE2 is built;
the default default is the same value as the default for the match if it is not, the default is set to the same value as the default for
limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match() the match limit. If the limit is exceeded, pcre2_match() or
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be pcre2_dfa_match() returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth
supplied by an item at the start of a pattern of the form limit may also be supplied by an item at the start of a pattern of the
form
(*LIMIT_DEPTH=ddd) (*LIMIT_DEPTH=ddd)
@ -1117,7 +1119,7 @@ CHECKING BUILD-TIME OPTIONS
PCRE2_CONFIG_HEAPLIMIT PCRE2_CONFIG_HEAPLIMIT
The output is a uint32_t integer that gives, in kilobytes, the default The output is a uint32_t integer that gives, in kibibytes, the default
limit for the amount of heap memory used by pcre2_match() or limit for the amount of heap memory used by pcre2_match() or
pcre2_dfa_match(). Further details are given with pcre2_dfa_match(). Further details are given with
pcre2_set_heap_limit() above. pcre2_set_heap_limit() above.
@ -1413,7 +1415,7 @@ COMPILING A PATTERN
it can be changed within a pattern by a (?i) option setting. If it can be changed within a pattern by a (?i) option setting. If
PCRE2_UTF is set, Unicode properties are used for all characters with PCRE2_UTF is set, Unicode properties are used for all characters with
more than one other case, and for all characters whose code points are more than one other case, and for all characters whose code points are
greater than U+007f. For lower valued characters with only one other greater than U+007F. For lower valued characters with only one other
case, a lookup table is used for speed. When PCRE2_UTF is not set, a case, a lookup table is used for speed. When PCRE2_UTF is not set, a
lookup table is used for all code points less than 256, and higher code lookup table is used for all code points less than 256, and higher code
points (available only in 16-bit or 32-bit mode) are treated as not points (available only in 16-bit or 32-bit mode) are treated as not
@ -1983,18 +1985,17 @@ INFORMATION ABOUT A COMPILED PATTERN
Return the number of the highest backreference in the pattern. The Return the number of the highest backreference in the pattern. The
third argument should point to an uint32_t variable. Named subpatterns third argument should point to an uint32_t variable. Named subpatterns
acquire numbers as well as names, and these count towards the highest acquire numbers as well as names, and these count towards the highest
back reference. Back references such as \4 or \g{12} match the cap- backreference. Backreferences such as \4 or \g{12} match the captured
tured characters of the given group, but in addition, the check that a characters of the given group, but in addition, the check that a cap-
capturing group is set in a conditional subpattern such as (?(3)a|b) is turing group is set in a conditional subpattern such as (?(3)a|b) is
also a back reference. Zero is returned if there are no back refer- also a backreference. Zero is returned if there are no backreferences.
ences.
PCRE2_INFO_BSR PCRE2_INFO_BSR
The output is a uint32_t whose value indicates what character sequences The output is a uint32_t integer whose value indicates what character
the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that sequences the \R escape sequence matches. A value of PCRE2_BSR_UNICODE
\R matches any Unicode line ending sequence; a value of PCRE2_BSR_ANY- means that \R matches any Unicode line ending sequence; a value of
CRLF means that \R matches only CR, LF, or CRLF. PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF.
PCRE2_INFO_CAPTURECOUNT PCRE2_INFO_CAPTURECOUNT
@ -2006,8 +2007,8 @@ INFORMATION ABOUT A COMPILED PATTERN
If the pattern set a backtracking depth limit by including an item of If the pattern set a backtracking depth limit by including an item of
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
third argument should point to an unsigned 32-bit integer. If no such third argument should point to a uint32_t integer. If no such value has
value has been set, the call to pcre2_pattern_info() returns the error been set, the call to pcre2_pattern_info() returns the error
PCRE2_ERROR_UNSET. Note that this limit will only be used during match- PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
ing if it is less than the limit set or defaulted by the caller of the ing if it is less than the limit set or defaulted by the caller of the
match function. match function.
@ -2021,7 +2022,7 @@ INFORMATION ABOUT A COMPILED PATTERN
code unit values greater than 255 are supported, the flag bit for 255 code unit values greater than 255 are supported, the flag bit for 255
means "any code unit of value 255 or above". If such a table was con- means "any code unit of value 255 or above". If such a table was con-
structed, a pointer to it is returned. Otherwise NULL is returned. The structed, a pointer to it is returned. Otherwise NULL is returned. The
third argument should point to an const uint8_t * variable. third argument should point to a const uint8_t * variable.
PCRE2_INFO_FIRSTCODETYPE PCRE2_INFO_FIRSTCODETYPE
@ -2048,7 +2049,7 @@ INFORMATION ABOUT A COMPILED PATTERN
Return the size (in bytes) of the data frames that are used to remember Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by pcre2_match() backtracking positions when the pattern is processed by pcre2_match()
without the use of JIT. The third argument should point to an size_t without the use of JIT. The third argument should point to a size_t
variable. The frame size depends on the number of capturing parentheses variable. The frame size depends on the number of capturing parentheses
in the pattern. Each additional capturing group adds two PCRE2_SIZE in the pattern. Each additional capturing group adds two PCRE2_SIZE
variables. variables.
@ -2070,11 +2071,10 @@ INFORMATION ABOUT A COMPILED PATTERN
If the pattern set a heap memory limit by including an item of the form If the pattern set a heap memory limit by including an item of the form
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu- (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu-
ment should point to an unsigned 32-bit integer. If no such value has ment should point to a uint32_t integer. If no such value has been set,
been set, the call to pcre2_pattern_info() returns the error the call to pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET.
PCRE2_ERROR_UNSET. Note that this limit will only be used during match- Note that this limit will only be used during matching if it is less
ing if it is less than the limit set or defaulted by the caller of the than the limit set or defaulted by the caller of the match function.
match function.
PCRE2_INFO_JCHANGED PCRE2_INFO_JCHANGED
@ -2120,8 +2120,8 @@ INFORMATION ABOUT A COMPILED PATTERN
If the pattern set a match limit by including an item of the form If the pattern set a match limit by including an item of the form
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value argument should point to a uint32_t integer. If no such value has been
has been set, the call to pcre2_pattern_info() returns the error set, the call to pcre2_pattern_info() returns the error
PCRE2_ERROR_UNSET. Note that this limit will only be used during match- PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
ing if it is less than the limit set or defaulted by the caller of the ing if it is less than the limit set or defaulted by the caller of the
match function. match function.
@ -2129,15 +2129,15 @@ INFORMATION ABOUT A COMPILED PATTERN
PCRE2_INFO_MAXLOOKBEHIND PCRE2_INFO_MAXLOOKBEHIND
Return the number of characters (not code units) in the longest lookbe- Return the number of characters (not code units) in the longest lookbe-
hind assertion in the pattern. The third argument should point to an hind assertion in the pattern. The third argument should point to a
unsigned 32-bit integer. This information is useful when doing multi- uint32_t integer. This information is useful when doing multi-segment
segment matching using the partial matching facilities. Note that the matching using the partial matching facilities. Note that the simple
simple assertions \b and \B require a one-character lookbehind. \A also assertions \b and \B require a one-character lookbehind. \A also regis-
registers a one-character lookbehind, though it does not actually ters a one-character lookbehind, though it does not actually inspect
inspect the previous character. This is to ensure that at least one the previous character. This is to ensure that at least one character
character from the old segment is retained when a new segment is pro- from the old segment is retained when a new segment is processed. Oth-
cessed. Otherwise, if there are no lookbehinds in the pattern, \A might erwise, if there are no lookbehinds in the pattern, \A might match
match incorrectly at the start of a second or subsequent segment. incorrectly at the start of a second or subsequent segment.
PCRE2_INFO_MINLENGTH PCRE2_INFO_MINLENGTH
@ -2378,7 +2378,7 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
set must point to the start of a character, or to the end of the sub- set must point to the start of a character, or to the end of the sub-
ject (in UTF-32 mode, one code unit equals one character, so all off- ject (in UTF-32 mode, one code unit equals one character, so all off-
sets are valid). Like the pattern string, the subject may contain sets are valid). Like the pattern string, the subject may contain
binary zeroes. binary zeros.
A non-zero starting offset is useful when searching for another match A non-zero starting offset is useful when searching for another match
in the same subject by calling pcre2_match() again after a previous in the same subject by calling pcre2_match() again after a previous
@ -3445,8 +3445,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
PCRE2_ERROR_DFA_UCOND PCRE2_ERROR_DFA_UCOND
This return is given if pcre2_dfa_match() encounters a condition item This return is given if pcre2_dfa_match() encounters a condition item
that uses a back reference for the condition, or a test for recursion that uses a backreference for the condition, or a test for recursion in
in a specific group. These are not supported. a specific group. These are not supported.
PCRE2_ERROR_DFA_WSSIZE PCRE2_ERROR_DFA_WSSIZE
@ -3683,8 +3683,8 @@ NEWLINE RECOGNITION
--enable-newline-is-nul --enable-newline-is-nul
which causes NUL (binary zero) is set as the default line-ending char- which causes NUL (binary zero) to be set as the default line-ending
acter. character.
Whatever default line ending convention is selected when PCRE2 is built Whatever default line ending convention is selected when PCRE2 is built
can be overridden by applications that use the library. At build time can be overridden by applications that use the library. At build time
@ -3745,18 +3745,18 @@ LIMITING PCRE2 RESOURCE USAGE
stack to record backtracking points. The more nested backtracking stack to record backtracking points. The more nested backtracking
points there are (that is, the deeper the search tree), the more memory points there are (that is, the deeper the search tree), the more memory
is needed. If the initial vector is not large enough, heap memory is is needed. If the initial vector is not large enough, heap memory is
used, up to a certain limit, which is specified in kilobytes. The limit used, up to a certain limit, which is specified in kibibytes (units of
can be changed at run time, as described in the pcre2api documentation. 1024 bytes). The limit can be changed at run time, as described in the
The default limit (in effect unlimited) is 20 million. You can change pcre2api documentation. The default limit (in effect unlimited) is 20
this by a setting such as million. You can change this by a setting such as
--with-heap-limit=500 --with-heap-limit=500
which limits the amount of heap to 500 kilobytes. This limit applies which limits the amount of heap to 500 KiB. This limit applies only to
only to interpretive matching in pcre2_match() and pcre2_dfa_match(), interpretive matching in pcre2_match() and pcre2_dfa_match(), which may
which may also use the heap for internal workspace when processing com- also use the heap for internal workspace when processing complicated
plicated patterns. This limit does not apply when JIT (which has its patterns. This limit does not apply when JIT (which has its own memory
own memory arrangements) is used. arrangements) is used.
You can also explicitly limit the depth of nested backtracking in the You can also explicitly limit the depth of nested backtracking in the
pcre2_match() interpreter. This limit defaults to the value that is set pcre2_match() interpreter. This limit defaults to the value that is set
@ -4005,10 +4005,10 @@ SUPPORT FOR FUZZERS
Setting --enable-fuzz-support also causes a binary called pcre2fuz- Setting --enable-fuzz-support also causes a binary called pcre2fuz-
zcheck to be created. This is normally run under valgrind or used when zcheck to be created. This is normally run under valgrind or used when
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
function and outputs information about it is doing. The input strings function and outputs information about what it is doing. The input
are specified by arguments: if an argument starts with "=" the rest of strings are specified by arguments: if an argument starts with "=" the
it is a literal input string. Otherwise, it is assumed to be a file rest of it is a literal input string. Otherwise, it is assumed to be a
name, and the contents of the file are the test string. file name, and the contents of the file are the test string.
OBSOLETE OPTION OBSOLETE OPTION
@ -4167,9 +4167,9 @@ MISSING CALLOUTS
all branches are anchorable. all branches are anchorable.
This optimization is disabled, however, if .* is in an atomic group or This optimization is disabled, however, if .* is in an atomic group or
if there is a back reference to the capturing group in which it if there is a backreference to the capturing group in which it appears.
appears. It is also disabled if the pattern contains (*PRUNE) or It is also disabled if the pattern contains (*PRUNE) or (*SKIP). How-
(*SKIP). However, the presence of callouts does not affect it. ever, the presence of callouts does not affect it.
For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT
and applied to the string "aa", the pcre2test output is: and applied to the string "aa", the pcre2test output is:
@ -4489,7 +4489,7 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized asser- 2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
tions, but they do not mean what you might think. For example, (?!a){3} tions, but they do not mean what you might think. For example, (?!a){3}
does not assert that the next three characters are not "a". It just does not assert that the next three characters are not "a". It just
asserts that the next character is not "a" three times (in principle: asserts that the next character is not "a" three times (in principle;
PCRE2 optimizes this to run the assertion just once). Perl allows some PCRE2 optimizes this to run the assertion just once). Perl allows some
repeat quantifiers on other assertions, for example, \b* (but not repeat quantifiers on other assertions, for example, \b* (but not
\b{3}), but these do not seem to have any use. \b{3}), but these do not seem to have any use.
@ -4534,9 +4534,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
classes. classes.
7. Fairly obviously, PCRE2 does not support the (?{code}) and 7. Fairly obviously, PCRE2 does not support the (?{code}) and
(??{code}) constructions. However, there is support PCRE2's "callout" (??{code}) constructions. However, PCRE2 does have a "callout" feature,
feature, which allows an external function to be called during pattern which allows an external function to be called during pattern matching.
matching. See the pcre2callout documentation for details. See the pcre2callout documentation for details.
8. Subroutine calls (whether recursive or not) were treated as atomic 8. Subroutine calls (whether recursive or not) were treated as atomic
groups up to PCRE2 release 10.23, but from release 10.30 this changed, groups up to PCRE2 release 10.23, but from release 10.30 this changed,
@ -4604,9 +4604,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
different length of string. Perl requires them all to have the same different length of string. Perl requires them all to have the same
length. length.
(b) From PCRE2 10.23, back references to groups of fixed length are (b) From PCRE2 10.23, backreferences to groups of fixed length are sup-
supported in lookbehinds, provided that there is no possibility of ref- ported in lookbehinds, provided that there is no possibility of refer-
erencing a non-unique number or name. Perl does not support backrefer- encing a non-unique number or name. Perl does not support backrefer-
ences in lookbehinds. ences in lookbehinds.
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the (c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
@ -5103,9 +5103,9 @@ SIZE AND OTHER LIMITATIONS
limit to the depth of nesting of parenthesized subpatterns of all limit to the depth of nesting of parenthesized subpatterns of all
kinds. This is imposed in order to limit the amount of system stack kinds. This is imposed in order to limit the amount of system stack
used at compile time. The default limit can be specified when PCRE2 is used at compile time. The default limit can be specified when PCRE2 is
built; the default default is 250. An application can change this limit built; if not, the default is set to 250. An application can change
by calling pcre2_set_parens_nest_limit() to set the limit in a compile this limit by calling pcre2_set_parens_nest_limit() to set the limit in
context. a compile context.
The maximum length of name for a named subpattern is 32 code units, and The maximum length of name for a named subpattern is 32 code units, and
the maximum number of named subpatterns is 10000. the maximum number of named subpatterns is 10000.
@ -5929,7 +5929,8 @@ SPECIAL START-OF-PATTERN ITEMS
pcre2_match() for it to have any effect. In other words, the pattern pcre2_match() for it to have any effect. In other words, the pattern
writer can lower the limits set by the programmer, but not raise them. writer can lower the limits set by the programmer, but not raise them.
If there is more than one setting of one of these limits, the lower If there is more than one setting of one of these limits, the lower
value is used. The heap limit is specified in kilobytes. value is used. The heap limit is specified in kibibytes (units of 1024
bytes).
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
name is still recognized for backwards compatibility. name is still recognized for backwards compatibility.
@ -6230,8 +6231,8 @@ BACKSLASH
All UTF modes no greater than 0x10ffff and a valid code point All UTF modes no greater than 0x10ffff and a valid code point
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff Invalid Unicode code points are all those in the range 0xd800 to 0xdfff
(the so-called "surrogate" codepoints). The check for these can be dis- (the so-called "surrogate" code points). The check for these can be
abled by the caller of pcre2_compile() by setting the option disabled by the caller of pcre2_compile() by setting the option
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
Escape sequences in character classes Escape sequences in character classes
@ -6257,7 +6258,7 @@ BACKSLASH
The sequence \g followed by a signed or unsigned number, optionally The sequence \g followed by a signed or unsigned number, optionally
enclosed in braces, is an absolute or relative backreference. A named enclosed in braces, is an absolute or relative backreference. A named
back reference can be coded as \g{name}. Back references are discussed backreference can be coded as \g{name}. backreferences are discussed
later, following the discussion of parenthesized subpatterns. later, following the discussion of parenthesized subpatterns.
Absolute and relative subroutine calls Absolute and relative subroutine calls
@ -6266,8 +6267,8 @@ BACKSLASH
name or a number enclosed either in angle brackets or single quotes, is name or a number enclosed either in angle brackets or single quotes, is
an alternative syntax for referencing a subpattern as a "subroutine". an alternative syntax for referencing a subpattern as a "subroutine".
Details are discussed later. Note that \g{...} (Perl syntax) and Details are discussed later. Note that \g{...} (Perl syntax) and
\g<...> (Oniguruma syntax) are not synonymous. The former is a back \g<...> (Oniguruma syntax) are not synonymous. The former is a backref-
reference; the latter is a subroutine call. erence; the latter is a subroutine call.
Generic character types Generic character types
@ -6593,7 +6594,7 @@ BACKSLASH
lowed by a modifier). Extending characters are allowed before the modi- lowed by a modifier). Extending characters are allowed before the modi-
fier. fier.
7. Do not break within emoji zwj sequences (zero-width jointer followed 7. Do not break within emoji zwj sequences (zero-width joiner followed
by "glue after ZWJ" or "base glue after ZWJ"). by "glue after ZWJ" or "base glue after ZWJ").
8. Do not break within emoji flag sequences. That is, do not break 8. Do not break within emoji flag sequences. That is, do not break
@ -7285,7 +7286,7 @@ NAMED SUBPATTERNS
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) In PCRE2, a subpattern can be named in one of three ways: (?<name>...)
or (?'name'...) as in Perl, or (?P<name>...) as in Python. References or (?'name'...) as in Perl, or (?P<name>...) as in Python. References
to capturing parentheses from other parts of the pattern, such as back to capturing parentheses from other parts of the pattern, such as back-
references, recursion, and conditions, can be made by name as well as references, recursion, and conditions, can be made by name as well as
by number. by number.
@ -7321,8 +7322,8 @@ NAMED SUBPATTERNS
that name that matched. This saves searching to find which numbered that name that matched. This saves searching to find which numbered
subpattern it was. subpattern it was.
If you make a back reference to a non-unique named subpattern from If you make a backreference to a non-unique named subpattern from else-
elsewhere in the pattern, the subpatterns to which the name refers are where in the pattern, the subpatterns to which the name refers are
checked in the order in which they appear in the overall pattern. The checked in the order in which they appear in the overall pattern. The
first one that is set is used for the reference. For example, this pat- first one that is set is used for the reference. For example, this pat-
tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo": tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":
@ -7481,9 +7482,9 @@ REPETITION
mization, or alternatively, using ^ to indicate anchoring explicitly. mization, or alternatively, using ^ to indicate anchoring explicitly.
However, there are some cases where the optimization cannot be used. However, there are some cases where the optimization cannot be used.
When .* is inside capturing parentheses that are the subject of a back When .* is inside capturing parentheses that are the subject of a
reference elsewhere in the pattern, a match at the start may fail where backreference elsewhere in the pattern, a match at the start may fail
a later one succeeds. Consider, for example: where a later one succeeds. Consider, for example:
(.*)abc\1 (.*)abc\1
@ -7631,7 +7632,7 @@ BACK REFERENCES
it is always taken as a backreference, and causes an error only if it is always taken as a backreference, and causes an error only if
there are not that many capturing left parentheses in the entire pat- there are not that many capturing left parentheses in the entire pat-
tern. In other words, the parentheses that are referenced need not be tern. In other words, the parentheses that are referenced need not be
to the left of the reference for numbers less than 8. A "forward back to the left of the reference for numbers less than 8. A "forward back-
reference" of this type can make sense when a repetition is involved reference" of this type can make sense when a repetition is involved
and the subpattern to the right has participated in an earlier itera- and the subpattern to the right has participated in an earlier itera-
tion. tion.
@ -7671,10 +7672,10 @@ BACK REFERENCES
This kind of forward reference can be useful it patterns that repeat. This kind of forward reference can be useful it patterns that repeat.
Perl does not support the use of + in this way. Perl does not support the use of + in this way.
A back reference matches whatever actually matched the capturing sub- A backreference matches whatever actually matched the capturing subpat-
pattern in the current subject string, rather than anything matching tern in the current subject string, rather than anything matching the
the subpattern itself (see "Subpatterns as subroutines" below for a way subpattern itself (see "Subpatterns as subroutines" below for a way of
of doing that). So the pattern doing that). So the pattern
(sens|respons)e and \1ibility (sens|respons)e and \1ibility
@ -7704,14 +7705,14 @@ BACK REFERENCES
before or after the reference. before or after the reference.
There may be more than one backreference to the same subpattern. If a There may be more than one backreference to the same subpattern. If a
subpattern has not actually been used in a particular match, any back subpattern has not actually been used in a particular match, any back-
references to it always fail by default. For example, the pattern references to it always fail by default. For example, the pattern
(a|(bc))\2 (a|(bc))\2
always fails if it starts to match "a" rather than "bc". However, if always fails if it starts to match "a" rather than "bc". However, if
the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a back the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backref-
reference to an unset value matches an empty string. erence to an unset value matches an empty string.
Because there may be many capturing parentheses in a pattern, all dig- Because there may be many capturing parentheses in a pattern, all dig-
its following a backslash are taken as part of a potential backrefer- its following a backslash are taken as part of a potential backrefer-
@ -7730,13 +7731,13 @@ BACK REFERENCES
(a|b\1)+ (a|b\1)+
matches any number of "a"s and also "aba", "ababbaa" etc. At each iter- matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
ation of the subpattern, the back reference matches the character ation of the subpattern, the backreference matches the character string
string corresponding to the previous iteration. In order for this to corresponding to the previous iteration. In order for this to work, the
work, the pattern must be such that the first iteration does not need pattern must be such that the first iteration does not need to match
to match the back reference. This can be done using alternation, as in the backreference. This can be done using alternation, as in the exam-
the example above, or by a quantifier with a minimum of zero. ple above, or by a quantifier with a minimum of zero.
Back references of this type cause the group that they reference to be backreferences of this type cause the group that they reference to be
treated as an atomic group. Once the whole group has been matched, a treated as an atomic group. Once the whole group has been matched, a
subsequent matching failure cannot cause backtracking into the middle subsequent matching failure cannot cause backtracking into the middle
of the group. of the group.
@ -7871,8 +7872,8 @@ ASSERTIONS
However, recursion, that is, a "subroutine" call into a group that is However, recursion, that is, a "subroutine" call into a group that is
already active, is not supported. already active, is not supported.
Perl does not support back references in lookbehinds. PCRE2 does sup- Perl does not support backreferences in lookbehinds. PCRE2 does support
port them, but only if certain conditions are met. The them, but only if certain conditions are met. The
PCRE2_MATCH_UNSET_BACKREF option must not be set, there must be no use PCRE2_MATCH_UNSET_BACKREF option must not be set, there must be no use
of (?| in the pattern (it creates duplicate subpattern numbers), and if of (?| in the pattern (it creates duplicate subpattern numbers), and if
the backreference is by name, the name must be unique. Of course, the the backreference is by name, the name must be unique. Of course, the
@ -8332,11 +8333,10 @@ RECURSIVE PATTERNS
^(.)(\1|a(?2)) ^(.)(\1|a(?2))
This pattern matches "bab". The first capturing parentheses match "b", This pattern matches "bab". The first capturing parentheses match "b",
then in the second group, when the back reference \1 fails to match then in the second group, when the backreference \1 fails to match "b",
"b", the second alternative matches "a" and then recurses. In the the second alternative matches "a" and then recurses. In the recursion,
recursion, \1 does now match "b" and so the whole match succeeds. This \1 does now match "b" and so the whole match succeeds. This match used
match used to fail in Perl, but in later versions (I tried 5.024) it to fail in Perl, but in later versions (I tried 5.024) it now works.
now works.
SUBPATTERNS AS SUBROUTINES SUBPATTERNS AS SUBROUTINES
@ -9253,11 +9253,10 @@ COMPILING A PATTERN
If this option is set, the reg_endp field in the preg structure (which If this option is set, the reg_endp field in the preg structure (which
has the type const char *) must be set to point to the character beyond has the type const char *) must be set to point to the character beyond
the end of the pattern before calling regcomp(). The pattern itself may the end of the pattern before calling regcomp(). The pattern itself may
now contain binary zeroes, which are treated as data characters. With- now contain binary zeros, which are treated as data characters. Without
out REG_PEND, a binary zero terminates the pattern and the re_endp REG_PEND, a binary zero terminates the pattern and the re_endp field is
field is ignored. This is a GNU extension to the POSIX standard and ignored. This is a GNU extension to the POSIX standard and should be
should be used with caution in software intended to be portable to used with caution in software intended to be portable to other systems.
other systems.
REG_UCP REG_UCP
@ -9364,10 +9363,10 @@ MATCHING A PATTERN
REG_STARTEND REG_STARTEND
When this option is set, the subject string is starts at string + When this option is set, the subject string starts at string +
pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
point to the first character beyond the string. There may be binary point to the first character beyond the string. There may be binary
zeroes within the subject string, and indeed, using REG_STARTEND is the zeros within the subject string, and indeed, using REG_STARTEND is the
only way to pass a subject string that contains a binary zero. only way to pass a subject string that contains a binary zero.
Whatever the value of pmatch[0].rm_so, the offsets of the matched Whatever the value of pmatch[0].rm_so, the offsets of the matched
@ -9995,7 +9994,7 @@ OPTION SETTING
one of them may appear. For the first three, d is a decimal number. one of them may appear. For the first three, d is a decimal number.
(*LIMIT_DEPTH=d) set the backtracking limit to d (*LIMIT_DEPTH=d) set the backtracking limit to d
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d) set the match limit to d (*LIMIT_MATCH=d) set the match limit to d
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching

View File

@ -1,4 +1,4 @@
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "16 June 2017" "PCRE2 10.30" .TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "16 June 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS

View File

@ -16,7 +16,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
This function is part of an experimental set of pattern conversion functions. This function is part of an experimental set of pattern conversion functions.
It sets the component separator character that is used when converting globs. It sets the component separator character that is used when converting globs.
The second argument must one of the characters forward slash, backslash, or The second argument must be one of the characters forward slash, backslash, or
dot. The default is backslash when running under Windows, otherwise forward dot. The default is backslash when running under Windows, otherwise forward
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
the second argument is invalid. the second argument is invalid.

View File

@ -1,4 +1,4 @@
.TH PCRE2_SET_DEPTH_LIMIT 3 "11 April 2017" "PCRE2 10.30" .TH PCRE2_SET_HEAP_LIMIT 3 "11 April 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS

View File

@ -497,10 +497,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
.P .P
Each of the first three conventions is used by at least one operating system as Each of the first three conventions is used by at least one operating system as
its standard newline sequence. When PCRE2 is built, a default can be specified. its standard newline sequence. When PCRE2 is built, a default can be specified.
The default default is LF, which is the Unix standard. However, the newline If it is not, the default is set to LF, which is the Unix standard. However,
convention can be changed by an application when calling \fBpcre2_compile()\fP, the newline convention can be changed by an application when calling
or it can be specified by special text at the start of the pattern itself; this \fBpcre2_compile()\fP, or it can be specified by special text at the start of
overrides any other settings. See the the pattern itself; this overrides any other settings. See the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
@ -885,19 +885,20 @@ offset limit. In other words, whichever limit comes first is used.
.B " uint32_t \fIvalue\fP);" .B " uint32_t \fIvalue\fP);"
.fi .fi
.sp .sp
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum The \fIheap_limit\fP parameter specifies, in units of kibibytes (1024 bytes),
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking the maximum amount of heap memory that \fBpcre2_match()\fP may use to hold
information when running an interpretive match. This limit also applies to backtracking information when running an interpretive match. This limit also
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a applies to \fBpcre2_dfa_match()\fP, which may use the heap when processing
lot of nested pattern recursion or lookarounds or atomic groups. This limit patterns with a lot of nested pattern recursion or lookarounds or atomic
does not apply to matching with the JIT optimization, which has its own memory groups. This limit does not apply to matching with the JIT optimization, which
control arrangements (see the has its own memory control arrangements (see the
.\" HREF .\" HREF
\fBpcre2jit\fP \fBpcre2jit\fP
.\" .\"
documentation for more details). If the limit is reached, the negative error documentation for more details). If the limit is reached, the negative error
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
built; the default default is very large and is essentially "unlimited". is built; if it is not, the default is set very large and is essentially
"unlimited".
.P .P
A value for the heap limit may also be supplied by an item at the start of a A value for the heap limit may also be supplied by an item at the start of a
pattern of the form pattern of the form
@ -975,7 +976,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
uses it to limit the depth of nested internal recursive function calls that uses it to limit the depth of nested internal recursive function calls that
implement atomic groups, lookaround assertions, and pattern recursions. This implement atomic groups, lookaround assertions, and pattern recursions. This
limits, indirectly, the amount of system stack this is used. It was more useful limits, indirectly, the amount of system stack that is used. It was more useful
in versions before 10.32, when stack memory was used for local workspace in versions before 10.32, when stack memory was used for local workspace
vectors for recursive function calls. From version 10.32, only local variables vectors for recursive function calls. From version 10.32, only local variables
are allocated on the stack and as each call uses only a few hundred bytes, even are allocated on the stack and as each call uses only a few hundred bytes, even
@ -989,11 +990,11 @@ using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
probably better to limit heap usage directly by calling probably better to limit heap usage directly by calling
\fBpcre2_set_heap_limit()\fP. \fBpcre2_set_heap_limit()\fP.
.P .P
The default value for the depth limit can be set when PCRE2 is built; the The default value for the depth limit can be set when PCRE2 is built; if it is
default default is the same value as the default for the match limit. If the not, the default is set to the same value as the default for the match limit.
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
item at the start of a pattern of the form supplied by an item at the start of a pattern of the form
.sp .sp
(*LIMIT_DEPTH=ddd) (*LIMIT_DEPTH=ddd)
.sp .sp
@ -1050,7 +1051,7 @@ given with \fBpcre2_set_depth_limit()\fP above.
.sp .sp
PCRE2_CONFIG_HEAPLIMIT PCRE2_CONFIG_HEAPLIMIT
.sp .sp
The output is a uint32_t integer that gives, in kilobytes, the default limit The output is a uint32_t integer that gives, in kibibytes, the default limit
for the amount of heap memory used by \fBpcre2_match()\fP or for the amount of heap memory used by \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP. Further details are given with \fBpcre2_dfa_match()\fP. Further details are given with
\fBpcre2_set_heap_limit()\fP above. \fBpcre2_set_heap_limit()\fP above.
@ -1367,7 +1368,7 @@ If this bit is set, letters in the pattern match both upper and lower case
letters in the subject. It is equivalent to Perl's /i option, and it can be letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
properties are used for all characters with more than one other case, and for properties are used for all characters with more than one other case, and for
all characters whose code points are greater than U+007f. For lower valued all characters whose code points are greater than U+007F. For lower valued
characters with only one other case, a lookup table is used for speed. When characters with only one other case, a lookup table is used for speed. When
PCRE2_UTF is not set, a lookup table is used for all code points less than 256, PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
and higher code points (available only in 16-bit or 32-bit mode) are treated as and higher code points (available only in 16-bit or 32-bit mode) are treated as
@ -1550,8 +1551,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). This is the same as Perl's /n option. they acquire numbers in the usual way). This is the same as Perl's /n option.
Note that, when this option is set, references to capturing groups (back Note that, when this option is set, references to capturing groups
references or recursion/subroutine calls) may only refer to named groups, (backreferences or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number. though the reference can be by name or by number.
.sp .sp
PCRE2_NO_AUTO_POSSESS PCRE2_NO_AUTO_POSSESS
@ -1976,10 +1977,10 @@ returned if there are no back references.
.sp .sp
PCRE2_INFO_BSR PCRE2_INFO_BSR
.sp .sp
The output is a uint32_t whose value indicates what character sequences the \eR The output is a uint32_t integer whose value indicates what character sequences
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR matches the \eR escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \eR matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
matches only CR, LF, or CRLF. that \eR matches only CR, LF, or CRLF.
.sp .sp
PCRE2_INFO_CAPTURECOUNT PCRE2_INFO_CAPTURECOUNT
.sp .sp
@ -1991,10 +1992,10 @@ The third argument should point to an \fBuint32_t\fP variable.
.sp .sp
If the pattern set a backtracking depth limit by including an item of the form If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to a uint32_t integer. If no such value has been set, the call to
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
that this limit will only be used during matching if it is less than the limit limit will only be used during matching if it is less than the limit set or
set or defaulted by the caller of the match function. defaulted by the caller of the match function.
.sp .sp
PCRE2_INFO_FIRSTBITMAP PCRE2_INFO_FIRSTBITMAP
.sp .sp
@ -2004,7 +2005,7 @@ values for the first code unit in any match. For example, a pattern that starts
with [abc] results in a table with three bits set. When code unit values with [abc] results in a table with three bits set. When code unit values
greater than 255 are supported, the flag bit for 255 means "any code unit of greater than 255 are supported, the flag bit for 255 means "any code unit of
value 255 or above". If such a table was constructed, a pointer to it is value 255 or above". If such a table was constructed, a pointer to it is
returned. Otherwise NULL is returned. The third argument should point to an returned. Otherwise NULL is returned. The third argument should point to a
\fBconst uint8_t *\fP variable. \fBconst uint8_t *\fP variable.
.sp .sp
PCRE2_INFO_FIRSTCODETYPE PCRE2_INFO_FIRSTCODETYPE
@ -2031,7 +2032,7 @@ and up to 0xffffffff when not using UTF-32 mode.
.sp .sp
Return the size (in bytes) of the data frames that are used to remember Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by \fBpcre2_match()\fP backtracking positions when the pattern is processed by \fBpcre2_match()\fP
without the use of JIT. The third argument should point to an \fBsize_t\fP without the use of JIT. The third argument should point to a \fBsize_t\fP
variable. The frame size depends on the number of capturing parentheses in the variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables. pattern. Each additional capturing group adds two PCRE2_SIZE variables.
.sp .sp
@ -2051,10 +2052,10 @@ the equivalent hexadecimal or octal escape sequences.
.sp .sp
If the pattern set a heap memory limit by including an item of the form If the pattern set a heap memory limit by including an item of the form
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to a uint32_t integer. If no such value has been set, the call to
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
that this limit will only be used during matching if it is less than the limit limit will only be used during matching if it is less than the limit set or
set or defaulted by the caller of the match function. defaulted by the caller of the match function.
.sp .sp
PCRE2_INFO_JCHANGED PCRE2_INFO_JCHANGED
.sp .sp
@ -2098,15 +2099,15 @@ in such cases.
.sp .sp
If the pattern set a match limit by including an item of the form If the pattern set a match limit by including an item of the form
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to a uint32_t integer. If no such value has been set, the call to
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
that this limit will only be used during matching if it is less than the limit limit will only be used during matching if it is less than the limit set or
set or defaulted by the caller of the match function. defaulted by the caller of the match function.
.sp .sp
PCRE2_INFO_MAXLOOKBEHIND PCRE2_INFO_MAXLOOKBEHIND
.sp .sp
Return the number of characters (not code units) in the longest lookbehind Return the number of characters (not code units) in the longest lookbehind
assertion in the pattern. The third argument should point to an unsigned 32-bit assertion in the pattern. The third argument should point to a uint32_t
integer. This information is useful when doing multi-segment matching using the integer. This information is useful when doing multi-segment matching using the
partial matching facilities. Note that the simple assertions \eb and \eB partial matching facilities. Note that the simple assertions \eb and \eB
require a one-character lookbehind. \eA also registers a one-character require a one-character lookbehind. \eA also registers a one-character
@ -2393,7 +2394,7 @@ zero, the search for a match starts at the beginning of the subject, and this
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
must point to the start of a character, or to the end of the subject (in UTF-32 must point to the start of a character, or to the end of the subject (in UTF-32
mode, one code unit equals one character, so all offsets are valid). Like the mode, one code unit equals one character, so all offsets are valid). Like the
pattern string, the subject may contain binary zeroes. pattern string, the subject may contain binary zeros.
.P .P
A non-zero starting offset is useful when searching for another match in the A non-zero starting offset is useful when searching for another match in the
same subject by calling \fBpcre2_match()\fP again after a previous success. same subject by calling \fBpcre2_match()\fP again after a previous success.

View File

@ -216,7 +216,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
.sp .sp
--enable-newline-is-nul --enable-newline-is-nul
.sp .sp
which causes NUL (binary zero) is set as the default line-ending character. which causes NUL (binary zero) to be set as the default line-ending character.
.P .P
Whatever default line ending convention is selected when PCRE2 is built can be Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is overridden by applications that use the library. At build time it is
@ -281,8 +281,8 @@ The \fBpcre2_match()\fP function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking points there stack to record backtracking points. The more nested backtracking points there
are (that is, the deeper the search tree), the more memory is needed. If the are (that is, the deeper the search tree), the more memory is needed. If the
initial vector is not large enough, heap memory is used, up to a certain limit, initial vector is not large enough, heap memory is used, up to a certain limit,
which is specified in kilobytes. The limit can be changed at run time, as which is specified in kibibytes (units of 1024 bytes). The limit can be changed
described in the at run time, as described in the
.\" HREF .\" HREF
\fBpcre2api\fP \fBpcre2api\fP
.\" .\"
@ -291,7 +291,7 @@ change this by a setting such as
.sp .sp
--with-heap-limit=500 --with-heap-limit=500
.sp .sp
which limits the amount of heap to 500 kilobytes. This limit applies only to which limits the amount of heap to 500 KiB. This limit applies only to
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
may also use the heap for internal workspace when processing complicated may also use the heap for internal workspace when processing complicated
patterns. This limit does not apply when JIT (which has its own memory patterns. This limit does not apply when JIT (which has its own memory
@ -552,7 +552,7 @@ generated from the string.
Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
to be created. This is normally run under valgrind or used when PCRE2 is to be created. This is normally run under valgrind or used when PCRE2 is
compiled with address sanitizing enabled. It calls the fuzzing function and compiled with address sanitizing enabled. It calls the fuzzing function and
outputs information about it is doing. The input strings are specified by outputs information about what it is doing. The input strings are specified by
arguments: if an argument starts with "=" the rest of it is a literal input arguments: if an argument starts with "=" the rest of it is a literal input
string. Otherwise, it is assumed to be a file name, and the contents of the string. Otherwise, it is assumed to be a file name, and the contents of the
file are the test string. file are the test string.

View File

@ -19,7 +19,7 @@ page.
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but 2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
they do not mean what you might think. For example, (?!a){3} does not assert they do not mean what you might think. For example, (?!a){3} does not assert
that the next three characters are not "a". It just asserts that the next that the next three characters are not "a". It just asserts that the next
character is not "a" three times (in principle: PCRE2 optimizes this to run the character is not "a" three times (in principle; PCRE2 optimizes this to run the
assertion just once). Perl allows some repeat quantifiers on other assertions, assertion just once). Perl allows some repeat quantifiers on other assertions,
for example, \eb* (but not \eb{3}), but these do not seem to have any use. for example, \eb* (but not \eb{3}), but these do not seem to have any use.
.P .P
@ -62,8 +62,8 @@ Note the following examples:
The \eQ...\eE sequence is recognized both inside and outside character classes. The \eQ...\eE sequence is recognized both inside and outside character classes.
.P .P
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code}) 7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
constructions. However, there is support PCRE2's "callout" feature, which constructions. However, PCRE2 does have a "callout" feature, which allows an
allows an external function to be called during pattern matching. See the external function to be called during pattern matching. See the
.\" HREF .\" HREF
\fBpcre2callout\fP \fBpcre2callout\fP
.\" .\"

View File

@ -57,9 +57,10 @@ controlled by parameters that can be set by the \fB--buffer-size\fP and
that is obtained at the start of processing. If an input file contains very that is obtained at the start of processing. If an input file contains very
long lines, a larger buffer may be needed; this is handled by automatically long lines, a larger buffer may be needed; this is handled by automatically
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
default values for these parameters are specified when \fBpcre2grep\fP is default values for these parameters can be set when \fBpcre2grep\fP is
built, with the default defaults being 20K and 1M respectively. An error occurs built; if nothing is specified, the defaults are set to 20K and 1M
if a line is too long and the buffer can no longer be expanded. respectively. An error occurs if a line is too long and the buffer can no
longer be expanded.
.P .P
The block of memory that is actually used is three times the "buffer size", to The block of memory that is actually used is three times the "buffer size", to
allow for buffering "before" and "after" lines. If the buffer size is too allow for buffering "before" and "after" lines. If the buffer size is too
@ -434,13 +435,13 @@ short form for this option.
When this option is given, non-compressed input is read and processed line by When this option is given, non-compressed input is read and processed line by
line, and the output is flushed after each write. By default, input is read in line, and the output is flushed after each write. By default, input is read in
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
terminal (which is currently possible only in Unix-like environments). Output terminal (which is currently possible only in Unix-like environments or
to terminal is normally automatically flushed by the operating system. This Windows). Output to terminal is normally automatically flushed by the operating
option can be useful when the input or output is attached to a pipe and you do system. This option can be useful when the input or output is attached to a
not want \fBpcre2grep\fP to buffer up large amounts of data. However, its use pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data.
will affect performance, and the \fB-M\fP (multiline) option ceases to work. However, its use will affect performance, and the \fB-M\fP (multiline) option
When input is from a compressed .gz or .bz2 file, \fB--line-buffered\fP is ceases to work. When input is from a compressed .gz or .bz2 file,
ignored. \fB--line-buffered\fP is ignored.
.TP .TP
\fB--line-offsets\fP \fB--line-offsets\fP
Instead of showing lines or parts of lines that match, show each match as a Instead of showing lines or parts of lines that match, show each match as a
@ -470,11 +471,11 @@ is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
counter that is incremented each time around its main processing loop. If the counter that is incremented each time around its main processing loop. If the
value set by \fB--match-limit\fP is reached, an error occurs. value set by \fB--match-limit\fP is reached, an error occurs.
.sp .sp
The \fB--heap-limit\fP option specifies, as a number of kilobytes, the amount The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
of heap memory that may be used for matching. Heap memory is needed only if 1024 bytes), the amount of heap memory that may be used for matching. Heap
matching the pattern requires a significant number of nested backtracking memory is needed only if matching the pattern requires a significant number of
points to be remembered. This parameter can be set to zero to forbid the use of nested backtracking points to be remembered. This parameter can be set to zero
heap memory altogether. to forbid the use of heap memory altogether.
.sp .sp
The \fB--depth-limit\fP option limits the depth of nested backtracking points, The \fB--depth-limit\fP option limits the depth of nested backtracking points,
which indirectly limits the amount of memory that is used. The amount of memory which indirectly limits the amount of memory that is used. The amount of memory
@ -483,9 +484,9 @@ parentheses in the pattern, so the amount of memory that is used before this
limit acts varies from pattern to pattern. This limit is of use only if it is limit acts varies from pattern to pattern. This limit is of use only if it is
set smaller than \fB--match-limit\fP. set smaller than \fB--match-limit\fP.
.sp .sp
There are no short forms for these options. The default settings are specified There are no short forms for these options. The default limits can be set
when the PCRE2 library is compiled, with the default defaults being very large when the PCRE2 library is compiled; if they are not specified, the defaults
and so effectively unlimited. are very large and so effectively unlimited.
.TP .TP
\fB--max-buffer-size=\fInumber\fP \fB--max-buffer-size=\fInumber\fP
This limits the expansion of the processing buffer, whose initial size can be This limits the expansion of the processing buffer, whose initial size can be

View File

@ -56,10 +56,10 @@ DESCRIPTION
that is obtained at the start of processing. If an input file contains that is obtained at the start of processing. If an input file contains
very long lines, a larger buffer may be needed; this is handled by very long lines, a larger buffer may be needed; this is handled by
automatically extending the buffer, up to the limit specified by --max- automatically extending the buffer, up to the limit specified by --max-
buffer-size. The default values for these parameters are specified when buffer-size. The default values for these parameters can be set when
pcre2grep is built, with the default defaults being 20K and 1M respec- pcre2grep is built; if nothing is specified, the defaults are set to
tively. An error occurs if a line is too long and the buffer can no 20K and 1M respectively. An error occurs if a line is too long and the
longer be expanded. buffer can no longer be expanded.
The block of memory that is actually used is three times the "buffer The block of memory that is actually used is three times the "buffer
size", to allow for buffering "before" and "after" lines. If the buffer size", to allow for buffering "before" and "after" lines. If the buffer
@ -475,14 +475,14 @@ OPTIONS
processed line by line, and the output is flushed after each processed line by line, and the output is flushed after each
write. By default, input is read in large chunks, unless write. By default, input is read in large chunks, unless
pcre2grep can determine that it is reading from a terminal pcre2grep can determine that it is reading from a terminal
(which is currently possible only in Unix-like environments). (which is currently possible only in Unix-like environments
Output to terminal is normally automatically flushed by the or Windows). Output to terminal is normally automatically
operating system. This option can be useful when the input or flushed by the operating system. This option can be useful
output is attached to a pipe and you do not want pcre2grep to when the input or output is attached to a pipe and you do not
buffer up large amounts of data. However, its use will affect want pcre2grep to buffer up large amounts of data. However,
performance, and the -M (multiline) option ceases to work. its use will affect performance, and the -M (multiline)
When input is from a compressed .gz or .bz2 file, --line- option ceases to work. When input is from a compressed .gz or
buffered is ignored. .bz2 file, --line-buffered is ignored.
--line-offsets --line-offsets
Instead of showing lines or parts of lines that match, show Instead of showing lines or parts of lines that match, show
@ -517,12 +517,12 @@ OPTIONS
processing loop. If the value set by --match-limit is processing loop. If the value set by --match-limit is
reached, an error occurs. reached, an error occurs.
The --heap-limit option specifies, as a number of kilobytes, The --heap-limit option specifies, as a number of kibibytes
the amount of heap memory that may be used for matching. Heap (units of 1024 bytes), the amount of heap memory that may be
memory is needed only if matching the pattern requires a sig- used for matching. Heap memory is needed only if matching the
nificant number of nested backtracking points to be remem- pattern requires a significant number of nested backtracking
bered. This parameter can be set to zero to forbid the use of points to be remembered. This parameter can be set to zero to
heap memory altogether. forbid the use of heap memory altogether.
The --depth-limit option limits the depth of nested back- The --depth-limit option limits the depth of nested back-
tracking points, which indirectly limits the amount of memory tracking points, which indirectly limits the amount of memory
@ -532,10 +532,10 @@ OPTIONS
limit acts varies from pattern to pattern. This limit is of limit acts varies from pattern to pattern. This limit is of
use only if it is set smaller than --match-limit. use only if it is set smaller than --match-limit.
There are no short forms for these options. The default set- There are no short forms for these options. The default lim-
tings are specified when the PCRE2 library is compiled, with its can be set when the PCRE2 library is compiled; if they
the default defaults being very large and so effectively are not specified, the defaults are very large and so effec-
unlimited. tively unlimited.
--max-buffer-size=number --max-buffer-size=number
This limits the expansion of the processing buffer, whose This limits the expansion of the processing buffer, whose

View File

@ -38,9 +38,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
no more than 65535 capturing subpatterns. There is, however, a limit to the no more than 65535 capturing subpatterns. There is, however, a limit to the
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
order to limit the amount of system stack used at compile time. The default order to limit the amount of system stack used at compile time. The default
limit can be specified when PCRE2 is built; the default default is 250. An limit can be specified when PCRE2 is built; if not, the default is set to 250.
application can change this limit by calling pcre2_set_parens_nest_limit() to An application can change this limit by calling pcre2_set_parens_nest_limit()
set the limit in a compile context. to set the limit in a compile context.
.P .P
The maximum length of name for a named subpattern is 32 code units, and the The maximum length of name for a named subpattern is 32 code units, and the
maximum number of named subpatterns is 10000. maximum number of named subpatterns is 10000.

View File

@ -163,7 +163,7 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
for it to have any effect. In other words, the pattern writer can lower the for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used. The heap limit is setting of one of these limits, the lower value is used. The heap limit is
specified in kilobytes. specified in kibibytes (units of 1024 bytes).
.P .P
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
still recognized for backwards compatibility. still recognized for backwards compatibility.
@ -528,7 +528,7 @@ by code point, as described above.
.sp .sp
The sequence \eg followed by a signed or unsigned number, optionally enclosed The sequence \eg followed by a signed or unsigned number, optionally enclosed
in braces, is an absolute or relative backreference. A named backreference in braces, is an absolute or relative backreference. A named backreference
can be coded as \eg{name}. Back references are discussed can be coded as \eg{name}. backreferences are discussed
.\" HTML <a href="#backreferences"> .\" HTML <a href="#backreferences">
.\" </a> .\" </a>
later, later,
@ -1026,7 +1026,7 @@ joiner" characters. Characters with the "mark" property always have the
6. Do not break within emoji modifier sequences (a base character followed by a 6. Do not break within emoji modifier sequences (a base character followed by a
modifier). Extending characters are allowed before the modifier. modifier). Extending characters are allowed before the modifier.
.P .P
7. Do not break within emoji zwj sequences (zero-width jointer followed by 7. Do not break within emoji zwj sequences (zero-width joiner followed by
"glue after ZWJ" or "base glue after ZWJ"). "glue after ZWJ" or "base glue after ZWJ").
.P .P
8. Do not break within emoji flag sequences. That is, do not break between 8. Do not break within emoji flag sequences. That is, do not break between
@ -2205,8 +2205,8 @@ A subpattern that is referenced by name may appear in the pattern before or
after the reference. after the reference.
.P .P
There may be more than one backreference to the same subpattern. If a There may be more than one backreference to the same subpattern. If a
subpattern has not actually been used in a particular match, any back subpattern has not actually been used in a particular match, any backreferences
references to it always fail by default. For example, the pattern to it always fail by default. For example, the pattern
.sp .sp
(a|(bc))\e2 (a|(bc))\e2
.sp .sp
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the back reference. This can be
done using alternation, as in the example above, or by a quantifier with a done using alternation, as in the example above, or by a quantifier with a
minimum of zero. minimum of zero.
.P .P
Back references of this type cause the group that they reference to be treated backreferences of this type cause the group that they reference to be treated
as an as an
.\" HTML <a href="#atomicgroup"> .\" HTML <a href="#atomicgroup">
.\" </a> .\" </a>

View File

@ -115,7 +115,7 @@ because it disables the use of back references.
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
(which has the type const char *) must be set to point to the character beyond (which has the type const char *) must be set to point to the character beyond
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
now contain binary zeroes, which are treated as data characters. Without now contain binary zeros, which are treated as data characters. Without
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
ignored. This is a GNU extension to the POSIX standard and should be used with ignored. This is a GNU extension to the POSIX standard and should be used with
caution in software intended to be portable to other systems. caution in software intended to be portable to other systems.
@ -224,10 +224,10 @@ function.
.sp .sp
REG_STARTEND REG_STARTEND
.sp .sp
When this option is set, the subject string is starts at \fIstring\fP + When this option is set, the subject string starts at \fIstring\fP +
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which \fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
should point to the first character beyond the string. There may be binary should point to the first character beyond the string. There may be binary
zeroes within the subject string, and indeed, using REG_STARTEND is the only zeros within the subject string, and indeed, using REG_STARTEND is the only
way to pass a subject string that contains a binary zero. way to pass a subject string that contains a binary zero.
.P .P
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string

View File

@ -419,7 +419,7 @@ of the newline or \eR options with similar syntax. More than one of them may
appear. For the first three, d is a decimal number. appear. For the first three, d is a decimal number.
.sp .sp
(*LIMIT_DEPTH=d) set the backtracking limit to d (*LIMIT_DEPTH=d) set the backtracking limit to d
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d) set the match limit to d (*LIMIT_MATCH=d) set the match limit to d
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching

View File

@ -101,7 +101,7 @@ to occur).
UTF-8 (in its original definition) is not capable of encoding values greater UTF-8 (in its original definition) is not capable of encoding values greater
than 0x7fffffff, but such values can be handled by the 32-bit library. When than 0x7fffffff, but such values can be handled by the 32-bit library. When
testing this library in non-UTF mode with \fButf8_input\fP set, if any testing this library in non-UTF mode with \fButf8_input\fP set, if any
character is preceded by the byte 0xff (which is an illegal byte in UTF-8) character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
0x80000000 is added to the character's value. This is the only way of passing 0x80000000 is added to the character's value. This is the only way of passing
such code points in a pattern string. For subject strings, using an escape such code points in a pattern string. For subject strings, using an escape
sequence is preferable. sequence is preferable.
@ -220,7 +220,7 @@ Do not output the version number of \fBpcre2test\fP at the start of execution.
.TP 10 .TP 10
\fB-S\fP \fIsize\fP \fB-S\fP \fIsize\fP
On Unix-like systems, set the size of the run-time stack to \fIsize\fP On Unix-like systems, set the size of the run-time stack to \fIsize\fP
megabytes. mebibytes (units of 1024*1024 bytes).
.TP 10 .TP 10
\fB-subject\fP \fImodifier-list\fP \fB-subject\fP \fImodifier-list\fP
Behave as if each subject line contains the given modifiers. Behave as if each subject line contains the given modifiers.
@ -639,8 +639,8 @@ The effects of these modifiers are described in the following sections.
.sp .sp
The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode", set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
\eR matches any Unicode newline sequence. The default is specified when PCRE2 \eR matches any Unicode newline sequence. The default can be specified when
is built, with the default default being Unicode. PCRE2 is built; if it is not, the default is set to Unicode.
.P .P
The \fBnewline\fP modifier specifies which characters are to be interpreted as The \fBnewline\fP modifier specifies which characters are to be interpreted as
newlines, both in the pattern and in subject lines. The type must be one of CR, newlines, both in the pattern and in subject lines. The type must be one of CR,
@ -1381,11 +1381,11 @@ matching provokes an error return ("bad option value") from
.sp .sp
The \fBjitstack\fP modifier provides a way of setting the maximum stack size The \fBjitstack\fP modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if JIT that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Setting optimization is not being used. The value is a number of kibibytes (units of
zero reverts to the default of 32K. Providing a stack that is larger than the 1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
default is necessary only for very complicated patterns. If \fBjitstack\fP is that is larger than the default is necessary only for very complicated
set non-zero on a subject line it overrides any value that was set on the patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any
pattern. value that was set on the pattern.
. .
. .
.SS "Setting heap, match, and depth limits" .SS "Setting heap, match, and depth limits"
@ -1427,10 +1427,10 @@ matching, \fImatch_limit\fP controls the total number of calls, both recursive
and non-recursive, to the internal matching function, thus controlling the and non-recursive, to the internal matching function, thus controlling the
overall amount of computing resource that is used. overall amount of computing resource that is used.
.P .P
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes) For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
limits the amount of heap memory used for matching. A value of zero disables (units of 1024 bytes), limits the amount of heap memory used for matching. A
the use of any heap memory; many simple pattern matches can be done without value of zero disables the use of any heap memory; many simple pattern matches
using the heap, so this is not an unreasonable setting. can be done without using the heap, so zero is not an unreasonable setting.
. .
. .
.SS "Showing MARK names" .SS "Showing MARK names"

View File

@ -94,7 +94,7 @@ INPUT ENCODING
UTF-8 (in its original definition) is not capable of encoding values UTF-8 (in its original definition) is not capable of encoding values
greater than 0x7fffffff, but such values can be handled by the 32-bit greater than 0x7fffffff, but such values can be handled by the 32-bit
library. When testing this library in non-UTF mode with utf8_input set, library. When testing this library in non-UTF mode with utf8_input set,
if any character is preceded by the byte 0xff (which is an illegal byte if any character is preceded by the byte 0xff (which is an invalid byte
in UTF-8) 0x80000000 is added to the character's value. This is the in UTF-8) 0x80000000 is added to the character's value. This is the
only way of passing such code points in a pattern string. For subject only way of passing such code points in a pattern string. For subject
strings, using an escape sequence is preferable. strings, using an escape sequence is preferable.
@ -208,7 +208,7 @@ COMMAND LINE OPTIONS
execution. execution.
-S size On Unix-like systems, set the size of the run-time stack to -S size On Unix-like systems, set the size of the run-time stack to
size megabytes. size mebibytes (units of 1024*1024 bytes).
-subject modifier-list -subject modifier-list
Behave as if each subject line contains the given modifiers. Behave as if each subject line contains the given modifiers.
@ -614,8 +614,9 @@ PATTERN MODIFIERS
The bsr modifier specifies what \R in a pattern should match. If it is The bsr modifier specifies what \R in a pattern should match. If it is
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to
"unicode", \R matches any Unicode newline sequence. The default is "unicode", \R matches any Unicode newline sequence. The default can be
specified when PCRE2 is built, with the default default being Unicode. specified when PCRE2 is built; if it is not, the default is set to Uni-
code.
The newline modifier specifies which characters are to be interpreted The newline modifier specifies which characters are to be interpreted
as newlines, both in the pattern and in subject lines. The type must be as newlines, both in the pattern and in subject lines. The type must be
@ -1272,11 +1273,11 @@ SUBJECT MODIFIERS
The jitstack modifier provides a way of setting the maximum stack size The jitstack modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if that is used by the just-in-time optimization code. It is ignored if
JIT optimization is not being used. The value is a number of kilobytes. JIT optimization is not being used. The value is a number of kibibytes
Setting zero reverts to the default of 32K. Providing a stack that is (units of 1024 bytes). Setting zero reverts to the default of 32KiB.
larger than the default is necessary only for very complicated pat- Providing a stack that is larger than the default is necessary only for
terns. If jitstack is set non-zero on a subject line it overrides any very complicated patterns. If jitstack is set non-zero on a subject
value that was set on the pattern. line it overrides any value that was set on the pattern.
Setting heap, match, and depth limits Setting heap, match, and depth limits
@ -1315,11 +1316,11 @@ SUBJECT MODIFIERS
tion, thus controlling the overall amount of computing resource that is tion, thus controlling the overall amount of computing resource that is
used. used.
For both kinds of matching, the heap_limit number (which is in kilo- For both kinds of matching, the heap_limit number, which is in
bytes) limits the amount of heap memory used for matching. A value of kibibytes (units of 1024 bytes), limits the amount of heap memory used
zero disables the use of any heap memory; many simple pattern matches for matching. A value of zero disables the use of any heap memory; many
can be done without using the heap, so this is not an unreasonable set- simple pattern matches can be done without using the heap, so zero is
ting. not an unreasonable setting.
Showing MARK names Showing MARK names

View File

@ -51,7 +51,7 @@ fi
# utf invoke UTF-8 functionality # utf invoke UTF-8 functionality
# #
# The data lines must not have any pcre2test modifiers. Unless # The data lines must not have any pcre2test modifiers. Unless
# "subject_litersl" is on the pattern, data lines are processed as # "subject_literal" is on the pattern, data lines are processed as
# Perl double-quoted strings, so if they contain " $ or @ characters, these # Perl double-quoted strings, so if they contain " $ or @ characters, these
# have to be escaped. For this reason, all such characters in the # have to be escaped. For this reason, all such characters in the
# Perl-compatible testinput1 and testinput4 files are escaped so that they can # Perl-compatible testinput1 and testinput4 files are escaped so that they can

View File

@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <zlib.h> header file. */ /* Define to 1 if you have the <zlib.h> header file. */
/* #undef HAVE_ZLIB_H */ /* #undef HAVE_ZLIB_H */
/* This limits the amount of memory that pcre2_match() may use while matching /* This limits the amount of memory that may be used while matching a pattern.
a pattern. The value is in kilobytes. */ It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
to JIT matching. The value is in kilobytes. */
#ifndef HEAP_LIMIT #ifndef HEAP_LIMIT
#define HEAP_LIMIT 20000000 #define HEAP_LIMIT 20000000
#endif #endif
@ -155,7 +156,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* The value of MATCH_LIMIT determines the default number of times the /* The value of MATCH_LIMIT determines the default number of times the
pcre2_match() function can record a backtrack position during a single pcre2_match() function can record a backtrack position during a single
matching attempt. There is a runtime interface for setting a different matching attempt. The value is also used to limit a loop counter in
pcre2_dfa_match(). There is a runtime interface for setting a different
limit. The limit exists in order to catch runaway regular expressions that limit. The limit exists in order to catch runaway regular expressions that
take for ever to determine that they do not match. The default is set very take for ever to determine that they do not match. The default is set very
large so that it does not accidentally catch legitimate cases. */ large so that it does not accidentally catch legitimate cases. */
@ -170,7 +172,9 @@ sure both macros are undefined; an emulation function will then be used. */
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
must be less than the value of MATCH_LIMIT. The default is to use the same must be less than the value of MATCH_LIMIT. The default is to use the same
value as MATCH_LIMIT. There is a runtime method for setting a different value as MATCH_LIMIT. There is a runtime method for setting a different
limit. */ limit. In the case of pcre2_dfa_match(), this limit controls the depth of
the internal nested function calls that are used for pattern recursions,
lookarounds, and atomic groups. */
#ifndef MATCH_LIMIT_DEPTH #ifndef MATCH_LIMIT_DEPTH
#define MATCH_LIMIT_DEPTH MATCH_LIMIT #define MATCH_LIMIT_DEPTH MATCH_LIMIT
#endif #endif
@ -210,7 +214,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE2" #define PACKAGE_NAME "PCRE2"
/* Define to the full name and version of this package. */ /* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE2 10.31" #define PACKAGE_STRING "PCRE2 10.32-RC1"
/* Define to the one symbol short name of this package. */ /* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2" #define PACKAGE_TARNAME "pcre2"
@ -219,7 +223,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL "" #define PACKAGE_URL ""
/* Define to the version of this package. */ /* Define to the version of this package. */
#define PACKAGE_VERSION "10.31" #define PACKAGE_VERSION "10.32-RC1"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested /* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system parentheses (of any kind) in a pattern. This limits the amount of system
@ -339,7 +343,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif #endif
/* Version number of package */ /* Version number of package */
#define VERSION "10.31" #define VERSION "10.32-RC1"
/* Define to 1 if on MINIX. */ /* Define to 1 if on MINIX. */
/* #undef _MINIX */ /* #undef _MINIX */

View File

@ -134,7 +134,7 @@ sure both macros are undefined; an emulation function will then be used. */
/* This limits the amount of memory that may be used while matching a pattern. /* This limits the amount of memory that may be used while matching a pattern.
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
to JIT matching. The value is in kilobytes. */ to JIT matching. The value is in kibibytes (units of 1024 bytes). */
#undef HEAP_LIMIT #undef HEAP_LIMIT
/* The value of LINK_SIZE determines the number of bytes used to store links /* The value of LINK_SIZE determines the number of bytes used to store links

View File

@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
/* The current PCRE version information. */ /* The current PCRE version information. */
#define PCRE2_MAJOR 10 #define PCRE2_MAJOR 10
#define PCRE2_MINOR 31 #define PCRE2_MINOR 32
#define PCRE2_PRERELEASE #define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2018-02-12 #define PCRE2_DATE 2018-02-19
/* When an application links to a PCRE DLL in Windows, the symbols that are /* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate imported have to be identified as such. When building PCRE2, the appropriate

View File

@ -387,8 +387,8 @@ return (mb->callout)(cb, mb->callout_data);
*************************************************/ *************************************************/
/* This function is called when internal_dfa_match() is about to be called /* This function is called when internal_dfa_match() is about to be called
recursively and there is insufficient workingspace left in the current work recursively and there is insufficient working space left in the current
space block. If there's an existing next block, use it; otherwise get a new workspace block. If there's an existing next block, use it; otherwise get a new
block unless the heap limit is reached. block unless the heap limit is reached.
Arguments: Arguments:

View File

@ -43,7 +43,7 @@ POSSIBILITY OF SUCH DAMAGE.
#include "config.h" #include "config.h"
#endif #endif
/* These defines enables debugging code */ /* These defines enable debugging code */
//#define DEBUG_FRAMES_DISPLAY //#define DEBUG_FRAMES_DISPLAY
//#define DEBUG_SHOW_OPS //#define DEBUG_SHOW_OPS
@ -2464,7 +2464,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
/* ===================================================================== */ /* ===================================================================== */
/* Match a single character type repeatedly. Note that the property type /* Match a single character type repeatedly. Note that the property type
does not need to be in a stack frame as it not used within an RMATCH() does not need to be in a stack frame as it is not used within an RMATCH()
loop. */ loop. */
#define Lstart_eptr F->temp_sptr[0] #define Lstart_eptr F->temp_sptr[0]

View File

@ -492,7 +492,7 @@ so many of them that they are split into two fields. */
/* These are the matching controls that may be set either on a pattern or on a /* These are the matching controls that may be set either on a pattern or on a
data line. They are copied from the pattern controls as initial settings for data line. They are copied from the pattern controls as initial settings for
data line controls Note that CTL_MEMORY is not included here, because it does data line controls. Note that CTL_MEMORY is not included here, because it does
different things in the two cases. */ different things in the two cases. */
#define CTL_ALLPD (CTL_AFTERTEXT|\ #define CTL_ALLPD (CTL_AFTERTEXT|\
@ -5411,7 +5411,7 @@ switch(errorcode)
/* The pattern is now in pbuffer[8|16|32], with the length in code units in /* The pattern is now in pbuffer[8|16|32], with the length in code units in
patlen. If it is to be converted, copy the result back afterwards so that it patlen. If it is to be converted, copy the result back afterwards so that it
it ends up back in the usual place. */ ends up back in the usual place. */
if (pat_patctl.convert_type != CONVERT_UNSET) if (pat_patctl.convert_type != CONVERT_UNSET)
{ {
@ -5735,7 +5735,7 @@ return PR_OK;
*************************************************/ *************************************************/
/* This is used for DFA, normal, and JIT fast matching. For DFA matching it /* This is used for DFA, normal, and JIT fast matching. For DFA matching it
should only called with the third argument set to PCRE2_ERROR_DEPTHLIMIT. should only be called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
Arguments: Arguments:
pp the subject string pp the subject string
@ -7766,7 +7766,7 @@ printf(" -LM list pattern and subject modifiers, then exit\n");
printf(" -q quiet: do not output PCRE2 version number at start\n"); printf(" -q quiet: do not output PCRE2 version number at start\n");
printf(" -pattern <s> set default pattern modifier fields\n"); printf(" -pattern <s> set default pattern modifier fields\n");
printf(" -subject <s> set default subject modifier fields\n"); printf(" -subject <s> set default subject modifier fields\n");
printf(" -S <n> set stack size to <n> megabytes\n"); printf(" -S <n> set stack size to <n> mebibytes\n");
printf(" -t [<n>] time compilation and execution, repeating <n> times\n"); printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n"); printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
printf(" -T same as -t, but show total times at the end\n"); printf(" -T same as -t, but show total times at the end\n");