Typos in documentation and comments noted by Jason Hood.
This commit is contained in:
parent
fa58ac6734
commit
fabea723cf
|
@ -146,7 +146,7 @@ SET(PCRE2_PARENS_NEST_LIMIT "250" CACHE STRING
|
||||||
"Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
|
"Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
|
||||||
|
|
||||||
SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
|
SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
|
||||||
"Default limit on heap memory (kilobytes). See HEAP_LIMIT in config.h.in for details.")
|
"Default limit on heap memory (kibibytes). See HEAP_LIMIT in config.h.in for details.")
|
||||||
|
|
||||||
SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
|
SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
|
||||||
"Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
|
"Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
|
||||||
|
|
|
@ -17,7 +17,7 @@ groups altogether. Now it shows those that come before any actual captures as
|
||||||
3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
|
3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
|
||||||
whatever the build configuration was. It now correctly says "\R matches all
|
whatever the build configuration was. It now correctly says "\R matches all
|
||||||
Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
|
Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
|
||||||
specified. Similarly, running "pcfre2test -C bsr" never produced the result
|
specified. Similarly, running "pcre2test -C bsr" never produced the result
|
||||||
ANY.
|
ANY.
|
||||||
|
|
||||||
4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
|
4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
|
||||||
|
@ -370,7 +370,7 @@ tests to improve coverage.
|
||||||
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
||||||
pcre2test, a crash could occur.
|
pcre2test, a crash could occur.
|
||||||
|
|
||||||
32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
|
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
|
||||||
all the tests can run with clang's sanitizing options.
|
all the tests can run with clang's sanitizing options.
|
||||||
|
|
||||||
33. Implement extra compile options in the compile context and add the first
|
33. Implement extra compile options in the compile context and add the first
|
||||||
|
|
4
HACKING
4
HACKING
|
@ -348,7 +348,7 @@ The /i, /m, or /s options (PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
|
||||||
others) may be changed in the middle of patterns by items such as (?i). Their
|
others) may be changed in the middle of patterns by items such as (?i). Their
|
||||||
processing is handled entirely at compile time by generating different opcodes
|
processing is handled entirely at compile time by generating different opcodes
|
||||||
for the different settings. The runtime functions do not need to keep track of
|
for the different settings. The runtime functions do not need to keep track of
|
||||||
an options state.
|
an option's state.
|
||||||
|
|
||||||
PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
|
PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
|
||||||
are tracked and processed during the parsing pre-pass. The others are handled
|
are tracked and processed during the parsing pre-pass. The others are handled
|
||||||
|
@ -764,7 +764,7 @@ OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
|
||||||
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
||||||
"subroutine" calls, even though they are not strictly a recursion. Up till
|
"subroutine" calls, even though they are not strictly a recursion. Up till
|
||||||
release 10.30 recursions were treated as atomic groups, making them
|
release 10.30 recursions were treated as atomic groups, making them
|
||||||
incompatible with Perl (but PCRE had then well before Perl did). From 10.30,
|
incompatible with Perl (but PCRE had them well before Perl did). From 10.30,
|
||||||
backtracking into recursions is supported.
|
backtracking into recursions is supported.
|
||||||
|
|
||||||
Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
|
Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
|
||||||
|
|
4
NEWS
4
NEWS
|
@ -31,7 +31,7 @@ remembering backtracking positions. This makes --disable-stack-for-recursion a
|
||||||
NOOP. The new implementation allows backtracking into recursive group calls in
|
NOOP. The new implementation allows backtracking into recursive group calls in
|
||||||
patterns, making it more compatible with Perl, and also fixes some other
|
patterns, making it more compatible with Perl, and also fixes some other
|
||||||
previously hard-to-do issues. For patterns that have a lot of backtracking, the
|
previously hard-to-do issues. For patterns that have a lot of backtracking, the
|
||||||
heap is now used, and there is explicit limit on the amount, settable by
|
heap is now used, and there is an explicit limit on the amount, settable by
|
||||||
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
|
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
|
||||||
but is renamed as "depth limit" (though the old names remain for
|
but is renamed as "depth limit" (though the old names remain for
|
||||||
compatibility).
|
compatibility).
|
||||||
|
@ -53,7 +53,7 @@ also supported.
|
||||||
|
|
||||||
5. Additional compile options in the compile context are now available, and the
|
5. Additional compile options in the compile context are now available, and the
|
||||||
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
|
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
|
||||||
|
|
||||||
6. The newline type PCRE2_NEWLINE_NUL is now available.
|
6. The newline type PCRE2_NEWLINE_NUL is now available.
|
||||||
|
|
||||||
|
|
|
@ -127,7 +127,7 @@ can skip ahead to the CMake section.
|
||||||
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
||||||
these yourself.
|
these yourself.
|
||||||
|
|
||||||
Not also that the pcre2_fuzzsupport.c file contains special code that is
|
Note also that the pcre2_fuzzsupport.c file contains special code that is
|
||||||
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
||||||
you are doing that, you can ignore it.
|
you are doing that, you can ignore it.
|
||||||
|
|
||||||
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1Mb in some Windows
|
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
17
README
17
README
|
@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
The units are kilobytes. This limit does not apply when the JIT optimization
|
The units are kibibytes (units of 1024 bytes). This limit does not apply when
|
||||||
(which has its own memory control features) is used. There is more discussion
|
the JIT optimization (which has its own memory control features) is used.
|
||||||
on the pcre2api man page (search for pcre2_set_heap_limit).
|
There is more discussion on the pcre2api man page (search for
|
||||||
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||||
|
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
||||||
you add --disable-pcre2grep-jit to the "configure" command.
|
you add --disable-pcre2grep-jit to the "configure" command.
|
||||||
|
|
||||||
. On non-Windows sytems there is support for calling external scripts during
|
. There is support for calling external programs during matching in the
|
||||||
matching in the pcre2grep command via PCRE2's callout facility with string
|
pcre2grep command, using PCRE2's callout facility with string arguments. This
|
||||||
arguments. This support can be disabled by adding --disable-pcre2grep-callout
|
support can be disabled by adding --disable-pcre2grep-callout to the
|
||||||
to the "configure" command.
|
"configure" command.
|
||||||
|
|
||||||
. The pcre2grep program currently supports only 8-bit data files, and so
|
. The pcre2grep program currently supports only 8-bit data files, and so
|
||||||
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
||||||
|
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 27 April 2018
|
Last updated: 17 June 2018
|
||||||
|
|
|
@ -708,7 +708,7 @@ $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >
|
||||||
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
|
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
|
||||||
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
|
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
|
||||||
|
|
||||||
# It seems inpossible to handle NUL characters easily in Solaris (aka SunOS).
|
# It seems impossible to handle NUL characters easily in Solaris (aka SunOS).
|
||||||
# The version of sed explicitly doesn't like them. For the moment, we just
|
# The version of sed explicitly doesn't like them. For the moment, we just
|
||||||
# don't run this test under SunOS. Fudge the output so that the comparison
|
# don't run this test under SunOS. Fudge the output so that the comparison
|
||||||
# works. A similar problem has also been reported for MacOS (Darwin).
|
# works. A similar problem has also been reported for MacOS (Darwin).
|
||||||
|
|
2
RunTest
2
RunTest
|
@ -843,7 +843,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
||||||
checkresult $? 24 ""
|
checkresult $? 24 ""
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# UTF pattern converson tests
|
# UTF pattern conversion tests
|
||||||
|
|
||||||
if [ "$do25" = yes ] ; then
|
if [ "$do25" = yes ] ; then
|
||||||
echo $title25
|
echo $title25
|
||||||
|
|
|
@ -288,7 +288,7 @@ AC_ARG_WITH(parens-nest-limit,
|
||||||
# Handle --with-heap-limit
|
# Handle --with-heap-limit
|
||||||
AC_ARG_WITH(heap-limit,
|
AC_ARG_WITH(heap-limit,
|
||||||
AS_HELP_STRING([--with-heap-limit=N],
|
AS_HELP_STRING([--with-heap-limit=N],
|
||||||
[default limit on heap memory (kilobytes, default=20000000)]),
|
[default limit on heap memory (kibibytes, default=20000000)]),
|
||||||
, with_heap_limit=20000000)
|
, with_heap_limit=20000000)
|
||||||
|
|
||||||
# Handle --with-match-limit=N
|
# Handle --with-match-limit=N
|
||||||
|
@ -754,7 +754,7 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
|
||||||
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
||||||
This limits the amount of memory that may be used while matching
|
This limits the amount of memory that may be used while matching
|
||||||
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
||||||
not apply to JIT matching. The value is in kilobytes.])
|
not apply to JIT matching. The value is in kibibytes (units of 1024 bytes).])
|
||||||
|
|
||||||
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
||||||
This limit is parameterized just in case anybody ever wants to
|
This limit is parameterized just in case anybody ever wants to
|
||||||
|
@ -1017,7 +1017,7 @@ $PACKAGE-$VERSION configuration summary:
|
||||||
Rebuild char tables ................ : ${enable_rebuild_chartables}
|
Rebuild char tables ................ : ${enable_rebuild_chartables}
|
||||||
Internal link size ................. : ${with_link_size}
|
Internal link size ................. : ${with_link_size}
|
||||||
Nested parentheses limit ........... : ${with_parens_nest_limit}
|
Nested parentheses limit ........... : ${with_parens_nest_limit}
|
||||||
Heap limit ......................... : ${with_heap_limit} kilobytes
|
Heap limit ......................... : ${with_heap_limit} kibibytes
|
||||||
Match limit ........................ : ${with_match_limit}
|
Match limit ........................ : ${with_match_limit}
|
||||||
Match depth limit .................. : ${with_match_limit_depth}
|
Match depth limit .................. : ${with_match_limit_depth}
|
||||||
Build shared libs .................. : ${enable_shared}
|
Build shared libs .................. : ${enable_shared}
|
||||||
|
|
|
@ -127,7 +127,7 @@ can skip ahead to the CMake section.
|
||||||
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
||||||
these yourself.
|
these yourself.
|
||||||
|
|
||||||
Not also that the pcre2_fuzzsupport.c file contains special code that is
|
Note also that the pcre2_fuzzsupport.c file contains special code that is
|
||||||
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
||||||
you are doing that, you can ignore it.
|
you are doing that, you can ignore it.
|
||||||
|
|
||||||
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1Mb in some Windows
|
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
|
@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
The units are kilobytes. This limit does not apply when the JIT optimization
|
The units are kibibytes (units of 1024 bytes). This limit does not apply when
|
||||||
(which has its own memory control features) is used. There is more discussion
|
the JIT optimization (which has its own memory control features) is used.
|
||||||
on the pcre2api man page (search for pcre2_set_heap_limit).
|
There is more discussion on the pcre2api man page (search for
|
||||||
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||||
|
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
|
||||||
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
||||||
you add --disable-pcre2grep-jit to the "configure" command.
|
you add --disable-pcre2grep-jit to the "configure" command.
|
||||||
|
|
||||||
. On non-Windows sytems there is support for calling external scripts during
|
. There is support for calling external programs during matching in the
|
||||||
matching in the pcre2grep command via PCRE2's callout facility with string
|
pcre2grep command, using PCRE2's callout facility with string arguments. This
|
||||||
arguments. This support can be disabled by adding --disable-pcre2grep-callout
|
support can be disabled by adding --disable-pcre2grep-callout to the
|
||||||
to the "configure" command.
|
"configure" command.
|
||||||
|
|
||||||
. The pcre2grep program currently supports only 8-bit data files, and so
|
. The pcre2grep program currently supports only 8-bit data files, and so
|
||||||
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
||||||
|
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 27 April 2018
|
Last updated: 17 June 2018
|
||||||
|
|
|
@ -65,7 +65,7 @@ The option bits are:
|
||||||
PCRE2_EXTENDED Ignore white space and # comments
|
PCRE2_EXTENDED Ignore white space and # comments
|
||||||
PCRE2_FIRSTLINE Force matching to be before newline
|
PCRE2_FIRSTLINE Force matching to be before newline
|
||||||
PCRE2_LITERAL Pattern characters are all literal
|
PCRE2_LITERAL Pattern characters are all literal
|
||||||
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
PCRE2_MATCH_UNSET_BACKREF Match unset backreferences
|
||||||
PCRE2_MULTILINE ^ and $ match newlines within data
|
PCRE2_MULTILINE ^ and $ match newlines within data
|
||||||
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
|
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
|
||||||
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
||||||
|
|
|
@ -36,7 +36,7 @@ request are as follows:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_ALLOPTIONS Final options after compiling
|
PCRE2_INFO_ALLOPTIONS Final options after compiling
|
||||||
PCRE2_INFO_ARGOPTIONS Options passed to <b>pcre2_compile()</b>
|
PCRE2_INFO_ARGOPTIONS Options passed to <b>pcre2_compile()</b>
|
||||||
PCRE2_INFO_BACKREFMAX Number of highest back reference
|
PCRE2_INFO_BACKREFMAX Number of highest backreference
|
||||||
PCRE2_INFO_BSR What \R matches:
|
PCRE2_INFO_BSR What \R matches:
|
||||||
PCRE2_BSR_UNICODE: Unicode line endings
|
PCRE2_BSR_UNICODE: Unicode line endings
|
||||||
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
||||||
|
|
|
@ -28,7 +28,7 @@ DESCRIPTION
|
||||||
<P>
|
<P>
|
||||||
This function is part of an experimental set of pattern conversion functions.
|
This function is part of an experimental set of pattern conversion functions.
|
||||||
It sets the component separator character that is used when converting globs.
|
It sets the component separator character that is used when converting globs.
|
||||||
The second argument must one of the characters forward slash, backslash, or
|
The second argument must be one of the characters forward slash, backslash, or
|
||||||
dot. The default is backslash when running under Windows, otherwise forward
|
dot. The default is backslash when running under Windows, otherwise forward
|
||||||
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
||||||
the second argument is invalid.
|
the second argument is invalid.
|
||||||
|
|
|
@ -562,10 +562,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
||||||
<P>
|
<P>
|
||||||
Each of the first three conventions is used by at least one operating system as
|
Each of the first three conventions is used by at least one operating system as
|
||||||
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
||||||
The default default is LF, which is the Unix standard. However, the newline
|
If it is not, the default is set to LF, which is the Unix standard. However,
|
||||||
convention can be changed by an application when calling <b>pcre2_compile()</b>,
|
the newline convention can be changed by an application when calling
|
||||||
or it can be specified by special text at the start of the pattern itself; this
|
<b>pcre2_compile()</b>, or it can be specified by special text at the start of
|
||||||
overrides any other settings. See the
|
the pattern itself; this overrides any other settings. See the
|
||||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
page for details of the special character sequences.
|
page for details of the special character sequences.
|
||||||
</P>
|
</P>
|
||||||
|
@ -949,17 +949,18 @@ offset limit. In other words, whichever limit comes first is used.
|
||||||
<b> uint32_t <i>value</i>);</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
The <i>heap_limit</i> parameter specifies, in units of kibibytes (1024 bytes),
|
||||||
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
the maximum amount of heap memory that <b>pcre2_match()</b> may use to hold
|
||||||
information when running an interpretive match. This limit also applies to
|
backtracking information when running an interpretive match. This limit also
|
||||||
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
|
applies to <b>pcre2_dfa_match()</b>, which may use the heap when processing
|
||||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
patterns with a lot of nested pattern recursion or lookarounds or atomic
|
||||||
does not apply to matching with the JIT optimization, which has its own memory
|
groups. This limit does not apply to matching with the JIT optimization, which
|
||||||
control arrangements (see the
|
has its own memory control arrangements (see the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation for more details). If the limit is reached, the negative error
|
documentation for more details). If the limit is reached, the negative error
|
||||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||||
built; the default default is very large and is essentially "unlimited".
|
is built; if it is not, the default is set very large and is essentially
|
||||||
|
"unlimited".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
|
@ -1044,7 +1045,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
|
||||||
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
||||||
uses it to limit the depth of nested internal recursive function calls that
|
uses it to limit the depth of nested internal recursive function calls that
|
||||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
limits, indirectly, the amount of system stack that is used. It was more useful
|
||||||
in versions before 10.32, when stack memory was used for local workspace
|
in versions before 10.32, when stack memory was used for local workspace
|
||||||
vectors for recursive function calls. From version 10.32, only local variables
|
vectors for recursive function calls. From version 10.32, only local variables
|
||||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||||
|
@ -1060,11 +1061,11 @@ probably better to limit heap usage directly by calling
|
||||||
<b>pcre2_set_heap_limit()</b>.
|
<b>pcre2_set_heap_limit()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The default value for the depth limit can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; if it is
|
||||||
default default is the same value as the default for the match limit. If the
|
not, the default is set to the same value as the default for the match limit.
|
||||||
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
|
If the limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
|
||||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||||
item at the start of a pattern of the form
|
supplied by an item at the start of a pattern of the form
|
||||||
<pre>
|
<pre>
|
||||||
(*LIMIT_DEPTH=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1120,7 +1121,7 @@ given with <b>pcre2_set_depth_limit()</b> above.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
The output is a uint32_t integer that gives, in kibibytes, the default limit
|
||||||
for the amount of heap memory used by <b>pcre2_match()</b> or
|
for the amount of heap memory used by <b>pcre2_match()</b> or
|
||||||
<b>pcre2_dfa_match()</b>. Further details are given with
|
<b>pcre2_dfa_match()</b>. Further details are given with
|
||||||
<b>pcre2_set_heap_limit()</b> above.
|
<b>pcre2_set_heap_limit()</b> above.
|
||||||
|
@ -1431,7 +1432,7 @@ If this bit is set, letters in the pattern match both upper and lower case
|
||||||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||||
properties are used for all characters with more than one other case, and for
|
properties are used for all characters with more than one other case, and for
|
||||||
all characters whose code points are greater than U+007f. For lower valued
|
all characters whose code points are greater than U+007F. For lower valued
|
||||||
characters with only one other case, a lookup table is used for speed. When
|
characters with only one other case, a lookup table is used for speed. When
|
||||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||||
|
@ -1551,7 +1552,7 @@ error.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_MATCH_UNSET_BACKREF
|
PCRE2_MATCH_UNSET_BACKREF
|
||||||
</pre>
|
</pre>
|
||||||
If this option is set, a back reference to an unset subpattern group matches an
|
If this option is set, a backreference to an unset subpattern group matches an
|
||||||
empty string (by default this causes the current matching alternative to fail).
|
empty string (by default this causes the current matching alternative to fail).
|
||||||
A pattern such as (\1)(a) succeeds when this option is set (assuming it can
|
A pattern such as (\1)(a) succeeds when this option is set (assuming it can
|
||||||
find an "a" in the subject), whereas it fails by default, for Perl
|
find an "a" in the subject), whereas it fails by default, for Perl
|
||||||
|
@ -1613,8 +1614,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
||||||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||||
were followed by ?: but named parentheses can still be used for capturing (and
|
were followed by ?: but named parentheses can still be used for capturing (and
|
||||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||||
Note that, when this option is set, references to capturing groups (back
|
Note that, when this option is set, references to capturing groups
|
||||||
references or recursion/subroutine calls) may only refer to named groups,
|
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||||
though the reference can be by name or by number.
|
though the reference can be by name or by number.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NO_AUTO_POSSESS
|
PCRE2_NO_AUTO_POSSESS
|
||||||
|
@ -1633,7 +1634,7 @@ If this option is set, it disables an optimization that is applied when .* is
|
||||||
the first significant item in a top-level branch of a pattern, and all the
|
the first significant item in a top-level branch of a pattern, and all the
|
||||||
other branches also start with .* or with \A or \G or ^. The optimization is
|
other branches also start with .* or with \A or \G or ^. The optimization is
|
||||||
automatically disabled for .* if it is inside an atomic group or a capturing
|
automatically disabled for .* if it is inside an atomic group or a capturing
|
||||||
group that is the subject of a back reference, or if the pattern contains
|
group that is the subject of a backreference, or if the pattern contains
|
||||||
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
|
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
|
||||||
automatically anchored if PCRE2_DOTALL is set for all the .* items and
|
automatically anchored if PCRE2_DOTALL is set for all the .* items and
|
||||||
PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
|
PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
|
||||||
|
@ -1999,7 +2000,7 @@ When .* is the first significant item, anchoring is possible only when all the
|
||||||
following are true:
|
following are true:
|
||||||
<pre>
|
<pre>
|
||||||
.* is not in an atomic group
|
.* is not in an atomic group
|
||||||
.* is not in a capturing group that is the subject of a back reference
|
.* is not in a capturing group that is the subject of a backreference
|
||||||
PCRE2_DOTALL is in force for .*
|
PCRE2_DOTALL is in force for .*
|
||||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||||
|
@ -2009,20 +2010,20 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_BACKREFMAX
|
PCRE2_INFO_BACKREFMAX
|
||||||
</pre>
|
</pre>
|
||||||
Return the number of the highest back reference in the pattern. The third
|
Return the number of the highest backreference in the pattern. The third
|
||||||
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
|
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
|
||||||
numbers as well as names, and these count towards the highest back reference.
|
numbers as well as names, and these count towards the highest backreference.
|
||||||
Back references such as \4 or \g{12} match the captured characters of the
|
Backreferences such as \4 or \g{12} match the captured characters of the
|
||||||
given group, but in addition, the check that a capturing group is set in a
|
given group, but in addition, the check that a capturing group is set in a
|
||||||
conditional subpattern such as (?(3)a|b) is also a back reference. Zero is
|
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is
|
||||||
returned if there are no back references.
|
returned if there are no backreferences.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t whose value indicates what character sequences the \R
|
The output is a uint32_t integer whose value indicates what character sequences
|
||||||
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R matches
|
the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R
|
||||||
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \R
|
matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
|
||||||
matches only CR, LF, or CRLF.
|
that \R matches only CR, LF, or CRLF.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_CAPTURECOUNT
|
PCRE2_INFO_CAPTURECOUNT
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2034,10 +2035,10 @@ The third argument should point to an <b>uint32_t</b> variable.
|
||||||
</pre>
|
</pre>
|
||||||
If the pattern set a backtracking depth limit by including an item of the form
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2047,7 +2048,7 @@ values for the first code unit in any match. For example, a pattern that starts
|
||||||
with [abc] results in a table with three bits set. When code unit values
|
with [abc] results in a table with three bits set. When code unit values
|
||||||
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
||||||
value 255 or above". If such a table was constructed, a pointer to it is
|
value 255 or above". If such a table was constructed, a pointer to it is
|
||||||
returned. Otherwise NULL is returned. The third argument should point to an
|
returned. Otherwise NULL is returned. The third argument should point to a
|
||||||
<b>const uint8_t *</b> variable.
|
<b>const uint8_t *</b> variable.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_FIRSTCODETYPE
|
PCRE2_INFO_FIRSTCODETYPE
|
||||||
|
@ -2074,7 +2075,7 @@ and up to 0xffffffff when not using UTF-32 mode.
|
||||||
</pre>
|
</pre>
|
||||||
Return the size (in bytes) of the data frames that are used to remember
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||||
without the use of JIT. The third argument should point to an <b>size_t</b>
|
without the use of JIT. The third argument should point to a <b>size_t</b>
|
||||||
variable. The frame size depends on the number of capturing parentheses in the
|
variable. The frame size depends on the number of capturing parentheses in the
|
||||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -2094,10 +2095,10 @@ the equivalent hexadecimal or octal escape sequences.
|
||||||
</pre>
|
</pre>
|
||||||
If the pattern set a heap memory limit by including an item of the form
|
If the pattern set a heap memory limit by including an item of the form
|
||||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2141,15 +2142,15 @@ in such cases.
|
||||||
</pre>
|
</pre>
|
||||||
If the pattern set a match limit by including an item of the form
|
If the pattern set a match limit by including an item of the form
|
||||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_MAXLOOKBEHIND
|
PCRE2_INFO_MAXLOOKBEHIND
|
||||||
</pre>
|
</pre>
|
||||||
Return the number of characters (not code units) in the longest lookbehind
|
Return the number of characters (not code units) in the longest lookbehind
|
||||||
assertion in the pattern. The third argument should point to an unsigned 32-bit
|
assertion in the pattern. The third argument should point to a uint32_t
|
||||||
integer. This information is useful when doing multi-segment matching using the
|
integer. This information is useful when doing multi-segment matching using the
|
||||||
partial matching facilities. Note that the simple assertions \b and \B
|
partial matching facilities. Note that the simple assertions \b and \B
|
||||||
require a one-character lookbehind. \A also registers a one-character
|
require a one-character lookbehind. \A also registers a one-character
|
||||||
|
@ -2417,7 +2418,7 @@ zero, the search for a match starts at the beginning of the subject, and this
|
||||||
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
||||||
must point to the start of a character, or to the end of the subject (in UTF-32
|
must point to the start of a character, or to the end of the subject (in UTF-32
|
||||||
mode, one code unit equals one character, so all offsets are valid). Like the
|
mode, one code unit equals one character, so all offsets are valid). Like the
|
||||||
pattern string, the subject may contain binary zeroes.
|
pattern string, the subject may contain binary zeros.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A non-zero starting offset is useful when searching for another match in the
|
A non-zero starting offset is useful when searching for another match in the
|
||||||
|
@ -3559,12 +3560,12 @@ There are in addition the following errors that are specific to
|
||||||
</pre>
|
</pre>
|
||||||
This return is given if <b>pcre2_dfa_match()</b> encounters an item in the
|
This return is given if <b>pcre2_dfa_match()</b> encounters an item in the
|
||||||
pattern that it does not support, for instance, the use of \C in a UTF mode or
|
pattern that it does not support, for instance, the use of \C in a UTF mode or
|
||||||
a back reference.
|
a backreference.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_DFA_UCOND
|
PCRE2_ERROR_DFA_UCOND
|
||||||
</pre>
|
</pre>
|
||||||
This return is given if <b>pcre2_dfa_match()</b> encounters a condition item
|
This return is given if <b>pcre2_dfa_match()</b> encounters a condition item
|
||||||
that uses a back reference for the condition, or a test for recursion in a
|
that uses a backreference for the condition, or a test for recursion in a
|
||||||
specific group. These are not supported.
|
specific group. These are not supported.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_DFA_WSSIZE
|
PCRE2_ERROR_DFA_WSSIZE
|
||||||
|
|
|
@ -227,7 +227,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
||||||
<pre>
|
<pre>
|
||||||
--enable-newline-is-nul
|
--enable-newline-is-nul
|
||||||
</pre>
|
</pre>
|
||||||
which causes NUL (binary zero) is set as the default line-ending character.
|
which causes NUL (binary zero) to be set as the default line-ending character.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||||
|
@ -286,15 +286,15 @@ The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
which is specified in kilobytes. The limit can be changed at run time, as
|
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
||||||
described in the
|
at run time, as described in the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation. The default limit (in effect unlimited) is 20 million. You can
|
documentation. The default limit (in effect unlimited) is 20 million. You can
|
||||||
change this by a setting such as
|
change this by a setting such as
|
||||||
<pre>
|
<pre>
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
</pre>
|
</pre>
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||||
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
||||||
may also use the heap for internal workspace when processing complicated
|
may also use the heap for internal workspace when processing complicated
|
||||||
patterns. This limit does not apply when JIT (which has its own memory
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
|
@ -542,7 +542,7 @@ generated from the string.
|
||||||
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
|
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
|
||||||
to be created. This is normally run under valgrind or used when PCRE2 is
|
to be created. This is normally run under valgrind or used when PCRE2 is
|
||||||
compiled with address sanitizing enabled. It calls the fuzzing function and
|
compiled with address sanitizing enabled. It calls the fuzzing function and
|
||||||
outputs information about it is doing. The input strings are specified by
|
outputs information about what it is doing. The input strings are specified by
|
||||||
arguments: if an argument starts with "=" the rest of it is a literal input
|
arguments: if an argument starts with "=" the rest of it is a literal input
|
||||||
string. Otherwise, it is assumed to be a file name, and the contents of the
|
string. Otherwise, it is assumed to be a file name, and the contents of the
|
||||||
file are the test string.
|
file are the test string.
|
||||||
|
|
|
@ -143,7 +143,7 @@ branch, automatic anchoring occurs if all branches are anchorable.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
This optimization is disabled, however, if .* is in an atomic group or if there
|
This optimization is disabled, however, if .* is in an atomic group or if there
|
||||||
is a back reference to the capturing group in which it appears. It is also
|
is a backreference to the capturing group in which it appears. It is also
|
||||||
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
|
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
|
||||||
callouts does not affect it.
|
callouts does not affect it.
|
||||||
</P>
|
</P>
|
||||||
|
|
|
@ -31,7 +31,7 @@ page.
|
||||||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||||
they do not mean what you might think. For example, (?!a){3} does not assert
|
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||||
that the next three characters are not "a". It just asserts that the next
|
that the next three characters are not "a". It just asserts that the next
|
||||||
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
||||||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||||
for example, \b* (but not \b{3}), but these do not seem to have any use.
|
for example, \b* (but not \b{3}), but these do not seem to have any use.
|
||||||
</P>
|
</P>
|
||||||
|
@ -77,8 +77,8 @@ The \Q...\E sequence is recognized both inside and outside character classes.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||||
constructions. However, there is support PCRE2's "callout" feature, which
|
constructions. However, PCRE2 does have a "callout" feature, which allows an
|
||||||
allows an external function to be called during pattern matching. See the
|
external function to be called during pattern matching. See the
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
documentation for details.
|
documentation for details.
|
||||||
</P>
|
</P>
|
||||||
|
@ -156,7 +156,7 @@ each alternative branch of a lookbehind assertion can match a different length
|
||||||
of string. Perl requires them all to have the same length.
|
of string. Perl requires them all to have the same length.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(b) From PCRE2 10.23, back references to groups of fixed length are supported
|
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
|
||||||
in lookbehinds, provided that there is no possibility of referencing a
|
in lookbehinds, provided that there is no possibility of referencing a
|
||||||
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -86,9 +86,10 @@ controlled by parameters that can be set by the <b>--buffer-size</b> and
|
||||||
that is obtained at the start of processing. If an input file contains very
|
that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
||||||
default values for these parameters are specified when <b>pcre2grep</b> is
|
default values for these parameters can be set when <b>pcre2grep</b> is
|
||||||
built, with the default defaults being 20K and 1M respectively. An error occurs
|
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||||
if a line is too long and the buffer can no longer be expanded.
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
|
longer be expanded.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The block of memory that is actually used is three times the "buffer size", to
|
The block of memory that is actually used is three times the "buffer size", to
|
||||||
|
@ -500,13 +501,13 @@ short form for this option.
|
||||||
When this option is given, non-compressed input is read and processed line by
|
When this option is given, non-compressed input is read and processed line by
|
||||||
line, and the output is flushed after each write. By default, input is read in
|
line, and the output is flushed after each write. By default, input is read in
|
||||||
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
||||||
terminal (which is currently possible only in Unix-like environments). Output
|
terminal (which is currently possible only in Unix-like environments or
|
||||||
to terminal is normally automatically flushed by the operating system. This
|
Windows). Output to terminal is normally automatically flushed by the operating
|
||||||
option can be useful when the input or output is attached to a pipe and you do
|
system. This option can be useful when the input or output is attached to a
|
||||||
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
|
pipe and you do not want <b>pcre2grep</b> to buffer up large amounts of data.
|
||||||
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
However, its use will affect performance, and the <b>-M</b> (multiline) option
|
||||||
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
|
ceases to work. When input is from a compressed .gz or .bz2 file,
|
||||||
ignored.
|
<b>--line-buffered</b> is ignored.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--line-offsets</b>
|
<b>--line-offsets</b>
|
||||||
|
@ -541,11 +542,11 @@ counter that is incremented each time around its main processing loop. If the
|
||||||
value set by <b>--match-limit</b> is reached, an error occurs.
|
value set by <b>--match-limit</b> is reached, an error occurs.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount
|
The <b>--heap-limit</b> option specifies, as a number of kibibytes (units of
|
||||||
of heap memory that may be used for matching. Heap memory is needed only if
|
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
||||||
matching the pattern requires a significant number of nested backtracking
|
memory is needed only if matching the pattern requires a significant number of
|
||||||
points to be remembered. This parameter can be set to zero to forbid the use of
|
nested backtracking points to be remembered. This parameter can be set to zero
|
||||||
heap memory altogether.
|
to forbid the use of heap memory altogether.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
||||||
|
@ -556,9 +557,9 @@ limit acts varies from pattern to pattern. This limit is of use only if it is
|
||||||
set smaller than <b>--match-limit</b>.
|
set smaller than <b>--match-limit</b>.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
There are no short forms for these options. The default settings are specified
|
There are no short forms for these options. The default limits can be set
|
||||||
when the PCRE2 library is compiled, with the default defaults being very large
|
when the PCRE2 library is compiled; if they are not specified, the defaults
|
||||||
and so effectively unlimited.
|
are very large and so effectively unlimited.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
\fB--max-buffer-size=<i>number</i>
|
\fB--max-buffer-size=<i>number</i>
|
||||||
|
|
|
@ -54,9 +54,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
|
||||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||||
order to limit the amount of system stack used at compile time. The default
|
order to limit the amount of system stack used at compile time. The default
|
||||||
limit can be specified when PCRE2 is built; the default default is 250. An
|
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||||
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||||
set the limit in a compile context.
|
to set the limit in a compile context.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The maximum length of name for a named subpattern is 32 code units, and the
|
The maximum length of name for a named subpattern is 32 code units, and the
|
||||||
|
|
|
@ -85,7 +85,7 @@ ungreedy repetition quantifiers are specified in the pattern.
|
||||||
Because it ends up with a single path through the tree, it is relatively
|
Because it ends up with a single path through the tree, it is relatively
|
||||||
straightforward for this algorithm to keep track of the substrings that are
|
straightforward for this algorithm to keep track of the substrings that are
|
||||||
matched by portions of the pattern in parentheses. This provides support for
|
matched by portions of the pattern in parentheses. This provides support for
|
||||||
capturing parentheses and back references.
|
capturing parentheses and backreferences.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
|
<br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -158,7 +158,7 @@ possibilities, and PCRE2's implementation of this algorithm does not attempt to
|
||||||
do this. This means that no captured substrings are available.
|
do this. This means that no captured substrings are available.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
3. Because no substrings are captured, back references within the pattern are
|
3. Because no substrings are captured, backreferences within the pattern are
|
||||||
not supported, and cause errors if encountered.
|
not supported, and cause errors if encountered.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -215,7 +215,7 @@ because it has to search for all possible matches, but is also because it is
|
||||||
less susceptible to optimization.
|
less susceptible to optimization.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
2. Capturing parentheses and back references are not supported.
|
2. Capturing parentheses and backreferences are not supported.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
3. Although atomic groups are supported, their use does not provide the
|
3. Although atomic groups are supported, their use does not provide the
|
||||||
|
|
|
@ -31,7 +31,7 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC16" href="#SEC16">NAMED SUBPATTERNS</a>
|
<li><a name="TOC16" href="#SEC16">NAMED SUBPATTERNS</a>
|
||||||
<li><a name="TOC17" href="#SEC17">REPETITION</a>
|
<li><a name="TOC17" href="#SEC17">REPETITION</a>
|
||||||
<li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
|
<li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
|
||||||
<li><a name="TOC19" href="#SEC19">BACK REFERENCES</a>
|
<li><a name="TOC19" href="#SEC19">BACKREFERENCES</a>
|
||||||
<li><a name="TOC20" href="#SEC20">ASSERTIONS</a>
|
<li><a name="TOC20" href="#SEC20">ASSERTIONS</a>
|
||||||
<li><a name="TOC21" href="#SEC21">CONDITIONAL SUBPATTERNS</a>
|
<li><a name="TOC21" href="#SEC21">CONDITIONAL SUBPATTERNS</a>
|
||||||
<li><a name="TOC22" href="#SEC22">COMMENTS</a>
|
<li><a name="TOC22" href="#SEC22">COMMENTS</a>
|
||||||
|
@ -196,7 +196,7 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used. The heap limit is
|
setting of one of these limits, the lower value is used. The heap limit is
|
||||||
specified in kilobytes.
|
specified in kibibytes (units of 1024 bytes).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||||
|
@ -342,7 +342,7 @@ In particular, if you want to match a backslash, you write \\.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In a UTF mode, only ASCII numbers and letters have any special meaning after a
|
In a UTF mode, only ASCII numbers and letters have any special meaning after a
|
||||||
backslash. All other characters (in particular, those whose codepoints are
|
backslash. All other characters (in particular, those whose code points are
|
||||||
greater than 127) are treated as literals.
|
greater than 127) are treated as literals.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -390,7 +390,7 @@ these escapes are as follows:
|
||||||
\r carriage return (hex 0D)
|
\r carriage return (hex 0D)
|
||||||
\t tab (hex 09)
|
\t tab (hex 09)
|
||||||
\0dd character with octal code 0dd
|
\0dd character with octal code 0dd
|
||||||
\ddd character with octal code ddd, or back reference
|
\ddd character with octal code ddd, or backreference
|
||||||
\o{ddd..} character with octal code ddd..
|
\o{ddd..} character with octal code ddd..
|
||||||
\xhh character with hex code hh
|
\xhh character with hex code hh
|
||||||
\x{hhh..} character with hex code hhh.. (default mode)
|
\x{hhh..} character with hex code hhh.. (default mode)
|
||||||
|
@ -438,13 +438,13 @@ follows is itself an octal digit.
|
||||||
The escape \o must be followed by a sequence of octal digits, enclosed in
|
The escape \o must be followed by a sequence of octal digits, enclosed in
|
||||||
braces. An error occurs if this is not the case. This escape is a recent
|
braces. An error occurs if this is not the case. This escape is a recent
|
||||||
addition to Perl; it provides way of specifying character code points as octal
|
addition to Perl; it provides way of specifying character code points as octal
|
||||||
numbers greater than 0777, and it also allows octal numbers and back references
|
numbers greater than 0777, and it also allows octal numbers and backreferences
|
||||||
to be unambiguously specified.
|
to be unambiguously specified.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For greater clarity and unambiguity, it is best to avoid following \ by a
|
For greater clarity and unambiguity, it is best to avoid following \ by a
|
||||||
digit greater than zero. Instead, use \o{} or \x{} to specify character
|
digit greater than zero. Instead, use \o{} or \x{} to specify character
|
||||||
numbers, and \g{} to specify back references. The following paragraphs
|
numbers, and \g{} to specify backreferences. The following paragraphs
|
||||||
describe the old, ambiguous syntax.
|
describe the old, ambiguous syntax.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -455,7 +455,7 @@ and Perl has changed over time, causing PCRE2 also to change.
|
||||||
Outside a character class, PCRE2 reads the digit and any following digits as a
|
Outside a character class, PCRE2 reads the digit and any following digits as a
|
||||||
decimal number. If the number is less than 10, begins with the digit 8 or 9, or
|
decimal number. If the number is less than 10, begins with the digit 8 or 9, or
|
||||||
if there are at least that many previous capturing left parentheses in the
|
if there are at least that many previous capturing left parentheses in the
|
||||||
expression, the entire sequence is taken as a <i>back reference</i>. A
|
expression, the entire sequence is taken as a <i>backreference</i>. A
|
||||||
description of how this works is given
|
description of how this works is given
|
||||||
<a href="#backreferences">later,</a>
|
<a href="#backreferences">later,</a>
|
||||||
following the discussion of
|
following the discussion of
|
||||||
|
@ -470,13 +470,13 @@ for themselves. For example, outside a character class:
|
||||||
<pre>
|
<pre>
|
||||||
\040 is another way of writing an ASCII space
|
\040 is another way of writing an ASCII space
|
||||||
\40 is the same, provided there are fewer than 40 previous capturing subpatterns
|
\40 is the same, provided there are fewer than 40 previous capturing subpatterns
|
||||||
\7 is always a back reference
|
\7 is always a backreference
|
||||||
\11 might be a back reference, or another way of writing a tab
|
\11 might be a backreference, or another way of writing a tab
|
||||||
\011 is always a tab
|
\011 is always a tab
|
||||||
\0113 is a tab followed by the character "3"
|
\0113 is a tab followed by the character "3"
|
||||||
\113 might be a back reference, otherwise the character with octal code 113
|
\113 might be a backreference, otherwise the character with octal code 113
|
||||||
\377 might be a back reference, otherwise the value 255 (decimal)
|
\377 might be a backreference, otherwise the value 255 (decimal)
|
||||||
\81 is always a back reference .sp
|
\81 is always a backreference .sp
|
||||||
</pre>
|
</pre>
|
||||||
Note that octal values of 100 or greater that are specified using this syntax
|
Note that octal values of 100 or greater that are specified using this syntax
|
||||||
must not be introduced by a leading zero, because no more than three octal
|
must not be introduced by a leading zero, because no more than three octal
|
||||||
|
@ -512,10 +512,10 @@ limited to certain values, as follows:
|
||||||
8-bit non-UTF mode no greater than 0xff
|
8-bit non-UTF mode no greater than 0xff
|
||||||
16-bit non-UTF mode no greater than 0xffff
|
16-bit non-UTF mode no greater than 0xffff
|
||||||
32-bit non-UTF mode no greater than 0xffffffff
|
32-bit non-UTF mode no greater than 0xffffffff
|
||||||
All UTF modes no greater than 0x10ffff and a valid codepoint
|
All UTF modes no greater than 0x10ffff and a valid code point
|
||||||
</pre>
|
</pre>
|
||||||
Invalid Unicode codepoints are all those in the range 0xd800 to 0xdfff (the
|
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the
|
||||||
so-called "surrogate" codepoints). The check for these can be disabled by the
|
so-called "surrogate" code points). The check for these can be disabled by the
|
||||||
caller of <b>pcre2_compile()</b> by setting the option
|
caller of <b>pcre2_compile()</b> by setting the option
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||||
</P>
|
</P>
|
||||||
|
@ -544,12 +544,12 @@ is set, \U matches a "U" character, and \u can be used to define a character
|
||||||
by code point, as described above.
|
by code point, as described above.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Absolute and relative back references
|
Absolute and relative backreferences
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative back reference. A named back reference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \g{name}. Back references are discussed
|
can be coded as \g{name}. backreferences are discussed
|
||||||
<a href="#backreferences">later,</a>
|
<a href="#backreferences">later,</a>
|
||||||
following the discussion of
|
following the discussion of
|
||||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||||
|
@ -563,7 +563,7 @@ a number enclosed either in angle brackets or single quotes, is an alternative
|
||||||
syntax for referencing a subpattern as a "subroutine". Details are discussed
|
syntax for referencing a subpattern as a "subroutine". Details are discussed
|
||||||
<a href="#onigurumasubroutines">later.</a>
|
<a href="#onigurumasubroutines">later.</a>
|
||||||
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i>
|
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i>
|
||||||
synonymous. The former is a back reference; the latter is a
|
synonymous. The former is a backreference; the latter is a
|
||||||
<a href="#subpatternsassubroutines">subroutine</a>
|
<a href="#subpatternsassubroutines">subroutine</a>
|
||||||
call.
|
call.
|
||||||
<a name="genericchartypes"></a></P>
|
<a name="genericchartypes"></a></P>
|
||||||
|
@ -694,7 +694,7 @@ line, U+0085). Because this is an atomic group, the two-character sequence is
|
||||||
treated as a single unit that cannot be split.
|
treated as a single unit that cannot be split.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In other modes, two additional characters whose codepoints are greater than 255
|
In other modes, two additional characters whose code points are greater than 255
|
||||||
are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
|
are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
|
||||||
Unicode support is not needed for these characters to be recognized.
|
Unicode support is not needed for these characters to be recognized.
|
||||||
</P>
|
</P>
|
||||||
|
@ -729,8 +729,8 @@ Unicode character properties
|
||||||
When PCRE2 is built with Unicode support (the default), three additional escape
|
When PCRE2 is built with Unicode support (the default), three additional escape
|
||||||
sequences that match characters with specific properties are available. In
|
sequences that match characters with specific properties are available. In
|
||||||
8-bit non-UTF-8 mode, these sequences are of course limited to testing
|
8-bit non-UTF-8 mode, these sequences are of course limited to testing
|
||||||
characters whose codepoints are less than 256, but they do work in this mode.
|
characters whose code points are less than 256, but they do work in this mode.
|
||||||
In 32-bit non-UTF mode, codepoints greater than 0x10ffff (the Unicode limit)
|
In 32-bit non-UTF mode, code points greater than 0x10ffff (the Unicode limit)
|
||||||
may be encountered. These are all treated as being in the Common script and
|
may be encountered. These are all treated as being in the Common script and
|
||||||
with an unassigned type. The extra escape sequences are:
|
with an unassigned type. The extra escape sequences are:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -1037,7 +1037,7 @@ joiner" characters. Characters with the "mark" property always have the
|
||||||
modifier). Extending characters are allowed before the modifier.
|
modifier). Extending characters are allowed before the modifier.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
7. Do not break within emoji zwj sequences (zero-width jointer followed by
|
7. Do not break within emoji zwj sequences (zero-width joiner followed by
|
||||||
"glue after ZWJ" or "base glue after ZWJ").
|
"glue after ZWJ" or "base glue after ZWJ").
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1731,7 +1731,7 @@ numbers underneath show in which buffer the captured content will be stored.
|
||||||
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
|
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
|
||||||
# 1 2 2 3 2 3 4
|
# 1 2 2 3 2 3 4
|
||||||
</pre>
|
</pre>
|
||||||
A back reference to a numbered subpattern uses the most recent value that is
|
A backreference to a numbered subpattern uses the most recent value that is
|
||||||
set for that number by any subpattern. The following pattern matches "abcabc"
|
set for that number by any subpattern. The following pattern matches "abcabc"
|
||||||
or "defdef":
|
or "defdef":
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -1771,7 +1771,7 @@ have different names, but PCRE2 does not.
|
||||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
||||||
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
|
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
|
||||||
parentheses from other parts of the pattern, such as
|
parentheses from other parts of the pattern, such as
|
||||||
<a href="#backreferences">back references,</a>
|
<a href="#backreferences">backreferences,</a>
|
||||||
<a href="#recursion">recursion,</a>
|
<a href="#recursion">recursion,</a>
|
||||||
and
|
and
|
||||||
<a href="#conditions">conditions,</a>
|
<a href="#conditions">conditions,</a>
|
||||||
|
@ -1811,7 +1811,7 @@ for the first (and in this example, the only) subpattern of that name that
|
||||||
matched. This saves searching to find which numbered subpattern it was.
|
matched. This saves searching to find which numbered subpattern it was.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If you make a back reference to a non-unique named subpattern from elsewhere in
|
If you make a backreference to a non-unique named subpattern from elsewhere in
|
||||||
the pattern, the subpatterns to which the name refers are checked in the order
|
the pattern, the subpatterns to which the name refers are checked in the order
|
||||||
in which they appear in the overall pattern. The first one that is set is used
|
in which they appear in the overall pattern. The first one that is set is used
|
||||||
for the reference. For example, this pattern matches both "foofoo" and
|
for the reference. For example, this pattern matches both "foofoo" and
|
||||||
|
@ -1859,7 +1859,7 @@ items:
|
||||||
the \R escape sequence
|
the \R escape sequence
|
||||||
an escape such as \d or \pL that matches a single character
|
an escape such as \d or \pL that matches a single character
|
||||||
a character class
|
a character class
|
||||||
a back reference
|
a backreference
|
||||||
a parenthesized subpattern (including most assertions)
|
a parenthesized subpattern (including most assertions)
|
||||||
a subroutine call to a subpattern (recursive or otherwise)
|
a subroutine call to a subpattern (recursive or otherwise)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1980,7 +1980,7 @@ alternatively, using ^ to indicate anchoring explicitly.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
However, there are some cases where the optimization cannot be used. When .*
|
However, there are some cases where the optimization cannot be used. When .*
|
||||||
is inside capturing parentheses that are the subject of a back reference
|
is inside capturing parentheses that are the subject of a backreference
|
||||||
elsewhere in the pattern, a match at the start may fail where a later one
|
elsewhere in the pattern, a match at the start may fail where a later one
|
||||||
succeeds. Consider, for example:
|
succeeds. Consider, for example:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -2121,30 +2121,30 @@ an atomic group, like this:
|
||||||
</pre>
|
</pre>
|
||||||
sequences of non-digits cannot be broken, and failure happens quickly.
|
sequences of non-digits cannot be broken, and failure happens quickly.
|
||||||
<a name="backreferences"></a></P>
|
<a name="backreferences"></a></P>
|
||||||
<br><a name="SEC19" href="#TOC1">BACK REFERENCES</a><br>
|
<br><a name="SEC19" href="#TOC1">BACKREFERENCES</a><br>
|
||||||
<P>
|
<P>
|
||||||
Outside a character class, a backslash followed by a digit greater than 0 (and
|
Outside a character class, a backslash followed by a digit greater than 0 (and
|
||||||
possibly further digits) is a back reference to a capturing subpattern earlier
|
possibly further digits) is a backreference to a capturing subpattern earlier
|
||||||
(that is, to its left) in the pattern, provided there have been that many
|
(that is, to its left) in the pattern, provided there have been that many
|
||||||
previous capturing left parentheses.
|
previous capturing left parentheses.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
However, if the decimal number following the backslash is less than 8, it is
|
However, if the decimal number following the backslash is less than 8, it is
|
||||||
always taken as a back reference, and causes an error only if there are not
|
always taken as a backreference, and causes an error only if there are not
|
||||||
that many capturing left parentheses in the entire pattern. In other words, the
|
that many capturing left parentheses in the entire pattern. In other words, the
|
||||||
parentheses that are referenced need not be to the left of the reference for
|
parentheses that are referenced need not be to the left of the reference for
|
||||||
numbers less than 8. A "forward back reference" of this type can make sense
|
numbers less than 8. A "forward backreference" of this type can make sense
|
||||||
when a repetition is involved and the subpattern to the right has participated
|
when a repetition is involved and the subpattern to the right has participated
|
||||||
in an earlier iteration.
|
in an earlier iteration.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
It is not possible to have a numerical "forward back reference" to a subpattern
|
It is not possible to have a numerical "forward backreference" to a subpattern
|
||||||
whose number is 8 or more using this syntax because a sequence such as \50 is
|
whose number is 8 or more using this syntax because a sequence such as \50 is
|
||||||
interpreted as a character defined in octal. See the subsection entitled
|
interpreted as a character defined in octal. See the subsection entitled
|
||||||
"Non-printing characters"
|
"Non-printing characters"
|
||||||
<a href="#digitsafterbackslash">above</a>
|
<a href="#digitsafterbackslash">above</a>
|
||||||
for further details of the handling of digits following a backslash. There is
|
for further details of the handling of digits following a backslash. There is
|
||||||
no such problem when named parentheses are used. A back reference to any
|
no such problem when named parentheses are used. A backreference to any
|
||||||
subpattern is possible using named parentheses (see below).
|
subpattern is possible using named parentheses (see below).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -2175,7 +2175,7 @@ of forward reference can be useful it patterns that repeat. Perl does not
|
||||||
support the use of + in this way.
|
support the use of + in this way.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A back reference matches whatever actually matched the capturing subpattern in
|
A backreference matches whatever actually matched the capturing subpattern in
|
||||||
the current subject string, rather than anything matching the subpattern
|
the current subject string, rather than anything matching the subpattern
|
||||||
itself (see
|
itself (see
|
||||||
<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
|
<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
|
||||||
|
@ -2185,7 +2185,7 @@ below for a way of doing that). So the pattern
|
||||||
</pre>
|
</pre>
|
||||||
matches "sense and sensibility" and "response and responsibility", but not
|
matches "sense and sensibility" and "response and responsibility", but not
|
||||||
"sense and responsibility". If caseful matching is in force at the time of the
|
"sense and responsibility". If caseful matching is in force at the time of the
|
||||||
back reference, the case of letters is relevant. For example,
|
backreference, the case of letters is relevant. For example,
|
||||||
<pre>
|
<pre>
|
||||||
((?i)rah)\s+\1
|
((?i)rah)\s+\1
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2193,10 +2193,10 @@ matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
|
||||||
capturing subpattern is matched caselessly.
|
capturing subpattern is matched caselessly.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There are several different ways of writing back references to named
|
There are several different ways of writing backreferences to named
|
||||||
subpatterns. The .NET syntax \k{name} and the Perl syntax \k<name> or
|
subpatterns. The .NET syntax \k{name} and the Perl syntax \k<name> or
|
||||||
\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
|
\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
|
||||||
back reference syntax, in which \g can be used for both numeric and named
|
backreference syntax, in which \g can be used for both numeric and named
|
||||||
references, is also supported. We could rewrite the above example in any of
|
references, is also supported. We could rewrite the above example in any of
|
||||||
the following ways:
|
the following ways:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -2209,30 +2209,30 @@ A subpattern that is referenced by name may appear in the pattern before or
|
||||||
after the reference.
|
after the reference.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There may be more than one back reference to the same subpattern. If a
|
There may be more than one backreference to the same subpattern. If a
|
||||||
subpattern has not actually been used in a particular match, any back
|
subpattern has not actually been used in a particular match, any backreferences
|
||||||
references to it always fail by default. For example, the pattern
|
to it always fail by default. For example, the pattern
|
||||||
<pre>
|
<pre>
|
||||||
(a|(bc))\2
|
(a|(bc))\2
|
||||||
</pre>
|
</pre>
|
||||||
always fails if it starts to match "a" rather than "bc". However, if the
|
always fails if it starts to match "a" rather than "bc". However, if the
|
||||||
PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a back reference to an
|
PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backreference to an
|
||||||
unset value matches an empty string.
|
unset value matches an empty string.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Because there may be many capturing parentheses in a pattern, all digits
|
Because there may be many capturing parentheses in a pattern, all digits
|
||||||
following a backslash are taken as part of a potential back reference number.
|
following a backslash are taken as part of a potential backreference number.
|
||||||
If the pattern continues with a digit character, some delimiter must be used to
|
If the pattern continues with a digit character, some delimiter must be used to
|
||||||
terminate the back reference. If the PCRE2_EXTENDED option is set, this can be
|
terminate the backreference. If the PCRE2_EXTENDED option is set, this can be
|
||||||
white space. Otherwise, the \g{ syntax or an empty comment (see
|
white space. Otherwise, the \g{ syntax or an empty comment (see
|
||||||
<a href="#comments">"Comments"</a>
|
<a href="#comments">"Comments"</a>
|
||||||
below) can be used.
|
below) can be used.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Recursive back references
|
Recursive backreferences
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
A back reference that occurs inside the parentheses to which it refers fails
|
A backreference that occurs inside the parentheses to which it refers fails
|
||||||
when the subpattern is first used, so, for example, (a\1) never matches.
|
when the subpattern is first used, so, for example, (a\1) never matches.
|
||||||
However, such references can be useful inside repeated subpatterns. For
|
However, such references can be useful inside repeated subpatterns. For
|
||||||
example, the pattern
|
example, the pattern
|
||||||
|
@ -2240,14 +2240,14 @@ example, the pattern
|
||||||
(a|b\1)+
|
(a|b\1)+
|
||||||
</pre>
|
</pre>
|
||||||
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
|
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
|
||||||
the subpattern, the back reference matches the character string corresponding
|
the subpattern, the backreference matches the character string corresponding
|
||||||
to the previous iteration. In order for this to work, the pattern must be such
|
to the previous iteration. In order for this to work, the pattern must be such
|
||||||
that the first iteration does not need to match the back reference. This can be
|
that the first iteration does not need to match the backreference. This can be
|
||||||
done using alternation, as in the example above, or by a quantifier with a
|
done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Back references of this type cause the group that they reference to be treated
|
backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
<a href="#atomicgroup">atomic group.</a>
|
<a href="#atomicgroup">atomic group.</a>
|
||||||
Once the whole group has been matched, a subsequent matching failure cannot
|
Once the whole group has been matched, a subsequent matching failure cannot
|
||||||
|
@ -2397,10 +2397,10 @@ that is, a "subroutine" call into a group that is already active,
|
||||||
is not supported.
|
is not supported.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Perl does not support back references in lookbehinds. PCRE2 does support them,
|
Perl does not support backreferences in lookbehinds. PCRE2 does support them,
|
||||||
but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
|
but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
|
||||||
must not be set, there must be no use of (?| in the pattern (it creates
|
must not be set, there must be no use of (?| in the pattern (it creates
|
||||||
duplicate subpattern numbers), and if the back reference is by name, the name
|
duplicate subpattern numbers), and if the backreference is by name, the name
|
||||||
must be unique. Of course, the referenced subpattern must itself be of fixed
|
must be unique. Of course, the referenced subpattern must itself be of fixed
|
||||||
length. The following pattern matches words containing at least two characters
|
length. The following pattern matches words containing at least two characters
|
||||||
that begin and end with the same character:
|
that begin and end with the same character:
|
||||||
|
@ -2882,7 +2882,7 @@ in PCRE2 these values can be referenced. Consider this pattern:
|
||||||
^(.)(\1|a(?2))
|
^(.)(\1|a(?2))
|
||||||
</pre>
|
</pre>
|
||||||
This pattern matches "bab". The first capturing parentheses match "b", then in
|
This pattern matches "bab". The first capturing parentheses match "b", then in
|
||||||
the second group, when the back reference \1 fails to match "b", the second
|
the second group, when the backreference \1 fails to match "b", the second
|
||||||
alternative matches "a" and then recurses. In the recursion, \1 does now match
|
alternative matches "a" and then recurses. In the recursion, \1 does now match
|
||||||
"b" and so the whole match succeeds. This match used to fail in Perl, but in
|
"b" and so the whole match succeeds. This match used to fail in Perl, but in
|
||||||
later versions (I tried 5.024) it now works.
|
later versions (I tried 5.024) it now works.
|
||||||
|
@ -2943,7 +2943,7 @@ plus or a minus sign it is taken as a relative reference. For example:
|
||||||
(abc)(?i:\g<-1>)
|
(abc)(?i:\g<-1>)
|
||||||
</pre>
|
</pre>
|
||||||
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i>
|
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i>
|
||||||
synonymous. The former is a back reference; the latter is a subroutine call.
|
synonymous. The former is a backreference; the latter is a subroutine call.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
|
<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
|
|
@ -132,14 +132,14 @@ When a pattern that is compiled with this flag is passed to <b>regexec()</b> for
|
||||||
matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
|
matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
|
||||||
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
||||||
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
||||||
because it disables the use of back references.
|
because it disables the use of backreferences.
|
||||||
<pre>
|
<pre>
|
||||||
REG_PEND
|
REG_PEND
|
||||||
</pre>
|
</pre>
|
||||||
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
||||||
(which has the type const char *) must be set to point to the character beyond
|
(which has the type const char *) must be set to point to the character beyond
|
||||||
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
||||||
now contain binary zeroes, which are treated as data characters. Without
|
now contain binary zeros, which are treated as data characters. Without
|
||||||
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
||||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||||
caution in software intended to be portable to other systems.
|
caution in software intended to be portable to other systems.
|
||||||
|
@ -248,10 +248,10 @@ function.
|
||||||
<pre>
|
<pre>
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
</pre>
|
</pre>
|
||||||
When this option is set, the subject string is starts at <i>string</i> +
|
When this option is set, the subject string starts at <i>string</i> +
|
||||||
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
||||||
should point to the first character beyond the string. There may be binary
|
should point to the first character beyond the string. There may be binary
|
||||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
zeros within the subject string, and indeed, using REG_STARTEND is the only
|
||||||
way to pass a subject string that contains a binary zero.
|
way to pass a subject string that contains a binary zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
|
|
@ -442,7 +442,7 @@ of the newline or \R options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
<pre>
|
<pre>
|
||||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||||
(*LIMIT_MATCH=d) set the match limit to d
|
(*LIMIT_MATCH=d) set the match limit to d
|
||||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||||
|
|
|
@ -129,7 +129,7 @@ to occur).
|
||||||
UTF-8 (in its original definition) is not capable of encoding values greater
|
UTF-8 (in its original definition) is not capable of encoding values greater
|
||||||
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
||||||
testing this library in non-UTF mode with <b>utf8_input</b> set, if any
|
testing this library in non-UTF mode with <b>utf8_input</b> set, if any
|
||||||
character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
|
character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
|
||||||
0x80000000 is added to the character's value. This is the only way of passing
|
0x80000000 is added to the character's value. This is the only way of passing
|
||||||
such code points in a pattern string. For subject strings, using an escape
|
such code points in a pattern string. For subject strings, using an escape
|
||||||
sequence is preferable.
|
sequence is preferable.
|
||||||
|
@ -264,7 +264,7 @@ Do not output the version number of <b>pcre2test</b> at the start of execution.
|
||||||
<P>
|
<P>
|
||||||
<b>-S</b> <i>size</i>
|
<b>-S</b> <i>size</i>
|
||||||
On Unix-like systems, set the size of the run-time stack to <i>size</i>
|
On Unix-like systems, set the size of the run-time stack to <i>size</i>
|
||||||
megabytes.
|
mebibytes (units of 1024*1024 bytes).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-subject</b> <i>modifier-list</i>
|
<b>-subject</b> <i>modifier-list</i>
|
||||||
|
@ -679,8 +679,8 @@ Newline and \R handling
|
||||||
<P>
|
<P>
|
||||||
The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
|
The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
|
||||||
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
|
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
|
||||||
\R matches any Unicode newline sequence. The default is specified when PCRE2
|
\R matches any Unicode newline sequence. The default can be specified when
|
||||||
is built, with the default default being Unicode.
|
PCRE2 is built; if it is not, the default is set to Unicode.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>newline</b> modifier specifies which characters are to be interpreted as
|
The <b>newline</b> modifier specifies which characters are to be interpreted as
|
||||||
|
@ -1418,11 +1418,11 @@ Setting the JIT stack size
|
||||||
<P>
|
<P>
|
||||||
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||||
optimization is not being used. The value is a number of kilobytes. Setting
|
optimization is not being used. The value is a number of kibibytes (units of
|
||||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
|
||||||
default is necessary only for very complicated patterns. If <b>jitstack</b> is
|
that is larger than the default is necessary only for very complicated
|
||||||
set non-zero on a subject line it overrides any value that was set on the
|
patterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
|
||||||
pattern.
|
value that was set on the pattern.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting heap, match, and depth limits
|
Setting heap, match, and depth limits
|
||||||
|
@ -1468,10 +1468,10 @@ and non-recursive, to the internal matching function, thus controlling the
|
||||||
overall amount of computing resource that is used.
|
overall amount of computing resource that is used.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
|
For both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
|
||||||
limits the amount of heap memory used for matching. A value of zero disables
|
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
||||||
the use of any heap memory; many simple pattern matches can be done without
|
value of zero disables the use of any heap memory; many simple pattern matches
|
||||||
using the heap, so this is not an unreasonable setting.
|
can be done without using the heap, so zero is not an unreasonable setting.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
|
@ -53,7 +53,7 @@ compatibility with Perl 5.6. PCRE2 does not support this.
|
||||||
WIDE CHARACTERS AND UTF MODES
|
WIDE CHARACTERS AND UTF MODES
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Codepoints less than 256 can be specified in patterns by either braced or
|
Code points less than 256 can be specified in patterns by either braced or
|
||||||
unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger
|
unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger
|
||||||
values have to use braced sequences. Unbraced octal code points up to \777 are
|
values have to use braced sequences. Unbraced octal code points up to \777 are
|
||||||
also recognized; larger ones can be coded using \o{...}.
|
also recognized; larger ones can be coded using \o{...}.
|
||||||
|
@ -116,7 +116,7 @@ CASE-EQUIVALENCE IN UTF MODES
|
||||||
Case-insensitive matching in a UTF mode makes use of Unicode properties except
|
Case-insensitive matching in a UTF mode makes use of Unicode properties except
|
||||||
for characters whose code points are less than 128 and that have at most two
|
for characters whose code points are less than 128 and that have at most two
|
||||||
case-equivalent values. For these, a direct table lookup is used for speed. A
|
case-equivalent values. For these, a direct table lookup is used for speed. A
|
||||||
few Unicode characters such as Greek sigma have more than two codepoints that
|
few Unicode characters such as Greek sigma have more than two code points that
|
||||||
are case-equivalent, and these are treated as such.
|
are case-equivalent, and these are treated as such.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
|
2629
doc/pcre2.txt
2629
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -53,7 +53,7 @@ The option bits are:
|
||||||
PCRE2_EXTENDED Ignore white space and # comments
|
PCRE2_EXTENDED Ignore white space and # comments
|
||||||
PCRE2_FIRSTLINE Force matching to be before newline
|
PCRE2_FIRSTLINE Force matching to be before newline
|
||||||
PCRE2_LITERAL Pattern characters are all literal
|
PCRE2_LITERAL Pattern characters are all literal
|
||||||
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
PCRE2_MATCH_UNSET_BACKREF Match unset backreferences
|
||||||
PCRE2_MULTILINE ^ and $ match newlines within data
|
PCRE2_MULTILINE ^ and $ match newlines within data
|
||||||
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns
|
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns
|
||||||
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
||||||
|
|
|
@ -65,7 +65,7 @@ subject that is terminated by a binary zero code unit. The options are:
|
||||||
match even if there is a full match
|
match even if there is a full match
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
|
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
|
||||||
match if no full matches are found
|
match if no full matches are found
|
||||||
.sp
|
.sp
|
||||||
For details of partial matching, see the
|
For details of partial matching, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -24,7 +24,7 @@ request are as follows:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_ALLOPTIONS Final options after compiling
|
PCRE2_INFO_ALLOPTIONS Final options after compiling
|
||||||
PCRE2_INFO_ARGOPTIONS Options passed to \fBpcre2_compile()\fP
|
PCRE2_INFO_ARGOPTIONS Options passed to \fBpcre2_compile()\fP
|
||||||
PCRE2_INFO_BACKREFMAX Number of highest back reference
|
PCRE2_INFO_BACKREFMAX Number of highest backreference
|
||||||
PCRE2_INFO_BSR What \eR matches:
|
PCRE2_INFO_BSR What \eR matches:
|
||||||
PCRE2_BSR_UNICODE: Unicode line endings
|
PCRE2_BSR_UNICODE: Unicode line endings
|
||||||
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "16 June 2017" "PCRE2 10.30"
|
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "16 June 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
|
|
@ -16,7 +16,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
This function is part of an experimental set of pattern conversion functions.
|
This function is part of an experimental set of pattern conversion functions.
|
||||||
It sets the component separator character that is used when converting globs.
|
It sets the component separator character that is used when converting globs.
|
||||||
The second argument must one of the characters forward slash, backslash, or
|
The second argument must be one of the characters forward slash, backslash, or
|
||||||
dot. The default is backslash when running under Windows, otherwise forward
|
dot. The default is backslash when running under Windows, otherwise forward
|
||||||
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
||||||
the second argument is invalid.
|
the second argument is invalid.
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SET_DEPTH_LIMIT 3 "11 April 2017" "PCRE2 10.30"
|
.TH PCRE2_SET_HEAP_LIMIT 3 "11 April 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
|
107
doc/pcre2api.3
107
doc/pcre2api.3
|
@ -497,10 +497,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
||||||
.P
|
.P
|
||||||
Each of the first three conventions is used by at least one operating system as
|
Each of the first three conventions is used by at least one operating system as
|
||||||
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
||||||
The default default is LF, which is the Unix standard. However, the newline
|
If it is not, the default is set to LF, which is the Unix standard. However,
|
||||||
convention can be changed by an application when calling \fBpcre2_compile()\fP,
|
the newline convention can be changed by an application when calling
|
||||||
or it can be specified by special text at the start of the pattern itself; this
|
\fBpcre2_compile()\fP, or it can be specified by special text at the start of
|
||||||
overrides any other settings. See the
|
the pattern itself; this overrides any other settings. See the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2pattern\fP
|
\fBpcre2pattern\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -885,19 +885,20 @@ offset limit. In other words, whichever limit comes first is used.
|
||||||
.B " uint32_t \fIvalue\fP);"
|
.B " uint32_t \fIvalue\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
The \fIheap_limit\fP parameter specifies, in units of kibibytes (1024 bytes),
|
||||||
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
the maximum amount of heap memory that \fBpcre2_match()\fP may use to hold
|
||||||
information when running an interpretive match. This limit also applies to
|
backtracking information when running an interpretive match. This limit also
|
||||||
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
|
applies to \fBpcre2_dfa_match()\fP, which may use the heap when processing
|
||||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
patterns with a lot of nested pattern recursion or lookarounds or atomic
|
||||||
does not apply to matching with the JIT optimization, which has its own memory
|
groups. This limit does not apply to matching with the JIT optimization, which
|
||||||
control arrangements (see the
|
has its own memory control arrangements (see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
documentation for more details). If the limit is reached, the negative error
|
documentation for more details). If the limit is reached, the negative error
|
||||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||||
built; the default default is very large and is essentially "unlimited".
|
is built; if it is not, the default is set very large and is essentially
|
||||||
|
"unlimited".
|
||||||
.P
|
.P
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
pattern of the form
|
pattern of the form
|
||||||
|
@ -975,7 +976,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
|
||||||
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
||||||
uses it to limit the depth of nested internal recursive function calls that
|
uses it to limit the depth of nested internal recursive function calls that
|
||||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
limits, indirectly, the amount of system stack that is used. It was more useful
|
||||||
in versions before 10.32, when stack memory was used for local workspace
|
in versions before 10.32, when stack memory was used for local workspace
|
||||||
vectors for recursive function calls. From version 10.32, only local variables
|
vectors for recursive function calls. From version 10.32, only local variables
|
||||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||||
|
@ -989,11 +990,11 @@ using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
|
||||||
probably better to limit heap usage directly by calling
|
probably better to limit heap usage directly by calling
|
||||||
\fBpcre2_set_heap_limit()\fP.
|
\fBpcre2_set_heap_limit()\fP.
|
||||||
.P
|
.P
|
||||||
The default value for the depth limit can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; if it is
|
||||||
default default is the same value as the default for the match limit. If the
|
not, the default is set to the same value as the default for the match limit.
|
||||||
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
|
If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
|
||||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||||
item at the start of a pattern of the form
|
supplied by an item at the start of a pattern of the form
|
||||||
.sp
|
.sp
|
||||||
(*LIMIT_DEPTH=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1050,7 +1051,7 @@ given with \fBpcre2_set_depth_limit()\fP above.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
The output is a uint32_t integer that gives, in kibibytes, the default limit
|
||||||
for the amount of heap memory used by \fBpcre2_match()\fP or
|
for the amount of heap memory used by \fBpcre2_match()\fP or
|
||||||
\fBpcre2_dfa_match()\fP. Further details are given with
|
\fBpcre2_dfa_match()\fP. Further details are given with
|
||||||
\fBpcre2_set_heap_limit()\fP above.
|
\fBpcre2_set_heap_limit()\fP above.
|
||||||
|
@ -1367,7 +1368,7 @@ If this bit is set, letters in the pattern match both upper and lower case
|
||||||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||||
properties are used for all characters with more than one other case, and for
|
properties are used for all characters with more than one other case, and for
|
||||||
all characters whose code points are greater than U+007f. For lower valued
|
all characters whose code points are greater than U+007F. For lower valued
|
||||||
characters with only one other case, a lookup table is used for speed. When
|
characters with only one other case, a lookup table is used for speed. When
|
||||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||||
|
@ -1489,7 +1490,7 @@ error.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_MATCH_UNSET_BACKREF
|
PCRE2_MATCH_UNSET_BACKREF
|
||||||
.sp
|
.sp
|
||||||
If this option is set, a back reference to an unset subpattern group matches an
|
If this option is set, a backreference to an unset subpattern group matches an
|
||||||
empty string (by default this causes the current matching alternative to fail).
|
empty string (by default this causes the current matching alternative to fail).
|
||||||
A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
|
A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
|
||||||
find an "a" in the subject), whereas it fails by default, for Perl
|
find an "a" in the subject), whereas it fails by default, for Perl
|
||||||
|
@ -1550,8 +1551,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
||||||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||||
were followed by ?: but named parentheses can still be used for capturing (and
|
were followed by ?: but named parentheses can still be used for capturing (and
|
||||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||||
Note that, when this option is set, references to capturing groups (back
|
Note that, when this option is set, references to capturing groups
|
||||||
references or recursion/subroutine calls) may only refer to named groups,
|
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||||
though the reference can be by name or by number.
|
though the reference can be by name or by number.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_NO_AUTO_POSSESS
|
PCRE2_NO_AUTO_POSSESS
|
||||||
|
@ -1570,7 +1571,7 @@ If this option is set, it disables an optimization that is applied when .* is
|
||||||
the first significant item in a top-level branch of a pattern, and all the
|
the first significant item in a top-level branch of a pattern, and all the
|
||||||
other branches also start with .* or with \eA or \eG or ^. The optimization is
|
other branches also start with .* or with \eA or \eG or ^. The optimization is
|
||||||
automatically disabled for .* if it is inside an atomic group or a capturing
|
automatically disabled for .* if it is inside an atomic group or a capturing
|
||||||
group that is the subject of a back reference, or if the pattern contains
|
group that is the subject of a backreference, or if the pattern contains
|
||||||
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
|
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
|
||||||
automatically anchored if PCRE2_DOTALL is set for all the .* items and
|
automatically anchored if PCRE2_DOTALL is set for all the .* items and
|
||||||
PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
|
PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
|
||||||
|
@ -1956,7 +1957,7 @@ following are true:
|
||||||
.* is not in an atomic group
|
.* is not in an atomic group
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
.* is not in a capturing group that is the subject
|
.* is not in a capturing group that is the subject
|
||||||
of a back reference
|
of a backreference
|
||||||
PCRE2_DOTALL is in force for .*
|
PCRE2_DOTALL is in force for .*
|
||||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||||
|
@ -1966,20 +1967,20 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_BACKREFMAX
|
PCRE2_INFO_BACKREFMAX
|
||||||
.sp
|
.sp
|
||||||
Return the number of the highest back reference in the pattern. The third
|
Return the number of the highest backreference in the pattern. The third
|
||||||
argument should point to an \fBuint32_t\fP variable. Named subpatterns acquire
|
argument should point to an \fBuint32_t\fP variable. Named subpatterns acquire
|
||||||
numbers as well as names, and these count towards the highest back reference.
|
numbers as well as names, and these count towards the highest backreference.
|
||||||
Back references such as \e4 or \eg{12} match the captured characters of the
|
Backreferences such as \e4 or \eg{12} match the captured characters of the
|
||||||
given group, but in addition, the check that a capturing group is set in a
|
given group, but in addition, the check that a capturing group is set in a
|
||||||
conditional subpattern such as (?(3)a|b) is also a back reference. Zero is
|
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is
|
||||||
returned if there are no back references.
|
returned if there are no backreferences.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t whose value indicates what character sequences the \eR
|
The output is a uint32_t integer whose value indicates what character sequences
|
||||||
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR matches
|
the \eR escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR
|
||||||
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \eR
|
matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
|
||||||
matches only CR, LF, or CRLF.
|
that \eR matches only CR, LF, or CRLF.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_CAPTURECOUNT
|
PCRE2_INFO_CAPTURECOUNT
|
||||||
.sp
|
.sp
|
||||||
|
@ -1991,10 +1992,10 @@ The third argument should point to an \fBuint32_t\fP variable.
|
||||||
.sp
|
.sp
|
||||||
If the pattern set a backtracking depth limit by including an item of the form
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
.sp
|
.sp
|
||||||
|
@ -2004,7 +2005,7 @@ values for the first code unit in any match. For example, a pattern that starts
|
||||||
with [abc] results in a table with three bits set. When code unit values
|
with [abc] results in a table with three bits set. When code unit values
|
||||||
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
||||||
value 255 or above". If such a table was constructed, a pointer to it is
|
value 255 or above". If such a table was constructed, a pointer to it is
|
||||||
returned. Otherwise NULL is returned. The third argument should point to an
|
returned. Otherwise NULL is returned. The third argument should point to a
|
||||||
\fBconst uint8_t *\fP variable.
|
\fBconst uint8_t *\fP variable.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_FIRSTCODETYPE
|
PCRE2_INFO_FIRSTCODETYPE
|
||||||
|
@ -2031,7 +2032,7 @@ and up to 0xffffffff when not using UTF-32 mode.
|
||||||
.sp
|
.sp
|
||||||
Return the size (in bytes) of the data frames that are used to remember
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||||
without the use of JIT. The third argument should point to an \fBsize_t\fP
|
without the use of JIT. The third argument should point to a \fBsize_t\fP
|
||||||
variable. The frame size depends on the number of capturing parentheses in the
|
variable. The frame size depends on the number of capturing parentheses in the
|
||||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||||
.sp
|
.sp
|
||||||
|
@ -2051,10 +2052,10 @@ the equivalent hexadecimal or octal escape sequences.
|
||||||
.sp
|
.sp
|
||||||
If the pattern set a heap memory limit by including an item of the form
|
If the pattern set a heap memory limit by including an item of the form
|
||||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
.sp
|
.sp
|
||||||
|
@ -2098,15 +2099,15 @@ in such cases.
|
||||||
.sp
|
.sp
|
||||||
If the pattern set a match limit by including an item of the form
|
If the pattern set a match limit by including an item of the form
|
||||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to a uint32_t integer. If no such value has been set, the call to
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||||
that this limit will only be used during matching if it is less than the limit
|
limit will only be used during matching if it is less than the limit set or
|
||||||
set or defaulted by the caller of the match function.
|
defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_MAXLOOKBEHIND
|
PCRE2_INFO_MAXLOOKBEHIND
|
||||||
.sp
|
.sp
|
||||||
Return the number of characters (not code units) in the longest lookbehind
|
Return the number of characters (not code units) in the longest lookbehind
|
||||||
assertion in the pattern. The third argument should point to an unsigned 32-bit
|
assertion in the pattern. The third argument should point to a uint32_t
|
||||||
integer. This information is useful when doing multi-segment matching using the
|
integer. This information is useful when doing multi-segment matching using the
|
||||||
partial matching facilities. Note that the simple assertions \eb and \eB
|
partial matching facilities. Note that the simple assertions \eb and \eB
|
||||||
require a one-character lookbehind. \eA also registers a one-character
|
require a one-character lookbehind. \eA also registers a one-character
|
||||||
|
@ -2393,7 +2394,7 @@ zero, the search for a match starts at the beginning of the subject, and this
|
||||||
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
||||||
must point to the start of a character, or to the end of the subject (in UTF-32
|
must point to the start of a character, or to the end of the subject (in UTF-32
|
||||||
mode, one code unit equals one character, so all offsets are valid). Like the
|
mode, one code unit equals one character, so all offsets are valid). Like the
|
||||||
pattern string, the subject may contain binary zeroes.
|
pattern string, the subject may contain binary zeros.
|
||||||
.P
|
.P
|
||||||
A non-zero starting offset is useful when searching for another match in the
|
A non-zero starting offset is useful when searching for another match in the
|
||||||
same subject by calling \fBpcre2_match()\fP again after a previous success.
|
same subject by calling \fBpcre2_match()\fP again after a previous success.
|
||||||
|
@ -3562,12 +3563,12 @@ There are in addition the following errors that are specific to
|
||||||
.sp
|
.sp
|
||||||
This return is given if \fBpcre2_dfa_match()\fP encounters an item in the
|
This return is given if \fBpcre2_dfa_match()\fP encounters an item in the
|
||||||
pattern that it does not support, for instance, the use of \eC in a UTF mode or
|
pattern that it does not support, for instance, the use of \eC in a UTF mode or
|
||||||
a back reference.
|
a backreference.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_DFA_UCOND
|
PCRE2_ERROR_DFA_UCOND
|
||||||
.sp
|
.sp
|
||||||
This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
|
This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
|
||||||
that uses a back reference for the condition, or a test for recursion in a
|
that uses a backreference for the condition, or a test for recursion in a
|
||||||
specific group. These are not supported.
|
specific group. These are not supported.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_DFA_WSSIZE
|
PCRE2_ERROR_DFA_WSSIZE
|
||||||
|
|
|
@ -216,7 +216,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
||||||
.sp
|
.sp
|
||||||
--enable-newline-is-nul
|
--enable-newline-is-nul
|
||||||
.sp
|
.sp
|
||||||
which causes NUL (binary zero) is set as the default line-ending character.
|
which causes NUL (binary zero) to be set as the default line-ending character.
|
||||||
.P
|
.P
|
||||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||||
overridden by applications that use the library. At build time it is
|
overridden by applications that use the library. At build time it is
|
||||||
|
@ -281,8 +281,8 @@ The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
which is specified in kilobytes. The limit can be changed at run time, as
|
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
||||||
described in the
|
at run time, as described in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -291,7 +291,7 @@ change this by a setting such as
|
||||||
.sp
|
.sp
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
.sp
|
.sp
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||||
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
||||||
may also use the heap for internal workspace when processing complicated
|
may also use the heap for internal workspace when processing complicated
|
||||||
patterns. This limit does not apply when JIT (which has its own memory
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
|
@ -552,7 +552,7 @@ generated from the string.
|
||||||
Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
|
Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
|
||||||
to be created. This is normally run under valgrind or used when PCRE2 is
|
to be created. This is normally run under valgrind or used when PCRE2 is
|
||||||
compiled with address sanitizing enabled. It calls the fuzzing function and
|
compiled with address sanitizing enabled. It calls the fuzzing function and
|
||||||
outputs information about it is doing. The input strings are specified by
|
outputs information about what it is doing. The input strings are specified by
|
||||||
arguments: if an argument starts with "=" the rest of it is a literal input
|
arguments: if an argument starts with "=" the rest of it is a literal input
|
||||||
string. Otherwise, it is assumed to be a file name, and the contents of the
|
string. Otherwise, it is assumed to be a file name, and the contents of the
|
||||||
file are the test string.
|
file are the test string.
|
||||||
|
|
|
@ -128,7 +128,7 @@ start only after an internal newline or at the beginning of the subject, and
|
||||||
branch, automatic anchoring occurs if all branches are anchorable.
|
branch, automatic anchoring occurs if all branches are anchorable.
|
||||||
.P
|
.P
|
||||||
This optimization is disabled, however, if .* is in an atomic group or if there
|
This optimization is disabled, however, if .* is in an atomic group or if there
|
||||||
is a back reference to the capturing group in which it appears. It is also
|
is a backreference to the capturing group in which it appears. It is also
|
||||||
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
|
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
|
||||||
callouts does not affect it.
|
callouts does not affect it.
|
||||||
.P
|
.P
|
||||||
|
|
|
@ -19,7 +19,7 @@ page.
|
||||||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||||
they do not mean what you might think. For example, (?!a){3} does not assert
|
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||||
that the next three characters are not "a". It just asserts that the next
|
that the next three characters are not "a". It just asserts that the next
|
||||||
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
||||||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||||
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
||||||
.P
|
.P
|
||||||
|
@ -62,8 +62,8 @@ Note the following examples:
|
||||||
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
||||||
.P
|
.P
|
||||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||||
constructions. However, there is support PCRE2's "callout" feature, which
|
constructions. However, PCRE2 does have a "callout" feature, which allows an
|
||||||
allows an external function to be called during pattern matching. See the
|
external function to be called during pattern matching. See the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -131,7 +131,7 @@ list is with respect to Perl 5.26:
|
||||||
each alternative branch of a lookbehind assertion can match a different length
|
each alternative branch of a lookbehind assertion can match a different length
|
||||||
of string. Perl requires them all to have the same length.
|
of string. Perl requires them all to have the same length.
|
||||||
.sp
|
.sp
|
||||||
(b) From PCRE2 10.23, back references to groups of fixed length are supported
|
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
|
||||||
in lookbehinds, provided that there is no possibility of referencing a
|
in lookbehinds, provided that there is no possibility of referencing a
|
||||||
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
||||||
.sp
|
.sp
|
||||||
|
|
|
@ -57,9 +57,10 @@ controlled by parameters that can be set by the \fB--buffer-size\fP and
|
||||||
that is obtained at the start of processing. If an input file contains very
|
that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
||||||
default values for these parameters are specified when \fBpcre2grep\fP is
|
default values for these parameters can be set when \fBpcre2grep\fP is
|
||||||
built, with the default defaults being 20K and 1M respectively. An error occurs
|
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||||
if a line is too long and the buffer can no longer be expanded.
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
|
longer be expanded.
|
||||||
.P
|
.P
|
||||||
The block of memory that is actually used is three times the "buffer size", to
|
The block of memory that is actually used is three times the "buffer size", to
|
||||||
allow for buffering "before" and "after" lines. If the buffer size is too
|
allow for buffering "before" and "after" lines. If the buffer size is too
|
||||||
|
@ -434,13 +435,13 @@ short form for this option.
|
||||||
When this option is given, non-compressed input is read and processed line by
|
When this option is given, non-compressed input is read and processed line by
|
||||||
line, and the output is flushed after each write. By default, input is read in
|
line, and the output is flushed after each write. By default, input is read in
|
||||||
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
|
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
|
||||||
terminal (which is currently possible only in Unix-like environments). Output
|
terminal (which is currently possible only in Unix-like environments or
|
||||||
to terminal is normally automatically flushed by the operating system. This
|
Windows). Output to terminal is normally automatically flushed by the operating
|
||||||
option can be useful when the input or output is attached to a pipe and you do
|
system. This option can be useful when the input or output is attached to a
|
||||||
not want \fBpcre2grep\fP to buffer up large amounts of data. However, its use
|
pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data.
|
||||||
will affect performance, and the \fB-M\fP (multiline) option ceases to work.
|
However, its use will affect performance, and the \fB-M\fP (multiline) option
|
||||||
When input is from a compressed .gz or .bz2 file, \fB--line-buffered\fP is
|
ceases to work. When input is from a compressed .gz or .bz2 file,
|
||||||
ignored.
|
\fB--line-buffered\fP is ignored.
|
||||||
.TP
|
.TP
|
||||||
\fB--line-offsets\fP
|
\fB--line-offsets\fP
|
||||||
Instead of showing lines or parts of lines that match, show each match as a
|
Instead of showing lines or parts of lines that match, show each match as a
|
||||||
|
@ -470,11 +471,11 @@ is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
|
||||||
counter that is incremented each time around its main processing loop. If the
|
counter that is incremented each time around its main processing loop. If the
|
||||||
value set by \fB--match-limit\fP is reached, an error occurs.
|
value set by \fB--match-limit\fP is reached, an error occurs.
|
||||||
.sp
|
.sp
|
||||||
The \fB--heap-limit\fP option specifies, as a number of kilobytes, the amount
|
The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
|
||||||
of heap memory that may be used for matching. Heap memory is needed only if
|
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
||||||
matching the pattern requires a significant number of nested backtracking
|
memory is needed only if matching the pattern requires a significant number of
|
||||||
points to be remembered. This parameter can be set to zero to forbid the use of
|
nested backtracking points to be remembered. This parameter can be set to zero
|
||||||
heap memory altogether.
|
to forbid the use of heap memory altogether.
|
||||||
.sp
|
.sp
|
||||||
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
||||||
which indirectly limits the amount of memory that is used. The amount of memory
|
which indirectly limits the amount of memory that is used. The amount of memory
|
||||||
|
@ -483,9 +484,9 @@ parentheses in the pattern, so the amount of memory that is used before this
|
||||||
limit acts varies from pattern to pattern. This limit is of use only if it is
|
limit acts varies from pattern to pattern. This limit is of use only if it is
|
||||||
set smaller than \fB--match-limit\fP.
|
set smaller than \fB--match-limit\fP.
|
||||||
.sp
|
.sp
|
||||||
There are no short forms for these options. The default settings are specified
|
There are no short forms for these options. The default limits can be set
|
||||||
when the PCRE2 library is compiled, with the default defaults being very large
|
when the PCRE2 library is compiled; if they are not specified, the defaults
|
||||||
and so effectively unlimited.
|
are very large and so effectively unlimited.
|
||||||
.TP
|
.TP
|
||||||
\fB--max-buffer-size=\fInumber\fP
|
\fB--max-buffer-size=\fInumber\fP
|
||||||
This limits the expansion of the processing buffer, whose initial size can be
|
This limits the expansion of the processing buffer, whose initial size can be
|
||||||
|
|
|
@ -56,10 +56,10 @@ DESCRIPTION
|
||||||
that is obtained at the start of processing. If an input file contains
|
that is obtained at the start of processing. If an input file contains
|
||||||
very long lines, a larger buffer may be needed; this is handled by
|
very long lines, a larger buffer may be needed; this is handled by
|
||||||
automatically extending the buffer, up to the limit specified by --max-
|
automatically extending the buffer, up to the limit specified by --max-
|
||||||
buffer-size. The default values for these parameters are specified when
|
buffer-size. The default values for these parameters can be set when
|
||||||
pcre2grep is built, with the default defaults being 20K and 1M respec-
|
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||||
tively. An error occurs if a line is too long and the buffer can no
|
20K and 1M respectively. An error occurs if a line is too long and the
|
||||||
longer be expanded.
|
buffer can no longer be expanded.
|
||||||
|
|
||||||
The block of memory that is actually used is three times the "buffer
|
The block of memory that is actually used is three times the "buffer
|
||||||
size", to allow for buffering "before" and "after" lines. If the buffer
|
size", to allow for buffering "before" and "after" lines. If the buffer
|
||||||
|
@ -475,14 +475,14 @@ OPTIONS
|
||||||
processed line by line, and the output is flushed after each
|
processed line by line, and the output is flushed after each
|
||||||
write. By default, input is read in large chunks, unless
|
write. By default, input is read in large chunks, unless
|
||||||
pcre2grep can determine that it is reading from a terminal
|
pcre2grep can determine that it is reading from a terminal
|
||||||
(which is currently possible only in Unix-like environments).
|
(which is currently possible only in Unix-like environments
|
||||||
Output to terminal is normally automatically flushed by the
|
or Windows). Output to terminal is normally automatically
|
||||||
operating system. This option can be useful when the input or
|
flushed by the operating system. This option can be useful
|
||||||
output is attached to a pipe and you do not want pcre2grep to
|
when the input or output is attached to a pipe and you do not
|
||||||
buffer up large amounts of data. However, its use will affect
|
want pcre2grep to buffer up large amounts of data. However,
|
||||||
performance, and the -M (multiline) option ceases to work.
|
its use will affect performance, and the -M (multiline)
|
||||||
When input is from a compressed .gz or .bz2 file, --line-
|
option ceases to work. When input is from a compressed .gz or
|
||||||
buffered is ignored.
|
.bz2 file, --line-buffered is ignored.
|
||||||
|
|
||||||
--line-offsets
|
--line-offsets
|
||||||
Instead of showing lines or parts of lines that match, show
|
Instead of showing lines or parts of lines that match, show
|
||||||
|
@ -517,12 +517,12 @@ OPTIONS
|
||||||
processing loop. If the value set by --match-limit is
|
processing loop. If the value set by --match-limit is
|
||||||
reached, an error occurs.
|
reached, an error occurs.
|
||||||
|
|
||||||
The --heap-limit option specifies, as a number of kilobytes,
|
The --heap-limit option specifies, as a number of kibibytes
|
||||||
the amount of heap memory that may be used for matching. Heap
|
(units of 1024 bytes), the amount of heap memory that may be
|
||||||
memory is needed only if matching the pattern requires a sig-
|
used for matching. Heap memory is needed only if matching the
|
||||||
nificant number of nested backtracking points to be remem-
|
pattern requires a significant number of nested backtracking
|
||||||
bered. This parameter can be set to zero to forbid the use of
|
points to be remembered. This parameter can be set to zero to
|
||||||
heap memory altogether.
|
forbid the use of heap memory altogether.
|
||||||
|
|
||||||
The --depth-limit option limits the depth of nested back-
|
The --depth-limit option limits the depth of nested back-
|
||||||
tracking points, which indirectly limits the amount of memory
|
tracking points, which indirectly limits the amount of memory
|
||||||
|
@ -532,10 +532,10 @@ OPTIONS
|
||||||
limit acts varies from pattern to pattern. This limit is of
|
limit acts varies from pattern to pattern. This limit is of
|
||||||
use only if it is set smaller than --match-limit.
|
use only if it is set smaller than --match-limit.
|
||||||
|
|
||||||
There are no short forms for these options. The default set-
|
There are no short forms for these options. The default lim-
|
||||||
tings are specified when the PCRE2 library is compiled, with
|
its can be set when the PCRE2 library is compiled; if they
|
||||||
the default defaults being very large and so effectively
|
are not specified, the defaults are very large and so effec-
|
||||||
unlimited.
|
tively unlimited.
|
||||||
|
|
||||||
--max-buffer-size=number
|
--max-buffer-size=number
|
||||||
This limits the expansion of the processing buffer, whose
|
This limits the expansion of the processing buffer, whose
|
||||||
|
|
|
@ -38,9 +38,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
|
||||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||||
order to limit the amount of system stack used at compile time. The default
|
order to limit the amount of system stack used at compile time. The default
|
||||||
limit can be specified when PCRE2 is built; the default default is 250. An
|
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||||
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||||
set the limit in a compile context.
|
to set the limit in a compile context.
|
||||||
.P
|
.P
|
||||||
The maximum length of name for a named subpattern is 32 code units, and the
|
The maximum length of name for a named subpattern is 32 code units, and the
|
||||||
maximum number of named subpatterns is 10000.
|
maximum number of named subpatterns is 10000.
|
||||||
|
|
|
@ -67,7 +67,7 @@ ungreedy repetition quantifiers are specified in the pattern.
|
||||||
Because it ends up with a single path through the tree, it is relatively
|
Because it ends up with a single path through the tree, it is relatively
|
||||||
straightforward for this algorithm to keep track of the substrings that are
|
straightforward for this algorithm to keep track of the substrings that are
|
||||||
matched by portions of the pattern in parentheses. This provides support for
|
matched by portions of the pattern in parentheses. This provides support for
|
||||||
capturing parentheses and back references.
|
capturing parentheses and backreferences.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "THE ALTERNATIVE MATCHING ALGORITHM"
|
.SH "THE ALTERNATIVE MATCHING ALGORITHM"
|
||||||
|
@ -134,7 +134,7 @@ straightforward to keep track of captured substrings for the different matching
|
||||||
possibilities, and PCRE2's implementation of this algorithm does not attempt to
|
possibilities, and PCRE2's implementation of this algorithm does not attempt to
|
||||||
do this. This means that no captured substrings are available.
|
do this. This means that no captured substrings are available.
|
||||||
.P
|
.P
|
||||||
3. Because no substrings are captured, back references within the pattern are
|
3. Because no substrings are captured, backreferences within the pattern are
|
||||||
not supported, and cause errors if encountered.
|
not supported, and cause errors if encountered.
|
||||||
.P
|
.P
|
||||||
4. For the same reason, conditional expressions that use a backreference as the
|
4. For the same reason, conditional expressions that use a backreference as the
|
||||||
|
@ -188,7 +188,7 @@ The alternative algorithm suffers from a number of disadvantages:
|
||||||
because it has to search for all possible matches, but is also because it is
|
because it has to search for all possible matches, but is also because it is
|
||||||
less susceptible to optimization.
|
less susceptible to optimization.
|
||||||
.P
|
.P
|
||||||
2. Capturing parentheses and back references are not supported.
|
2. Capturing parentheses and backreferences are not supported.
|
||||||
.P
|
.P
|
||||||
3. Although atomic groups are supported, their use does not provide the
|
3. Although atomic groups are supported, their use does not provide the
|
||||||
performance advantage that it does for the standard algorithm.
|
performance advantage that it does for the standard algorithm.
|
||||||
|
|
|
@ -163,7 +163,7 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used. The heap limit is
|
setting of one of these limits, the lower value is used. The heap limit is
|
||||||
specified in kilobytes.
|
specified in kibibytes (units of 1024 bytes).
|
||||||
.P
|
.P
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||||
still recognized for backwards compatibility.
|
still recognized for backwards compatibility.
|
||||||
|
@ -318,7 +318,7 @@ precede a non-alphanumeric with backslash to specify that it stands for itself.
|
||||||
In particular, if you want to match a backslash, you write \e\e.
|
In particular, if you want to match a backslash, you write \e\e.
|
||||||
.P
|
.P
|
||||||
In a UTF mode, only ASCII numbers and letters have any special meaning after a
|
In a UTF mode, only ASCII numbers and letters have any special meaning after a
|
||||||
backslash. All other characters (in particular, those whose codepoints are
|
backslash. All other characters (in particular, those whose code points are
|
||||||
greater than 127) are treated as literals.
|
greater than 127) are treated as literals.
|
||||||
.P
|
.P
|
||||||
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
|
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
|
||||||
|
@ -367,7 +367,7 @@ these escapes are as follows:
|
||||||
\er carriage return (hex 0D)
|
\er carriage return (hex 0D)
|
||||||
\et tab (hex 09)
|
\et tab (hex 09)
|
||||||
\e0dd character with octal code 0dd
|
\e0dd character with octal code 0dd
|
||||||
\eddd character with octal code ddd, or back reference
|
\eddd character with octal code ddd, or backreference
|
||||||
\eo{ddd..} character with octal code ddd..
|
\eo{ddd..} character with octal code ddd..
|
||||||
\exhh character with hex code hh
|
\exhh character with hex code hh
|
||||||
\ex{hhh..} character with hex code hhh.. (default mode)
|
\ex{hhh..} character with hex code hhh.. (default mode)
|
||||||
|
@ -410,12 +410,12 @@ follows is itself an octal digit.
|
||||||
The escape \eo must be followed by a sequence of octal digits, enclosed in
|
The escape \eo must be followed by a sequence of octal digits, enclosed in
|
||||||
braces. An error occurs if this is not the case. This escape is a recent
|
braces. An error occurs if this is not the case. This escape is a recent
|
||||||
addition to Perl; it provides way of specifying character code points as octal
|
addition to Perl; it provides way of specifying character code points as octal
|
||||||
numbers greater than 0777, and it also allows octal numbers and back references
|
numbers greater than 0777, and it also allows octal numbers and backreferences
|
||||||
to be unambiguously specified.
|
to be unambiguously specified.
|
||||||
.P
|
.P
|
||||||
For greater clarity and unambiguity, it is best to avoid following \e by a
|
For greater clarity and unambiguity, it is best to avoid following \e by a
|
||||||
digit greater than zero. Instead, use \eo{} or \ex{} to specify character
|
digit greater than zero. Instead, use \eo{} or \ex{} to specify character
|
||||||
numbers, and \eg{} to specify back references. The following paragraphs
|
numbers, and \eg{} to specify backreferences. The following paragraphs
|
||||||
describe the old, ambiguous syntax.
|
describe the old, ambiguous syntax.
|
||||||
.P
|
.P
|
||||||
The handling of a backslash followed by a digit other than 0 is complicated,
|
The handling of a backslash followed by a digit other than 0 is complicated,
|
||||||
|
@ -424,7 +424,7 @@ and Perl has changed over time, causing PCRE2 also to change.
|
||||||
Outside a character class, PCRE2 reads the digit and any following digits as a
|
Outside a character class, PCRE2 reads the digit and any following digits as a
|
||||||
decimal number. If the number is less than 10, begins with the digit 8 or 9, or
|
decimal number. If the number is less than 10, begins with the digit 8 or 9, or
|
||||||
if there are at least that many previous capturing left parentheses in the
|
if there are at least that many previous capturing left parentheses in the
|
||||||
expression, the entire sequence is taken as a \fIback reference\fP. A
|
expression, the entire sequence is taken as a \fIbackreference\fP. A
|
||||||
description of how this works is given
|
description of how this works is given
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
@ -446,20 +446,20 @@ for themselves. For example, outside a character class:
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
\e40 is the same, provided there are fewer than 40
|
\e40 is the same, provided there are fewer than 40
|
||||||
previous capturing subpatterns
|
previous capturing subpatterns
|
||||||
\e7 is always a back reference
|
\e7 is always a backreference
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
\e11 might be a back reference, or another way of
|
\e11 might be a backreference, or another way of
|
||||||
writing a tab
|
writing a tab
|
||||||
\e011 is always a tab
|
\e011 is always a tab
|
||||||
\e0113 is a tab followed by the character "3"
|
\e0113 is a tab followed by the character "3"
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
\e113 might be a back reference, otherwise the
|
\e113 might be a backreference, otherwise the
|
||||||
character with octal code 113
|
character with octal code 113
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
\e377 might be a back reference, otherwise
|
\e377 might be a backreference, otherwise
|
||||||
the value 255 (decimal)
|
the value 255 (decimal)
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
\e81 is always a back reference
|
\e81 is always a backreference
|
||||||
.sp
|
.sp
|
||||||
Note that octal values of 100 or greater that are specified using this syntax
|
Note that octal values of 100 or greater that are specified using this syntax
|
||||||
must not be introduced by a leading zero, because no more than three octal
|
must not be introduced by a leading zero, because no more than three octal
|
||||||
|
@ -492,10 +492,10 @@ limited to certain values, as follows:
|
||||||
8-bit non-UTF mode no greater than 0xff
|
8-bit non-UTF mode no greater than 0xff
|
||||||
16-bit non-UTF mode no greater than 0xffff
|
16-bit non-UTF mode no greater than 0xffff
|
||||||
32-bit non-UTF mode no greater than 0xffffffff
|
32-bit non-UTF mode no greater than 0xffffffff
|
||||||
All UTF modes no greater than 0x10ffff and a valid codepoint
|
All UTF modes no greater than 0x10ffff and a valid code point
|
||||||
.sp
|
.sp
|
||||||
Invalid Unicode codepoints are all those in the range 0xd800 to 0xdfff (the
|
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the
|
||||||
so-called "surrogate" codepoints). The check for these can be disabled by the
|
so-called "surrogate" code points). The check for these can be disabled by the
|
||||||
caller of \fBpcre2_compile()\fP by setting the option
|
caller of \fBpcre2_compile()\fP by setting the option
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||||
.
|
.
|
||||||
|
@ -523,12 +523,12 @@ is set, \eU matches a "U" character, and \eu can be used to define a character
|
||||||
by code point, as described above.
|
by code point, as described above.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Absolute and relative back references"
|
.SS "Absolute and relative backreferences"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative back reference. A named back reference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \eg{name}. Back references are discussed
|
can be coded as \eg{name}. backreferences are discussed
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
later,
|
later,
|
||||||
|
@ -551,7 +551,7 @@ syntax for referencing a subpattern as a "subroutine". Details are discussed
|
||||||
later.
|
later.
|
||||||
.\"
|
.\"
|
||||||
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
|
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
|
||||||
synonymous. The former is a back reference; the latter is a
|
synonymous. The former is a backreference; the latter is a
|
||||||
.\" HTML <a href="#subpatternsassubroutines">
|
.\" HTML <a href="#subpatternsassubroutines">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
subroutine
|
subroutine
|
||||||
|
@ -692,7 +692,7 @@ U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
|
||||||
line, U+0085). Because this is an atomic group, the two-character sequence is
|
line, U+0085). Because this is an atomic group, the two-character sequence is
|
||||||
treated as a single unit that cannot be split.
|
treated as a single unit that cannot be split.
|
||||||
.P
|
.P
|
||||||
In other modes, two additional characters whose codepoints are greater than 255
|
In other modes, two additional characters whose code points are greater than 255
|
||||||
are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
|
are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
|
||||||
Unicode support is not needed for these characters to be recognized.
|
Unicode support is not needed for these characters to be recognized.
|
||||||
.P
|
.P
|
||||||
|
@ -727,8 +727,8 @@ an error.
|
||||||
When PCRE2 is built with Unicode support (the default), three additional escape
|
When PCRE2 is built with Unicode support (the default), three additional escape
|
||||||
sequences that match characters with specific properties are available. In
|
sequences that match characters with specific properties are available. In
|
||||||
8-bit non-UTF-8 mode, these sequences are of course limited to testing
|
8-bit non-UTF-8 mode, these sequences are of course limited to testing
|
||||||
characters whose codepoints are less than 256, but they do work in this mode.
|
characters whose code points are less than 256, but they do work in this mode.
|
||||||
In 32-bit non-UTF mode, codepoints greater than 0x10ffff (the Unicode limit)
|
In 32-bit non-UTF mode, code points greater than 0x10ffff (the Unicode limit)
|
||||||
may be encountered. These are all treated as being in the Common script and
|
may be encountered. These are all treated as being in the Common script and
|
||||||
with an unassigned type. The extra escape sequences are:
|
with an unassigned type. The extra escape sequences are:
|
||||||
.sp
|
.sp
|
||||||
|
@ -1026,7 +1026,7 @@ joiner" characters. Characters with the "mark" property always have the
|
||||||
6. Do not break within emoji modifier sequences (a base character followed by a
|
6. Do not break within emoji modifier sequences (a base character followed by a
|
||||||
modifier). Extending characters are allowed before the modifier.
|
modifier). Extending characters are allowed before the modifier.
|
||||||
.P
|
.P
|
||||||
7. Do not break within emoji zwj sequences (zero-width jointer followed by
|
7. Do not break within emoji zwj sequences (zero-width joiner followed by
|
||||||
"glue after ZWJ" or "base glue after ZWJ").
|
"glue after ZWJ" or "base glue after ZWJ").
|
||||||
.P
|
.P
|
||||||
8. Do not break within emoji flag sequences. That is, do not break between
|
8. Do not break within emoji flag sequences. That is, do not break between
|
||||||
|
@ -1724,7 +1724,7 @@ numbers underneath show in which buffer the captured content will be stored.
|
||||||
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
|
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
|
||||||
# 1 2 2 3 2 3 4
|
# 1 2 2 3 2 3 4
|
||||||
.sp
|
.sp
|
||||||
A back reference to a numbered subpattern uses the most recent value that is
|
A backreference to a numbered subpattern uses the most recent value that is
|
||||||
set for that number by any subpattern. The following pattern matches "abcabc"
|
set for that number by any subpattern. The following pattern matches "abcabc"
|
||||||
or "defdef":
|
or "defdef":
|
||||||
.sp
|
.sp
|
||||||
|
@ -1768,7 +1768,7 @@ In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
||||||
parentheses from other parts of the pattern, such as
|
parentheses from other parts of the pattern, such as
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
back references,
|
backreferences,
|
||||||
.\"
|
.\"
|
||||||
.\" HTML <a href="#recursion">
|
.\" HTML <a href="#recursion">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
@ -1811,7 +1811,7 @@ The convenience functions for extracting the data by name returns the substring
|
||||||
for the first (and in this example, the only) subpattern of that name that
|
for the first (and in this example, the only) subpattern of that name that
|
||||||
matched. This saves searching to find which numbered subpattern it was.
|
matched. This saves searching to find which numbered subpattern it was.
|
||||||
.P
|
.P
|
||||||
If you make a back reference to a non-unique named subpattern from elsewhere in
|
If you make a backreference to a non-unique named subpattern from elsewhere in
|
||||||
the pattern, the subpatterns to which the name refers are checked in the order
|
the pattern, the subpatterns to which the name refers are checked in the order
|
||||||
in which they appear in the overall pattern. The first one that is set is used
|
in which they appear in the overall pattern. The first one that is set is used
|
||||||
for the reference. For example, this pattern matches both "foofoo" and
|
for the reference. For example, this pattern matches both "foofoo" and
|
||||||
|
@ -1863,7 +1863,7 @@ items:
|
||||||
the \eR escape sequence
|
the \eR escape sequence
|
||||||
an escape such as \ed or \epL that matches a single character
|
an escape such as \ed or \epL that matches a single character
|
||||||
a character class
|
a character class
|
||||||
a back reference
|
a backreference
|
||||||
a parenthesized subpattern (including most assertions)
|
a parenthesized subpattern (including most assertions)
|
||||||
a subroutine call to a subpattern (recursive or otherwise)
|
a subroutine call to a subpattern (recursive or otherwise)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1980,7 +1980,7 @@ worth setting PCRE2_DOTALL in order to obtain this optimization, or
|
||||||
alternatively, using ^ to indicate anchoring explicitly.
|
alternatively, using ^ to indicate anchoring explicitly.
|
||||||
.P
|
.P
|
||||||
However, there are some cases where the optimization cannot be used. When .*
|
However, there are some cases where the optimization cannot be used. When .*
|
||||||
is inside capturing parentheses that are the subject of a back reference
|
is inside capturing parentheses that are the subject of a backreference
|
||||||
elsewhere in the pattern, a match at the start may fail where a later one
|
elsewhere in the pattern, a match at the start may fail where a later one
|
||||||
succeeds. Consider, for example:
|
succeeds. Consider, for example:
|
||||||
.sp
|
.sp
|
||||||
|
@ -2116,23 +2116,23 @@ sequences of non-digits cannot be broken, and failure happens quickly.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="backreferences"></a>
|
.\" HTML <a name="backreferences"></a>
|
||||||
.SH "BACK REFERENCES"
|
.SH "BACKREFERENCES"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
Outside a character class, a backslash followed by a digit greater than 0 (and
|
Outside a character class, a backslash followed by a digit greater than 0 (and
|
||||||
possibly further digits) is a back reference to a capturing subpattern earlier
|
possibly further digits) is a backreference to a capturing subpattern earlier
|
||||||
(that is, to its left) in the pattern, provided there have been that many
|
(that is, to its left) in the pattern, provided there have been that many
|
||||||
previous capturing left parentheses.
|
previous capturing left parentheses.
|
||||||
.P
|
.P
|
||||||
However, if the decimal number following the backslash is less than 8, it is
|
However, if the decimal number following the backslash is less than 8, it is
|
||||||
always taken as a back reference, and causes an error only if there are not
|
always taken as a backreference, and causes an error only if there are not
|
||||||
that many capturing left parentheses in the entire pattern. In other words, the
|
that many capturing left parentheses in the entire pattern. In other words, the
|
||||||
parentheses that are referenced need not be to the left of the reference for
|
parentheses that are referenced need not be to the left of the reference for
|
||||||
numbers less than 8. A "forward back reference" of this type can make sense
|
numbers less than 8. A "forward backreference" of this type can make sense
|
||||||
when a repetition is involved and the subpattern to the right has participated
|
when a repetition is involved and the subpattern to the right has participated
|
||||||
in an earlier iteration.
|
in an earlier iteration.
|
||||||
.P
|
.P
|
||||||
It is not possible to have a numerical "forward back reference" to a subpattern
|
It is not possible to have a numerical "forward backreference" to a subpattern
|
||||||
whose number is 8 or more using this syntax because a sequence such as \e50 is
|
whose number is 8 or more using this syntax because a sequence such as \e50 is
|
||||||
interpreted as a character defined in octal. See the subsection entitled
|
interpreted as a character defined in octal. See the subsection entitled
|
||||||
"Non-printing characters"
|
"Non-printing characters"
|
||||||
|
@ -2141,7 +2141,7 @@ interpreted as a character defined in octal. See the subsection entitled
|
||||||
above
|
above
|
||||||
.\"
|
.\"
|
||||||
for further details of the handling of digits following a backslash. There is
|
for further details of the handling of digits following a backslash. There is
|
||||||
no such problem when named parentheses are used. A back reference to any
|
no such problem when named parentheses are used. A backreference to any
|
||||||
subpattern is possible using named parentheses (see below).
|
subpattern is possible using named parentheses (see below).
|
||||||
.P
|
.P
|
||||||
Another way of avoiding the ambiguity inherent in the use of digits following a
|
Another way of avoiding the ambiguity inherent in the use of digits following a
|
||||||
|
@ -2169,7 +2169,7 @@ The sequence \eg{+1} is a reference to the next capturing subpattern. This kind
|
||||||
of forward reference can be useful it patterns that repeat. Perl does not
|
of forward reference can be useful it patterns that repeat. Perl does not
|
||||||
support the use of + in this way.
|
support the use of + in this way.
|
||||||
.P
|
.P
|
||||||
A back reference matches whatever actually matched the capturing subpattern in
|
A backreference matches whatever actually matched the capturing subpattern in
|
||||||
the current subject string, rather than anything matching the subpattern
|
the current subject string, rather than anything matching the subpattern
|
||||||
itself (see
|
itself (see
|
||||||
.\" HTML <a href="#subpatternsassubroutines">
|
.\" HTML <a href="#subpatternsassubroutines">
|
||||||
|
@ -2182,17 +2182,17 @@ below for a way of doing that). So the pattern
|
||||||
.sp
|
.sp
|
||||||
matches "sense and sensibility" and "response and responsibility", but not
|
matches "sense and sensibility" and "response and responsibility", but not
|
||||||
"sense and responsibility". If caseful matching is in force at the time of the
|
"sense and responsibility". If caseful matching is in force at the time of the
|
||||||
back reference, the case of letters is relevant. For example,
|
backreference, the case of letters is relevant. For example,
|
||||||
.sp
|
.sp
|
||||||
((?i)rah)\es+\e1
|
((?i)rah)\es+\e1
|
||||||
.sp
|
.sp
|
||||||
matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
|
matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
|
||||||
capturing subpattern is matched caselessly.
|
capturing subpattern is matched caselessly.
|
||||||
.P
|
.P
|
||||||
There are several different ways of writing back references to named
|
There are several different ways of writing backreferences to named
|
||||||
subpatterns. The .NET syntax \ek{name} and the Perl syntax \ek<name> or
|
subpatterns. The .NET syntax \ek{name} and the Perl syntax \ek<name> or
|
||||||
\ek'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
|
\ek'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
|
||||||
back reference syntax, in which \eg can be used for both numeric and named
|
backreference syntax, in which \eg can be used for both numeric and named
|
||||||
references, is also supported. We could rewrite the above example in any of
|
references, is also supported. We could rewrite the above example in any of
|
||||||
the following ways:
|
the following ways:
|
||||||
.sp
|
.sp
|
||||||
|
@ -2204,20 +2204,20 @@ the following ways:
|
||||||
A subpattern that is referenced by name may appear in the pattern before or
|
A subpattern that is referenced by name may appear in the pattern before or
|
||||||
after the reference.
|
after the reference.
|
||||||
.P
|
.P
|
||||||
There may be more than one back reference to the same subpattern. If a
|
There may be more than one backreference to the same subpattern. If a
|
||||||
subpattern has not actually been used in a particular match, any back
|
subpattern has not actually been used in a particular match, any backreferences
|
||||||
references to it always fail by default. For example, the pattern
|
to it always fail by default. For example, the pattern
|
||||||
.sp
|
.sp
|
||||||
(a|(bc))\e2
|
(a|(bc))\e2
|
||||||
.sp
|
.sp
|
||||||
always fails if it starts to match "a" rather than "bc". However, if the
|
always fails if it starts to match "a" rather than "bc". However, if the
|
||||||
PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a back reference to an
|
PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backreference to an
|
||||||
unset value matches an empty string.
|
unset value matches an empty string.
|
||||||
.P
|
.P
|
||||||
Because there may be many capturing parentheses in a pattern, all digits
|
Because there may be many capturing parentheses in a pattern, all digits
|
||||||
following a backslash are taken as part of a potential back reference number.
|
following a backslash are taken as part of a potential backreference number.
|
||||||
If the pattern continues with a digit character, some delimiter must be used to
|
If the pattern continues with a digit character, some delimiter must be used to
|
||||||
terminate the back reference. If the PCRE2_EXTENDED option is set, this can be
|
terminate the backreference. If the PCRE2_EXTENDED option is set, this can be
|
||||||
white space. Otherwise, the \eg{ syntax or an empty comment (see
|
white space. Otherwise, the \eg{ syntax or an empty comment (see
|
||||||
.\" HTML <a href="#comments">
|
.\" HTML <a href="#comments">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
@ -2226,10 +2226,10 @@ white space. Otherwise, the \eg{ syntax or an empty comment (see
|
||||||
below) can be used.
|
below) can be used.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Recursive back references"
|
.SS "Recursive backreferences"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
A back reference that occurs inside the parentheses to which it refers fails
|
A backreference that occurs inside the parentheses to which it refers fails
|
||||||
when the subpattern is first used, so, for example, (a\e1) never matches.
|
when the subpattern is first used, so, for example, (a\e1) never matches.
|
||||||
However, such references can be useful inside repeated subpatterns. For
|
However, such references can be useful inside repeated subpatterns. For
|
||||||
example, the pattern
|
example, the pattern
|
||||||
|
@ -2237,13 +2237,13 @@ example, the pattern
|
||||||
(a|b\e1)+
|
(a|b\e1)+
|
||||||
.sp
|
.sp
|
||||||
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
|
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
|
||||||
the subpattern, the back reference matches the character string corresponding
|
the subpattern, the backreference matches the character string corresponding
|
||||||
to the previous iteration. In order for this to work, the pattern must be such
|
to the previous iteration. In order for this to work, the pattern must be such
|
||||||
that the first iteration does not need to match the back reference. This can be
|
that the first iteration does not need to match the backreference. This can be
|
||||||
done using alternation, as in the example above, or by a quantifier with a
|
done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
.P
|
.P
|
||||||
Back references of this type cause the group that they reference to be treated
|
backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
.\" HTML <a href="#atomicgroup">
|
.\" HTML <a href="#atomicgroup">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
@ -2406,10 +2406,10 @@ recursion,
|
||||||
that is, a "subroutine" call into a group that is already active,
|
that is, a "subroutine" call into a group that is already active,
|
||||||
is not supported.
|
is not supported.
|
||||||
.P
|
.P
|
||||||
Perl does not support back references in lookbehinds. PCRE2 does support them,
|
Perl does not support backreferences in lookbehinds. PCRE2 does support them,
|
||||||
but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
|
but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
|
||||||
must not be set, there must be no use of (?| in the pattern (it creates
|
must not be set, there must be no use of (?| in the pattern (it creates
|
||||||
duplicate subpattern numbers), and if the back reference is by name, the name
|
duplicate subpattern numbers), and if the backreference is by name, the name
|
||||||
must be unique. Of course, the referenced subpattern must itself be of fixed
|
must be unique. Of course, the referenced subpattern must itself be of fixed
|
||||||
length. The following pattern matches words containing at least two characters
|
length. The following pattern matches words containing at least two characters
|
||||||
that begin and end with the same character:
|
that begin and end with the same character:
|
||||||
|
@ -2899,7 +2899,7 @@ in PCRE2 these values can be referenced. Consider this pattern:
|
||||||
^(.)(\e1|a(?2))
|
^(.)(\e1|a(?2))
|
||||||
.sp
|
.sp
|
||||||
This pattern matches "bab". The first capturing parentheses match "b", then in
|
This pattern matches "bab". The first capturing parentheses match "b", then in
|
||||||
the second group, when the back reference \e1 fails to match "b", the second
|
the second group, when the backreference \e1 fails to match "b", the second
|
||||||
alternative matches "a" and then recurses. In the recursion, \e1 does now match
|
alternative matches "a" and then recurses. In the recursion, \e1 does now match
|
||||||
"b" and so the whole match succeeds. This match used to fail in Perl, but in
|
"b" and so the whole match succeeds. This match used to fail in Perl, but in
|
||||||
later versions (I tried 5.024) it now works.
|
later versions (I tried 5.024) it now works.
|
||||||
|
@ -2964,7 +2964,7 @@ plus or a minus sign it is taken as a relative reference. For example:
|
||||||
(abc)(?i:\eg<-1>)
|
(abc)(?i:\eg<-1>)
|
||||||
.sp
|
.sp
|
||||||
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
|
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
|
||||||
synonymous. The former is a back reference; the latter is a subroutine call.
|
synonymous. The former is a backreference; the latter is a subroutine call.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH CALLOUTS
|
.SH CALLOUTS
|
||||||
|
|
|
@ -108,14 +108,14 @@ When a pattern that is compiled with this flag is passed to \fBregexec()\fP for
|
||||||
matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
|
matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
|
||||||
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
||||||
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
||||||
because it disables the use of back references.
|
because it disables the use of backreferences.
|
||||||
.sp
|
.sp
|
||||||
REG_PEND
|
REG_PEND
|
||||||
.sp
|
.sp
|
||||||
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
||||||
(which has the type const char *) must be set to point to the character beyond
|
(which has the type const char *) must be set to point to the character beyond
|
||||||
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
||||||
now contain binary zeroes, which are treated as data characters. Without
|
now contain binary zeros, which are treated as data characters. Without
|
||||||
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
||||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||||
caution in software intended to be portable to other systems.
|
caution in software intended to be portable to other systems.
|
||||||
|
@ -224,10 +224,10 @@ function.
|
||||||
.sp
|
.sp
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
.sp
|
.sp
|
||||||
When this option is set, the subject string is starts at \fIstring\fP +
|
When this option is set, the subject string starts at \fIstring\fP +
|
||||||
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
||||||
should point to the first character beyond the string. There may be binary
|
should point to the first character beyond the string. There may be binary
|
||||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
zeros within the subject string, and indeed, using REG_STARTEND is the only
|
||||||
way to pass a subject string that contains a binary zero.
|
way to pass a subject string that contains a binary zero.
|
||||||
.P
|
.P
|
||||||
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
||||||
|
|
|
@ -419,7 +419,7 @@ of the newline or \eR options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
.sp
|
.sp
|
||||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||||
(*LIMIT_MATCH=d) set the match limit to d
|
(*LIMIT_MATCH=d) set the match limit to d
|
||||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||||
|
|
|
@ -101,7 +101,7 @@ to occur).
|
||||||
UTF-8 (in its original definition) is not capable of encoding values greater
|
UTF-8 (in its original definition) is not capable of encoding values greater
|
||||||
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
||||||
testing this library in non-UTF mode with \fButf8_input\fP set, if any
|
testing this library in non-UTF mode with \fButf8_input\fP set, if any
|
||||||
character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
|
character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
|
||||||
0x80000000 is added to the character's value. This is the only way of passing
|
0x80000000 is added to the character's value. This is the only way of passing
|
||||||
such code points in a pattern string. For subject strings, using an escape
|
such code points in a pattern string. For subject strings, using an escape
|
||||||
sequence is preferable.
|
sequence is preferable.
|
||||||
|
@ -220,7 +220,7 @@ Do not output the version number of \fBpcre2test\fP at the start of execution.
|
||||||
.TP 10
|
.TP 10
|
||||||
\fB-S\fP \fIsize\fP
|
\fB-S\fP \fIsize\fP
|
||||||
On Unix-like systems, set the size of the run-time stack to \fIsize\fP
|
On Unix-like systems, set the size of the run-time stack to \fIsize\fP
|
||||||
megabytes.
|
mebibytes (units of 1024*1024 bytes).
|
||||||
.TP 10
|
.TP 10
|
||||||
\fB-subject\fP \fImodifier-list\fP
|
\fB-subject\fP \fImodifier-list\fP
|
||||||
Behave as if each subject line contains the given modifiers.
|
Behave as if each subject line contains the given modifiers.
|
||||||
|
@ -639,8 +639,8 @@ The effects of these modifiers are described in the following sections.
|
||||||
.sp
|
.sp
|
||||||
The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
|
The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
|
||||||
set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
|
set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
|
||||||
\eR matches any Unicode newline sequence. The default is specified when PCRE2
|
\eR matches any Unicode newline sequence. The default can be specified when
|
||||||
is built, with the default default being Unicode.
|
PCRE2 is built; if it is not, the default is set to Unicode.
|
||||||
.P
|
.P
|
||||||
The \fBnewline\fP modifier specifies which characters are to be interpreted as
|
The \fBnewline\fP modifier specifies which characters are to be interpreted as
|
||||||
newlines, both in the pattern and in subject lines. The type must be one of CR,
|
newlines, both in the pattern and in subject lines. The type must be one of CR,
|
||||||
|
@ -1381,11 +1381,11 @@ matching provokes an error return ("bad option value") from
|
||||||
.sp
|
.sp
|
||||||
The \fBjitstack\fP modifier provides a way of setting the maximum stack size
|
The \fBjitstack\fP modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||||
optimization is not being used. The value is a number of kilobytes. Setting
|
optimization is not being used. The value is a number of kibibytes (units of
|
||||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
|
||||||
default is necessary only for very complicated patterns. If \fBjitstack\fP is
|
that is larger than the default is necessary only for very complicated
|
||||||
set non-zero on a subject line it overrides any value that was set on the
|
patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any
|
||||||
pattern.
|
value that was set on the pattern.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Setting heap, match, and depth limits"
|
.SS "Setting heap, match, and depth limits"
|
||||||
|
@ -1427,10 +1427,10 @@ matching, \fImatch_limit\fP controls the total number of calls, both recursive
|
||||||
and non-recursive, to the internal matching function, thus controlling the
|
and non-recursive, to the internal matching function, thus controlling the
|
||||||
overall amount of computing resource that is used.
|
overall amount of computing resource that is used.
|
||||||
.P
|
.P
|
||||||
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes)
|
For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
|
||||||
limits the amount of heap memory used for matching. A value of zero disables
|
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
||||||
the use of any heap memory; many simple pattern matches can be done without
|
value of zero disables the use of any heap memory; many simple pattern matches
|
||||||
using the heap, so this is not an unreasonable setting.
|
can be done without using the heap, so zero is not an unreasonable setting.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Showing MARK names"
|
.SS "Showing MARK names"
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -46,7 +46,7 @@ compatibility with Perl 5.6. PCRE2 does not support this.
|
||||||
.SH "WIDE CHARACTERS AND UTF MODES"
|
.SH "WIDE CHARACTERS AND UTF MODES"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
Codepoints less than 256 can be specified in patterns by either braced or
|
Code points less than 256 can be specified in patterns by either braced or
|
||||||
unbraced hexadecimal escape sequences (for example, \ex{b3} or \exb3). Larger
|
unbraced hexadecimal escape sequences (for example, \ex{b3} or \exb3). Larger
|
||||||
values have to use braced sequences. Unbraced octal code points up to \e777 are
|
values have to use braced sequences. Unbraced octal code points up to \e777 are
|
||||||
also recognized; larger ones can be coded using \eo{...}.
|
also recognized; larger ones can be coded using \eo{...}.
|
||||||
|
@ -109,7 +109,7 @@ not PCRE2_UCP is set.
|
||||||
Case-insensitive matching in a UTF mode makes use of Unicode properties except
|
Case-insensitive matching in a UTF mode makes use of Unicode properties except
|
||||||
for characters whose code points are less than 128 and that have at most two
|
for characters whose code points are less than 128 and that have at most two
|
||||||
case-equivalent values. For these, a direct table lookup is used for speed. A
|
case-equivalent values. For these, a direct table lookup is used for speed. A
|
||||||
few Unicode characters such as Greek sigma have more than two codepoints that
|
few Unicode characters such as Greek sigma have more than two code points that
|
||||||
are case-equivalent, and these are treated as such.
|
are case-equivalent, and these are treated as such.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
|
|
@ -51,7 +51,7 @@ fi
|
||||||
# utf invoke UTF-8 functionality
|
# utf invoke UTF-8 functionality
|
||||||
#
|
#
|
||||||
# The data lines must not have any pcre2test modifiers. Unless
|
# The data lines must not have any pcre2test modifiers. Unless
|
||||||
# "subject_litersl" is on the pattern, data lines are processed as
|
# "subject_literal" is on the pattern, data lines are processed as
|
||||||
# Perl double-quoted strings, so if they contain " $ or @ characters, these
|
# Perl double-quoted strings, so if they contain " $ or @ characters, these
|
||||||
# have to be escaped. For this reason, all such characters in the
|
# have to be escaped. For this reason, all such characters in the
|
||||||
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
|
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
|
||||||
|
|
|
@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
/* Define to 1 if you have the <zlib.h> header file. */
|
/* Define to 1 if you have the <zlib.h> header file. */
|
||||||
/* #undef HAVE_ZLIB_H */
|
/* #undef HAVE_ZLIB_H */
|
||||||
|
|
||||||
/* This limits the amount of memory that pcre2_match() may use while matching
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
a pattern. The value is in kilobytes. */
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
|
to JIT matching. The value is in kilobytes. */
|
||||||
#ifndef HEAP_LIMIT
|
#ifndef HEAP_LIMIT
|
||||||
#define HEAP_LIMIT 20000000
|
#define HEAP_LIMIT 20000000
|
||||||
#endif
|
#endif
|
||||||
|
@ -155,7 +156,8 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* The value of MATCH_LIMIT determines the default number of times the
|
/* The value of MATCH_LIMIT determines the default number of times the
|
||||||
pcre2_match() function can record a backtrack position during a single
|
pcre2_match() function can record a backtrack position during a single
|
||||||
matching attempt. There is a runtime interface for setting a different
|
matching attempt. The value is also used to limit a loop counter in
|
||||||
|
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||||
limit. The limit exists in order to catch runaway regular expressions that
|
limit. The limit exists in order to catch runaway regular expressions that
|
||||||
take for ever to determine that they do not match. The default is set very
|
take for ever to determine that they do not match. The default is set very
|
||||||
large so that it does not accidentally catch legitimate cases. */
|
large so that it does not accidentally catch legitimate cases. */
|
||||||
|
@ -170,7 +172,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
||||||
must be less than the value of MATCH_LIMIT. The default is to use the same
|
must be less than the value of MATCH_LIMIT. The default is to use the same
|
||||||
value as MATCH_LIMIT. There is a runtime method for setting a different
|
value as MATCH_LIMIT. There is a runtime method for setting a different
|
||||||
limit. */
|
limit. In the case of pcre2_dfa_match(), this limit controls the depth of
|
||||||
|
the internal nested function calls that are used for pattern recursions,
|
||||||
|
lookarounds, and atomic groups. */
|
||||||
#ifndef MATCH_LIMIT_DEPTH
|
#ifndef MATCH_LIMIT_DEPTH
|
||||||
#define MATCH_LIMIT_DEPTH MATCH_LIMIT
|
#define MATCH_LIMIT_DEPTH MATCH_LIMIT
|
||||||
#endif
|
#endif
|
||||||
|
@ -210,7 +214,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_NAME "PCRE2"
|
#define PACKAGE_NAME "PCRE2"
|
||||||
|
|
||||||
/* Define to the full name and version of this package. */
|
/* Define to the full name and version of this package. */
|
||||||
#define PACKAGE_STRING "PCRE2 10.31"
|
#define PACKAGE_STRING "PCRE2 10.32-RC1"
|
||||||
|
|
||||||
/* Define to the one symbol short name of this package. */
|
/* Define to the one symbol short name of this package. */
|
||||||
#define PACKAGE_TARNAME "pcre2"
|
#define PACKAGE_TARNAME "pcre2"
|
||||||
|
@ -219,7 +223,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_URL ""
|
#define PACKAGE_URL ""
|
||||||
|
|
||||||
/* Define to the version of this package. */
|
/* Define to the version of this package. */
|
||||||
#define PACKAGE_VERSION "10.31"
|
#define PACKAGE_VERSION "10.32-RC1"
|
||||||
|
|
||||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||||
|
@ -339,7 +343,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* Version number of package */
|
/* Version number of package */
|
||||||
#define VERSION "10.31"
|
#define VERSION "10.32-RC1"
|
||||||
|
|
||||||
/* Define to 1 if on MINIX. */
|
/* Define to 1 if on MINIX. */
|
||||||
/* #undef _MINIX */
|
/* #undef _MINIX */
|
||||||
|
|
|
@ -134,7 +134,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* This limits the amount of memory that may be used while matching a pattern.
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
to JIT matching. The value is in kilobytes. */
|
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
|
||||||
#undef HEAP_LIMIT
|
#undef HEAP_LIMIT
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
/* The current PCRE version information. */
|
/* The current PCRE version information. */
|
||||||
|
|
||||||
#define PCRE2_MAJOR 10
|
#define PCRE2_MAJOR 10
|
||||||
#define PCRE2_MINOR 31
|
#define PCRE2_MINOR 32
|
||||||
#define PCRE2_PRERELEASE
|
#define PCRE2_PRERELEASE -RC1
|
||||||
#define PCRE2_DATE 2018-02-12
|
#define PCRE2_DATE 2018-02-19
|
||||||
|
|
||||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||||
imported have to be identified as such. When building PCRE2, the appropriate
|
imported have to be identified as such. When building PCRE2, the appropriate
|
||||||
|
|
|
@ -4261,11 +4261,11 @@ goto FAILED;
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Find first significant op code *
|
* Find first significant opcode *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This is called by several functions that scan a compiled expression looking
|
/* This is called by several functions that scan a compiled expression looking
|
||||||
for a fixed first character, or an anchoring op code etc. It skips over things
|
for a fixed first character, or an anchoring opcode etc. It skips over things
|
||||||
that do not influence this. For some calls, it makes sense to skip negative
|
that do not influence this. For some calls, it makes sense to skip negative
|
||||||
forward and all backward assertions, and also the \b assertion; for others it
|
forward and all backward assertions, and also the \b assertion; for others it
|
||||||
does not.
|
does not.
|
||||||
|
@ -5472,7 +5472,7 @@ for (;; pptr++)
|
||||||
set xclass = TRUE. Then, in the pre-compile phase, accumulate the length
|
set xclass = TRUE. Then, in the pre-compile phase, accumulate the length
|
||||||
of the extra data and reset the pointer. This is so that very large
|
of the extra data and reset the pointer. This is so that very large
|
||||||
classes that contain a zillion wide characters or Unicode property tests
|
classes that contain a zillion wide characters or Unicode property tests
|
||||||
do not overwrite the work space (which is on the stack). */
|
do not overwrite the workspace (which is on the stack). */
|
||||||
|
|
||||||
if (class_uchardata > class_uchardata_base)
|
if (class_uchardata > class_uchardata_base)
|
||||||
{
|
{
|
||||||
|
@ -7460,7 +7460,7 @@ length of the BRA and KET and any extra code units that are required at the
|
||||||
beginning. We accumulate in a local variable to save frequent testing of
|
beginning. We accumulate in a local variable to save frequent testing of
|
||||||
lengthptr for NULL. We cannot do this by looking at the value of 'code' at the
|
lengthptr for NULL. We cannot do this by looking at the value of 'code' at the
|
||||||
start and end of each alternative, because compiled items are discarded during
|
start and end of each alternative, because compiled items are discarded during
|
||||||
the pre-compile phase so that the work space is not exceeded. */
|
the pre-compile phase so that the workspace is not exceeded. */
|
||||||
|
|
||||||
length = 2 + 2*LINK_SIZE + skipunits;
|
length = 2 + 2*LINK_SIZE + skipunits;
|
||||||
|
|
||||||
|
|
|
@ -387,8 +387,8 @@ return (mb->callout)(cb, mb->callout_data);
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This function is called when internal_dfa_match() is about to be called
|
/* This function is called when internal_dfa_match() is about to be called
|
||||||
recursively and there is insufficient workingspace left in the current work
|
recursively and there is insufficient working space left in the current
|
||||||
space block. If there's an existing next block, use it; otherwise get a new
|
workspace block. If there's an existing next block, use it; otherwise get a new
|
||||||
block unless the heap limit is reached.
|
block unless the heap limit is reached.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
|
@ -2800,7 +2800,7 @@ for (;;)
|
||||||
local_workspace, /* workspace vector */
|
local_workspace, /* workspace vector */
|
||||||
RWS_RSIZE, /* size of same */
|
RWS_RSIZE, /* size of same */
|
||||||
rlevel, /* function recursion level */
|
rlevel, /* function recursion level */
|
||||||
RWS); /* recursion work space */
|
RWS); /* recursion workspace */
|
||||||
|
|
||||||
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
|
|
|
@ -43,7 +43,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
#include "config.h"
|
#include "config.h"
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* These defines enables debugging code */
|
/* These defines enable debugging code */
|
||||||
|
|
||||||
//#define DEBUG_FRAMES_DISPLAY
|
//#define DEBUG_FRAMES_DISPLAY
|
||||||
//#define DEBUG_SHOW_OPS
|
//#define DEBUG_SHOW_OPS
|
||||||
|
@ -1776,7 +1776,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
|
|
||||||
|
|
||||||
/* ===================================================================== */
|
/* ===================================================================== */
|
||||||
/* Match a bit-mapped character class, possibly repeatedly. These op codes
|
/* Match a bit-mapped character class, possibly repeatedly. These opcodes
|
||||||
are used when all the characters in the class have values in the range
|
are used when all the characters in the class have values in the range
|
||||||
0-255, and either the matching is caseful, or the characters are in the
|
0-255, and either the matching is caseful, or the characters are in the
|
||||||
range 0-127 when UTF processing is enabled. The only difference between
|
range 0-127 when UTF processing is enabled. The only difference between
|
||||||
|
@ -2464,7 +2464,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
|
|
||||||
/* ===================================================================== */
|
/* ===================================================================== */
|
||||||
/* Match a single character type repeatedly. Note that the property type
|
/* Match a single character type repeatedly. Note that the property type
|
||||||
does not need to be in a stack frame as it not used within an RMATCH()
|
does not need to be in a stack frame as it is not used within an RMATCH()
|
||||||
loop. */
|
loop. */
|
||||||
|
|
||||||
#define Lstart_eptr F->temp_sptr[0]
|
#define Lstart_eptr F->temp_sptr[0]
|
||||||
|
@ -4143,7 +4143,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* The "byte" (i.e. "code unit") case is the same as non-UTF */
|
/* The "byte" (i.e. "code unit") case is the same as non-UTF */
|
||||||
|
|
||||||
case OP_ANYBYTE:
|
case OP_ANYBYTE:
|
||||||
fc = Lmax - Lmin;
|
fc = Lmax - Lmin;
|
||||||
|
@ -5424,7 +5424,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
Feptr -= number;
|
Feptr -= number;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Save the earliest consulted character, then skip to next op code */
|
/* Save the earliest consulted character, then skip to next opcode */
|
||||||
|
|
||||||
if (Feptr < mb->start_used_ptr) mb->start_used_ptr = Feptr;
|
if (Feptr < mb->start_used_ptr) mb->start_used_ptr = Feptr;
|
||||||
Fecode += 1 + LINK_SIZE;
|
Fecode += 1 + LINK_SIZE;
|
||||||
|
@ -5929,7 +5929,7 @@ in rrc. */
|
||||||
|
|
||||||
RETURN_SWITCH:
|
RETURN_SWITCH:
|
||||||
if (Frdepth == 0) return rrc; /* Exit from the top level */
|
if (Frdepth == 0) return rrc; /* Exit from the top level */
|
||||||
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
|
F = (heapframe *)((char *)F - Fback_frame); /* Backtrack */
|
||||||
mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */
|
mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */
|
||||||
|
|
||||||
#ifdef DEBUG_SHOW_RMATCH
|
#ifdef DEBUG_SHOW_RMATCH
|
||||||
|
|
|
@ -1274,7 +1274,7 @@ do
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* Single character types set the bits and stop. Note that if PCRE2_UCP
|
/* Single character types set the bits and stop. Note that if PCRE2_UCP
|
||||||
is set, we do not see these op codes because \d etc are converted to
|
is set, we do not see these opcodes because \d etc are converted to
|
||||||
properties. Therefore, these apply in the case when only characters less
|
properties. Therefore, these apply in the case when only characters less
|
||||||
than 256 are recognized to match the types. */
|
than 256 are recognized to match the types. */
|
||||||
|
|
||||||
|
|
|
@ -170,7 +170,7 @@ are implementing).
|
||||||
by E_Modifier). Extend characters are allowed before the modifier; this
|
by E_Modifier). Extend characters are allowed before the modifier; this
|
||||||
cannot be represented in this table, the code has to deal with it.
|
cannot be represented in this table, the code has to deal with it.
|
||||||
|
|
||||||
8. Do not break within emoji zwj sequences (ZWJ followed by Glue_After_Zwj or
|
8. Do not break within emoji zwj sequences (ZWJ followed by Glue_After_Zwj or
|
||||||
E_Base_GAZ).
|
E_Base_GAZ).
|
||||||
|
|
||||||
9. Do not break within emoji flag sequences. That is, do not break between
|
9. Do not break within emoji flag sequences. That is, do not break between
|
||||||
|
|
|
@ -492,7 +492,7 @@ so many of them that they are split into two fields. */
|
||||||
|
|
||||||
/* These are the matching controls that may be set either on a pattern or on a
|
/* These are the matching controls that may be set either on a pattern or on a
|
||||||
data line. They are copied from the pattern controls as initial settings for
|
data line. They are copied from the pattern controls as initial settings for
|
||||||
data line controls Note that CTL_MEMORY is not included here, because it does
|
data line controls. Note that CTL_MEMORY is not included here, because it does
|
||||||
different things in the two cases. */
|
different things in the two cases. */
|
||||||
|
|
||||||
#define CTL_ALLPD (CTL_AFTERTEXT|\
|
#define CTL_ALLPD (CTL_AFTERTEXT|\
|
||||||
|
@ -5411,7 +5411,7 @@ switch(errorcode)
|
||||||
|
|
||||||
/* The pattern is now in pbuffer[8|16|32], with the length in code units in
|
/* The pattern is now in pbuffer[8|16|32], with the length in code units in
|
||||||
patlen. If it is to be converted, copy the result back afterwards so that it
|
patlen. If it is to be converted, copy the result back afterwards so that it
|
||||||
it ends up back in the usual place. */
|
ends up back in the usual place. */
|
||||||
|
|
||||||
if (pat_patctl.convert_type != CONVERT_UNSET)
|
if (pat_patctl.convert_type != CONVERT_UNSET)
|
||||||
{
|
{
|
||||||
|
@ -5735,7 +5735,7 @@ return PR_OK;
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This is used for DFA, normal, and JIT fast matching. For DFA matching it
|
/* This is used for DFA, normal, and JIT fast matching. For DFA matching it
|
||||||
should only called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
|
should only be called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
pp the subject string
|
pp the subject string
|
||||||
|
@ -7766,7 +7766,7 @@ printf(" -LM list pattern and subject modifiers, then exit\n");
|
||||||
printf(" -q quiet: do not output PCRE2 version number at start\n");
|
printf(" -q quiet: do not output PCRE2 version number at start\n");
|
||||||
printf(" -pattern <s> set default pattern modifier fields\n");
|
printf(" -pattern <s> set default pattern modifier fields\n");
|
||||||
printf(" -subject <s> set default subject modifier fields\n");
|
printf(" -subject <s> set default subject modifier fields\n");
|
||||||
printf(" -S <n> set stack size to <n> megabytes\n");
|
printf(" -S <n> set stack size to <n> mebibytes\n");
|
||||||
printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
|
printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
|
||||||
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
|
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
|
||||||
printf(" -T same as -t, but show total times at the end\n");
|
printf(" -T same as -t, but show total times at the end\n");
|
||||||
|
|
Loading…
Reference in New Issue