Typos in documentation and comments noted by Jason Hood.
This commit is contained in:
parent
fa58ac6734
commit
fabea723cf
|
@ -146,7 +146,7 @@ SET(PCRE2_PARENS_NEST_LIMIT "250" CACHE STRING
|
|||
"Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
|
||||
|
||||
SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING
|
||||
"Default limit on heap memory (kilobytes). See HEAP_LIMIT in config.h.in for details.")
|
||||
"Default limit on heap memory (kibibytes). See HEAP_LIMIT in config.h.in for details.")
|
||||
|
||||
SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING
|
||||
"Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")
|
||||
|
|
|
@ -17,7 +17,7 @@ groups altogether. Now it shows those that come before any actual captures as
|
|||
3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
|
||||
whatever the build configuration was. It now correctly says "\R matches all
|
||||
Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
|
||||
specified. Similarly, running "pcfre2test -C bsr" never produced the result
|
||||
specified. Similarly, running "pcre2test -C bsr" never produced the result
|
||||
ANY.
|
||||
|
||||
4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
|
||||
|
@ -370,7 +370,7 @@ tests to improve coverage.
|
|||
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
||||
pcre2test, a crash could occur.
|
||||
|
||||
32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
|
||||
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
|
||||
all the tests can run with clang's sanitizing options.
|
||||
|
||||
33. Implement extra compile options in the compile context and add the first
|
||||
|
|
4
HACKING
4
HACKING
|
@ -348,7 +348,7 @@ The /i, /m, or /s options (PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
|
|||
others) may be changed in the middle of patterns by items such as (?i). Their
|
||||
processing is handled entirely at compile time by generating different opcodes
|
||||
for the different settings. The runtime functions do not need to keep track of
|
||||
an options state.
|
||||
an option's state.
|
||||
|
||||
PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE
|
||||
are tracked and processed during the parsing pre-pass. The others are handled
|
||||
|
@ -764,7 +764,7 @@ OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
|
|||
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
||||
"subroutine" calls, even though they are not strictly a recursion. Up till
|
||||
release 10.30 recursions were treated as atomic groups, making them
|
||||
incompatible with Perl (but PCRE had then well before Perl did). From 10.30,
|
||||
incompatible with Perl (but PCRE had them well before Perl did). From 10.30,
|
||||
backtracking into recursions is supported.
|
||||
|
||||
Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only
|
||||
|
|
4
NEWS
4
NEWS
|
@ -31,7 +31,7 @@ remembering backtracking positions. This makes --disable-stack-for-recursion a
|
|||
NOOP. The new implementation allows backtracking into recursive group calls in
|
||||
patterns, making it more compatible with Perl, and also fixes some other
|
||||
previously hard-to-do issues. For patterns that have a lot of backtracking, the
|
||||
heap is now used, and there is explicit limit on the amount, settable by
|
||||
heap is now used, and there is an explicit limit on the amount, settable by
|
||||
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
|
||||
but is renamed as "depth limit" (though the old names remain for
|
||||
compatibility).
|
||||
|
@ -53,7 +53,7 @@ also supported.
|
|||
|
||||
5. Additional compile options in the compile context are now available, and the
|
||||
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
|
||||
|
||||
6. The newline type PCRE2_NEWLINE_NUL is now available.
|
||||
|
||||
|
|
|
@ -127,7 +127,7 @@ can skip ahead to the CMake section.
|
|||
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
||||
these yourself.
|
||||
|
||||
Not also that the pcre2_fuzzsupport.c file contains special code that is
|
||||
Note also that the pcre2_fuzzsupport.c file contains special code that is
|
||||
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
||||
you are doing that, you can ignore it.
|
||||
|
||||
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
|||
|
||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||
|
||||
Prior to release 10.30 the default system stack size of 1Mb in some Windows
|
||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||
environments caused issues with some tests. This should no longer be the case
|
||||
for 10.30 and later releases.
|
||||
|
||||
|
|
17
README
17
README
|
@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
|
|||
|
||||
--with-heap-limit=500
|
||||
|
||||
The units are kilobytes. This limit does not apply when the JIT optimization
|
||||
(which has its own memory control features) is used. There is more discussion
|
||||
on the pcre2api man page (search for pcre2_set_heap_limit).
|
||||
The units are kibibytes (units of 1024 bytes). This limit does not apply when
|
||||
the JIT optimization (which has its own memory control features) is used.
|
||||
There is more discussion on the pcre2api man page (search for
|
||||
pcre2_set_heap_limit).
|
||||
|
||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||
|
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
|
|||
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
||||
you add --disable-pcre2grep-jit to the "configure" command.
|
||||
|
||||
. On non-Windows sytems there is support for calling external scripts during
|
||||
matching in the pcre2grep command via PCRE2's callout facility with string
|
||||
arguments. This support can be disabled by adding --disable-pcre2grep-callout
|
||||
to the "configure" command.
|
||||
. There is support for calling external programs during matching in the
|
||||
pcre2grep command, using PCRE2's callout facility with string arguments. This
|
||||
support can be disabled by adding --disable-pcre2grep-callout to the
|
||||
"configure" command.
|
||||
|
||||
. The pcre2grep program currently supports only 8-bit data files, and so
|
||||
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
||||
|
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 27 April 2018
|
||||
Last updated: 17 June 2018
|
||||
|
|
|
@ -708,7 +708,7 @@ $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >
|
|||
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
|
||||
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep
|
||||
|
||||
# It seems inpossible to handle NUL characters easily in Solaris (aka SunOS).
|
||||
# It seems impossible to handle NUL characters easily in Solaris (aka SunOS).
|
||||
# The version of sed explicitly doesn't like them. For the moment, we just
|
||||
# don't run this test under SunOS. Fudge the output so that the comparison
|
||||
# works. A similar problem has also been reported for MacOS (Darwin).
|
||||
|
|
2
RunTest
2
RunTest
|
@ -843,7 +843,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
|||
checkresult $? 24 ""
|
||||
fi
|
||||
|
||||
# UTF pattern converson tests
|
||||
# UTF pattern conversion tests
|
||||
|
||||
if [ "$do25" = yes ] ; then
|
||||
echo $title25
|
||||
|
|
|
@ -288,7 +288,7 @@ AC_ARG_WITH(parens-nest-limit,
|
|||
# Handle --with-heap-limit
|
||||
AC_ARG_WITH(heap-limit,
|
||||
AS_HELP_STRING([--with-heap-limit=N],
|
||||
[default limit on heap memory (kilobytes, default=20000000)]),
|
||||
[default limit on heap memory (kibibytes, default=20000000)]),
|
||||
, with_heap_limit=20000000)
|
||||
|
||||
# Handle --with-match-limit=N
|
||||
|
@ -754,7 +754,7 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
|
|||
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
||||
This limits the amount of memory that may be used while matching
|
||||
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
||||
not apply to JIT matching. The value is in kilobytes.])
|
||||
not apply to JIT matching. The value is in kibibytes (units of 1024 bytes).])
|
||||
|
||||
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
||||
This limit is parameterized just in case anybody ever wants to
|
||||
|
@ -1017,7 +1017,7 @@ $PACKAGE-$VERSION configuration summary:
|
|||
Rebuild char tables ................ : ${enable_rebuild_chartables}
|
||||
Internal link size ................. : ${with_link_size}
|
||||
Nested parentheses limit ........... : ${with_parens_nest_limit}
|
||||
Heap limit ......................... : ${with_heap_limit} kilobytes
|
||||
Heap limit ......................... : ${with_heap_limit} kibibytes
|
||||
Match limit ........................ : ${with_match_limit}
|
||||
Match depth limit .................. : ${with_match_limit_depth}
|
||||
Build shared libs .................. : ${enable_shared}
|
||||
|
|
|
@ -127,7 +127,7 @@ can skip ahead to the CMake section.
|
|||
src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile
|
||||
these yourself.
|
||||
|
||||
Not also that the pcre2_fuzzsupport.c file contains special code that is
|
||||
Note also that the pcre2_fuzzsupport.c file contains special code that is
|
||||
useful to those who want to run fuzzing tests on the PCRE2 library. Unless
|
||||
you are doing that, you can ignore it.
|
||||
|
||||
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
|||
|
||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||
|
||||
Prior to release 10.30 the default system stack size of 1Mb in some Windows
|
||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||
environments caused issues with some tests. This should no longer be the case
|
||||
for 10.30 and later releases.
|
||||
|
||||
|
|
|
@ -257,9 +257,10 @@ library. They are also documented in the pcre2build man page.
|
|||
|
||||
--with-heap-limit=500
|
||||
|
||||
The units are kilobytes. This limit does not apply when the JIT optimization
|
||||
(which has its own memory control features) is used. There is more discussion
|
||||
on the pcre2api man page (search for pcre2_set_heap_limit).
|
||||
The units are kibibytes (units of 1024 bytes). This limit does not apply when
|
||||
the JIT optimization (which has its own memory control features) is used.
|
||||
There is more discussion on the pcre2api man page (search for
|
||||
pcre2_set_heap_limit).
|
||||
|
||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||
|
@ -319,10 +320,10 @@ library. They are also documented in the pcre2build man page.
|
|||
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
|
||||
you add --disable-pcre2grep-jit to the "configure" command.
|
||||
|
||||
. On non-Windows sytems there is support for calling external scripts during
|
||||
matching in the pcre2grep command via PCRE2's callout facility with string
|
||||
arguments. This support can be disabled by adding --disable-pcre2grep-callout
|
||||
to the "configure" command.
|
||||
. There is support for calling external programs during matching in the
|
||||
pcre2grep command, using PCRE2's callout facility with string arguments. This
|
||||
support can be disabled by adding --disable-pcre2grep-callout to the
|
||||
"configure" command.
|
||||
|
||||
. The pcre2grep program currently supports only 8-bit data files, and so
|
||||
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
|
||||
|
@ -887,4 +888,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 27 April 2018
|
||||
Last updated: 17 June 2018
|
||||
|
|
|
@ -28,7 +28,7 @@ DESCRIPTION
|
|||
<P>
|
||||
This function is part of an experimental set of pattern conversion functions.
|
||||
It sets the component separator character that is used when converting globs.
|
||||
The second argument must one of the characters forward slash, backslash, or
|
||||
The second argument must be one of the characters forward slash, backslash, or
|
||||
dot. The default is backslash when running under Windows, otherwise forward
|
||||
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
||||
the second argument is invalid.
|
||||
|
|
|
@ -562,10 +562,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
|||
<P>
|
||||
Each of the first three conventions is used by at least one operating system as
|
||||
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
||||
The default default is LF, which is the Unix standard. However, the newline
|
||||
convention can be changed by an application when calling <b>pcre2_compile()</b>,
|
||||
or it can be specified by special text at the start of the pattern itself; this
|
||||
overrides any other settings. See the
|
||||
If it is not, the default is set to LF, which is the Unix standard. However,
|
||||
the newline convention can be changed by an application when calling
|
||||
<b>pcre2_compile()</b>, or it can be specified by special text at the start of
|
||||
the pattern itself; this overrides any other settings. See the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page for details of the special character sequences.
|
||||
</P>
|
||||
|
@ -949,17 +949,18 @@ offset limit. In other words, whichever limit comes first is used.
|
|||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
||||
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
||||
information when running an interpretive match. This limit also applies to
|
||||
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
|
||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
||||
does not apply to matching with the JIT optimization, which has its own memory
|
||||
control arrangements (see the
|
||||
The <i>heap_limit</i> parameter specifies, in units of kibibytes (1024 bytes),
|
||||
the maximum amount of heap memory that <b>pcre2_match()</b> may use to hold
|
||||
backtracking information when running an interpretive match. This limit also
|
||||
applies to <b>pcre2_dfa_match()</b>, which may use the heap when processing
|
||||
patterns with a lot of nested pattern recursion or lookarounds or atomic
|
||||
groups. This limit does not apply to matching with the JIT optimization, which
|
||||
has its own memory control arrangements (see the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for more details). If the limit is reached, the negative error
|
||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
||||
built; the default default is very large and is essentially "unlimited".
|
||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||
is built; if it is not, the default is set very large and is essentially
|
||||
"unlimited".
|
||||
</P>
|
||||
<P>
|
||||
A value for the heap limit may also be supplied by an item at the start of a
|
||||
|
@ -1044,7 +1045,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
|
|||
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
||||
uses it to limit the depth of nested internal recursive function calls that
|
||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
||||
limits, indirectly, the amount of system stack that is used. It was more useful
|
||||
in versions before 10.32, when stack memory was used for local workspace
|
||||
vectors for recursive function calls. From version 10.32, only local variables
|
||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||
|
@ -1060,11 +1061,11 @@ probably better to limit heap usage directly by calling
|
|||
<b>pcre2_set_heap_limit()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
|
||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||
item at the start of a pattern of the form
|
||||
The default value for the depth limit can be set when PCRE2 is built; if it is
|
||||
not, the default is set to the same value as the default for the match limit.
|
||||
If the limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
|
||||
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
<pre>
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
</pre>
|
||||
|
@ -1120,7 +1121,7 @@ given with <b>pcre2_set_depth_limit()</b> above.
|
|||
<pre>
|
||||
PCRE2_CONFIG_HEAPLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
||||
The output is a uint32_t integer that gives, in kibibytes, the default limit
|
||||
for the amount of heap memory used by <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b>. Further details are given with
|
||||
<b>pcre2_set_heap_limit()</b> above.
|
||||
|
@ -1431,7 +1432,7 @@ If this bit is set, letters in the pattern match both upper and lower case
|
|||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||
properties are used for all characters with more than one other case, and for
|
||||
all characters whose code points are greater than U+007f. For lower valued
|
||||
all characters whose code points are greater than U+007F. For lower valued
|
||||
characters with only one other case, a lookup table is used for speed. When
|
||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||
|
@ -1613,8 +1614,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
|||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||
were followed by ?: but named parentheses can still be used for capturing (and
|
||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||
Note that, when this option is set, references to capturing groups (back
|
||||
references or recursion/subroutine calls) may only refer to named groups,
|
||||
Note that, when this option is set, references to capturing groups
|
||||
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||
though the reference can be by name or by number.
|
||||
<pre>
|
||||
PCRE2_NO_AUTO_POSSESS
|
||||
|
@ -2019,10 +2020,10 @@ returned if there are no back references.
|
|||
<pre>
|
||||
PCRE2_INFO_BSR
|
||||
</pre>
|
||||
The output is a uint32_t whose value indicates what character sequences the \R
|
||||
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R matches
|
||||
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \R
|
||||
matches only CR, LF, or CRLF.
|
||||
The output is a uint32_t integer whose value indicates what character sequences
|
||||
the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R
|
||||
matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
|
||||
that \R matches only CR, LF, or CRLF.
|
||||
<pre>
|
||||
PCRE2_INFO_CAPTURECOUNT
|
||||
</pre>
|
||||
|
@ -2034,10 +2035,10 @@ The third argument should point to an <b>uint32_t</b> variable.
|
|||
</pre>
|
||||
If the pattern set a backtracking depth limit by including an item of the form
|
||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
||||
that this limit will only be used during matching if it is less than the limit
|
||||
set or defaulted by the caller of the match function.
|
||||
should point to a uint32_t integer. If no such value has been set, the call to
|
||||
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||
limit will only be used during matching if it is less than the limit set or
|
||||
defaulted by the caller of the match function.
|
||||
<pre>
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
</pre>
|
||||
|
@ -2047,7 +2048,7 @@ values for the first code unit in any match. For example, a pattern that starts
|
|||
with [abc] results in a table with three bits set. When code unit values
|
||||
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
||||
value 255 or above". If such a table was constructed, a pointer to it is
|
||||
returned. Otherwise NULL is returned. The third argument should point to an
|
||||
returned. Otherwise NULL is returned. The third argument should point to a
|
||||
<b>const uint8_t *</b> variable.
|
||||
<pre>
|
||||
PCRE2_INFO_FIRSTCODETYPE
|
||||
|
@ -2074,7 +2075,7 @@ and up to 0xffffffff when not using UTF-32 mode.
|
|||
</pre>
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||
without the use of JIT. The third argument should point to an <b>size_t</b>
|
||||
without the use of JIT. The third argument should point to a <b>size_t</b>
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
<pre>
|
||||
|
@ -2094,10 +2095,10 @@ the equivalent hexadecimal or octal escape sequences.
|
|||
</pre>
|
||||
If the pattern set a heap memory limit by including an item of the form
|
||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
||||
that this limit will only be used during matching if it is less than the limit
|
||||
set or defaulted by the caller of the match function.
|
||||
should point to a uint32_t integer. If no such value has been set, the call to
|
||||
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||
limit will only be used during matching if it is less than the limit set or
|
||||
defaulted by the caller of the match function.
|
||||
<pre>
|
||||
PCRE2_INFO_JCHANGED
|
||||
</pre>
|
||||
|
@ -2141,15 +2142,15 @@ in such cases.
|
|||
</pre>
|
||||
If the pattern set a match limit by including an item of the form
|
||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
||||
that this limit will only be used during matching if it is less than the limit
|
||||
set or defaulted by the caller of the match function.
|
||||
should point to a uint32_t integer. If no such value has been set, the call to
|
||||
<b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note that this
|
||||
limit will only be used during matching if it is less than the limit set or
|
||||
defaulted by the caller of the match function.
|
||||
<pre>
|
||||
PCRE2_INFO_MAXLOOKBEHIND
|
||||
</pre>
|
||||
Return the number of characters (not code units) in the longest lookbehind
|
||||
assertion in the pattern. The third argument should point to an unsigned 32-bit
|
||||
assertion in the pattern. The third argument should point to a uint32_t
|
||||
integer. This information is useful when doing multi-segment matching using the
|
||||
partial matching facilities. Note that the simple assertions \b and \B
|
||||
require a one-character lookbehind. \A also registers a one-character
|
||||
|
@ -2417,7 +2418,7 @@ zero, the search for a match starts at the beginning of the subject, and this
|
|||
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
||||
must point to the start of a character, or to the end of the subject (in UTF-32
|
||||
mode, one code unit equals one character, so all offsets are valid). Like the
|
||||
pattern string, the subject may contain binary zeroes.
|
||||
pattern string, the subject may contain binary zeros.
|
||||
</P>
|
||||
<P>
|
||||
A non-zero starting offset is useful when searching for another match in the
|
||||
|
|
|
@ -227,7 +227,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
|||
<pre>
|
||||
--enable-newline-is-nul
|
||||
</pre>
|
||||
which causes NUL (binary zero) is set as the default line-ending character.
|
||||
which causes NUL (binary zero) to be set as the default line-ending character.
|
||||
</P>
|
||||
<P>
|
||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||
|
@ -286,15 +286,15 @@ The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
|||
stack to record backtracking points. The more nested backtracking points there
|
||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||
which is specified in kilobytes. The limit can be changed at run time, as
|
||||
described in the
|
||||
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
||||
at run time, as described in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
documentation. The default limit (in effect unlimited) is 20 million. You can
|
||||
change this by a setting such as
|
||||
<pre>
|
||||
--with-heap-limit=500
|
||||
</pre>
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
||||
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
||||
may also use the heap for internal workspace when processing complicated
|
||||
patterns. This limit does not apply when JIT (which has its own memory
|
||||
|
@ -542,7 +542,7 @@ generated from the string.
|
|||
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
|
||||
to be created. This is normally run under valgrind or used when PCRE2 is
|
||||
compiled with address sanitizing enabled. It calls the fuzzing function and
|
||||
outputs information about it is doing. The input strings are specified by
|
||||
outputs information about what it is doing. The input strings are specified by
|
||||
arguments: if an argument starts with "=" the rest of it is a literal input
|
||||
string. Otherwise, it is assumed to be a file name, and the contents of the
|
||||
file are the test string.
|
||||
|
|
|
@ -31,7 +31,7 @@ page.
|
|||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||
that the next three characters are not "a". It just asserts that the next
|
||||
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
||||
character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
||||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||
for example, \b* (but not \b{3}), but these do not seem to have any use.
|
||||
</P>
|
||||
|
@ -77,8 +77,8 @@ The \Q...\E sequence is recognized both inside and outside character classes.
|
|||
</P>
|
||||
<P>
|
||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||
constructions. However, there is support PCRE2's "callout" feature, which
|
||||
allows an external function to be called during pattern matching. See the
|
||||
constructions. However, PCRE2 does have a "callout" feature, which allows an
|
||||
external function to be called during pattern matching. See the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation for details.
|
||||
</P>
|
||||
|
|
|
@ -86,9 +86,10 @@ controlled by parameters that can be set by the <b>--buffer-size</b> and
|
|||
that is obtained at the start of processing. If an input file contains very
|
||||
long lines, a larger buffer may be needed; this is handled by automatically
|
||||
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
||||
default values for these parameters are specified when <b>pcre2grep</b> is
|
||||
built, with the default defaults being 20K and 1M respectively. An error occurs
|
||||
if a line is too long and the buffer can no longer be expanded.
|
||||
default values for these parameters can be set when <b>pcre2grep</b> is
|
||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||
respectively. An error occurs if a line is too long and the buffer can no
|
||||
longer be expanded.
|
||||
</P>
|
||||
<P>
|
||||
The block of memory that is actually used is three times the "buffer size", to
|
||||
|
@ -500,13 +501,13 @@ short form for this option.
|
|||
When this option is given, non-compressed input is read and processed line by
|
||||
line, and the output is flushed after each write. By default, input is read in
|
||||
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
||||
terminal (which is currently possible only in Unix-like environments). Output
|
||||
to terminal is normally automatically flushed by the operating system. This
|
||||
option can be useful when the input or output is attached to a pipe and you do
|
||||
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
|
||||
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
|
||||
ignored.
|
||||
terminal (which is currently possible only in Unix-like environments or
|
||||
Windows). Output to terminal is normally automatically flushed by the operating
|
||||
system. This option can be useful when the input or output is attached to a
|
||||
pipe and you do not want <b>pcre2grep</b> to buffer up large amounts of data.
|
||||
However, its use will affect performance, and the <b>-M</b> (multiline) option
|
||||
ceases to work. When input is from a compressed .gz or .bz2 file,
|
||||
<b>--line-buffered</b> is ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>--line-offsets</b>
|
||||
|
@ -541,11 +542,11 @@ counter that is incremented each time around its main processing loop. If the
|
|||
value set by <b>--match-limit</b> is reached, an error occurs.
|
||||
<br>
|
||||
<br>
|
||||
The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount
|
||||
of heap memory that may be used for matching. Heap memory is needed only if
|
||||
matching the pattern requires a significant number of nested backtracking
|
||||
points to be remembered. This parameter can be set to zero to forbid the use of
|
||||
heap memory altogether.
|
||||
The <b>--heap-limit</b> option specifies, as a number of kibibytes (units of
|
||||
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
||||
memory is needed only if matching the pattern requires a significant number of
|
||||
nested backtracking points to be remembered. This parameter can be set to zero
|
||||
to forbid the use of heap memory altogether.
|
||||
<br>
|
||||
<br>
|
||||
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
|
||||
|
@ -556,9 +557,9 @@ limit acts varies from pattern to pattern. This limit is of use only if it is
|
|||
set smaller than <b>--match-limit</b>.
|
||||
<br>
|
||||
<br>
|
||||
There are no short forms for these options. The default settings are specified
|
||||
when the PCRE2 library is compiled, with the default defaults being very large
|
||||
and so effectively unlimited.
|
||||
There are no short forms for these options. The default limits can be set
|
||||
when the PCRE2 library is compiled; if they are not specified, the defaults
|
||||
are very large and so effectively unlimited.
|
||||
</P>
|
||||
<P>
|
||||
\fB--max-buffer-size=<i>number</i>
|
||||
|
|
|
@ -54,9 +54,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
|
|||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||
order to limit the amount of system stack used at compile time. The default
|
||||
limit can be specified when PCRE2 is built; the default default is 250. An
|
||||
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
||||
set the limit in a compile context.
|
||||
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||
to set the limit in a compile context.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of name for a named subpattern is 32 code units, and the
|
||||
|
|
|
@ -196,7 +196,7 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
|||
for it to have any effect. In other words, the pattern writer can lower the
|
||||
limits set by the programmer, but not raise them. If there is more than one
|
||||
setting of one of these limits, the lower value is used. The heap limit is
|
||||
specified in kilobytes.
|
||||
specified in kibibytes (units of 1024 bytes).
|
||||
</P>
|
||||
<P>
|
||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||
|
@ -549,7 +549,7 @@ Absolute and relative back references
|
|||
<P>
|
||||
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
||||
in braces, is an absolute or relative backreference. A named backreference
|
||||
can be coded as \g{name}. Back references are discussed
|
||||
can be coded as \g{name}. backreferences are discussed
|
||||
<a href="#backreferences">later,</a>
|
||||
following the discussion of
|
||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||
|
@ -1037,7 +1037,7 @@ joiner" characters. Characters with the "mark" property always have the
|
|||
modifier). Extending characters are allowed before the modifier.
|
||||
</P>
|
||||
<P>
|
||||
7. Do not break within emoji zwj sequences (zero-width jointer followed by
|
||||
7. Do not break within emoji zwj sequences (zero-width joiner followed by
|
||||
"glue after ZWJ" or "base glue after ZWJ").
|
||||
</P>
|
||||
<P>
|
||||
|
@ -2210,8 +2210,8 @@ after the reference.
|
|||
</P>
|
||||
<P>
|
||||
There may be more than one backreference to the same subpattern. If a
|
||||
subpattern has not actually been used in a particular match, any back
|
||||
references to it always fail by default. For example, the pattern
|
||||
subpattern has not actually been used in a particular match, any backreferences
|
||||
to it always fail by default. For example, the pattern
|
||||
<pre>
|
||||
(a|(bc))\2
|
||||
</pre>
|
||||
|
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
|
|||
minimum of zero.
|
||||
</P>
|
||||
<P>
|
||||
Back references of this type cause the group that they reference to be treated
|
||||
backreferences of this type cause the group that they reference to be treated
|
||||
as an
|
||||
<a href="#atomicgroup">atomic group.</a>
|
||||
Once the whole group has been matched, a subsequent matching failure cannot
|
||||
|
|
|
@ -139,7 +139,7 @@ because it disables the use of back references.
|
|||
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
||||
(which has the type const char *) must be set to point to the character beyond
|
||||
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
||||
now contain binary zeroes, which are treated as data characters. Without
|
||||
now contain binary zeros, which are treated as data characters. Without
|
||||
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||
caution in software intended to be portable to other systems.
|
||||
|
@ -248,10 +248,10 @@ function.
|
|||
<pre>
|
||||
REG_STARTEND
|
||||
</pre>
|
||||
When this option is set, the subject string is starts at <i>string</i> +
|
||||
When this option is set, the subject string starts at <i>string</i> +
|
||||
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
||||
should point to the first character beyond the string. There may be binary
|
||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
||||
zeros within the subject string, and indeed, using REG_STARTEND is the only
|
||||
way to pass a subject string that contains a binary zero.
|
||||
</P>
|
||||
<P>
|
||||
|
|
|
@ -442,7 +442,7 @@ of the newline or \R options with similar syntax. More than one of them may
|
|||
appear. For the first three, d is a decimal number.
|
||||
<pre>
|
||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
||||
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||
(*LIMIT_MATCH=d) set the match limit to d
|
||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||
|
|
|
@ -129,7 +129,7 @@ to occur).
|
|||
UTF-8 (in its original definition) is not capable of encoding values greater
|
||||
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
||||
testing this library in non-UTF mode with <b>utf8_input</b> set, if any
|
||||
character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
|
||||
character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
|
||||
0x80000000 is added to the character's value. This is the only way of passing
|
||||
such code points in a pattern string. For subject strings, using an escape
|
||||
sequence is preferable.
|
||||
|
@ -264,7 +264,7 @@ Do not output the version number of <b>pcre2test</b> at the start of execution.
|
|||
<P>
|
||||
<b>-S</b> <i>size</i>
|
||||
On Unix-like systems, set the size of the run-time stack to <i>size</i>
|
||||
megabytes.
|
||||
mebibytes (units of 1024*1024 bytes).
|
||||
</P>
|
||||
<P>
|
||||
<b>-subject</b> <i>modifier-list</i>
|
||||
|
@ -679,8 +679,8 @@ Newline and \R handling
|
|||
<P>
|
||||
The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
|
||||
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
|
||||
\R matches any Unicode newline sequence. The default is specified when PCRE2
|
||||
is built, with the default default being Unicode.
|
||||
\R matches any Unicode newline sequence. The default can be specified when
|
||||
PCRE2 is built; if it is not, the default is set to Unicode.
|
||||
</P>
|
||||
<P>
|
||||
The <b>newline</b> modifier specifies which characters are to be interpreted as
|
||||
|
@ -1418,11 +1418,11 @@ Setting the JIT stack size
|
|||
<P>
|
||||
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||
optimization is not being used. The value is a number of kilobytes. Setting
|
||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
||||
default is necessary only for very complicated patterns. If <b>jitstack</b> is
|
||||
set non-zero on a subject line it overrides any value that was set on the
|
||||
pattern.
|
||||
optimization is not being used. The value is a number of kibibytes (units of
|
||||
1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
|
||||
that is larger than the default is necessary only for very complicated
|
||||
patterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
|
||||
value that was set on the pattern.
|
||||
</P>
|
||||
<br><b>
|
||||
Setting heap, match, and depth limits
|
||||
|
@ -1468,10 +1468,10 @@ and non-recursive, to the internal matching function, thus controlling the
|
|||
overall amount of computing resource that is used.
|
||||
</P>
|
||||
<P>
|
||||
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
|
||||
limits the amount of heap memory used for matching. A value of zero disables
|
||||
the use of any heap memory; many simple pattern matches can be done without
|
||||
using the heap, so this is not an unreasonable setting.
|
||||
For both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
|
||||
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
||||
value of zero disables the use of any heap memory; many simple pattern matches
|
||||
can be done without using the heap, so zero is not an unreasonable setting.
|
||||
</P>
|
||||
<br><b>
|
||||
Showing MARK names
|
||||
|
|
249
doc/pcre2.txt
249
doc/pcre2.txt
|
@ -619,11 +619,12 @@ NEWLINES
|
|||
|
||||
Each of the first three conventions is used by at least one operating
|
||||
system as its standard newline sequence. When PCRE2 is built, a default
|
||||
can be specified. The default default is LF, which is the Unix stan-
|
||||
dard. However, the newline convention can be changed by an application
|
||||
when calling pcre2_compile(), or it can be specified by special text at
|
||||
the start of the pattern itself; this overrides any other settings. See
|
||||
the pcre2pattern page for details of the special character sequences.
|
||||
can be specified. If it is not, the default is set to LF, which is the
|
||||
Unix standard. However, the newline convention can be changed by an
|
||||
application when calling pcre2_compile(), or it can be specified by
|
||||
special text at the start of the pattern itself; this overrides any
|
||||
other settings. See the pcre2pattern page for details of the special
|
||||
character sequences.
|
||||
|
||||
In the PCRE2 documentation the word "newline" is used to mean "the
|
||||
character or pair of characters that indicate a line break". The choice
|
||||
|
@ -957,17 +958,17 @@ PCRE2 CONTEXTS
|
|||
int pcre2_set_heap_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
The heap_limit parameter specifies, in units of kilobytes, the maximum
|
||||
amount of heap memory that pcre2_match() may use to hold backtracking
|
||||
information when running an interpretive match. This limit also applies
|
||||
to pcre2_dfa_match(), which may use the heap when processing patterns
|
||||
with a lot of nested pattern recursion or lookarounds or atomic groups.
|
||||
This limit does not apply to matching with the JIT optimization, which
|
||||
has its own memory control arrangements (see the pcre2jit documentation
|
||||
for more details). If the limit is reached, the negative error code
|
||||
PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2
|
||||
is built; the default default is very large and is essentially "unlim-
|
||||
ited".
|
||||
The heap_limit parameter specifies, in units of kibibytes (1024 bytes),
|
||||
the maximum amount of heap memory that pcre2_match() may use to hold
|
||||
backtracking information when running an interpretive match. This limit
|
||||
also applies to pcre2_dfa_match(), which may use the heap when process-
|
||||
ing patterns with a lot of nested pattern recursion or lookarounds or
|
||||
atomic groups. This limit does not apply to matching with the JIT opti-
|
||||
mization, which has its own memory control arrangements (see the
|
||||
pcre2jit documentation for more details). If the limit is reached, the
|
||||
negative error code PCRE2_ERROR_HEAPLIMIT is returned. The default
|
||||
limit can be set when PCRE2 is built; if it is not, the default is set
|
||||
very large and is essentially "unlimited".
|
||||
|
||||
A value for the heap limit may also be supplied by an item at the start
|
||||
of a pattern of the form
|
||||
|
@ -1042,7 +1043,7 @@ PCRE2 CONTEXTS
|
|||
using JIT compiled code. However, it is supported by pcre2_dfa_match(),
|
||||
which uses it to limit the depth of nested internal recursive function
|
||||
calls that implement atomic groups, lookaround assertions, and pattern
|
||||
recursions. This limits, indirectly, the amount of system stack this is
|
||||
recursions. This limits, indirectly, the amount of system stack that is
|
||||
used. It was more useful in versions before 10.32, when stack memory
|
||||
was used for local workspace vectors for recursive function calls. From
|
||||
version 10.32, only local variables are allocated on the stack and as
|
||||
|
@ -1058,10 +1059,11 @@ PCRE2 CONTEXTS
|
|||
directly by calling pcre2_set_heap_limit().
|
||||
|
||||
The default value for the depth limit can be set when PCRE2 is built;
|
||||
the default default is the same value as the default for the match
|
||||
limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match()
|
||||
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
if it is not, the default is set to the same value as the default for
|
||||
the match limit. If the limit is exceeded, pcre2_match() or
|
||||
pcre2_dfa_match() returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth
|
||||
limit may also be supplied by an item at the start of a pattern of the
|
||||
form
|
||||
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
|
||||
|
@ -1117,7 +1119,7 @@ CHECKING BUILD-TIME OPTIONS
|
|||
|
||||
PCRE2_CONFIG_HEAPLIMIT
|
||||
|
||||
The output is a uint32_t integer that gives, in kilobytes, the default
|
||||
The output is a uint32_t integer that gives, in kibibytes, the default
|
||||
limit for the amount of heap memory used by pcre2_match() or
|
||||
pcre2_dfa_match(). Further details are given with
|
||||
pcre2_set_heap_limit() above.
|
||||
|
@ -1413,7 +1415,7 @@ COMPILING A PATTERN
|
|||
it can be changed within a pattern by a (?i) option setting. If
|
||||
PCRE2_UTF is set, Unicode properties are used for all characters with
|
||||
more than one other case, and for all characters whose code points are
|
||||
greater than U+007f. For lower valued characters with only one other
|
||||
greater than U+007F. For lower valued characters with only one other
|
||||
case, a lookup table is used for speed. When PCRE2_UTF is not set, a
|
||||
lookup table is used for all code points less than 256, and higher code
|
||||
points (available only in 16-bit or 32-bit mode) are treated as not
|
||||
|
@ -1983,18 +1985,17 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
Return the number of the highest backreference in the pattern. The
|
||||
third argument should point to an uint32_t variable. Named subpatterns
|
||||
acquire numbers as well as names, and these count towards the highest
|
||||
back reference. Back references such as \4 or \g{12} match the cap-
|
||||
tured characters of the given group, but in addition, the check that a
|
||||
capturing group is set in a conditional subpattern such as (?(3)a|b) is
|
||||
also a back reference. Zero is returned if there are no back refer-
|
||||
ences.
|
||||
backreference. Backreferences such as \4 or \g{12} match the captured
|
||||
characters of the given group, but in addition, the check that a cap-
|
||||
turing group is set in a conditional subpattern such as (?(3)a|b) is
|
||||
also a backreference. Zero is returned if there are no backreferences.
|
||||
|
||||
PCRE2_INFO_BSR
|
||||
|
||||
The output is a uint32_t whose value indicates what character sequences
|
||||
the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that
|
||||
\R matches any Unicode line ending sequence; a value of PCRE2_BSR_ANY-
|
||||
CRLF means that \R matches only CR, LF, or CRLF.
|
||||
The output is a uint32_t integer whose value indicates what character
|
||||
sequences the \R escape sequence matches. A value of PCRE2_BSR_UNICODE
|
||||
means that \R matches any Unicode line ending sequence; a value of
|
||||
PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF.
|
||||
|
||||
PCRE2_INFO_CAPTURECOUNT
|
||||
|
||||
|
@ -2006,8 +2007,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
|
||||
If the pattern set a backtracking depth limit by including an item of
|
||||
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
|
||||
third argument should point to an unsigned 32-bit integer. If no such
|
||||
value has been set, the call to pcre2_pattern_info() returns the error
|
||||
third argument should point to a uint32_t integer. If no such value has
|
||||
been set, the call to pcre2_pattern_info() returns the error
|
||||
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
||||
ing if it is less than the limit set or defaulted by the caller of the
|
||||
match function.
|
||||
|
@ -2021,7 +2022,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
code unit values greater than 255 are supported, the flag bit for 255
|
||||
means "any code unit of value 255 or above". If such a table was con-
|
||||
structed, a pointer to it is returned. Otherwise NULL is returned. The
|
||||
third argument should point to an const uint8_t * variable.
|
||||
third argument should point to a const uint8_t * variable.
|
||||
|
||||
PCRE2_INFO_FIRSTCODETYPE
|
||||
|
||||
|
@ -2048,7 +2049,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by pcre2_match()
|
||||
without the use of JIT. The third argument should point to an size_t
|
||||
without the use of JIT. The third argument should point to a size_t
|
||||
variable. The frame size depends on the number of capturing parentheses
|
||||
in the pattern. Each additional capturing group adds two PCRE2_SIZE
|
||||
variables.
|
||||
|
@ -2070,11 +2071,10 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
|
||||
If the pattern set a heap memory limit by including an item of the form
|
||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu-
|
||||
ment should point to an unsigned 32-bit integer. If no such value has
|
||||
been set, the call to pcre2_pattern_info() returns the error
|
||||
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
||||
ing if it is less than the limit set or defaulted by the caller of the
|
||||
match function.
|
||||
ment should point to a uint32_t integer. If no such value has been set,
|
||||
the call to pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET.
|
||||
Note that this limit will only be used during matching if it is less
|
||||
than the limit set or defaulted by the caller of the match function.
|
||||
|
||||
PCRE2_INFO_JCHANGED
|
||||
|
||||
|
@ -2120,8 +2120,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
|
||||
If the pattern set a match limit by including an item of the form
|
||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third
|
||||
argument should point to an unsigned 32-bit integer. If no such value
|
||||
has been set, the call to pcre2_pattern_info() returns the error
|
||||
argument should point to a uint32_t integer. If no such value has been
|
||||
set, the call to pcre2_pattern_info() returns the error
|
||||
PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
|
||||
ing if it is less than the limit set or defaulted by the caller of the
|
||||
match function.
|
||||
|
@ -2129,15 +2129,15 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
PCRE2_INFO_MAXLOOKBEHIND
|
||||
|
||||
Return the number of characters (not code units) in the longest lookbe-
|
||||
hind assertion in the pattern. The third argument should point to an
|
||||
unsigned 32-bit integer. This information is useful when doing multi-
|
||||
segment matching using the partial matching facilities. Note that the
|
||||
simple assertions \b and \B require a one-character lookbehind. \A also
|
||||
registers a one-character lookbehind, though it does not actually
|
||||
inspect the previous character. This is to ensure that at least one
|
||||
character from the old segment is retained when a new segment is pro-
|
||||
cessed. Otherwise, if there are no lookbehinds in the pattern, \A might
|
||||
match incorrectly at the start of a second or subsequent segment.
|
||||
hind assertion in the pattern. The third argument should point to a
|
||||
uint32_t integer. This information is useful when doing multi-segment
|
||||
matching using the partial matching facilities. Note that the simple
|
||||
assertions \b and \B require a one-character lookbehind. \A also regis-
|
||||
ters a one-character lookbehind, though it does not actually inspect
|
||||
the previous character. This is to ensure that at least one character
|
||||
from the old segment is retained when a new segment is processed. Oth-
|
||||
erwise, if there are no lookbehinds in the pattern, \A might match
|
||||
incorrectly at the start of a second or subsequent segment.
|
||||
|
||||
PCRE2_INFO_MINLENGTH
|
||||
|
||||
|
@ -2378,7 +2378,7 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
|||
set must point to the start of a character, or to the end of the sub-
|
||||
ject (in UTF-32 mode, one code unit equals one character, so all off-
|
||||
sets are valid). Like the pattern string, the subject may contain
|
||||
binary zeroes.
|
||||
binary zeros.
|
||||
|
||||
A non-zero starting offset is useful when searching for another match
|
||||
in the same subject by calling pcre2_match() again after a previous
|
||||
|
@ -3445,8 +3445,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
|||
PCRE2_ERROR_DFA_UCOND
|
||||
|
||||
This return is given if pcre2_dfa_match() encounters a condition item
|
||||
that uses a back reference for the condition, or a test for recursion
|
||||
in a specific group. These are not supported.
|
||||
that uses a backreference for the condition, or a test for recursion in
|
||||
a specific group. These are not supported.
|
||||
|
||||
PCRE2_ERROR_DFA_WSSIZE
|
||||
|
||||
|
@ -3683,8 +3683,8 @@ NEWLINE RECOGNITION
|
|||
|
||||
--enable-newline-is-nul
|
||||
|
||||
which causes NUL (binary zero) is set as the default line-ending char-
|
||||
acter.
|
||||
which causes NUL (binary zero) to be set as the default line-ending
|
||||
character.
|
||||
|
||||
Whatever default line ending convention is selected when PCRE2 is built
|
||||
can be overridden by applications that use the library. At build time
|
||||
|
@ -3745,18 +3745,18 @@ LIMITING PCRE2 RESOURCE USAGE
|
|||
stack to record backtracking points. The more nested backtracking
|
||||
points there are (that is, the deeper the search tree), the more memory
|
||||
is needed. If the initial vector is not large enough, heap memory is
|
||||
used, up to a certain limit, which is specified in kilobytes. The limit
|
||||
can be changed at run time, as described in the pcre2api documentation.
|
||||
The default limit (in effect unlimited) is 20 million. You can change
|
||||
this by a setting such as
|
||||
used, up to a certain limit, which is specified in kibibytes (units of
|
||||
1024 bytes). The limit can be changed at run time, as described in the
|
||||
pcre2api documentation. The default limit (in effect unlimited) is 20
|
||||
million. You can change this by a setting such as
|
||||
|
||||
--with-heap-limit=500
|
||||
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
||||
only to interpretive matching in pcre2_match() and pcre2_dfa_match(),
|
||||
which may also use the heap for internal workspace when processing com-
|
||||
plicated patterns. This limit does not apply when JIT (which has its
|
||||
own memory arrangements) is used.
|
||||
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||
interpretive matching in pcre2_match() and pcre2_dfa_match(), which may
|
||||
also use the heap for internal workspace when processing complicated
|
||||
patterns. This limit does not apply when JIT (which has its own memory
|
||||
arrangements) is used.
|
||||
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
pcre2_match() interpreter. This limit defaults to the value that is set
|
||||
|
@ -4005,10 +4005,10 @@ SUPPORT FOR FUZZERS
|
|||
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
|
||||
zcheck to be created. This is normally run under valgrind or used when
|
||||
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
|
||||
function and outputs information about it is doing. The input strings
|
||||
are specified by arguments: if an argument starts with "=" the rest of
|
||||
it is a literal input string. Otherwise, it is assumed to be a file
|
||||
name, and the contents of the file are the test string.
|
||||
function and outputs information about what it is doing. The input
|
||||
strings are specified by arguments: if an argument starts with "=" the
|
||||
rest of it is a literal input string. Otherwise, it is assumed to be a
|
||||
file name, and the contents of the file are the test string.
|
||||
|
||||
|
||||
OBSOLETE OPTION
|
||||
|
@ -4167,9 +4167,9 @@ MISSING CALLOUTS
|
|||
all branches are anchorable.
|
||||
|
||||
This optimization is disabled, however, if .* is in an atomic group or
|
||||
if there is a back reference to the capturing group in which it
|
||||
appears. It is also disabled if the pattern contains (*PRUNE) or
|
||||
(*SKIP). However, the presence of callouts does not affect it.
|
||||
if there is a backreference to the capturing group in which it appears.
|
||||
It is also disabled if the pattern contains (*PRUNE) or (*SKIP). How-
|
||||
ever, the presence of callouts does not affect it.
|
||||
|
||||
For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT
|
||||
and applied to the string "aa", the pcre2test output is:
|
||||
|
@ -4489,7 +4489,7 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
|||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
|
||||
tions, but they do not mean what you might think. For example, (?!a){3}
|
||||
does not assert that the next three characters are not "a". It just
|
||||
asserts that the next character is not "a" three times (in principle:
|
||||
asserts that the next character is not "a" three times (in principle;
|
||||
PCRE2 optimizes this to run the assertion just once). Perl allows some
|
||||
repeat quantifiers on other assertions, for example, \b* (but not
|
||||
\b{3}), but these do not seem to have any use.
|
||||
|
@ -4534,9 +4534,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
|||
classes.
|
||||
|
||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and
|
||||
(??{code}) constructions. However, there is support PCRE2's "callout"
|
||||
feature, which allows an external function to be called during pattern
|
||||
matching. See the pcre2callout documentation for details.
|
||||
(??{code}) constructions. However, PCRE2 does have a "callout" feature,
|
||||
which allows an external function to be called during pattern matching.
|
||||
See the pcre2callout documentation for details.
|
||||
|
||||
8. Subroutine calls (whether recursive or not) were treated as atomic
|
||||
groups up to PCRE2 release 10.23, but from release 10.30 this changed,
|
||||
|
@ -4604,9 +4604,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
|||
different length of string. Perl requires them all to have the same
|
||||
length.
|
||||
|
||||
(b) From PCRE2 10.23, back references to groups of fixed length are
|
||||
supported in lookbehinds, provided that there is no possibility of ref-
|
||||
erencing a non-unique number or name. Perl does not support backrefer-
|
||||
(b) From PCRE2 10.23, backreferences to groups of fixed length are sup-
|
||||
ported in lookbehinds, provided that there is no possibility of refer-
|
||||
encing a non-unique number or name. Perl does not support backrefer-
|
||||
ences in lookbehinds.
|
||||
|
||||
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
|
||||
|
@ -5103,9 +5103,9 @@ SIZE AND OTHER LIMITATIONS
|
|||
limit to the depth of nesting of parenthesized subpatterns of all
|
||||
kinds. This is imposed in order to limit the amount of system stack
|
||||
used at compile time. The default limit can be specified when PCRE2 is
|
||||
built; the default default is 250. An application can change this limit
|
||||
by calling pcre2_set_parens_nest_limit() to set the limit in a compile
|
||||
context.
|
||||
built; if not, the default is set to 250. An application can change
|
||||
this limit by calling pcre2_set_parens_nest_limit() to set the limit in
|
||||
a compile context.
|
||||
|
||||
The maximum length of name for a named subpattern is 32 code units, and
|
||||
the maximum number of named subpatterns is 10000.
|
||||
|
@ -5929,7 +5929,8 @@ SPECIAL START-OF-PATTERN ITEMS
|
|||
pcre2_match() for it to have any effect. In other words, the pattern
|
||||
writer can lower the limits set by the programmer, but not raise them.
|
||||
If there is more than one setting of one of these limits, the lower
|
||||
value is used. The heap limit is specified in kilobytes.
|
||||
value is used. The heap limit is specified in kibibytes (units of 1024
|
||||
bytes).
|
||||
|
||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
|
||||
name is still recognized for backwards compatibility.
|
||||
|
@ -6230,8 +6231,8 @@ BACKSLASH
|
|||
All UTF modes no greater than 0x10ffff and a valid code point
|
||||
|
||||
Invalid Unicode code points are all those in the range 0xd800 to 0xdfff
|
||||
(the so-called "surrogate" codepoints). The check for these can be dis-
|
||||
abled by the caller of pcre2_compile() by setting the option
|
||||
(the so-called "surrogate" code points). The check for these can be
|
||||
disabled by the caller of pcre2_compile() by setting the option
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||
|
||||
Escape sequences in character classes
|
||||
|
@ -6257,7 +6258,7 @@ BACKSLASH
|
|||
|
||||
The sequence \g followed by a signed or unsigned number, optionally
|
||||
enclosed in braces, is an absolute or relative backreference. A named
|
||||
back reference can be coded as \g{name}. Back references are discussed
|
||||
backreference can be coded as \g{name}. backreferences are discussed
|
||||
later, following the discussion of parenthesized subpatterns.
|
||||
|
||||
Absolute and relative subroutine calls
|
||||
|
@ -6266,8 +6267,8 @@ BACKSLASH
|
|||
name or a number enclosed either in angle brackets or single quotes, is
|
||||
an alternative syntax for referencing a subpattern as a "subroutine".
|
||||
Details are discussed later. Note that \g{...} (Perl syntax) and
|
||||
\g<...> (Oniguruma syntax) are not synonymous. The former is a back
|
||||
reference; the latter is a subroutine call.
|
||||
\g<...> (Oniguruma syntax) are not synonymous. The former is a backref-
|
||||
erence; the latter is a subroutine call.
|
||||
|
||||
Generic character types
|
||||
|
||||
|
@ -6593,7 +6594,7 @@ BACKSLASH
|
|||
lowed by a modifier). Extending characters are allowed before the modi-
|
||||
fier.
|
||||
|
||||
7. Do not break within emoji zwj sequences (zero-width jointer followed
|
||||
7. Do not break within emoji zwj sequences (zero-width joiner followed
|
||||
by "glue after ZWJ" or "base glue after ZWJ").
|
||||
|
||||
8. Do not break within emoji flag sequences. That is, do not break
|
||||
|
@ -7285,7 +7286,7 @@ NAMED SUBPATTERNS
|
|||
|
||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...)
|
||||
or (?'name'...) as in Perl, or (?P<name>...) as in Python. References
|
||||
to capturing parentheses from other parts of the pattern, such as back
|
||||
to capturing parentheses from other parts of the pattern, such as back-
|
||||
references, recursion, and conditions, can be made by name as well as
|
||||
by number.
|
||||
|
||||
|
@ -7321,8 +7322,8 @@ NAMED SUBPATTERNS
|
|||
that name that matched. This saves searching to find which numbered
|
||||
subpattern it was.
|
||||
|
||||
If you make a back reference to a non-unique named subpattern from
|
||||
elsewhere in the pattern, the subpatterns to which the name refers are
|
||||
If you make a backreference to a non-unique named subpattern from else-
|
||||
where in the pattern, the subpatterns to which the name refers are
|
||||
checked in the order in which they appear in the overall pattern. The
|
||||
first one that is set is used for the reference. For example, this pat-
|
||||
tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":
|
||||
|
@ -7481,9 +7482,9 @@ REPETITION
|
|||
mization, or alternatively, using ^ to indicate anchoring explicitly.
|
||||
|
||||
However, there are some cases where the optimization cannot be used.
|
||||
When .* is inside capturing parentheses that are the subject of a back
|
||||
reference elsewhere in the pattern, a match at the start may fail where
|
||||
a later one succeeds. Consider, for example:
|
||||
When .* is inside capturing parentheses that are the subject of a
|
||||
backreference elsewhere in the pattern, a match at the start may fail
|
||||
where a later one succeeds. Consider, for example:
|
||||
|
||||
(.*)abc\1
|
||||
|
||||
|
@ -7631,7 +7632,7 @@ BACK REFERENCES
|
|||
it is always taken as a backreference, and causes an error only if
|
||||
there are not that many capturing left parentheses in the entire pat-
|
||||
tern. In other words, the parentheses that are referenced need not be
|
||||
to the left of the reference for numbers less than 8. A "forward back
|
||||
to the left of the reference for numbers less than 8. A "forward back-
|
||||
reference" of this type can make sense when a repetition is involved
|
||||
and the subpattern to the right has participated in an earlier itera-
|
||||
tion.
|
||||
|
@ -7671,10 +7672,10 @@ BACK REFERENCES
|
|||
This kind of forward reference can be useful it patterns that repeat.
|
||||
Perl does not support the use of + in this way.
|
||||
|
||||
A back reference matches whatever actually matched the capturing sub-
|
||||
pattern in the current subject string, rather than anything matching
|
||||
the subpattern itself (see "Subpatterns as subroutines" below for a way
|
||||
of doing that). So the pattern
|
||||
A backreference matches whatever actually matched the capturing subpat-
|
||||
tern in the current subject string, rather than anything matching the
|
||||
subpattern itself (see "Subpatterns as subroutines" below for a way of
|
||||
doing that). So the pattern
|
||||
|
||||
(sens|respons)e and \1ibility
|
||||
|
||||
|
@ -7704,14 +7705,14 @@ BACK REFERENCES
|
|||
before or after the reference.
|
||||
|
||||
There may be more than one backreference to the same subpattern. If a
|
||||
subpattern has not actually been used in a particular match, any back
|
||||
subpattern has not actually been used in a particular match, any back-
|
||||
references to it always fail by default. For example, the pattern
|
||||
|
||||
(a|(bc))\2
|
||||
|
||||
always fails if it starts to match "a" rather than "bc". However, if
|
||||
the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a back
|
||||
reference to an unset value matches an empty string.
|
||||
the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backref-
|
||||
erence to an unset value matches an empty string.
|
||||
|
||||
Because there may be many capturing parentheses in a pattern, all dig-
|
||||
its following a backslash are taken as part of a potential backrefer-
|
||||
|
@ -7730,13 +7731,13 @@ BACK REFERENCES
|
|||
(a|b\1)+
|
||||
|
||||
matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
|
||||
ation of the subpattern, the back reference matches the character
|
||||
string corresponding to the previous iteration. In order for this to
|
||||
work, the pattern must be such that the first iteration does not need
|
||||
to match the back reference. This can be done using alternation, as in
|
||||
the example above, or by a quantifier with a minimum of zero.
|
||||
ation of the subpattern, the backreference matches the character string
|
||||
corresponding to the previous iteration. In order for this to work, the
|
||||
pattern must be such that the first iteration does not need to match
|
||||
the backreference. This can be done using alternation, as in the exam-
|
||||
ple above, or by a quantifier with a minimum of zero.
|
||||
|
||||
Back references of this type cause the group that they reference to be
|
||||
backreferences of this type cause the group that they reference to be
|
||||
treated as an atomic group. Once the whole group has been matched, a
|
||||
subsequent matching failure cannot cause backtracking into the middle
|
||||
of the group.
|
||||
|
@ -7871,8 +7872,8 @@ ASSERTIONS
|
|||
However, recursion, that is, a "subroutine" call into a group that is
|
||||
already active, is not supported.
|
||||
|
||||
Perl does not support back references in lookbehinds. PCRE2 does sup-
|
||||
port them, but only if certain conditions are met. The
|
||||
Perl does not support backreferences in lookbehinds. PCRE2 does support
|
||||
them, but only if certain conditions are met. The
|
||||
PCRE2_MATCH_UNSET_BACKREF option must not be set, there must be no use
|
||||
of (?| in the pattern (it creates duplicate subpattern numbers), and if
|
||||
the backreference is by name, the name must be unique. Of course, the
|
||||
|
@ -8332,11 +8333,10 @@ RECURSIVE PATTERNS
|
|||
^(.)(\1|a(?2))
|
||||
|
||||
This pattern matches "bab". The first capturing parentheses match "b",
|
||||
then in the second group, when the back reference \1 fails to match
|
||||
"b", the second alternative matches "a" and then recurses. In the
|
||||
recursion, \1 does now match "b" and so the whole match succeeds. This
|
||||
match used to fail in Perl, but in later versions (I tried 5.024) it
|
||||
now works.
|
||||
then in the second group, when the backreference \1 fails to match "b",
|
||||
the second alternative matches "a" and then recurses. In the recursion,
|
||||
\1 does now match "b" and so the whole match succeeds. This match used
|
||||
to fail in Perl, but in later versions (I tried 5.024) it now works.
|
||||
|
||||
|
||||
SUBPATTERNS AS SUBROUTINES
|
||||
|
@ -9253,11 +9253,10 @@ COMPILING A PATTERN
|
|||
If this option is set, the reg_endp field in the preg structure (which
|
||||
has the type const char *) must be set to point to the character beyond
|
||||
the end of the pattern before calling regcomp(). The pattern itself may
|
||||
now contain binary zeroes, which are treated as data characters. With-
|
||||
out REG_PEND, a binary zero terminates the pattern and the re_endp
|
||||
field is ignored. This is a GNU extension to the POSIX standard and
|
||||
should be used with caution in software intended to be portable to
|
||||
other systems.
|
||||
now contain binary zeros, which are treated as data characters. Without
|
||||
REG_PEND, a binary zero terminates the pattern and the re_endp field is
|
||||
ignored. This is a GNU extension to the POSIX standard and should be
|
||||
used with caution in software intended to be portable to other systems.
|
||||
|
||||
REG_UCP
|
||||
|
||||
|
@ -9364,10 +9363,10 @@ MATCHING A PATTERN
|
|||
|
||||
REG_STARTEND
|
||||
|
||||
When this option is set, the subject string is starts at string +
|
||||
When this option is set, the subject string starts at string +
|
||||
pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
|
||||
point to the first character beyond the string. There may be binary
|
||||
zeroes within the subject string, and indeed, using REG_STARTEND is the
|
||||
zeros within the subject string, and indeed, using REG_STARTEND is the
|
||||
only way to pass a subject string that contains a binary zero.
|
||||
|
||||
Whatever the value of pmatch[0].rm_so, the offsets of the matched
|
||||
|
@ -9995,7 +9994,7 @@ OPTION SETTING
|
|||
one of them may appear. For the first three, d is a decimal number.
|
||||
|
||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
||||
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||
(*LIMIT_MATCH=d) set the match limit to d
|
||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "16 June 2017" "PCRE2 10.30"
|
||||
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "16 June 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
|
|
@ -16,7 +16,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
|||
.sp
|
||||
This function is part of an experimental set of pattern conversion functions.
|
||||
It sets the component separator character that is used when converting globs.
|
||||
The second argument must one of the characters forward slash, backslash, or
|
||||
The second argument must be one of the characters forward slash, backslash, or
|
||||
dot. The default is backslash when running under Windows, otherwise forward
|
||||
slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if
|
||||
the second argument is invalid.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SET_DEPTH_LIMIT 3 "11 April 2017" "PCRE2 10.30"
|
||||
.TH PCRE2_SET_HEAP_LIMIT 3 "11 April 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
|
|
@ -497,10 +497,10 @@ U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
|
|||
.P
|
||||
Each of the first three conventions is used by at least one operating system as
|
||||
its standard newline sequence. When PCRE2 is built, a default can be specified.
|
||||
The default default is LF, which is the Unix standard. However, the newline
|
||||
convention can be changed by an application when calling \fBpcre2_compile()\fP,
|
||||
or it can be specified by special text at the start of the pattern itself; this
|
||||
overrides any other settings. See the
|
||||
If it is not, the default is set to LF, which is the Unix standard. However,
|
||||
the newline convention can be changed by an application when calling
|
||||
\fBpcre2_compile()\fP, or it can be specified by special text at the start of
|
||||
the pattern itself; this overrides any other settings. See the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -885,19 +885,20 @@ offset limit. In other words, whichever limit comes first is used.
|
|||
.B " uint32_t \fIvalue\fP);"
|
||||
.fi
|
||||
.sp
|
||||
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
||||
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
||||
information when running an interpretive match. This limit also applies to
|
||||
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
|
||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
||||
does not apply to matching with the JIT optimization, which has its own memory
|
||||
control arrangements (see the
|
||||
The \fIheap_limit\fP parameter specifies, in units of kibibytes (1024 bytes),
|
||||
the maximum amount of heap memory that \fBpcre2_match()\fP may use to hold
|
||||
backtracking information when running an interpretive match. This limit also
|
||||
applies to \fBpcre2_dfa_match()\fP, which may use the heap when processing
|
||||
patterns with a lot of nested pattern recursion or lookarounds or atomic
|
||||
groups. This limit does not apply to matching with the JIT optimization, which
|
||||
has its own memory control arrangements (see the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
documentation for more details). If the limit is reached, the negative error
|
||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
||||
built; the default default is very large and is essentially "unlimited".
|
||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||
is built; if it is not, the default is set very large and is essentially
|
||||
"unlimited".
|
||||
.P
|
||||
A value for the heap limit may also be supplied by an item at the start of a
|
||||
pattern of the form
|
||||
|
@ -975,7 +976,7 @@ The depth limit is not relevant, and is ignored, when matching is done using
|
|||
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
||||
uses it to limit the depth of nested internal recursive function calls that
|
||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
||||
limits, indirectly, the amount of system stack that is used. It was more useful
|
||||
in versions before 10.32, when stack memory was used for local workspace
|
||||
vectors for recursive function calls. From version 10.32, only local variables
|
||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||
|
@ -989,11 +990,11 @@ using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
|
|||
probably better to limit heap usage directly by calling
|
||||
\fBpcre2_set_heap_limit()\fP.
|
||||
.P
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
|
||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||
item at the start of a pattern of the form
|
||||
The default value for the depth limit can be set when PCRE2 is built; if it is
|
||||
not, the default is set to the same value as the default for the match limit.
|
||||
If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
|
||||
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
.sp
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
.sp
|
||||
|
@ -1050,7 +1051,7 @@ given with \fBpcre2_set_depth_limit()\fP above.
|
|||
.sp
|
||||
PCRE2_CONFIG_HEAPLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
||||
The output is a uint32_t integer that gives, in kibibytes, the default limit
|
||||
for the amount of heap memory used by \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP. Further details are given with
|
||||
\fBpcre2_set_heap_limit()\fP above.
|
||||
|
@ -1367,7 +1368,7 @@ If this bit is set, letters in the pattern match both upper and lower case
|
|||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||
properties are used for all characters with more than one other case, and for
|
||||
all characters whose code points are greater than U+007f. For lower valued
|
||||
all characters whose code points are greater than U+007F. For lower valued
|
||||
characters with only one other case, a lookup table is used for speed. When
|
||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||
|
@ -1550,8 +1551,8 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
|||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||
were followed by ?: but named parentheses can still be used for capturing (and
|
||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||
Note that, when this option is set, references to capturing groups (back
|
||||
references or recursion/subroutine calls) may only refer to named groups,
|
||||
Note that, when this option is set, references to capturing groups
|
||||
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||
though the reference can be by name or by number.
|
||||
.sp
|
||||
PCRE2_NO_AUTO_POSSESS
|
||||
|
@ -1976,10 +1977,10 @@ returned if there are no back references.
|
|||
.sp
|
||||
PCRE2_INFO_BSR
|
||||
.sp
|
||||
The output is a uint32_t whose value indicates what character sequences the \eR
|
||||
escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR matches
|
||||
any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \eR
|
||||
matches only CR, LF, or CRLF.
|
||||
The output is a uint32_t integer whose value indicates what character sequences
|
||||
the \eR escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR
|
||||
matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
|
||||
that \eR matches only CR, LF, or CRLF.
|
||||
.sp
|
||||
PCRE2_INFO_CAPTURECOUNT
|
||||
.sp
|
||||
|
@ -1991,10 +1992,10 @@ The third argument should point to an \fBuint32_t\fP variable.
|
|||
.sp
|
||||
If the pattern set a backtracking depth limit by including an item of the form
|
||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
||||
that this limit will only be used during matching if it is less than the limit
|
||||
set or defaulted by the caller of the match function.
|
||||
should point to a uint32_t integer. If no such value has been set, the call to
|
||||
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||
limit will only be used during matching if it is less than the limit set or
|
||||
defaulted by the caller of the match function.
|
||||
.sp
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
.sp
|
||||
|
@ -2004,7 +2005,7 @@ values for the first code unit in any match. For example, a pattern that starts
|
|||
with [abc] results in a table with three bits set. When code unit values
|
||||
greater than 255 are supported, the flag bit for 255 means "any code unit of
|
||||
value 255 or above". If such a table was constructed, a pointer to it is
|
||||
returned. Otherwise NULL is returned. The third argument should point to an
|
||||
returned. Otherwise NULL is returned. The third argument should point to a
|
||||
\fBconst uint8_t *\fP variable.
|
||||
.sp
|
||||
PCRE2_INFO_FIRSTCODETYPE
|
||||
|
@ -2031,7 +2032,7 @@ and up to 0xffffffff when not using UTF-32 mode.
|
|||
.sp
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||
without the use of JIT. The third argument should point to an \fBsize_t\fP
|
||||
without the use of JIT. The third argument should point to a \fBsize_t\fP
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
.sp
|
||||
|
@ -2051,10 +2052,10 @@ the equivalent hexadecimal or octal escape sequences.
|
|||
.sp
|
||||
If the pattern set a heap memory limit by including an item of the form
|
||||
(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
||||
that this limit will only be used during matching if it is less than the limit
|
||||
set or defaulted by the caller of the match function.
|
||||
should point to a uint32_t integer. If no such value has been set, the call to
|
||||
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||
limit will only be used during matching if it is less than the limit set or
|
||||
defaulted by the caller of the match function.
|
||||
.sp
|
||||
PCRE2_INFO_JCHANGED
|
||||
.sp
|
||||
|
@ -2098,15 +2099,15 @@ in such cases.
|
|||
.sp
|
||||
If the pattern set a match limit by including an item of the form
|
||||
(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
||||
that this limit will only be used during matching if it is less than the limit
|
||||
set or defaulted by the caller of the match function.
|
||||
should point to a uint32_t integer. If no such value has been set, the call to
|
||||
\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this
|
||||
limit will only be used during matching if it is less than the limit set or
|
||||
defaulted by the caller of the match function.
|
||||
.sp
|
||||
PCRE2_INFO_MAXLOOKBEHIND
|
||||
.sp
|
||||
Return the number of characters (not code units) in the longest lookbehind
|
||||
assertion in the pattern. The third argument should point to an unsigned 32-bit
|
||||
assertion in the pattern. The third argument should point to a uint32_t
|
||||
integer. This information is useful when doing multi-segment matching using the
|
||||
partial matching facilities. Note that the simple assertions \eb and \eB
|
||||
require a one-character lookbehind. \eA also registers a one-character
|
||||
|
@ -2393,7 +2394,7 @@ zero, the search for a match starts at the beginning of the subject, and this
|
|||
is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset
|
||||
must point to the start of a character, or to the end of the subject (in UTF-32
|
||||
mode, one code unit equals one character, so all offsets are valid). Like the
|
||||
pattern string, the subject may contain binary zeroes.
|
||||
pattern string, the subject may contain binary zeros.
|
||||
.P
|
||||
A non-zero starting offset is useful when searching for another match in the
|
||||
same subject by calling \fBpcre2_match()\fP again after a previous success.
|
||||
|
|
|
@ -216,7 +216,7 @@ separator, U+2028), and PS (paragraph separator, U+2029). The final option is
|
|||
.sp
|
||||
--enable-newline-is-nul
|
||||
.sp
|
||||
which causes NUL (binary zero) is set as the default line-ending character.
|
||||
which causes NUL (binary zero) to be set as the default line-ending character.
|
||||
.P
|
||||
Whatever default line ending convention is selected when PCRE2 is built can be
|
||||
overridden by applications that use the library. At build time it is
|
||||
|
@ -281,8 +281,8 @@ The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
|||
stack to record backtracking points. The more nested backtracking points there
|
||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||
which is specified in kilobytes. The limit can be changed at run time, as
|
||||
described in the
|
||||
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
||||
at run time, as described in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
|
@ -291,7 +291,7 @@ change this by a setting such as
|
|||
.sp
|
||||
--with-heap-limit=500
|
||||
.sp
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
||||
which limits the amount of heap to 500 KiB. This limit applies only to
|
||||
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
||||
may also use the heap for internal workspace when processing complicated
|
||||
patterns. This limit does not apply when JIT (which has its own memory
|
||||
|
@ -552,7 +552,7 @@ generated from the string.
|
|||
Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP
|
||||
to be created. This is normally run under valgrind or used when PCRE2 is
|
||||
compiled with address sanitizing enabled. It calls the fuzzing function and
|
||||
outputs information about it is doing. The input strings are specified by
|
||||
outputs information about what it is doing. The input strings are specified by
|
||||
arguments: if an argument starts with "=" the rest of it is a literal input
|
||||
string. Otherwise, it is assumed to be a file name, and the contents of the
|
||||
file are the test string.
|
||||
|
|
|
@ -19,7 +19,7 @@ page.
|
|||
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||
that the next three characters are not "a". It just asserts that the next
|
||||
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
||||
character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
||||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
||||
.P
|
||||
|
@ -62,8 +62,8 @@ Note the following examples:
|
|||
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
||||
.P
|
||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||
constructions. However, there is support PCRE2's "callout" feature, which
|
||||
allows an external function to be called during pattern matching. See the
|
||||
constructions. However, PCRE2 does have a "callout" feature, which allows an
|
||||
external function to be called during pattern matching. See the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
|
|
|
@ -57,9 +57,10 @@ controlled by parameters that can be set by the \fB--buffer-size\fP and
|
|||
that is obtained at the start of processing. If an input file contains very
|
||||
long lines, a larger buffer may be needed; this is handled by automatically
|
||||
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
||||
default values for these parameters are specified when \fBpcre2grep\fP is
|
||||
built, with the default defaults being 20K and 1M respectively. An error occurs
|
||||
if a line is too long and the buffer can no longer be expanded.
|
||||
default values for these parameters can be set when \fBpcre2grep\fP is
|
||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||
respectively. An error occurs if a line is too long and the buffer can no
|
||||
longer be expanded.
|
||||
.P
|
||||
The block of memory that is actually used is three times the "buffer size", to
|
||||
allow for buffering "before" and "after" lines. If the buffer size is too
|
||||
|
@ -434,13 +435,13 @@ short form for this option.
|
|||
When this option is given, non-compressed input is read and processed line by
|
||||
line, and the output is flushed after each write. By default, input is read in
|
||||
large chunks, unless \fBpcre2grep\fP can determine that it is reading from a
|
||||
terminal (which is currently possible only in Unix-like environments). Output
|
||||
to terminal is normally automatically flushed by the operating system. This
|
||||
option can be useful when the input or output is attached to a pipe and you do
|
||||
not want \fBpcre2grep\fP to buffer up large amounts of data. However, its use
|
||||
will affect performance, and the \fB-M\fP (multiline) option ceases to work.
|
||||
When input is from a compressed .gz or .bz2 file, \fB--line-buffered\fP is
|
||||
ignored.
|
||||
terminal (which is currently possible only in Unix-like environments or
|
||||
Windows). Output to terminal is normally automatically flushed by the operating
|
||||
system. This option can be useful when the input or output is attached to a
|
||||
pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data.
|
||||
However, its use will affect performance, and the \fB-M\fP (multiline) option
|
||||
ceases to work. When input is from a compressed .gz or .bz2 file,
|
||||
\fB--line-buffered\fP is ignored.
|
||||
.TP
|
||||
\fB--line-offsets\fP
|
||||
Instead of showing lines or parts of lines that match, show each match as a
|
||||
|
@ -470,11 +471,11 @@ is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
|
|||
counter that is incremented each time around its main processing loop. If the
|
||||
value set by \fB--match-limit\fP is reached, an error occurs.
|
||||
.sp
|
||||
The \fB--heap-limit\fP option specifies, as a number of kilobytes, the amount
|
||||
of heap memory that may be used for matching. Heap memory is needed only if
|
||||
matching the pattern requires a significant number of nested backtracking
|
||||
points to be remembered. This parameter can be set to zero to forbid the use of
|
||||
heap memory altogether.
|
||||
The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
|
||||
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
||||
memory is needed only if matching the pattern requires a significant number of
|
||||
nested backtracking points to be remembered. This parameter can be set to zero
|
||||
to forbid the use of heap memory altogether.
|
||||
.sp
|
||||
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
||||
which indirectly limits the amount of memory that is used. The amount of memory
|
||||
|
@ -483,9 +484,9 @@ parentheses in the pattern, so the amount of memory that is used before this
|
|||
limit acts varies from pattern to pattern. This limit is of use only if it is
|
||||
set smaller than \fB--match-limit\fP.
|
||||
.sp
|
||||
There are no short forms for these options. The default settings are specified
|
||||
when the PCRE2 library is compiled, with the default defaults being very large
|
||||
and so effectively unlimited.
|
||||
There are no short forms for these options. The default limits can be set
|
||||
when the PCRE2 library is compiled; if they are not specified, the defaults
|
||||
are very large and so effectively unlimited.
|
||||
.TP
|
||||
\fB--max-buffer-size=\fInumber\fP
|
||||
This limits the expansion of the processing buffer, whose initial size can be
|
||||
|
|
|
@ -56,10 +56,10 @@ DESCRIPTION
|
|||
that is obtained at the start of processing. If an input file contains
|
||||
very long lines, a larger buffer may be needed; this is handled by
|
||||
automatically extending the buffer, up to the limit specified by --max-
|
||||
buffer-size. The default values for these parameters are specified when
|
||||
pcre2grep is built, with the default defaults being 20K and 1M respec-
|
||||
tively. An error occurs if a line is too long and the buffer can no
|
||||
longer be expanded.
|
||||
buffer-size. The default values for these parameters can be set when
|
||||
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||
20K and 1M respectively. An error occurs if a line is too long and the
|
||||
buffer can no longer be expanded.
|
||||
|
||||
The block of memory that is actually used is three times the "buffer
|
||||
size", to allow for buffering "before" and "after" lines. If the buffer
|
||||
|
@ -475,14 +475,14 @@ OPTIONS
|
|||
processed line by line, and the output is flushed after each
|
||||
write. By default, input is read in large chunks, unless
|
||||
pcre2grep can determine that it is reading from a terminal
|
||||
(which is currently possible only in Unix-like environments).
|
||||
Output to terminal is normally automatically flushed by the
|
||||
operating system. This option can be useful when the input or
|
||||
output is attached to a pipe and you do not want pcre2grep to
|
||||
buffer up large amounts of data. However, its use will affect
|
||||
performance, and the -M (multiline) option ceases to work.
|
||||
When input is from a compressed .gz or .bz2 file, --line-
|
||||
buffered is ignored.
|
||||
(which is currently possible only in Unix-like environments
|
||||
or Windows). Output to terminal is normally automatically
|
||||
flushed by the operating system. This option can be useful
|
||||
when the input or output is attached to a pipe and you do not
|
||||
want pcre2grep to buffer up large amounts of data. However,
|
||||
its use will affect performance, and the -M (multiline)
|
||||
option ceases to work. When input is from a compressed .gz or
|
||||
.bz2 file, --line-buffered is ignored.
|
||||
|
||||
--line-offsets
|
||||
Instead of showing lines or parts of lines that match, show
|
||||
|
@ -517,12 +517,12 @@ OPTIONS
|
|||
processing loop. If the value set by --match-limit is
|
||||
reached, an error occurs.
|
||||
|
||||
The --heap-limit option specifies, as a number of kilobytes,
|
||||
the amount of heap memory that may be used for matching. Heap
|
||||
memory is needed only if matching the pattern requires a sig-
|
||||
nificant number of nested backtracking points to be remem-
|
||||
bered. This parameter can be set to zero to forbid the use of
|
||||
heap memory altogether.
|
||||
The --heap-limit option specifies, as a number of kibibytes
|
||||
(units of 1024 bytes), the amount of heap memory that may be
|
||||
used for matching. Heap memory is needed only if matching the
|
||||
pattern requires a significant number of nested backtracking
|
||||
points to be remembered. This parameter can be set to zero to
|
||||
forbid the use of heap memory altogether.
|
||||
|
||||
The --depth-limit option limits the depth of nested back-
|
||||
tracking points, which indirectly limits the amount of memory
|
||||
|
@ -532,10 +532,10 @@ OPTIONS
|
|||
limit acts varies from pattern to pattern. This limit is of
|
||||
use only if it is set smaller than --match-limit.
|
||||
|
||||
There are no short forms for these options. The default set-
|
||||
tings are specified when the PCRE2 library is compiled, with
|
||||
the default defaults being very large and so effectively
|
||||
unlimited.
|
||||
There are no short forms for these options. The default lim-
|
||||
its can be set when the PCRE2 library is compiled; if they
|
||||
are not specified, the defaults are very large and so effec-
|
||||
tively unlimited.
|
||||
|
||||
--max-buffer-size=number
|
||||
This limits the expansion of the processing buffer, whose
|
||||
|
|
|
@ -38,9 +38,9 @@ There is no limit to the number of parenthesized subpatterns, but there can be
|
|||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||
order to limit the amount of system stack used at compile time. The default
|
||||
limit can be specified when PCRE2 is built; the default default is 250. An
|
||||
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
||||
set the limit in a compile context.
|
||||
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||
to set the limit in a compile context.
|
||||
.P
|
||||
The maximum length of name for a named subpattern is 32 code units, and the
|
||||
maximum number of named subpatterns is 10000.
|
||||
|
|
|
@ -163,7 +163,7 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
|||
for it to have any effect. In other words, the pattern writer can lower the
|
||||
limits set by the programmer, but not raise them. If there is more than one
|
||||
setting of one of these limits, the lower value is used. The heap limit is
|
||||
specified in kilobytes.
|
||||
specified in kibibytes (units of 1024 bytes).
|
||||
.P
|
||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||
still recognized for backwards compatibility.
|
||||
|
@ -528,7 +528,7 @@ by code point, as described above.
|
|||
.sp
|
||||
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||
in braces, is an absolute or relative backreference. A named backreference
|
||||
can be coded as \eg{name}. Back references are discussed
|
||||
can be coded as \eg{name}. backreferences are discussed
|
||||
.\" HTML <a href="#backreferences">
|
||||
.\" </a>
|
||||
later,
|
||||
|
@ -1026,7 +1026,7 @@ joiner" characters. Characters with the "mark" property always have the
|
|||
6. Do not break within emoji modifier sequences (a base character followed by a
|
||||
modifier). Extending characters are allowed before the modifier.
|
||||
.P
|
||||
7. Do not break within emoji zwj sequences (zero-width jointer followed by
|
||||
7. Do not break within emoji zwj sequences (zero-width joiner followed by
|
||||
"glue after ZWJ" or "base glue after ZWJ").
|
||||
.P
|
||||
8. Do not break within emoji flag sequences. That is, do not break between
|
||||
|
@ -2205,8 +2205,8 @@ A subpattern that is referenced by name may appear in the pattern before or
|
|||
after the reference.
|
||||
.P
|
||||
There may be more than one backreference to the same subpattern. If a
|
||||
subpattern has not actually been used in a particular match, any back
|
||||
references to it always fail by default. For example, the pattern
|
||||
subpattern has not actually been used in a particular match, any backreferences
|
||||
to it always fail by default. For example, the pattern
|
||||
.sp
|
||||
(a|(bc))\e2
|
||||
.sp
|
||||
|
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the back reference. This can be
|
|||
done using alternation, as in the example above, or by a quantifier with a
|
||||
minimum of zero.
|
||||
.P
|
||||
Back references of this type cause the group that they reference to be treated
|
||||
backreferences of this type cause the group that they reference to be treated
|
||||
as an
|
||||
.\" HTML <a href="#atomicgroup">
|
||||
.\" </a>
|
||||
|
|
|
@ -115,7 +115,7 @@ because it disables the use of back references.
|
|||
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
||||
(which has the type const char *) must be set to point to the character beyond
|
||||
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
||||
now contain binary zeroes, which are treated as data characters. Without
|
||||
now contain binary zeros, which are treated as data characters. Without
|
||||
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||
caution in software intended to be portable to other systems.
|
||||
|
@ -224,10 +224,10 @@ function.
|
|||
.sp
|
||||
REG_STARTEND
|
||||
.sp
|
||||
When this option is set, the subject string is starts at \fIstring\fP +
|
||||
When this option is set, the subject string starts at \fIstring\fP +
|
||||
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
||||
should point to the first character beyond the string. There may be binary
|
||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
||||
zeros within the subject string, and indeed, using REG_STARTEND is the only
|
||||
way to pass a subject string that contains a binary zero.
|
||||
.P
|
||||
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
||||
|
|
|
@ -419,7 +419,7 @@ of the newline or \eR options with similar syntax. More than one of them may
|
|||
appear. For the first three, d is a decimal number.
|
||||
.sp
|
||||
(*LIMIT_DEPTH=d) set the backtracking limit to d
|
||||
(*LIMIT_HEAP=d) set the heap size limit to d kilobytes
|
||||
(*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes
|
||||
(*LIMIT_MATCH=d) set the match limit to d
|
||||
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching
|
||||
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
|
||||
|
|
|
@ -101,7 +101,7 @@ to occur).
|
|||
UTF-8 (in its original definition) is not capable of encoding values greater
|
||||
than 0x7fffffff, but such values can be handled by the 32-bit library. When
|
||||
testing this library in non-UTF mode with \fButf8_input\fP set, if any
|
||||
character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
|
||||
character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
|
||||
0x80000000 is added to the character's value. This is the only way of passing
|
||||
such code points in a pattern string. For subject strings, using an escape
|
||||
sequence is preferable.
|
||||
|
@ -220,7 +220,7 @@ Do not output the version number of \fBpcre2test\fP at the start of execution.
|
|||
.TP 10
|
||||
\fB-S\fP \fIsize\fP
|
||||
On Unix-like systems, set the size of the run-time stack to \fIsize\fP
|
||||
megabytes.
|
||||
mebibytes (units of 1024*1024 bytes).
|
||||
.TP 10
|
||||
\fB-subject\fP \fImodifier-list\fP
|
||||
Behave as if each subject line contains the given modifiers.
|
||||
|
@ -639,8 +639,8 @@ The effects of these modifiers are described in the following sections.
|
|||
.sp
|
||||
The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
|
||||
set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
|
||||
\eR matches any Unicode newline sequence. The default is specified when PCRE2
|
||||
is built, with the default default being Unicode.
|
||||
\eR matches any Unicode newline sequence. The default can be specified when
|
||||
PCRE2 is built; if it is not, the default is set to Unicode.
|
||||
.P
|
||||
The \fBnewline\fP modifier specifies which characters are to be interpreted as
|
||||
newlines, both in the pattern and in subject lines. The type must be one of CR,
|
||||
|
@ -1381,11 +1381,11 @@ matching provokes an error return ("bad option value") from
|
|||
.sp
|
||||
The \fBjitstack\fP modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||
optimization is not being used. The value is a number of kilobytes. Setting
|
||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
||||
default is necessary only for very complicated patterns. If \fBjitstack\fP is
|
||||
set non-zero on a subject line it overrides any value that was set on the
|
||||
pattern.
|
||||
optimization is not being used. The value is a number of kibibytes (units of
|
||||
1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
|
||||
that is larger than the default is necessary only for very complicated
|
||||
patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any
|
||||
value that was set on the pattern.
|
||||
.
|
||||
.
|
||||
.SS "Setting heap, match, and depth limits"
|
||||
|
@ -1427,10 +1427,10 @@ matching, \fImatch_limit\fP controls the total number of calls, both recursive
|
|||
and non-recursive, to the internal matching function, thus controlling the
|
||||
overall amount of computing resource that is used.
|
||||
.P
|
||||
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes)
|
||||
limits the amount of heap memory used for matching. A value of zero disables
|
||||
the use of any heap memory; many simple pattern matches can be done without
|
||||
using the heap, so this is not an unreasonable setting.
|
||||
For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
|
||||
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
||||
value of zero disables the use of any heap memory; many simple pattern matches
|
||||
can be done without using the heap, so zero is not an unreasonable setting.
|
||||
.
|
||||
.
|
||||
.SS "Showing MARK names"
|
||||
|
|
|
@ -94,7 +94,7 @@ INPUT ENCODING
|
|||
UTF-8 (in its original definition) is not capable of encoding values
|
||||
greater than 0x7fffffff, but such values can be handled by the 32-bit
|
||||
library. When testing this library in non-UTF mode with utf8_input set,
|
||||
if any character is preceded by the byte 0xff (which is an illegal byte
|
||||
if any character is preceded by the byte 0xff (which is an invalid byte
|
||||
in UTF-8) 0x80000000 is added to the character's value. This is the
|
||||
only way of passing such code points in a pattern string. For subject
|
||||
strings, using an escape sequence is preferable.
|
||||
|
@ -208,7 +208,7 @@ COMMAND LINE OPTIONS
|
|||
execution.
|
||||
|
||||
-S size On Unix-like systems, set the size of the run-time stack to
|
||||
size megabytes.
|
||||
size mebibytes (units of 1024*1024 bytes).
|
||||
|
||||
-subject modifier-list
|
||||
Behave as if each subject line contains the given modifiers.
|
||||
|
@ -614,8 +614,9 @@ PATTERN MODIFIERS
|
|||
|
||||
The bsr modifier specifies what \R in a pattern should match. If it is
|
||||
set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to
|
||||
"unicode", \R matches any Unicode newline sequence. The default is
|
||||
specified when PCRE2 is built, with the default default being Unicode.
|
||||
"unicode", \R matches any Unicode newline sequence. The default can be
|
||||
specified when PCRE2 is built; if it is not, the default is set to Uni-
|
||||
code.
|
||||
|
||||
The newline modifier specifies which characters are to be interpreted
|
||||
as newlines, both in the pattern and in subject lines. The type must be
|
||||
|
@ -1272,11 +1273,11 @@ SUBJECT MODIFIERS
|
|||
|
||||
The jitstack modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if
|
||||
JIT optimization is not being used. The value is a number of kilobytes.
|
||||
Setting zero reverts to the default of 32K. Providing a stack that is
|
||||
larger than the default is necessary only for very complicated pat-
|
||||
terns. If jitstack is set non-zero on a subject line it overrides any
|
||||
value that was set on the pattern.
|
||||
JIT optimization is not being used. The value is a number of kibibytes
|
||||
(units of 1024 bytes). Setting zero reverts to the default of 32KiB.
|
||||
Providing a stack that is larger than the default is necessary only for
|
||||
very complicated patterns. If jitstack is set non-zero on a subject
|
||||
line it overrides any value that was set on the pattern.
|
||||
|
||||
Setting heap, match, and depth limits
|
||||
|
||||
|
@ -1315,11 +1316,11 @@ SUBJECT MODIFIERS
|
|||
tion, thus controlling the overall amount of computing resource that is
|
||||
used.
|
||||
|
||||
For both kinds of matching, the heap_limit number (which is in kilo-
|
||||
bytes) limits the amount of heap memory used for matching. A value of
|
||||
zero disables the use of any heap memory; many simple pattern matches
|
||||
can be done without using the heap, so this is not an unreasonable set-
|
||||
ting.
|
||||
For both kinds of matching, the heap_limit number, which is in
|
||||
kibibytes (units of 1024 bytes), limits the amount of heap memory used
|
||||
for matching. A value of zero disables the use of any heap memory; many
|
||||
simple pattern matches can be done without using the heap, so zero is
|
||||
not an unreasonable setting.
|
||||
|
||||
Showing MARK names
|
||||
|
||||
|
|
|
@ -51,7 +51,7 @@ fi
|
|||
# utf invoke UTF-8 functionality
|
||||
#
|
||||
# The data lines must not have any pcre2test modifiers. Unless
|
||||
# "subject_litersl" is on the pattern, data lines are processed as
|
||||
# "subject_literal" is on the pattern, data lines are processed as
|
||||
# Perl double-quoted strings, so if they contain " $ or @ characters, these
|
||||
# have to be escaped. For this reason, all such characters in the
|
||||
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
|
||||
|
|
|
@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
/* Define to 1 if you have the <zlib.h> header file. */
|
||||
/* #undef HAVE_ZLIB_H */
|
||||
|
||||
/* This limits the amount of memory that pcre2_match() may use while matching
|
||||
a pattern. The value is in kilobytes. */
|
||||
/* This limits the amount of memory that may be used while matching a pattern.
|
||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||
to JIT matching. The value is in kilobytes. */
|
||||
#ifndef HEAP_LIMIT
|
||||
#define HEAP_LIMIT 20000000
|
||||
#endif
|
||||
|
@ -155,7 +156,8 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the
|
||||
pcre2_match() function can record a backtrack position during a single
|
||||
matching attempt. There is a runtime interface for setting a different
|
||||
matching attempt. The value is also used to limit a loop counter in
|
||||
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||
limit. The limit exists in order to catch runaway regular expressions that
|
||||
take for ever to determine that they do not match. The default is set very
|
||||
large so that it does not accidentally catch legitimate cases. */
|
||||
|
@ -170,7 +172,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
||||
must be less than the value of MATCH_LIMIT. The default is to use the same
|
||||
value as MATCH_LIMIT. There is a runtime method for setting a different
|
||||
limit. */
|
||||
limit. In the case of pcre2_dfa_match(), this limit controls the depth of
|
||||
the internal nested function calls that are used for pattern recursions,
|
||||
lookarounds, and atomic groups. */
|
||||
#ifndef MATCH_LIMIT_DEPTH
|
||||
#define MATCH_LIMIT_DEPTH MATCH_LIMIT
|
||||
#endif
|
||||
|
@ -210,7 +214,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#define PACKAGE_NAME "PCRE2"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE2 10.31"
|
||||
#define PACKAGE_STRING "PCRE2 10.32-RC1"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre2"
|
||||
|
@ -219,7 +223,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#define PACKAGE_URL ""
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "10.31"
|
||||
#define PACKAGE_VERSION "10.32-RC1"
|
||||
|
||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||
|
@ -339,7 +343,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#endif
|
||||
|
||||
/* Version number of package */
|
||||
#define VERSION "10.31"
|
||||
#define VERSION "10.32-RC1"
|
||||
|
||||
/* Define to 1 if on MINIX. */
|
||||
/* #undef _MINIX */
|
||||
|
|
|
@ -134,7 +134,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
|
||||
/* This limits the amount of memory that may be used while matching a pattern.
|
||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||
to JIT matching. The value is in kilobytes. */
|
||||
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
|
||||
#undef HEAP_LIMIT
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE2_MAJOR 10
|
||||
#define PCRE2_MINOR 31
|
||||
#define PCRE2_PRERELEASE
|
||||
#define PCRE2_DATE 2018-02-12
|
||||
#define PCRE2_MINOR 32
|
||||
#define PCRE2_PRERELEASE -RC1
|
||||
#define PCRE2_DATE 2018-02-19
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE2, the appropriate
|
||||
|
|
|
@ -387,8 +387,8 @@ return (mb->callout)(cb, mb->callout_data);
|
|||
*************************************************/
|
||||
|
||||
/* This function is called when internal_dfa_match() is about to be called
|
||||
recursively and there is insufficient workingspace left in the current work
|
||||
space block. If there's an existing next block, use it; otherwise get a new
|
||||
recursively and there is insufficient working space left in the current
|
||||
workspace block. If there's an existing next block, use it; otherwise get a new
|
||||
block unless the heap limit is reached.
|
||||
|
||||
Arguments:
|
||||
|
|
|
@ -43,7 +43,7 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
#include "config.h"
|
||||
#endif
|
||||
|
||||
/* These defines enables debugging code */
|
||||
/* These defines enable debugging code */
|
||||
|
||||
//#define DEBUG_FRAMES_DISPLAY
|
||||
//#define DEBUG_SHOW_OPS
|
||||
|
@ -2464,7 +2464,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
|
||||
/* ===================================================================== */
|
||||
/* Match a single character type repeatedly. Note that the property type
|
||||
does not need to be in a stack frame as it not used within an RMATCH()
|
||||
does not need to be in a stack frame as it is not used within an RMATCH()
|
||||
loop. */
|
||||
|
||||
#define Lstart_eptr F->temp_sptr[0]
|
||||
|
|
|
@ -492,7 +492,7 @@ so many of them that they are split into two fields. */
|
|||
|
||||
/* These are the matching controls that may be set either on a pattern or on a
|
||||
data line. They are copied from the pattern controls as initial settings for
|
||||
data line controls Note that CTL_MEMORY is not included here, because it does
|
||||
data line controls. Note that CTL_MEMORY is not included here, because it does
|
||||
different things in the two cases. */
|
||||
|
||||
#define CTL_ALLPD (CTL_AFTERTEXT|\
|
||||
|
@ -5411,7 +5411,7 @@ switch(errorcode)
|
|||
|
||||
/* The pattern is now in pbuffer[8|16|32], with the length in code units in
|
||||
patlen. If it is to be converted, copy the result back afterwards so that it
|
||||
it ends up back in the usual place. */
|
||||
ends up back in the usual place. */
|
||||
|
||||
if (pat_patctl.convert_type != CONVERT_UNSET)
|
||||
{
|
||||
|
@ -5735,7 +5735,7 @@ return PR_OK;
|
|||
*************************************************/
|
||||
|
||||
/* This is used for DFA, normal, and JIT fast matching. For DFA matching it
|
||||
should only called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
|
||||
should only be called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
|
||||
|
||||
Arguments:
|
||||
pp the subject string
|
||||
|
@ -7766,7 +7766,7 @@ printf(" -LM list pattern and subject modifiers, then exit\n");
|
|||
printf(" -q quiet: do not output PCRE2 version number at start\n");
|
||||
printf(" -pattern <s> set default pattern modifier fields\n");
|
||||
printf(" -subject <s> set default subject modifier fields\n");
|
||||
printf(" -S <n> set stack size to <n> megabytes\n");
|
||||
printf(" -S <n> set stack size to <n> mebibytes\n");
|
||||
printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
|
||||
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
|
||||
printf(" -T same as -t, but show total times at the end\n");
|
||||
|
|
Loading…
Reference in New Issue