Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace
vectors when doing recursive function calls.
This commit is contained in:
parent
fb413521fc
commit
75747ebb11
10
ChangeLog
10
ChangeLog
|
@ -50,7 +50,15 @@ offset is set zero for early errors.
|
||||||
|
|
||||||
(c) Support for non-C99 snprintf() that returns -1 in the overflow case.
|
(c) Support for non-C99 snprintf() that returns -1 in the overflow case.
|
||||||
|
|
||||||
11. Minor tidy of pcre2_dfa_matgch() code.
|
11. Minor tidy of pcre2_dfa_match() code.
|
||||||
|
|
||||||
|
12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
|
||||||
|
use the stack for local workspace and local ovectors. Instead, an initial block
|
||||||
|
of stack is reserved, but if this is insufficient, heap memory is used. The
|
||||||
|
heap limit parameter now applies to pcre2_dfa_match().
|
||||||
|
|
||||||
|
13. If a "find limits" test of DFA matching in pcre2test resulted in too many
|
||||||
|
matches for the ovector, no matches were displayed.
|
||||||
|
|
||||||
|
|
||||||
Version 10.31 12-February-2018
|
Version 10.31 12-February-2018
|
||||||
|
|
12
README
12
README
|
@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
|
||||||
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
||||||
|
|
||||||
. There is a separate counter that limits the depth of nested backtracking
|
. There is a separate counter that limits the depth of nested backtracking
|
||||||
during a matching process, which indirectly limits the amount of heap memory
|
(pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
|
||||||
that is used. This also has a default of ten million, which is essentially
|
matching process, which indirectly limits the amount of heap memory that is
|
||||||
"unlimited". You can change the default by setting, for example,
|
used, and in the case of pcre2_dfa_match() the amount of stack as well. This
|
||||||
|
counter also has a default of ten million, which is essentially "unlimited".
|
||||||
|
You can change the default by setting, for example,
|
||||||
|
|
||||||
--with-match-limit-depth=5000
|
--with-match-limit-depth=5000
|
||||||
|
|
||||||
|
@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
pcre2_set_depth_limit).
|
pcre2_set_depth_limit).
|
||||||
|
|
||||||
. You can also set an explicit limit on the amount of heap memory used by
|
. You can also set an explicit limit on the amount of heap memory used by
|
||||||
the pcre2_match() interpreter:
|
the pcre2_match() and pcre2_dfa_match() interpreters:
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
|
@ -885,4 +887,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 25 February 2018
|
Last updated: 27 April 2018
|
||||||
|
|
21
configure.ac
21
configure.ac
|
@ -142,7 +142,7 @@ AC_ARG_ENABLE(jit,
|
||||||
AS_HELP_STRING([--enable-jit],
|
AS_HELP_STRING([--enable-jit],
|
||||||
[enable Just-In-Time compiling support]),
|
[enable Just-In-Time compiling support]),
|
||||||
, enable_jit=no)
|
, enable_jit=no)
|
||||||
|
|
||||||
# This code enables JIT if the hardware supports it.
|
# This code enables JIT if the hardware supports it.
|
||||||
if test "$enable_jit" = "auto"; then
|
if test "$enable_jit" = "auto"; then
|
||||||
AC_LANG(C)
|
AC_LANG(C)
|
||||||
|
@ -718,10 +718,11 @@ AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
||||||
AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
|
AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
|
||||||
The value of MATCH_LIMIT determines the default number of times the
|
The value of MATCH_LIMIT determines the default number of times the
|
||||||
pcre2_match() function can record a backtrack position during a single
|
pcre2_match() function can record a backtrack position during a single
|
||||||
matching attempt. There is a runtime interface for setting a different limit.
|
matching attempt. The value is also used to limit a loop counter in
|
||||||
The limit exists in order to catch runaway regular expressions that take for
|
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||||
ever to determine that they do not match. The default is set very large so
|
limit. The limit exists in order to catch runaway regular expressions that
|
||||||
that it does not accidentally catch legitimate cases.])
|
take for ever to determine that they do not match. The default is set very
|
||||||
|
large so that it does not accidentally catch legitimate cases.])
|
||||||
|
|
||||||
# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth
|
# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth
|
||||||
|
|
||||||
|
@ -745,11 +746,15 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
|
||||||
the maximum amount of heap memory that is used. The value of
|
the maximum amount of heap memory that is used. The value of
|
||||||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
|
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
|
||||||
be less than the value of MATCH_LIMIT. The default is to use the same value
|
be less than the value of MATCH_LIMIT. The default is to use the same value
|
||||||
as MATCH_LIMIT. There is a runtime method for setting a different limit.])
|
as MATCH_LIMIT. There is a runtime method for setting a different limit. In
|
||||||
|
the case of pcre2_dfa_match(), this limit controls the depth of the internal
|
||||||
|
nested function calls that are used for pattern recursions, lookarounds, and
|
||||||
|
atomic groups.])
|
||||||
|
|
||||||
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
||||||
This limits the amount of memory that pcre2_match() may use while matching
|
This limits the amount of memory that may be used while matching
|
||||||
a pattern. The value is in kilobytes.])
|
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
||||||
|
not apply to JIT matching. The value is in kilobytes.])
|
||||||
|
|
||||||
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
||||||
This limit is parameterized just in case anybody ever wants to
|
This limit is parameterized just in case anybody ever wants to
|
||||||
|
|
|
@ -10,6 +10,7 @@ This document contains the following sections:
|
||||||
Calling conventions in Windows environments
|
Calling conventions in Windows environments
|
||||||
Comments about Win32 builds
|
Comments about Win32 builds
|
||||||
Building PCRE2 on Windows with CMake
|
Building PCRE2 on Windows with CMake
|
||||||
|
Building PCRE2 on Windows with Visual Studio
|
||||||
Testing with RunTest.bat
|
Testing with RunTest.bat
|
||||||
Building PCRE2 on native z/OS and z/VM
|
Building PCRE2 on native z/OS and z/VM
|
||||||
|
|
||||||
|
@ -328,6 +329,18 @@ cache can be deleted by selecting "File > Delete Cache".
|
||||||
most recent build configuration is targeted by the tests. A summary of
|
most recent build configuration is targeted by the tests. A summary of
|
||||||
test results is presented. Complete test output is subsequently
|
test results is presented. Complete test output is subsequently
|
||||||
available for review in Testing\Temporary under your build dir.
|
available for review in Testing\Temporary under your build dir.
|
||||||
|
|
||||||
|
|
||||||
|
BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO
|
||||||
|
|
||||||
|
The code currently cannot be compiled without a stdint.h header, which is
|
||||||
|
available only in relatively recent versions of Visual Studio. However, this
|
||||||
|
portable and permissively-licensed implementation of the header worked without
|
||||||
|
issue:
|
||||||
|
|
||||||
|
http://www.azillionmonkeys.com/qed/pstdint.h
|
||||||
|
|
||||||
|
Just rename it and drop it into the top level of the build tree.
|
||||||
|
|
||||||
|
|
||||||
TESTING WITH RUNTEST.BAT
|
TESTING WITH RUNTEST.BAT
|
||||||
|
@ -382,6 +395,6 @@ Everything in that location, source and executable, is in EBCDIC and native
|
||||||
z/OS file formats. The port provides an API for LE languages such as COBOL and
|
z/OS file formats. The port provides an API for LE languages such as COBOL and
|
||||||
for the z/OS and z/VM versions of the Rexx languages.
|
for the z/OS and z/VM versions of the Rexx languages.
|
||||||
|
|
||||||
===============================
|
===========================
|
||||||
Last Updated: 13 September 2017
|
Last Updated: 19 April 2018
|
||||||
===============================
|
===========================
|
||||||
|
|
|
@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
|
||||||
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
||||||
|
|
||||||
. There is a separate counter that limits the depth of nested backtracking
|
. There is a separate counter that limits the depth of nested backtracking
|
||||||
during a matching process, which indirectly limits the amount of heap memory
|
(pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
|
||||||
that is used. This also has a default of ten million, which is essentially
|
matching process, which indirectly limits the amount of heap memory that is
|
||||||
"unlimited". You can change the default by setting, for example,
|
used, and in the case of pcre2_dfa_match() the amount of stack as well. This
|
||||||
|
counter also has a default of ten million, which is essentially "unlimited".
|
||||||
|
You can change the default by setting, for example,
|
||||||
|
|
||||||
--with-match-limit-depth=5000
|
--with-match-limit-depth=5000
|
||||||
|
|
||||||
|
@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
pcre2_set_depth_limit).
|
pcre2_set_depth_limit).
|
||||||
|
|
||||||
. You can also set an explicit limit on the amount of heap memory used by
|
. You can also set an explicit limit on the amount of heap memory used by
|
||||||
the pcre2_match() interpreter:
|
the pcre2_match() and pcre2_dfa_match() interpreters:
|
||||||
|
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
|
|
||||||
|
@ -885,4 +887,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 25 February 2018
|
Last updated: 27 April 2018
|
||||||
|
|
|
@ -46,9 +46,9 @@ just once (except when processing lookaround assertions). This function is
|
||||||
<i>wscount</i> Number of elements in the vector
|
<i>wscount</i> Number of elements in the vector
|
||||||
</pre>
|
</pre>
|
||||||
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
|
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
|
||||||
up a callout function or specify the match and/or the recursion depth limits.
|
up a callout function or specify the heap limit or the match or the recursion
|
||||||
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
|
depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
|
||||||
The options are:
|
characters. The options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
|
|
|
@ -951,14 +951,15 @@ offset limit. In other words, whichever limit comes first is used.
|
||||||
<br>
|
<br>
|
||||||
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
||||||
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
||||||
information when running an interpretive match. This limit does not apply to
|
information when running an interpretive match. This limit also applies to
|
||||||
matching with the JIT optimization, which has its own memory control
|
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
|
||||||
arrangements (see the
|
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
||||||
|
does not apply to matching with the JIT optimization, which has its own memory
|
||||||
|
control arrangements (see the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
|
documentation for more details). If the limit is reached, the negative error
|
||||||
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
||||||
returned. The default limit is set when PCRE2 is built; the default default is
|
built; the default default is very large and is essentially "unlimited".
|
||||||
very large and is essentially "unlimited".
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
|
@ -978,6 +979,12 @@ Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
is set to a value less than 21 (in particular, zero) no heap memory will be
|
||||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
used. In this case, only patterns that do not have a lot of nested backtracking
|
||||||
can be successfully processed.
|
can be successfully processed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Similarly, for <b>pcre2_dfa_match()</b>, a vector on the system stack is used
|
||||||
|
when processing pattern recursions, lookarounds, or atomic groups, and only if
|
||||||
|
this is not big enough is heap memory used. In this case, too, setting a value
|
||||||
|
of zero disables the use of the heap.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
|
@ -1035,11 +1042,22 @@ backtracking.
|
||||||
<P>
|
<P>
|
||||||
The depth limit is not relevant, and is ignored, when matching is done using
|
The depth limit is not relevant, and is ignored, when matching is done using
|
||||||
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
||||||
uses it to limit the depth of internal recursive function calls that implement
|
uses it to limit the depth of nested internal recursive function calls that
|
||||||
atomic groups, lookaround assertions, and pattern recursions. This is,
|
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||||
therefore, an indirect limit on the amount of system stack that is used. A
|
limits, indirectly, the amount of system stack this is used. It was more useful
|
||||||
recursive pattern such as /(.)(?1)/, when matched to a very long string using
|
in versions before 10.32, when stack memory was used for local workspace
|
||||||
<b>pcre2_dfa_match()</b>, can use a great deal of stack.
|
vectors for recursive function calls. From version 10.32, only local variables
|
||||||
|
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||||
|
a small stack can support quite a lot of recursion.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If the depth of internal recursive function calls is great enough, local
|
||||||
|
workspace vectors are allocated on the heap from version 10.32 onwards, so the
|
||||||
|
depth limit also indirectly limits the amount of heap memory that is used. A
|
||||||
|
recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
|
||||||
|
using <b>pcre2_dfa_match()</b>, can use a great deal of memory. However, it is
|
||||||
|
probably better to limit heap usage directly by calling
|
||||||
|
<b>pcre2_set_heap_limit()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The default value for the depth limit can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; the
|
||||||
|
@ -1096,15 +1114,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
|
||||||
PCRE2_CONFIG_DEPTHLIMIT
|
PCRE2_CONFIG_DEPTHLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t integer that gives the default limit for the depth of
|
The output is a uint32_t integer that gives the default limit for the depth of
|
||||||
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
|
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions,
|
||||||
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
|
lookarounds, and atomic groups in <b>pcre2_dfa_match()</b>. Further details are
|
||||||
<b>pcre2_set_depth_limit()</b> above.
|
given with <b>pcre2_set_depth_limit()</b> above.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
||||||
for the amount of heap memory used by <b>pcre2_match()</b>. Further details are
|
for the amount of heap memory used by <b>pcre2_match()</b> or
|
||||||
given with <b>pcre2_set_heap_limit()</b> above.
|
<b>pcre2_dfa_match()</b>. Further details are given with
|
||||||
|
<b>pcre2_set_heap_limit()</b> above.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_JIT
|
PCRE2_CONFIG_JIT
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3510,17 +3529,7 @@ capture.
|
||||||
Calls to the convenience functions that extract substrings by name
|
Calls to the convenience functions that extract substrings by name
|
||||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
||||||
DFA match. The convenience functions that extract substrings by number never
|
DFA match. The convenience functions that extract substrings by number never
|
||||||
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
|
return PCRE2_ERROR_NOSUBSTRING.
|
||||||
slightly different:
|
|
||||||
<pre>
|
|
||||||
PCRE2_ERROR_UNAVAILABLE
|
|
||||||
</pre>
|
|
||||||
The ovector is not big enough to include a slot for the given substring number.
|
|
||||||
<pre>
|
|
||||||
PCRE2_ERROR_UNSET
|
|
||||||
</pre>
|
|
||||||
There is a slot in the ovector for this substring, but there were insufficient
|
|
||||||
matches to fill it.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The matched strings are stored in the ovector in reverse order of length; that
|
The matched strings are stored in the ovector in reverse order of length; that
|
||||||
|
@ -3594,9 +3603,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 31 December 2017
|
Last updated: 27 April 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -295,9 +295,10 @@ change this by a setting such as
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
</pre>
|
</pre>
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
||||||
interpretive matching in pcre2_match(). It does not apply when JIT (which has
|
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
||||||
its own memory arrangements) is used, nor does it apply to
|
may also use the heap for internal workspace when processing complicated
|
||||||
<b>pcre2_dfa_match()</b>.
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
|
arrangements) is used.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
You can also explicitly limit the depth of nested backtracking in the
|
You can also explicitly limit the depth of nested backtracking in the
|
||||||
|
@ -573,7 +574,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 25 February 2018
|
Last updated: 26 April 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -310,10 +310,12 @@ PCRE2_UNSET.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For DFA matching, the <i>offset_vector</i> field points to the ovector that was
|
For DFA matching, the <i>offset_vector</i> field points to the ovector that was
|
||||||
passed to the matching function in the match data block, but it holds no useful
|
passed to the matching function in the match data block for callouts at the top
|
||||||
information at callout time because <b>pcre2_dfa_match()</b> does not support
|
level, but to an internal ovector during the processing of pattern recursions,
|
||||||
substring capturing. The value of <i>capture_top</i> is always 1 and the value
|
lookarounds, and atomic groups. However, these ovectors hold no useful
|
||||||
of <i>capture_last</i> is always 0 for DFA matching.
|
information because <b>pcre2_dfa_match()</b> does not support substring
|
||||||
|
capturing. The value of <i>capture_top</i> is always 1 and the value of
|
||||||
|
<i>capture_last</i> is always 0 for DFA matching.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
|
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
|
||||||
|
@ -461,9 +463,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 22 December 2017
|
Last updated: 26 April 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -173,12 +173,12 @@ the application to apply the JIT optimization by calling
|
||||||
Setting match resource limits
|
Setting match resource limits
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The pcre2_match() function contains a counter that is incremented every time it
|
The <b>pcre2_match()</b> function contains a counter that is incremented every
|
||||||
goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
|
time it goes round its main loop. The caller of <b>pcre2_match()</b> can set a
|
||||||
this counter, which therefore limits the amount of computing resource used for
|
limit on this counter, which therefore limits the amount of computing resource
|
||||||
a match. The maximum depth of nested backtracking can also be limited; this
|
used for a match. The maximum depth of nested backtracking can also be limited;
|
||||||
indirectly restricts the amount of heap memory that is used, but there is also
|
this indirectly restricts the amount of heap memory that is used, but there is
|
||||||
an explicit memory limit that can be set.
|
also an explicit memory limit that can be set.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
These facilities are provided to catch runaway matches that are provoked by
|
These facilities are provided to catch runaway matches that are provoked by
|
||||||
|
@ -195,20 +195,22 @@ where d is any number of decimal digits. However, the value of the setting must
|
||||||
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used.
|
setting of one of these limits, the lower value is used. The heap limit is
|
||||||
|
specified in kilobytes.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||||
still recognized for backwards compatibility.
|
still recognized for backwards compatibility.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
|
The heap limit applies only when the <b>pcre2_match()</b> or
|
||||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
<b>pcre2_dfa_match()</b> interpreters are used for matching. It does not apply
|
||||||
(but in a different way) when JIT is being used, or when
|
to JIT. The match limit is used (but in a different way) when JIT is being
|
||||||
<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
|
used, or when <b>pcre2_dfa_match()</b> is called, to limit computing resource
|
||||||
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
usage by those matching functions. The depth limit is ignored by JIT but is
|
||||||
matching, which uses function recursion for recursions within the pattern. In
|
relevant for DFA matching, which uses function recursion for recursions within
|
||||||
this case, the depth limit controls the amount of system stack that is used.
|
the pattern and for lookaround assertions and atomic groups. In this case, the
|
||||||
|
depth limit controls the depth of such recursion.
|
||||||
<a name="newlines"></a></P>
|
<a name="newlines"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Newline conventions
|
Newline conventions
|
||||||
|
@ -2818,11 +2820,6 @@ matched at the top level, its final captured value is unset, even if it was
|
||||||
(temporarily) set at a deeper level during the matching process.
|
(temporarily) set at a deeper level during the matching process.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
|
|
||||||
obtain extra memory from the heap to store data during a recursion. If no
|
|
||||||
memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
Do not confuse the (?R) item with the condition (R), which tests for recursion.
|
Do not confuse the (?R) item with the condition (R), which tests for recursion.
|
||||||
Consider this pattern, which matches text in angle brackets, allowing for
|
Consider this pattern, which matches text in angle brackets, allowing for
|
||||||
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
|
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
|
||||||
|
@ -3479,9 +3476,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 12 September 2017
|
Last updated: 25 April 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -93,9 +93,17 @@ may also reduce the memory requirements.
|
||||||
<P>
|
<P>
|
||||||
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
|
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
|
||||||
function calls, but only for processing atomic groups, lookaround assertions,
|
function calls, but only for processing atomic groups, lookaround assertions,
|
||||||
and recursion within the pattern. Too much nested recursion may cause stack
|
and recursion within the pattern. The original version of the code used to
|
||||||
issues. The "match depth" parameter can be used to limit the depth of function
|
allocate quite large internal workspace vectors on the stack, which caused some
|
||||||
recursion in <b>pcre2_dfa_match()</b>.
|
problems for some patterns in environments with small stacks. From release
|
||||||
|
10.32 the code for <b>pcre2_dfa_match()</b> has been re-factored to use heap
|
||||||
|
memory when necessary for internal workspace when recursing, though recursive
|
||||||
|
function calls are still used.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The "match depth" parameter can be used to limit the depth of function
|
||||||
|
recursion, and the "match heap" parameter to limit heap memory in
|
||||||
|
<b>pcre2_dfa_match()</b>.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
|
<br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -244,9 +252,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 08 April 2017
|
Last updated: 25 April 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -1199,7 +1199,7 @@ pattern.
|
||||||
get=<number or name> extract captured substring
|
get=<number or name> extract captured substring
|
||||||
getall extract all captured substrings
|
getall extract all captured substrings
|
||||||
/g global global matching
|
/g global global matching
|
||||||
heap_limit=<n> set a limit on heap memory
|
heap_limit=<n> set a limit on heap memory (Kbytes)
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
match_limit=<n> set a match limit
|
match_limit=<n> set a match limit
|
||||||
|
@ -1438,20 +1438,17 @@ Finding minimum limits
|
||||||
<P>
|
<P>
|
||||||
If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
|
If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
|
||||||
calls the relevant matching function several times, setting different values in
|
calls the relevant matching function several times, setting different values in
|
||||||
the match context via <b>pcre2_set_heap_limit(), \fBpcre2_set_match_limit()</b>,
|
the match context via <b>pcre2_set_heap_limit()</b>,
|
||||||
or <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
|
<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
|
||||||
parameter that allows the match to complete without error.
|
the minimum values for each parameter that allows the match to complete without
|
||||||
|
error. If JIT is being used, only the match limit is relevant.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching is
|
When using this modifier, the pattern should not contain any limit settings
|
||||||
being used, only the depth limit is relevant.
|
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
|
||||||
</P>
|
lower than the minimum matching value, the minimum value cannot be found
|
||||||
<P>
|
because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
|
||||||
The <i>match_limit</i> number is a measure of the amount of backtracking
|
an in-pattern limit; they cannot increase it.
|
||||||
that takes place, and learning the minimum value can be instructive. For most
|
|
||||||
simple matches, the number is quite small, but for patterns with very large
|
|
||||||
numbers of matching possibilities, it can become large very quickly with
|
|
||||||
increasing length of subject string.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
|
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
|
||||||
|
@ -1460,6 +1457,22 @@ searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
|
||||||
recursive calls of the internal function that is used for handling pattern
|
recursive calls of the internal function that is used for handling pattern
|
||||||
recursion, lookaround assertions, and atomic groups.
|
recursion, lookaround assertions, and atomic groups.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
|
||||||
|
of backtracking that takes place, and learning the minimum value can be
|
||||||
|
instructive. For most simple matches, the number is quite small, but for
|
||||||
|
patterns with very large numbers of matching possibilities, it can become large
|
||||||
|
very quickly with increasing length of subject string. In the case of DFA
|
||||||
|
matching, <i>match_limit</i> controls the total number of calls, both recursive
|
||||||
|
and non-recursive, to the internal matching function, thus controlling the
|
||||||
|
overall amount of computing resource that is used.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
|
||||||
|
limits the amount of heap memory used for matching. A value of zero disables
|
||||||
|
the use of any heap memory; many simple pattern matches can be done without
|
||||||
|
using the heap, so this is not an unreasonable setting.
|
||||||
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -1476,13 +1489,14 @@ Showing memory usage
|
||||||
<P>
|
<P>
|
||||||
The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
|
The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
|
||||||
memory allocation and freeing calls that occur during a call to
|
memory allocation and freeing calls that occur during a call to
|
||||||
<b>pcre2_match()</b>. These occur only when a match requires a bigger vector
|
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
|
||||||
than the default for remembering backtracking points. In many cases there will
|
requires a bigger vector than the default for remembering backtracking points
|
||||||
be no heap memory used and therefore no additional output. No heap memory is
|
(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
|
||||||
allocated during matching with <b>pcre2_dfa_match</b> or with JIT, so in those
|
many cases there will be no heap memory used and therefore no additional
|
||||||
cases the <b>memory</b> modifier never has any effect. For this modifier to
|
output. No heap memory is allocated during matching with JIT, so in that case
|
||||||
work, the <b>null_context</b> modifier must not be set on both the pattern and
|
the <b>memory</b> modifier never has any effect. For this modifier to work, the
|
||||||
the subject, though it can be set on one or the other.
|
<b>null_context</b> modifier must not be set on both the pattern and the
|
||||||
|
subject, though it can be set on one or the other.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
@ -1982,9 +1996,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 December 2017
|
Last updated: 25 April 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
909
doc/pcre2.txt
909
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
|
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -34,9 +34,9 @@ just once (except when processing lookaround assertions). This function is
|
||||||
\fIwscount\fP Number of elements in the vector
|
\fIwscount\fP Number of elements in the vector
|
||||||
.sp
|
.sp
|
||||||
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
||||||
up a callout function or specify the match and/or the recursion depth limits.
|
up a callout function or specify the heap limit or the match or the recursion
|
||||||
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
|
depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
|
||||||
The options are:
|
characters. The options are:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "31 December 2017" "PCRE2 10.31"
|
.TH PCRE2API 3 "27 April 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -887,16 +887,17 @@ offset limit. In other words, whichever limit comes first is used.
|
||||||
.sp
|
.sp
|
||||||
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
||||||
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
||||||
information when running an interpretive match. This limit does not apply to
|
information when running an interpretive match. This limit also applies to
|
||||||
matching with the JIT optimization, which has its own memory control
|
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
|
||||||
arrangements (see the
|
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
||||||
|
does not apply to matching with the JIT optimization, which has its own memory
|
||||||
|
control arrangements (see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
|
documentation for more details). If the limit is reached, the negative error
|
||||||
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
||||||
returned. The default limit is set when PCRE2 is built; the default default is
|
built; the default default is very large and is essentially "unlimited".
|
||||||
very large and is essentially "unlimited".
|
|
||||||
.P
|
.P
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
pattern of the form
|
pattern of the form
|
||||||
|
@ -914,6 +915,11 @@ Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
is set to a value less than 21 (in particular, zero) no heap memory will be
|
||||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
used. In this case, only patterns that do not have a lot of nested backtracking
|
||||||
can be successfully processed.
|
can be successfully processed.
|
||||||
|
.P
|
||||||
|
Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used
|
||||||
|
when processing pattern recursions, lookarounds, or atomic groups, and only if
|
||||||
|
this is not big enough is heap memory used. In this case, too, setting a value
|
||||||
|
of zero disables the use of the heap.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
|
@ -967,11 +973,21 @@ backtracking.
|
||||||
.P
|
.P
|
||||||
The depth limit is not relevant, and is ignored, when matching is done using
|
The depth limit is not relevant, and is ignored, when matching is done using
|
||||||
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
||||||
uses it to limit the depth of internal recursive function calls that implement
|
uses it to limit the depth of nested internal recursive function calls that
|
||||||
atomic groups, lookaround assertions, and pattern recursions. This is,
|
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||||
therefore, an indirect limit on the amount of system stack that is used. A
|
limits, indirectly, the amount of system stack this is used. It was more useful
|
||||||
recursive pattern such as /(.)(?1)/, when matched to a very long string using
|
in versions before 10.32, when stack memory was used for local workspace
|
||||||
\fBpcre2_dfa_match()\fP, can use a great deal of stack.
|
vectors for recursive function calls. From version 10.32, only local variables
|
||||||
|
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||||
|
a small stack can support quite a lot of recursion.
|
||||||
|
.P
|
||||||
|
If the depth of internal recursive function calls is great enough, local
|
||||||
|
workspace vectors are allocated on the heap from version 10.32 onwards, so the
|
||||||
|
depth limit also indirectly limits the amount of heap memory that is used. A
|
||||||
|
recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
|
||||||
|
using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
|
||||||
|
probably better to limit heap usage directly by calling
|
||||||
|
\fBpcre2_set_heap_limit()\fP.
|
||||||
.P
|
.P
|
||||||
The default value for the depth limit can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; the
|
||||||
default default is the same value as the default for the match limit. If the
|
default default is the same value as the default for the match limit. If the
|
||||||
|
@ -1028,15 +1044,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
|
||||||
PCRE2_CONFIG_DEPTHLIMIT
|
PCRE2_CONFIG_DEPTHLIMIT
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t integer that gives the default limit for the depth of
|
The output is a uint32_t integer that gives the default limit for the depth of
|
||||||
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
|
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions,
|
||||||
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
|
lookarounds, and atomic groups in \fBpcre2_dfa_match()\fP. Further details are
|
||||||
\fBpcre2_set_depth_limit()\fP above.
|
given with \fBpcre2_set_depth_limit()\fP above.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_HEAPLIMIT
|
PCRE2_CONFIG_HEAPLIMIT
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
||||||
for the amount of heap memory used by \fBpcre2_match()\fP. Further details are
|
for the amount of heap memory used by \fBpcre2_match()\fP or
|
||||||
given with \fBpcre2_set_heap_limit()\fP above.
|
\fBpcre2_dfa_match()\fP. Further details are given with
|
||||||
|
\fBpcre2_set_heap_limit()\fP above.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_JIT
|
PCRE2_CONFIG_JIT
|
||||||
.sp
|
.sp
|
||||||
|
@ -3514,17 +3531,7 @@ capture.
|
||||||
Calls to the convenience functions that extract substrings by name
|
Calls to the convenience functions that extract substrings by name
|
||||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
||||||
DFA match. The convenience functions that extract substrings by number never
|
DFA match. The convenience functions that extract substrings by number never
|
||||||
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
|
return PCRE2_ERROR_NOSUBSTRING.
|
||||||
slightly different:
|
|
||||||
.sp
|
|
||||||
PCRE2_ERROR_UNAVAILABLE
|
|
||||||
.sp
|
|
||||||
The ovector is not big enough to include a slot for the given substring number.
|
|
||||||
.sp
|
|
||||||
PCRE2_ERROR_UNSET
|
|
||||||
.sp
|
|
||||||
There is a slot in the ovector for this substring, but there were insufficient
|
|
||||||
matches to fill it.
|
|
||||||
.P
|
.P
|
||||||
The matched strings are stored in the ovector in reverse order of length; that
|
The matched strings are stored in the ovector in reverse order of length; that
|
||||||
is, the longest matching string is first. If there were too many matches to fit
|
is, the longest matching string is first. If there were too many matches to fit
|
||||||
|
@ -3605,6 +3612,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 31 December 2017
|
Last updated: 27 April 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2BUILD 3 "25 February 2018" "PCRE2 10.32"
|
.TH PCRE2BUILD 3 "26 April 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.
|
.
|
||||||
|
@ -292,9 +292,10 @@ change this by a setting such as
|
||||||
--with-heap-limit=500
|
--with-heap-limit=500
|
||||||
.sp
|
.sp
|
||||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
||||||
interpretive matching in pcre2_match(). It does not apply when JIT (which has
|
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
||||||
its own memory arrangements) is used, nor does it apply to
|
may also use the heap for internal workspace when processing complicated
|
||||||
\fBpcre2_dfa_match()\fP.
|
patterns. This limit does not apply when JIT (which has its own memory
|
||||||
|
arrangements) is used.
|
||||||
.P
|
.P
|
||||||
You can also explicitly limit the depth of nested backtracking in the
|
You can also explicitly limit the depth of nested backtracking in the
|
||||||
\fBpcre2_match()\fP interpreter. This limit defaults to the value that is set
|
\fBpcre2_match()\fP interpreter. This limit defaults to the value that is set
|
||||||
|
@ -590,6 +591,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 25 February 2018
|
Last updated: 26 April 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
|
.TH PCRE2CALLOUT 3 "26 April 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -291,10 +291,12 @@ than \fIcapture_top\fP also have both of their ovector slots set to
|
||||||
PCRE2_UNSET.
|
PCRE2_UNSET.
|
||||||
.P
|
.P
|
||||||
For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
|
For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
|
||||||
passed to the matching function in the match data block, but it holds no useful
|
passed to the matching function in the match data block for callouts at the top
|
||||||
information at callout time because \fBpcre2_dfa_match()\fP does not support
|
level, but to an internal ovector during the processing of pattern recursions,
|
||||||
substring capturing. The value of \fIcapture_top\fP is always 1 and the value
|
lookarounds, and atomic groups. However, these ovectors hold no useful
|
||||||
of \fIcapture_last\fP is always 0 for DFA matching.
|
information because \fBpcre2_dfa_match()\fP does not support substring
|
||||||
|
capturing. The value of \fIcapture_top\fP is always 1 and the value of
|
||||||
|
\fIcapture_last\fP is always 0 for DFA matching.
|
||||||
.P
|
.P
|
||||||
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
|
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
|
||||||
that were passed to the matching function.
|
that were passed to the matching function.
|
||||||
|
@ -441,6 +443,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 22 December 2017
|
Last updated: 26 April 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "12 September 2017" "PCRE2 10.31"
|
.TH PCRE2PATTERN 3 "25 April 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -141,12 +141,12 @@ the application to apply the JIT optimization by calling
|
||||||
.SS "Setting match resource limits"
|
.SS "Setting match resource limits"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The pcre2_match() function contains a counter that is incremented every time it
|
The \fBpcre2_match()\fP function contains a counter that is incremented every
|
||||||
goes round its main loop. The caller of \fBpcre2_match()\fP can set a limit on
|
time it goes round its main loop. The caller of \fBpcre2_match()\fP can set a
|
||||||
this counter, which therefore limits the amount of computing resource used for
|
limit on this counter, which therefore limits the amount of computing resource
|
||||||
a match. The maximum depth of nested backtracking can also be limited; this
|
used for a match. The maximum depth of nested backtracking can also be limited;
|
||||||
indirectly restricts the amount of heap memory that is used, but there is also
|
this indirectly restricts the amount of heap memory that is used, but there is
|
||||||
an explicit memory limit that can be set.
|
also an explicit memory limit that can be set.
|
||||||
.P
|
.P
|
||||||
These facilities are provided to catch runaway matches that are provoked by
|
These facilities are provided to catch runaway matches that are provoked by
|
||||||
patterns with huge matching trees (a typical example is a pattern with nested
|
patterns with huge matching trees (a typical example is a pattern with nested
|
||||||
|
@ -162,18 +162,20 @@ where d is any number of decimal digits. However, the value of the setting must
|
||||||
be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used.
|
setting of one of these limits, the lower value is used. The heap limit is
|
||||||
|
specified in kilobytes.
|
||||||
.P
|
.P
|
||||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||||
still recognized for backwards compatibility.
|
still recognized for backwards compatibility.
|
||||||
.P
|
.P
|
||||||
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
|
The heap limit applies only when the \fBpcre2_match()\fP or
|
||||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
\fBpcre2_dfa_match()\fP interpreters are used for matching. It does not apply
|
||||||
(but in a different way) when JIT is being used, or when
|
to JIT. The match limit is used (but in a different way) when JIT is being
|
||||||
\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
|
used, or when \fBpcre2_dfa_match()\fP is called, to limit computing resource
|
||||||
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
usage by those matching functions. The depth limit is ignored by JIT but is
|
||||||
matching, which uses function recursion for recursions within the pattern. In
|
relevant for DFA matching, which uses function recursion for recursions within
|
||||||
this case, the depth limit controls the amount of system stack that is used.
|
the pattern and for lookaround assertions and atomic groups. In this case, the
|
||||||
|
depth limit controls the depth of such recursion.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="newlines"></a>
|
.\" HTML <a name="newlines"></a>
|
||||||
|
@ -2838,10 +2840,6 @@ the last value taken on at the top level. If a capturing subpattern is not
|
||||||
matched at the top level, its final captured value is unset, even if it was
|
matched at the top level, its final captured value is unset, even if it was
|
||||||
(temporarily) set at a deeper level during the matching process.
|
(temporarily) set at a deeper level during the matching process.
|
||||||
.P
|
.P
|
||||||
If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
|
|
||||||
obtain extra memory from the heap to store data during a recursion. If no
|
|
||||||
memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
|
|
||||||
.P
|
|
||||||
Do not confuse the (?R) item with the condition (R), which tests for recursion.
|
Do not confuse the (?R) item with the condition (R), which tests for recursion.
|
||||||
Consider this pattern, which matches text in angle brackets, allowing for
|
Consider this pattern, which matches text in angle brackets, allowing for
|
||||||
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
|
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
|
||||||
|
@ -3505,6 +3503,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 12 September 2017
|
Last updated: 25 April 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PERFORM 3 "08 April 2017" "PCRE2 10.30"
|
.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 PERFORMANCE"
|
.SH "PCRE2 PERFORMANCE"
|
||||||
|
@ -78,9 +78,16 @@ may also reduce the memory requirements.
|
||||||
.P
|
.P
|
||||||
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
|
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
|
||||||
function calls, but only for processing atomic groups, lookaround assertions,
|
function calls, but only for processing atomic groups, lookaround assertions,
|
||||||
and recursion within the pattern. Too much nested recursion may cause stack
|
and recursion within the pattern. The original version of the code used to
|
||||||
issues. The "match depth" parameter can be used to limit the depth of function
|
allocate quite large internal workspace vectors on the stack, which caused some
|
||||||
recursion in \fBpcre2_dfa_match()\fP.
|
problems for some patterns in environments with small stacks. From release
|
||||||
|
10.32 the code for \fBpcre2_dfa_match()\fP has been re-factored to use heap
|
||||||
|
memory when necessary for internal workspace when recursing, though recursive
|
||||||
|
function calls are still used.
|
||||||
|
.P
|
||||||
|
The "match depth" parameter can be used to limit the depth of function
|
||||||
|
recursion, and the "match heap" parameter to limit heap memory in
|
||||||
|
\fBpcre2_dfa_match()\fP.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "PROCESSING TIME"
|
.SH "PROCESSING TIME"
|
||||||
|
@ -232,6 +239,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 08 April 2017
|
Last updated: 25 April 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
|
.TH PCRE2TEST 1 "25 April 2018" "PCRE 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -1168,7 +1168,7 @@ pattern.
|
||||||
get=<number or name> extract captured substring
|
get=<number or name> extract captured substring
|
||||||
getall extract all captured substrings
|
getall extract all captured substrings
|
||||||
/g global global matching
|
/g global global matching
|
||||||
heap_limit=<n> set a limit on heap memory
|
heap_limit=<n> set a limit on heap memory (Kbytes)
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
match_limit=<n> set a match limit
|
match_limit=<n> set a match limit
|
||||||
|
@ -1401,24 +1401,36 @@ the appropriate limits in the match context. These values are ignored when the
|
||||||
.sp
|
.sp
|
||||||
If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
|
If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
|
||||||
calls the relevant matching function several times, setting different values in
|
calls the relevant matching function several times, setting different values in
|
||||||
the match context via \fBpcre2_set_heap_limit(), \fBpcre2_set_match_limit()\fP,
|
the match context via \fBpcre2_set_heap_limit()\fP,
|
||||||
or \fBpcre2_set_depth_limit()\fP until it finds the minimum values for each
|
\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
|
||||||
parameter that allows the match to complete without error.
|
the minimum values for each parameter that allows the match to complete without
|
||||||
|
error. If JIT is being used, only the match limit is relevant.
|
||||||
.P
|
.P
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching is
|
When using this modifier, the pattern should not contain any limit settings
|
||||||
being used, only the depth limit is relevant.
|
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
|
||||||
.P
|
lower than the minimum matching value, the minimum value cannot be found
|
||||||
The \fImatch_limit\fP number is a measure of the amount of backtracking
|
because \fBpcre2_set_match_limit()\fP etc. are only able to reduce the value of
|
||||||
that takes place, and learning the minimum value can be instructive. For most
|
an in-pattern limit; they cannot increase it.
|
||||||
simple matches, the number is quite small, but for patterns with very large
|
|
||||||
numbers of matching possibilities, it can become large very quickly with
|
|
||||||
increasing length of subject string.
|
|
||||||
.P
|
.P
|
||||||
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
|
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
|
||||||
much nested backtracking happens (that is, how deeply the pattern's tree is
|
much nested backtracking happens (that is, how deeply the pattern's tree is
|
||||||
searched). In the case of DFA matching, \fIdepth_limit\fP controls the depth of
|
searched). In the case of DFA matching, \fIdepth_limit\fP controls the depth of
|
||||||
recursive calls of the internal function that is used for handling pattern
|
recursive calls of the internal function that is used for handling pattern
|
||||||
recursion, lookaround assertions, and atomic groups.
|
recursion, lookaround assertions, and atomic groups.
|
||||||
|
.P
|
||||||
|
For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount
|
||||||
|
of backtracking that takes place, and learning the minimum value can be
|
||||||
|
instructive. For most simple matches, the number is quite small, but for
|
||||||
|
patterns with very large numbers of matching possibilities, it can become large
|
||||||
|
very quickly with increasing length of subject string. In the case of DFA
|
||||||
|
matching, \fImatch_limit\fP controls the total number of calls, both recursive
|
||||||
|
and non-recursive, to the internal matching function, thus controlling the
|
||||||
|
overall amount of computing resource that is used.
|
||||||
|
.P
|
||||||
|
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes)
|
||||||
|
limits the amount of heap memory used for matching. A value of zero disables
|
||||||
|
the use of any heap memory; many simple pattern matches can be done without
|
||||||
|
using the heap, so this is not an unreasonable setting.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Showing MARK names"
|
.SS "Showing MARK names"
|
||||||
|
@ -1437,13 +1449,14 @@ is added to the non-match message.
|
||||||
.sp
|
.sp
|
||||||
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
|
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
|
||||||
memory allocation and freeing calls that occur during a call to
|
memory allocation and freeing calls that occur during a call to
|
||||||
\fBpcre2_match()\fP. These occur only when a match requires a bigger vector
|
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match
|
||||||
than the default for remembering backtracking points. In many cases there will
|
requires a bigger vector than the default for remembering backtracking points
|
||||||
be no heap memory used and therefore no additional output. No heap memory is
|
(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In
|
||||||
allocated during matching with \fBpcre2_dfa_match\fP or with JIT, so in those
|
many cases there will be no heap memory used and therefore no additional
|
||||||
cases the \fBmemory\fP modifier never has any effect. For this modifier to
|
output. No heap memory is allocated during matching with JIT, so in that case
|
||||||
work, the \fBnull_context\fP modifier must not be set on both the pattern and
|
the \fBmemory\fP modifier never has any effect. For this modifier to work, the
|
||||||
the subject, though it can be set on one or the other.
|
\fBnull_context\fP modifier must not be set on both the pattern and the
|
||||||
|
subject, though it can be set on one or the other.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Setting a starting offset"
|
.SS "Setting a starting offset"
|
||||||
|
@ -1962,6 +1975,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 December 2017
|
Last updated: 25 April 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1071,7 +1071,7 @@ SUBJECT MODIFIERS
|
||||||
get=<number or name> extract captured substring
|
get=<number or name> extract captured substring
|
||||||
getall extract all captured substrings
|
getall extract all captured substrings
|
||||||
/g global global matching
|
/g global global matching
|
||||||
heap_limit=<n> set a limit on heap memory
|
heap_limit=<n> set a limit on heap memory (Kbytes)
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
match_limit=<n> set a match limit
|
match_limit=<n> set a match limit
|
||||||
|
@ -1291,126 +1291,139 @@ SUBJECT MODIFIERS
|
||||||
values in the match context via pcre2_set_heap_limit(),
|
values in the match context via pcre2_set_heap_limit(),
|
||||||
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
||||||
minimum values for each parameter that allows the match to complete
|
minimum values for each parameter that allows the match to complete
|
||||||
without error.
|
without error. If JIT is being used, only the match limit is relevant.
|
||||||
|
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
When using this modifier, the pattern should not contain any limit set-
|
||||||
is being used, only the depth limit is relevant.
|
tings such as (*LIMIT_MATCH=...) within it. If such a setting is
|
||||||
|
present and is lower than the minimum matching value, the minimum value
|
||||||
|
cannot be found because pcre2_set_match_limit() etc. are only able to
|
||||||
|
reduce the value of an in-pattern limit; they cannot increase it.
|
||||||
|
|
||||||
The match_limit number is a measure of the amount of backtracking that
|
For non-DFA matching, the minimum depth_limit number is a measure of
|
||||||
takes place, and learning the minimum value can be instructive. For
|
|
||||||
most simple matches, the number is quite small, but for patterns with
|
|
||||||
very large numbers of matching possibilities, it can become large very
|
|
||||||
quickly with increasing length of subject string.
|
|
||||||
|
|
||||||
For non-DFA matching, the minimum depth_limit number is a measure of
|
|
||||||
how much nested backtracking happens (that is, how deeply the pattern's
|
how much nested backtracking happens (that is, how deeply the pattern's
|
||||||
tree is searched). In the case of DFA matching, depth_limit controls
|
tree is searched). In the case of DFA matching, depth_limit controls
|
||||||
the depth of recursive calls of the internal function that is used for
|
the depth of recursive calls of the internal function that is used for
|
||||||
handling pattern recursion, lookaround assertions, and atomic groups.
|
handling pattern recursion, lookaround assertions, and atomic groups.
|
||||||
|
|
||||||
|
For non-DFA matching, the match_limit number is a measure of the amount
|
||||||
|
of backtracking that takes place, and learning the minimum value can be
|
||||||
|
instructive. For most simple matches, the number is quite small, but
|
||||||
|
for patterns with very large numbers of matching possibilities, it can
|
||||||
|
become large very quickly with increasing length of subject string. In
|
||||||
|
the case of DFA matching, match_limit controls the total number of
|
||||||
|
calls, both recursive and non-recursive, to the internal matching func-
|
||||||
|
tion, thus controlling the overall amount of computing resource that is
|
||||||
|
used.
|
||||||
|
|
||||||
|
For both kinds of matching, the heap_limit number (which is in kilo-
|
||||||
|
bytes) limits the amount of heap memory used for matching. A value of
|
||||||
|
zero disables the use of any heap memory; many simple pattern matches
|
||||||
|
can be done without using the heap, so this is not an unreasonable set-
|
||||||
|
ting.
|
||||||
|
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
||||||
|
|
||||||
The mark modifier causes the names from backtracking control verbs that
|
The mark modifier causes the names from backtracking control verbs that
|
||||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||||
it is added to the non-match message.
|
it is added to the non-match message.
|
||||||
|
|
||||||
Showing memory usage
|
Showing memory usage
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||||
ory allocation and freeing calls that occur during a call to
|
ory allocation and freeing calls that occur during a call to
|
||||||
pcre2_match(). These occur only when a match requires a bigger vector
|
pcre2_match() or pcre2_dfa_match(). These occur only when a match
|
||||||
than the default for remembering backtracking points. In many cases
|
requires a bigger vector than the default for remembering backtracking
|
||||||
there will be no heap memory used and therefore no additional output.
|
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||||
No heap memory is allocated during matching with pcre2_dfa_match or
|
In many cases there will be no heap memory used and therefore no addi-
|
||||||
with JIT, so in those cases the memory modifier never has any effect.
|
tional output. No heap memory is allocated during matching with JIT, so
|
||||||
For this modifier to work, the null_context modifier must not be set on
|
in that case the memory modifier never has any effect. For this modi-
|
||||||
both the pattern and the subject, though it can be set on one or the
|
fier to work, the null_context modifier must not be set on both the
|
||||||
other.
|
pattern and the subject, though it can be set on one or the other.
|
||||||
|
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
|
||||||
The offset modifier sets an offset in the subject string at which
|
The offset modifier sets an offset in the subject string at which
|
||||||
matching starts. Its value is a number of code units, not characters.
|
matching starts. Its value is a number of code units, not characters.
|
||||||
|
|
||||||
Setting an offset limit
|
Setting an offset limit
|
||||||
|
|
||||||
The offset_limit modifier sets a limit for unanchored matches. If a
|
The offset_limit modifier sets a limit for unanchored matches. If a
|
||||||
match cannot be found starting at or before this offset in the subject,
|
match cannot be found starting at or before this offset in the subject,
|
||||||
a "no match" return is given. The data value is a number of code units,
|
a "no match" return is given. The data value is a number of code units,
|
||||||
not characters. When this modifier is used, the use_offset_limit modi-
|
not characters. When this modifier is used, the use_offset_limit modi-
|
||||||
fier must have been set for the pattern; if not, an error is generated.
|
fier must have been set for the pattern; if not, an error is generated.
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it
|
The ovector modifier applies only to the subject line in which it
|
||||||
appears, though of course it can also be used to set a default in a
|
appears, though of course it can also be used to set a default in a
|
||||||
#subject command. It specifies the number of pairs of offsets that are
|
#subject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
regexec() to be called with a NULL capture vector. When not testing the
|
regexec() to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||||
ate_from_pattern() to be called, in order to create a match block of
|
ate_from_pattern() to be called, in order to create a match block of
|
||||||
exactly the right size for the pattern. (It is not possible to create a
|
exactly the right size for the pattern. (It is not possible to create a
|
||||||
match block with a zero-length ovector; there is always at least one
|
match block with a zero-length ovector; there is always at least one
|
||||||
pair of offsets.)
|
pair of offsets.)
|
||||||
|
|
||||||
Passing the subject as zero-terminated
|
Passing the subject as zero-terminated
|
||||||
|
|
||||||
By default, the subject string is passed to a native API matching func-
|
By default, the subject string is passed to a native API matching func-
|
||||||
tion with its correct length. In order to test the facility for passing
|
tion with its correct length. In order to test the facility for passing
|
||||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
|
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
|
||||||
via the POSIX interface, this modifier is ignored, with a warning.
|
via the POSIX interface, this modifier is ignored, with a warning.
|
||||||
|
|
||||||
When testing pcre2_substitute(), this modifier also has the effect of
|
When testing pcre2_substitute(), this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
|
|
||||||
Passing a NULL context
|
Passing a NULL context
|
||||||
|
|
||||||
Normally, pcre2test passes a context block to pcre2_match(),
|
Normally, pcre2test passes a context block to pcre2_match(),
|
||||||
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
|
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
|
||||||
set, however, NULL is passed. This is for testing that the matching
|
set, however, NULL is passed. This is for testing that the matching
|
||||||
functions behave correctly in this case (they use default values). This
|
functions behave correctly in this case (they use default values). This
|
||||||
modifier cannot be used with the find_limits modifier or when testing
|
modifier cannot be used with the find_limits modifier or when testing
|
||||||
the substitution function.
|
the substitution function.
|
||||||
|
|
||||||
|
|
||||||
THE ALTERNATIVE MATCHING FUNCTION
|
THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
By default, pcre2test uses the standard PCRE2 matching function,
|
By default, pcre2test uses the standard PCRE2 matching function,
|
||||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||||
ferent way, and has some restrictions. The differences between the two
|
ferent way, and has some restrictions. The differences between the two
|
||||||
functions are described in the pcre2matching documentation.
|
functions are described in the pcre2matching documentation.
|
||||||
|
|
||||||
If the dfa modifier is set, the alternative matching function is used.
|
If the dfa modifier is set, the alternative matching function is used.
|
||||||
This function finds all possible matches at a given point in the sub-
|
This function finds all possible matches at a given point in the sub-
|
||||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||||
after the first match is found. This is always the shortest possible
|
after the first match is found. This is always the shortest possible
|
||||||
match.
|
match.
|
||||||
|
|
||||||
|
|
||||||
DEFAULT OUTPUT FROM pcre2test
|
DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
This section describes the output when the normal matching function,
|
This section describes the output when the normal matching function,
|
||||||
pcre2_match(), is being used.
|
pcre2_match(), is being used.
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is
|
pattern. Otherwise, it outputs "No match" when the return is
|
||||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||||
this is the entire substring that was inspected during the partial
|
this is the entire substring that was inspected during the partial
|
||||||
match; it may include characters before the actual match start if a
|
match; it may include characters before the actual match start if a
|
||||||
lookbehind assertion, \K, \b, or \B was involved.)
|
lookbehind assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
check, the code unit offset of the start of the failing character is
|
check, the code unit offset of the start of the failing character is
|
||||||
also output. Here is an example of an interactive pcre2test run.
|
also output. Here is an example of an interactive pcre2test run.
|
||||||
|
|
||||||
$ pcre2test
|
$ pcre2test
|
||||||
|
@ -1426,8 +1439,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Unset capturing substrings that are not followed by one that is set are
|
Unset capturing substrings that are not followed by one that is set are
|
||||||
not shown by pcre2test unless the allcaptures modifier is specified. In
|
not shown by pcre2test unless the allcaptures modifier is specified. In
|
||||||
the following example, there are two capturing substrings, but when the
|
the following example, there are two capturing substrings, but when the
|
||||||
first data line is matched, the second, unset substring is not shown.
|
first data line is matched, the second, unset substring is not shown.
|
||||||
An "internal" unset substring is shown as "<unset>", as for the second
|
An "internal" unset substring is shown as "<unset>", as for the second
|
||||||
data line.
|
data line.
|
||||||
|
|
||||||
re> /(a)|(b)/
|
re> /(a)|(b)/
|
||||||
|
@ -1439,11 +1452,11 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
1: <unset>
|
1: <unset>
|
||||||
2: b
|
2: b
|
||||||
|
|
||||||
If the strings contain any non-printing characters, they are output as
|
If the strings contain any non-printing characters, they are output as
|
||||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||||
nition of non-printing characters. If the aftertext modifier is set,
|
nition of non-printing characters. If the aftertext modifier is set,
|
||||||
the output for substring 0 is followed by the the rest of the subject
|
the output for substring 0 is followed by the the rest of the subject
|
||||||
string, identified by "0+" like this:
|
string, identified by "0+" like this:
|
||||||
|
|
||||||
re> /cat/aftertext
|
re> /cat/aftertext
|
||||||
|
@ -1451,7 +1464,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: cat
|
0: cat
|
||||||
0+ aract
|
0+ aract
|
||||||
|
|
||||||
If global matching is requested, the results of successive matching
|
If global matching is requested, the results of successive matching
|
||||||
attempts are output in sequence, like this:
|
attempts are output in sequence, like this:
|
||||||
|
|
||||||
re> /\Bi(\w\w)/g
|
re> /\Bi(\w\w)/g
|
||||||
|
@ -1463,8 +1476,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: ipp
|
0: ipp
|
||||||
1: pp
|
1: pp
|
||||||
|
|
||||||
"No match" is output only if the first match attempt fails. Here is an
|
"No match" is output only if the first match attempt fails. Here is an
|
||||||
example of a failure message (the offset 4 that is specified by the
|
example of a failure message (the offset 4 that is specified by the
|
||||||
offset modifier is past the end of the subject string):
|
offset modifier is past the end of the subject string):
|
||||||
|
|
||||||
re> /xyz/
|
re> /xyz/
|
||||||
|
@ -1472,7 +1485,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Error -24 (bad offset value)
|
Error -24 (bad offset value)
|
||||||
|
|
||||||
Note that whereas patterns can be continued over several lines (a plain
|
Note that whereas patterns can be continued over several lines (a plain
|
||||||
">" prompt is used for continuations), subject lines may not. However
|
">" prompt is used for continuations), subject lines may not. However
|
||||||
newlines can be included in a subject by means of the \n escape (or \r,
|
newlines can be included in a subject by means of the \n escape (or \r,
|
||||||
\r\n, etc., depending on the newline sequence setting).
|
\r\n, etc., depending on the newline sequence setting).
|
||||||
|
|
||||||
|
@ -1480,7 +1493,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||||
output consists of a list of all the matches that start at the first
|
output consists of a list of all the matches that start at the first
|
||||||
point in the subject where there is at least one match. For example:
|
point in the subject where there is at least one match. For example:
|
||||||
|
|
||||||
re> /(tang|tangerine|tan)/
|
re> /(tang|tangerine|tan)/
|
||||||
|
@ -1489,11 +1502,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tang
|
1: tang
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
Using the normal matching function on this data finds only "tang". The
|
Using the normal matching function on this data finds only "tang". The
|
||||||
longest matching string is always given first (and numbered zero).
|
longest matching string is always given first (and numbered zero).
|
||||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||||
followed by the partially matching substring. Note that this is the
|
followed by the partially matching substring. Note that this is the
|
||||||
entire substring that was inspected during the partial match; it may
|
entire substring that was inspected during the partial match; it may
|
||||||
include characters before the actual match start if a lookbehind asser-
|
include characters before the actual match start if a lookbehind asser-
|
||||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||||
|
|
||||||
|
@ -1509,16 +1522,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tan
|
1: tan
|
||||||
0: tan
|
0: tan
|
||||||
|
|
||||||
The alternative matching function does not support substring capture,
|
The alternative matching function does not support substring capture,
|
||||||
so the modifiers that are concerned with captured substrings are not
|
so the modifiers that are concerned with captured substrings are not
|
||||||
relevant.
|
relevant.
|
||||||
|
|
||||||
|
|
||||||
RESTARTING AFTER A PARTIAL MATCH
|
RESTARTING AFTER A PARTIAL MATCH
|
||||||
|
|
||||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||||
TIAL return, indicating that the subject partially matched the pattern,
|
TIAL return, indicating that the subject partially matched the pattern,
|
||||||
you can restart the match with additional subject data by means of the
|
you can restart the match with additional subject data by means of the
|
||||||
dfa_restart modifier. For example:
|
dfa_restart modifier. For example:
|
||||||
|
|
||||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||||
|
@ -1527,37 +1540,37 @@ RESTARTING AFTER A PARTIAL MATCH
|
||||||
data> n05\=dfa,dfa_restart
|
data> n05\=dfa,dfa_restart
|
||||||
0: n05
|
0: n05
|
||||||
|
|
||||||
For further information about partial matching, see the pcre2partial
|
For further information about partial matching, see the pcre2partial
|
||||||
documentation.
|
documentation.
|
||||||
|
|
||||||
|
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
|
||||||
If the pattern contains any callout requests, pcre2test's callout func-
|
If the pattern contains any callout requests, pcre2test's callout func-
|
||||||
tion is called during matching unless callout_none is specified. This
|
tion is called during matching unless callout_none is specified. This
|
||||||
works with both matching functions, and with JIT, though there are some
|
works with both matching functions, and with JIT, though there are some
|
||||||
differences in behaviour. The output for callouts with numerical argu-
|
differences in behaviour. The output for callouts with numerical argu-
|
||||||
ments and those with string arguments is slightly different.
|
ments and those with string arguments is slightly different.
|
||||||
|
|
||||||
Callouts with numerical arguments
|
Callouts with numerical arguments
|
||||||
|
|
||||||
By default, the callout function displays the callout number, the start
|
By default, the callout function displays the callout number, the start
|
||||||
and current positions in the subject text at the callout time, and the
|
and current positions in the subject text at the callout time, and the
|
||||||
next pattern item to be tested. For example:
|
next pattern item to be tested. For example:
|
||||||
|
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match
|
||||||
attempt starting at the fourth character of the subject string, when
|
attempt starting at the fourth character of the subject string, when
|
||||||
the pointer was at the seventh character, and when the next pattern
|
the pointer was at the seventh character, and when the next pattern
|
||||||
item was \d. Just one circumflex is output if the start and current
|
item was \d. Just one circumflex is output if the start and current
|
||||||
positions are the same, or if the current position precedes the start
|
positions are the same, or if the current position precedes the start
|
||||||
position, which can happen if the callout is in a lookbehind assertion.
|
position, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the auto_callout pattern modifier. In this case, instead of
|
a result of the auto_callout pattern modifier. In this case, instead of
|
||||||
showing the callout number, the offset in the pattern, preceded by a
|
showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
re> /\d?[A-E]\*/auto_callout
|
re> /\d?[A-E]\*/auto_callout
|
||||||
|
@ -1570,7 +1583,7 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
|
@ -1584,17 +1597,17 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
Callouts with string arguments
|
Callouts with string arguments
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output
|
the callout string and its offset in the pattern string are output
|
||||||
before the reflection of the subject string, and the subject string is
|
before the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1610,26 +1623,26 @@ CALLOUTS
|
||||||
|
|
||||||
Callout modifiers
|
Callout modifiers
|
||||||
|
|
||||||
The callout function in pcre2test returns zero (carry on matching) by
|
The callout function in pcre2test returns zero (carry on matching) by
|
||||||
default, but you can use a callout_fail modifier in a subject line to
|
default, but you can use a callout_fail modifier in a subject line to
|
||||||
change this and other parameters of the callout (see below).
|
change this and other parameters of the callout (see below).
|
||||||
|
|
||||||
If the callout_capture modifier is set, the current captured groups are
|
If the callout_capture modifier is set, the current captured groups are
|
||||||
output when a callout occurs. This is useful only for non-DFA matching,
|
output when a callout occurs. This is useful only for non-DFA matching,
|
||||||
as pcre2_dfa_match() does not support capturing, so no captures are
|
as pcre2_dfa_match() does not support capturing, so no captures are
|
||||||
ever shown.
|
ever shown.
|
||||||
|
|
||||||
The normal callout output, showing the callout number or pattern offset
|
The normal callout output, showing the callout number or pattern offset
|
||||||
(as described above) is suppressed if the callout_no_where modifier is
|
(as described above) is suppressed if the callout_no_where modifier is
|
||||||
set.
|
set.
|
||||||
|
|
||||||
When using the interpretive matching function pcre2_match() without
|
When using the interpretive matching function pcre2_match() without
|
||||||
JIT, setting the callout_extra modifier causes additional output from
|
JIT, setting the callout_extra modifier causes additional output from
|
||||||
pcre2test's callout function to be generated. For the first callout in
|
pcre2test's callout function to be generated. For the first callout in
|
||||||
a match attempt at a new starting position in the subject, "New match
|
a match attempt at a new starting position in the subject, "New match
|
||||||
attempt" is output. If there has been a backtrack since the last call-
|
attempt" is output. If there has been a backtrack since the last call-
|
||||||
out (or start of matching if this is the first callout), "Backtrack" is
|
out (or start of matching if this is the first callout), "Backtrack" is
|
||||||
output, followed by "No other matching paths" if the backtrack ended
|
output, followed by "No other matching paths" if the backtrack ended
|
||||||
the previous match attempt. For example:
|
the previous match attempt. For example:
|
||||||
|
|
||||||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
@ -1666,82 +1679,82 @@ CALLOUTS
|
||||||
+1 ^ a+
|
+1 ^ a+
|
||||||
No match
|
No match
|
||||||
|
|
||||||
Notice that various optimizations must be turned off if you want all
|
Notice that various optimizations must be turned off if you want all
|
||||||
possible matching paths to be scanned. If no_start_optimize is not
|
possible matching paths to be scanned. If no_start_optimize is not
|
||||||
used, there is an immediate "no match", without any callouts, because
|
used, there is an immediate "no match", without any callouts, because
|
||||||
the starting optimization fails to find "b" in the subject, which it
|
the starting optimization fails to find "b" in the subject, which it
|
||||||
knows must be present for any match. If no_auto_possess is not used,
|
knows must be present for any match. If no_auto_possess is not used,
|
||||||
the "a+" item is turned into "a++", which reduces the number of back-
|
the "a+" item is turned into "a++", which reduces the number of back-
|
||||||
tracks.
|
tracks.
|
||||||
|
|
||||||
The callout_extra modifier has no effect if used with the DFA matching
|
The callout_extra modifier has no effect if used with the DFA matching
|
||||||
function, or with JIT.
|
function, or with JIT.
|
||||||
|
|
||||||
Return values from callouts
|
Return values from callouts
|
||||||
|
|
||||||
The default return from the callout function is zero, which allows
|
The default return from the callout function is zero, which allows
|
||||||
matching to continue. The callout_fail modifier can be given one or two
|
matching to continue. The callout_fail modifier can be given one or two
|
||||||
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
||||||
ing matching to backtrack) when a callout of that number is reached. If
|
ing matching to backtrack) when a callout of that number is reached. If
|
||||||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
||||||
reached and there have been at least <m> callouts. The callout_error
|
reached and there have been at least <m> callouts. The callout_error
|
||||||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
||||||
ing the entire matching process to be aborted. If both these modifiers
|
ing the entire matching process to be aborted. If both these modifiers
|
||||||
are set for the same callout number, callout_error takes precedence.
|
are set for the same callout number, callout_error takes precedence.
|
||||||
Note that callouts with string arguments are always given the number
|
Note that callouts with string arguments are always given the number
|
||||||
zero.
|
zero.
|
||||||
|
|
||||||
The callout_data modifier can be given an unsigned or a negative num-
|
The callout_data modifier can be given an unsigned or a negative num-
|
||||||
ber. This is set as the "user data" that is passed to the matching
|
ber. This is set as the "user data" that is passed to the matching
|
||||||
function, and passed back when the callout function is invoked. Any
|
function, and passed back when the callout function is invoked. Any
|
||||||
value other than zero is used as a return from pcre2test's callout
|
value other than zero is used as a return from pcre2test's callout
|
||||||
function.
|
function.
|
||||||
|
|
||||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||||
cated regular expressions. For further information about callouts, see
|
cated regular expressions. For further information about callouts, see
|
||||||
the pcre2callout documentation.
|
the pcre2callout documentation.
|
||||||
|
|
||||||
|
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
bytes other than 32-126 are always treated as non-printing characters
|
bytes other than 32-126 are always treated as non-printing characters
|
||||||
and are therefore shown as hex escapes.
|
and are therefore shown as hex escapes.
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the locale modifier). In this case, the
|
set for the pattern (using the locale modifier). In this case, the
|
||||||
isprint() function is used to distinguish printing and non-printing
|
isprint() function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
SAVING AND RESTORING COMPILED PATTERNS
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
|
||||||
It is possible to save compiled patterns on disc or elsewhere, and
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
reload them later, subject to a number of restrictions. JIT data cannot
|
reload them later, subject to a number of restrictions. JIT data cannot
|
||||||
be saved. The host on which the patterns are reloaded must be running
|
be saved. The host on which the patterns are reloaded must be running
|
||||||
the same version of PCRE2, with the same code unit width, and must also
|
the same version of PCRE2, with the same code unit width, and must also
|
||||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
compiled patterns can be saved they must be serialized, that is, con-
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
verted to a stream of bytes. A single byte stream may contain any num-
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
ber of compiled patterns, but they must all use the same character
|
ber of compiled patterns, but they must all use the same character
|
||||||
tables. A single copy of the tables is included in the byte stream (its
|
tables. A single copy of the tables is included in the byte stream (its
|
||||||
size is 1088 bytes).
|
size is 1088 bytes).
|
||||||
|
|
||||||
The functions whose names begin with pcre2_serialize_ are used for
|
The functions whose names begin with pcre2_serialize_ are used for
|
||||||
serializing and de-serializing. They are described in the pcre2serial-
|
serializing and de-serializing. They are described in the pcre2serial-
|
||||||
ize documentation. In this section we describe the features of
|
ize documentation. In this section we describe the features of
|
||||||
pcre2test that can be used to test these functions.
|
pcre2test that can be used to test these functions.
|
||||||
|
|
||||||
When a pattern with push modifier is successfully compiled, it is
|
When a pattern with push modifier is successfully compiled, it is
|
||||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
next line to contain a new pattern (or command) instead of a subject
|
next line to contain a new pattern (or command) instead of a subject
|
||||||
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
||||||
pattern to be stacked, leaving the original available for immediate
|
pattern to be stacked, leaving the original available for immediate
|
||||||
matching. By using push and/or pushcopy, a number of patterns can be
|
matching. By using push and/or pushcopy, a number of patterns can be
|
||||||
compiled and retained. These modifiers are incompatible with posix, and
|
compiled and retained. These modifiers are incompatible with posix, and
|
||||||
control modifiers that act at match time are ignored (with a message)
|
control modifiers that act at match time are ignored (with a message)
|
||||||
for the stacked patterns. The jitverify modifier applies only at com-
|
for the stacked patterns. The jitverify modifier applies only at com-
|
||||||
pile time.
|
pile time.
|
||||||
|
|
||||||
The command
|
The command
|
||||||
|
@ -1749,21 +1762,21 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#save <filename>
|
#save <filename>
|
||||||
|
|
||||||
causes all the stacked patterns to be serialized and the result written
|
causes all the stacked patterns to be serialized and the result written
|
||||||
to the named file. Afterwards, all the stacked patterns are freed. The
|
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||||
command
|
command
|
||||||
|
|
||||||
#load <filename>
|
#load <filename>
|
||||||
|
|
||||||
reads the data in the file, and then arranges for it to be de-serial-
|
reads the data in the file, and then arranges for it to be de-serial-
|
||||||
ized, with the resulting compiled patterns added to the pattern stack.
|
ized, with the resulting compiled patterns added to the pattern stack.
|
||||||
The pattern on the top of the stack can be retrieved by the #pop com-
|
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||||
mand, which must be followed by lines of subjects that are to be
|
mand, which must be followed by lines of subjects that are to be
|
||||||
matched with the pattern, terminated as usual by an empty line or end
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
of file. This command may be followed by a modifier list containing
|
of file. This command may be followed by a modifier list containing
|
||||||
only control modifiers that act after a pattern has been compiled. In
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
||||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||||
however permitted. Here is an example that saves and reloads two pat-
|
however permitted. Here is an example that saves and reloads two pat-
|
||||||
terns.
|
terns.
|
||||||
|
|
||||||
/abc/push
|
/abc/push
|
||||||
|
@ -1776,10 +1789,10 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#pop jit,bincode
|
#pop jit,bincode
|
||||||
abc
|
abc
|
||||||
|
|
||||||
If jitverify is used with #pop, it does not automatically imply jit,
|
If jitverify is used with #pop, it does not automatically imply jit,
|
||||||
which is different behaviour from when it is used on a pattern.
|
which is different behaviour from when it is used on a pattern.
|
||||||
|
|
||||||
The #popcopy command is analagous to the pushcopy modifier in that it
|
The #popcopy command is analagous to the pushcopy modifier in that it
|
||||||
makes current a copy of the topmost stack pattern, leaving the original
|
makes current a copy of the topmost stack pattern, leaving the original
|
||||||
still on the stack.
|
still on the stack.
|
||||||
|
|
||||||
|
@ -1799,5 +1812,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 21 December 2017
|
Last updated: 25 April 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
|
|
|
@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
/* Define to 1 if you have the <zlib.h> header file. */
|
/* Define to 1 if you have the <zlib.h> header file. */
|
||||||
#undef HAVE_ZLIB_H
|
#undef HAVE_ZLIB_H
|
||||||
|
|
||||||
/* This limits the amount of memory that pcre2_match() may use while matching
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
a pattern. The value is in kilobytes. */
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
|
to JIT matching. The value is in kilobytes. */
|
||||||
#undef HEAP_LIMIT
|
#undef HEAP_LIMIT
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
|
@ -148,7 +149,8 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* The value of MATCH_LIMIT determines the default number of times the
|
/* The value of MATCH_LIMIT determines the default number of times the
|
||||||
pcre2_match() function can record a backtrack position during a single
|
pcre2_match() function can record a backtrack position during a single
|
||||||
matching attempt. There is a runtime interface for setting a different
|
matching attempt. The value is also used to limit a loop counter in
|
||||||
|
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||||
limit. The limit exists in order to catch runaway regular expressions that
|
limit. The limit exists in order to catch runaway regular expressions that
|
||||||
take for ever to determine that they do not match. The default is set very
|
take for ever to determine that they do not match. The default is set very
|
||||||
large so that it does not accidentally catch legitimate cases. */
|
large so that it does not accidentally catch legitimate cases. */
|
||||||
|
@ -161,7 +163,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
||||||
must be less than the value of MATCH_LIMIT. The default is to use the same
|
must be less than the value of MATCH_LIMIT. The default is to use the same
|
||||||
value as MATCH_LIMIT. There is a runtime method for setting a different
|
value as MATCH_LIMIT. There is a runtime method for setting a different
|
||||||
limit. */
|
limit. In the case of pcre2_dfa_match(), this limit controls the depth of
|
||||||
|
the internal nested function calls that are used for pattern recursions,
|
||||||
|
lookarounds, and atomic groups. */
|
||||||
#undef MATCH_LIMIT_DEPTH
|
#undef MATCH_LIMIT_DEPTH
|
||||||
|
|
||||||
/* This limit is parameterized just in case anybody ever wants to change it.
|
/* This limit is parameterized just in case anybody ever wants to change it.
|
||||||
|
|
|
@ -292,6 +292,35 @@ typedef struct stateblock {
|
||||||
#define INTS_PER_STATEBLOCK (int)(sizeof(stateblock)/sizeof(int))
|
#define INTS_PER_STATEBLOCK (int)(sizeof(stateblock)/sizeof(int))
|
||||||
|
|
||||||
|
|
||||||
|
/* Before version 10.32 the recursive calls of internal_dfa_match() were passed
|
||||||
|
local working space and output vectors that were created on the stack. This has
|
||||||
|
caused issues for some patterns, especially in small-stack environments such as
|
||||||
|
Windows. A new scheme is now in use which sets up a vector on the stack, but if
|
||||||
|
this is too small, heap memory is used, up to the heap_limit. The main
|
||||||
|
parameters are all numbers of ints because the workspace is a vector of ints.
|
||||||
|
|
||||||
|
The size of the starting stack vector, DFA_START_RWS_SIZE, is in bytes, and is
|
||||||
|
defined in pcre2_internal.h so as to be available to pcre2test when it is
|
||||||
|
finding the minimum heap requirement for a match. */
|
||||||
|
|
||||||
|
#define OVEC_UNIT (sizeof(PCRE2_SIZE)/sizeof(int))
|
||||||
|
|
||||||
|
#define RWS_BASE_SIZE (DFA_START_RWS_SIZE/sizeof(int)) /* Stack vector */
|
||||||
|
#define RWS_RSIZE 1000 /* Work size for recursion */
|
||||||
|
#define RWS_OVEC_RSIZE (1000*OVEC_UNIT) /* Ovector for recursion */
|
||||||
|
#define RWS_OVEC_OSIZE (2*OVEC_UNIT) /* Ovector in other cases */
|
||||||
|
|
||||||
|
/* This structure is at the start of each workspace block. */
|
||||||
|
|
||||||
|
typedef struct RWS_anchor {
|
||||||
|
struct RWS_anchor *next;
|
||||||
|
unsigned int size; /* Number of ints */
|
||||||
|
unsigned int free; /* Number of ints */
|
||||||
|
} RWS_anchor;
|
||||||
|
|
||||||
|
#define RWS_ANCHOR_SIZE (sizeof(RWS_anchor)/sizeof(int))
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Process a callout *
|
* Process a callout *
|
||||||
|
@ -353,6 +382,61 @@ return (mb->callout)(cb, mb->callout_data);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Expand local workspace memory *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* This function is called when internal_dfa_match() is about to be called
|
||||||
|
recursively and there is insufficient workingspace left in the current work
|
||||||
|
space block. If there's an existing next block, use it; otherwise get a new
|
||||||
|
block unless the heap limit is reached.
|
||||||
|
|
||||||
|
Arguments:
|
||||||
|
rwsptr pointer to block pointer (updated)
|
||||||
|
ovecsize space needed for an ovector
|
||||||
|
mb the match block
|
||||||
|
|
||||||
|
Returns: 0 rwsptr has been updated
|
||||||
|
!0 an error code
|
||||||
|
*/
|
||||||
|
|
||||||
|
static int
|
||||||
|
more_workspace(RWS_anchor **rwsptr, unsigned int ovecsize, dfa_match_block *mb)
|
||||||
|
{
|
||||||
|
RWS_anchor *rws = *rwsptr;
|
||||||
|
RWS_anchor *new;
|
||||||
|
|
||||||
|
if (rws->next != NULL)
|
||||||
|
{
|
||||||
|
new = rws->next;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
||||||
|
kilobytes. */
|
||||||
|
|
||||||
|
else
|
||||||
|
{
|
||||||
|
unsigned int newsize = rws->size * 2;
|
||||||
|
unsigned int heapleft = (unsigned int)
|
||||||
|
(((1024/sizeof(int))*mb->heap_limit - mb->heap_used));
|
||||||
|
if (newsize > heapleft) newsize = heapleft;
|
||||||
|
if (newsize < RWS_RSIZE + ovecsize + RWS_ANCHOR_SIZE)
|
||||||
|
return PCRE2_ERROR_HEAPLIMIT;
|
||||||
|
new = mb->memctl.malloc(newsize*sizeof(int), mb->memctl.memory_data);
|
||||||
|
if (new == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
mb->heap_used += newsize;
|
||||||
|
new->next = NULL;
|
||||||
|
new->size = newsize;
|
||||||
|
rws->next = new;
|
||||||
|
}
|
||||||
|
|
||||||
|
new->free = new->size - RWS_ANCHOR_SIZE;
|
||||||
|
*rwsptr = new;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Match a Regular Expression - DFA engine *
|
* Match a Regular Expression - DFA engine *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
@ -431,7 +515,8 @@ internal_dfa_match(
|
||||||
uint32_t offsetcount,
|
uint32_t offsetcount,
|
||||||
int *workspace,
|
int *workspace,
|
||||||
int wscount,
|
int wscount,
|
||||||
uint32_t rlevel)
|
uint32_t rlevel,
|
||||||
|
int *RWS)
|
||||||
{
|
{
|
||||||
stateblock *active_states, *new_states, *temp_states;
|
stateblock *active_states, *new_states, *temp_states;
|
||||||
stateblock *next_active_state, *next_new_state;
|
stateblock *next_active_state, *next_new_state;
|
||||||
|
@ -2587,10 +2672,22 @@ for (;;)
|
||||||
case OP_ASSERTBACK:
|
case OP_ASSERTBACK:
|
||||||
case OP_ASSERTBACK_NOT:
|
case OP_ASSERTBACK_NOT:
|
||||||
{
|
{
|
||||||
PCRE2_SPTR endasscode = code + GET(code, 1);
|
|
||||||
PCRE2_SIZE local_offsets[2];
|
|
||||||
int rc;
|
int rc;
|
||||||
int local_workspace[1000];
|
int *local_workspace;
|
||||||
|
PCRE2_SIZE *local_offsets;
|
||||||
|
PCRE2_SPTR endasscode = code + GET(code, 1);
|
||||||
|
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||||
|
|
||||||
|
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||||
|
{
|
||||||
|
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||||
|
if (rc != 0) return rc;
|
||||||
|
RWS = (int *)rws;
|
||||||
|
}
|
||||||
|
|
||||||
|
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||||
|
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||||
|
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
|
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
|
||||||
|
|
||||||
|
@ -2600,10 +2697,13 @@ for (;;)
|
||||||
ptr, /* where we currently are */
|
ptr, /* where we currently are */
|
||||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||||
local_offsets, /* offset vector */
|
local_offsets, /* offset vector */
|
||||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||||
local_workspace, /* workspace vector */
|
local_workspace, /* workspace vector */
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
RWS_RSIZE, /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel, /* function recursion level */
|
||||||
|
RWS); /* recursion workspace */
|
||||||
|
|
||||||
|
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||||
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
|
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
|
||||||
|
@ -2670,11 +2770,23 @@ for (;;)
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
PCRE2_SIZE local_offsets[2];
|
|
||||||
int local_workspace[1000];
|
|
||||||
int rc;
|
int rc;
|
||||||
|
int *local_workspace;
|
||||||
|
PCRE2_SIZE *local_offsets;
|
||||||
PCRE2_SPTR asscode = code + LINK_SIZE + 1;
|
PCRE2_SPTR asscode = code + LINK_SIZE + 1;
|
||||||
PCRE2_SPTR endasscode = asscode + GET(asscode, 1);
|
PCRE2_SPTR endasscode = asscode + GET(asscode, 1);
|
||||||
|
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||||
|
|
||||||
|
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||||
|
{
|
||||||
|
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||||
|
if (rc != 0) return rc;
|
||||||
|
RWS = (int *)rws;
|
||||||
|
}
|
||||||
|
|
||||||
|
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||||
|
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||||
|
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
|
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
|
||||||
|
|
||||||
|
@ -2684,10 +2796,13 @@ for (;;)
|
||||||
ptr, /* where we currently are */
|
ptr, /* where we currently are */
|
||||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||||
local_offsets, /* offset vector */
|
local_offsets, /* offset vector */
|
||||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||||
local_workspace, /* workspace vector */
|
local_workspace, /* workspace vector */
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
RWS_RSIZE, /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel, /* function recursion level */
|
||||||
|
RWS); /* recursion work space */
|
||||||
|
|
||||||
|
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||||
if ((rc >= 0) ==
|
if ((rc >= 0) ==
|
||||||
|
@ -2702,13 +2817,25 @@ for (;;)
|
||||||
/*-----------------------------------------------------------------*/
|
/*-----------------------------------------------------------------*/
|
||||||
case OP_RECURSE:
|
case OP_RECURSE:
|
||||||
{
|
{
|
||||||
|
int rc;
|
||||||
|
int *local_workspace;
|
||||||
|
PCRE2_SIZE *local_offsets;
|
||||||
|
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||||
dfa_recursion_info *ri;
|
dfa_recursion_info *ri;
|
||||||
PCRE2_SIZE local_offsets[1000];
|
|
||||||
int local_workspace[1000];
|
|
||||||
PCRE2_SPTR callpat = start_code + GET(code, 1);
|
PCRE2_SPTR callpat = start_code + GET(code, 1);
|
||||||
uint32_t recno = (callpat == mb->start_code)? 0 :
|
uint32_t recno = (callpat == mb->start_code)? 0 :
|
||||||
GET2(callpat, 1 + LINK_SIZE);
|
GET2(callpat, 1 + LINK_SIZE);
|
||||||
int rc;
|
|
||||||
|
if (rws->free < RWS_RSIZE + RWS_OVEC_RSIZE)
|
||||||
|
{
|
||||||
|
rc = more_workspace(&rws, RWS_OVEC_RSIZE, mb);
|
||||||
|
if (rc != 0) return rc;
|
||||||
|
RWS = (int *)rws;
|
||||||
|
}
|
||||||
|
|
||||||
|
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||||
|
local_workspace = ((int *)local_offsets) + RWS_OVEC_RSIZE;
|
||||||
|
rws->free -= RWS_RSIZE + RWS_OVEC_RSIZE;
|
||||||
|
|
||||||
/* Check for repeating a recursion without advancing the subject
|
/* Check for repeating a recursion without advancing the subject
|
||||||
pointer. This should catch convoluted mutual recursions. (Some simple
|
pointer. This should catch convoluted mutual recursions. (Some simple
|
||||||
|
@ -2732,11 +2859,13 @@ for (;;)
|
||||||
ptr, /* where we currently are */
|
ptr, /* where we currently are */
|
||||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||||
local_offsets, /* offset vector */
|
local_offsets, /* offset vector */
|
||||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
RWS_OVEC_RSIZE/OVEC_UNIT, /* size of same */
|
||||||
local_workspace, /* workspace vector */
|
local_workspace, /* workspace vector */
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
RWS_RSIZE, /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel, /* function recursion level */
|
||||||
|
RWS); /* recursion workspace */
|
||||||
|
|
||||||
|
rws->free += RWS_RSIZE + RWS_OVEC_RSIZE;
|
||||||
mb->recursive = new_recursive.prevrec; /* Done this recursion */
|
mb->recursive = new_recursive.prevrec; /* Done this recursion */
|
||||||
|
|
||||||
/* Ran out of internal offsets */
|
/* Ran out of internal offsets */
|
||||||
|
@ -2782,10 +2911,25 @@ for (;;)
|
||||||
case OP_SCBRAPOS:
|
case OP_SCBRAPOS:
|
||||||
case OP_BRAPOSZERO:
|
case OP_BRAPOSZERO:
|
||||||
{
|
{
|
||||||
|
int rc;
|
||||||
|
int *local_workspace;
|
||||||
|
PCRE2_SIZE *local_offsets;
|
||||||
PCRE2_SIZE charcount, matched_count;
|
PCRE2_SIZE charcount, matched_count;
|
||||||
PCRE2_SPTR local_ptr = ptr;
|
PCRE2_SPTR local_ptr = ptr;
|
||||||
|
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||||
BOOL allow_zero;
|
BOOL allow_zero;
|
||||||
|
|
||||||
|
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||||
|
{
|
||||||
|
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||||
|
if (rc != 0) return rc;
|
||||||
|
RWS = (int *)rws;
|
||||||
|
}
|
||||||
|
|
||||||
|
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||||
|
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||||
|
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
if (codevalue == OP_BRAPOSZERO)
|
if (codevalue == OP_BRAPOSZERO)
|
||||||
{
|
{
|
||||||
allow_zero = TRUE;
|
allow_zero = TRUE;
|
||||||
|
@ -2798,19 +2942,17 @@ for (;;)
|
||||||
|
|
||||||
for (matched_count = 0;; matched_count++)
|
for (matched_count = 0;; matched_count++)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE local_offsets[2];
|
rc = internal_dfa_match(
|
||||||
int local_workspace[1000];
|
|
||||||
|
|
||||||
int rc = internal_dfa_match(
|
|
||||||
mb, /* fixed match data */
|
mb, /* fixed match data */
|
||||||
code, /* this subexpression's code */
|
code, /* this subexpression's code */
|
||||||
local_ptr, /* where we currently are */
|
local_ptr, /* where we currently are */
|
||||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||||
local_offsets, /* offset vector */
|
local_offsets, /* offset vector */
|
||||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||||
local_workspace, /* workspace vector */
|
local_workspace, /* workspace vector */
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
RWS_RSIZE, /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel, /* function recursion level */
|
||||||
|
RWS); /* recursion workspace */
|
||||||
|
|
||||||
/* Failed to match */
|
/* Failed to match */
|
||||||
|
|
||||||
|
@ -2827,6 +2969,8 @@ for (;;)
|
||||||
local_ptr += charcount; /* Advance temporary position ptr */
|
local_ptr += charcount; /* Advance temporary position ptr */
|
||||||
}
|
}
|
||||||
|
|
||||||
|
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
/* At this point we have matched the subpattern matched_count
|
/* At this point we have matched the subpattern matched_count
|
||||||
times, and local_ptr is pointing to the character after the end of the
|
times, and local_ptr is pointing to the character after the end of the
|
||||||
last match. */
|
last match. */
|
||||||
|
@ -2869,19 +3013,35 @@ for (;;)
|
||||||
/*-----------------------------------------------------------------*/
|
/*-----------------------------------------------------------------*/
|
||||||
case OP_ONCE:
|
case OP_ONCE:
|
||||||
{
|
{
|
||||||
PCRE2_SIZE local_offsets[2];
|
int rc;
|
||||||
int local_workspace[1000];
|
int *local_workspace;
|
||||||
|
PCRE2_SIZE *local_offsets;
|
||||||
|
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||||
|
|
||||||
int rc = internal_dfa_match(
|
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||||
|
{
|
||||||
|
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||||
|
if (rc != 0) return rc;
|
||||||
|
RWS = (int *)rws;
|
||||||
|
}
|
||||||
|
|
||||||
|
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||||
|
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||||
|
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
|
rc = internal_dfa_match(
|
||||||
mb, /* fixed match data */
|
mb, /* fixed match data */
|
||||||
code, /* this subexpression's code */
|
code, /* this subexpression's code */
|
||||||
ptr, /* where we currently are */
|
ptr, /* where we currently are */
|
||||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||||
local_offsets, /* offset vector */
|
local_offsets, /* offset vector */
|
||||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||||
local_workspace, /* workspace vector */
|
local_workspace, /* workspace vector */
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
RWS_RSIZE, /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel, /* function recursion level */
|
||||||
|
RWS); /* recursion workspace */
|
||||||
|
|
||||||
|
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||||
|
|
||||||
if (rc >= 0)
|
if (rc >= 0)
|
||||||
{
|
{
|
||||||
|
@ -3063,6 +3223,7 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
||||||
PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data,
|
PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data,
|
||||||
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
|
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
|
||||||
{
|
{
|
||||||
|
int rc;
|
||||||
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
||||||
|
|
||||||
PCRE2_SPTR start_match;
|
PCRE2_SPTR start_match;
|
||||||
|
@ -3071,9 +3232,9 @@ PCRE2_SPTR bumpalong_limit;
|
||||||
PCRE2_SPTR req_cu_ptr;
|
PCRE2_SPTR req_cu_ptr;
|
||||||
|
|
||||||
BOOL utf, anchored, startline, firstline;
|
BOOL utf, anchored, startline, firstline;
|
||||||
|
|
||||||
BOOL has_first_cu = FALSE;
|
BOOL has_first_cu = FALSE;
|
||||||
BOOL has_req_cu = FALSE;
|
BOOL has_req_cu = FALSE;
|
||||||
|
|
||||||
PCRE2_UCHAR first_cu = 0;
|
PCRE2_UCHAR first_cu = 0;
|
||||||
PCRE2_UCHAR first_cu2 = 0;
|
PCRE2_UCHAR first_cu2 = 0;
|
||||||
PCRE2_UCHAR req_cu = 0;
|
PCRE2_UCHAR req_cu = 0;
|
||||||
|
@ -3088,6 +3249,17 @@ pcre2_callout_block cb;
|
||||||
dfa_match_block actual_match_block;
|
dfa_match_block actual_match_block;
|
||||||
dfa_match_block *mb = &actual_match_block;
|
dfa_match_block *mb = &actual_match_block;
|
||||||
|
|
||||||
|
/* Set up a starting block of memory for use during recursive calls to
|
||||||
|
internal_dfa_match(). By putting this on the stack, it minimizes resource use
|
||||||
|
in the case when it is not needed. If this is too small, more memory is
|
||||||
|
obtained from the heap. At the start of each block is an anchor structure.*/
|
||||||
|
|
||||||
|
int base_recursion_workspace[RWS_BASE_SIZE];
|
||||||
|
RWS_anchor *rws = (RWS_anchor *)base_recursion_workspace;
|
||||||
|
rws->next = NULL;
|
||||||
|
rws->size = RWS_BASE_SIZE;
|
||||||
|
rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
|
||||||
|
|
||||||
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
||||||
subject string. */
|
subject string. */
|
||||||
|
|
||||||
|
@ -3184,6 +3356,7 @@ if (mcontext == NULL)
|
||||||
mb->memctl = re->memctl;
|
mb->memctl = re->memctl;
|
||||||
mb->match_limit = PRIV(default_match_context).match_limit;
|
mb->match_limit = PRIV(default_match_context).match_limit;
|
||||||
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
|
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
|
||||||
|
mb->heap_limit = PRIV(default_match_context).heap_limit;
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
@ -3198,6 +3371,7 @@ else
|
||||||
mb->memctl = mcontext->memctl;
|
mb->memctl = mcontext->memctl;
|
||||||
mb->match_limit = mcontext->match_limit;
|
mb->match_limit = mcontext->match_limit;
|
||||||
mb->match_limit_depth = mcontext->depth_limit;
|
mb->match_limit_depth = mcontext->depth_limit;
|
||||||
|
mb->heap_limit = mcontext->heap_limit;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (mb->match_limit > re->limit_match)
|
if (mb->match_limit > re->limit_match)
|
||||||
|
@ -3206,6 +3380,9 @@ if (mb->match_limit > re->limit_match)
|
||||||
if (mb->match_limit_depth > re->limit_depth)
|
if (mb->match_limit_depth > re->limit_depth)
|
||||||
mb->match_limit_depth = re->limit_depth;
|
mb->match_limit_depth = re->limit_depth;
|
||||||
|
|
||||||
|
if (mb->heap_limit > re->limit_heap)
|
||||||
|
mb->heap_limit = re->limit_heap;
|
||||||
|
|
||||||
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
|
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
|
||||||
re->name_count * re->name_entry_size;
|
re->name_count * re->name_entry_size;
|
||||||
mb->tables = re->tables;
|
mb->tables = re->tables;
|
||||||
|
@ -3215,6 +3392,7 @@ mb->start_offset = start_offset;
|
||||||
mb->moptions = options;
|
mb->moptions = options;
|
||||||
mb->poptions = re->overall_options;
|
mb->poptions = re->overall_options;
|
||||||
mb->match_call_count = 0;
|
mb->match_call_count = 0;
|
||||||
|
mb->heap_used = 0;
|
||||||
|
|
||||||
/* Process the \R and newline settings. */
|
/* Process the \R and newline settings. */
|
||||||
|
|
||||||
|
@ -3351,8 +3529,6 @@ a match. */
|
||||||
|
|
||||||
for (;;)
|
for (;;)
|
||||||
{
|
{
|
||||||
int rc;
|
|
||||||
|
|
||||||
/* ----------------- Start of match optimizations ---------------- */
|
/* ----------------- Start of match optimizations ---------------- */
|
||||||
|
|
||||||
/* There are some optimizations that avoid running the match if a known
|
/* There are some optimizations that avoid running the match if a known
|
||||||
|
@ -3544,7 +3720,7 @@ for (;;)
|
||||||
in characters, we treat it as code units to avoid spending too much time
|
in characters, we treat it as code units to avoid spending too much time
|
||||||
in this optimization. */
|
in this optimization. */
|
||||||
|
|
||||||
if (end_subject - start_match < re->minlength) return PCRE2_ERROR_NOMATCH;
|
if (end_subject - start_match < re->minlength) goto NOMATCH_EXIT;
|
||||||
|
|
||||||
/* If req_cu is set, we know that that code unit must appear in the
|
/* If req_cu is set, we know that that code unit must appear in the
|
||||||
subject for the match to succeed. If the first code unit is set, req_cu
|
subject for the match to succeed. If the first code unit is set, req_cu
|
||||||
|
@ -3621,7 +3797,8 @@ for (;;)
|
||||||
(uint32_t)match_data->oveccount * 2, /* actual size of same */
|
(uint32_t)match_data->oveccount * 2, /* actual size of same */
|
||||||
workspace, /* workspace vector */
|
workspace, /* workspace vector */
|
||||||
(int)wscount, /* size of same */
|
(int)wscount, /* size of same */
|
||||||
0); /* function recurse level */
|
0, /* function recurse level */
|
||||||
|
base_recursion_workspace); /* initial workspace for recursion */
|
||||||
|
|
||||||
/* Anything other than "no match" means we are done, always; otherwise, carry
|
/* Anything other than "no match" means we are done, always; otherwise, carry
|
||||||
on only if not anchored. */
|
on only if not anchored. */
|
||||||
|
@ -3637,7 +3814,7 @@ for (;;)
|
||||||
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
|
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
|
||||||
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
|
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
|
||||||
match_data->rc = rc;
|
match_data->rc = rc;
|
||||||
return rc;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Advance to the next subject character unless we are at the end of a line
|
/* Advance to the next subject character unless we are at the end of a line
|
||||||
|
@ -3668,8 +3845,18 @@ for (;;)
|
||||||
|
|
||||||
} /* "Bumpalong" loop */
|
} /* "Bumpalong" loop */
|
||||||
|
|
||||||
|
NOMATCH_EXIT:
|
||||||
|
rc = PCRE2_ERROR_NOMATCH;
|
||||||
|
|
||||||
return PCRE2_ERROR_NOMATCH;
|
EXIT:
|
||||||
|
while (rws->next != NULL)
|
||||||
|
{
|
||||||
|
RWS_anchor *next = rws->next;
|
||||||
|
rws->next = next->next;
|
||||||
|
mb->memctl.free(next, mb->memctl.memory_data);
|
||||||
|
}
|
||||||
|
|
||||||
|
return rc;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* End of pcre2_dfa_match.c */
|
/* End of pcre2_dfa_match.c */
|
||||||
|
|
|
@ -253,6 +253,11 @@ maximum size of this can be limited. */
|
||||||
|
|
||||||
#define START_FRAMES_SIZE 20480
|
#define START_FRAMES_SIZE 20480
|
||||||
|
|
||||||
|
/* Similarly, for DFA matching, an initial internal workspace vector is
|
||||||
|
allocated on the stack. */
|
||||||
|
|
||||||
|
#define DFA_START_RWS_SIZE 30720
|
||||||
|
|
||||||
/* Define the default BSR convention. */
|
/* Define the default BSR convention. */
|
||||||
|
|
||||||
#ifdef BSR_ANYCRLF
|
#ifdef BSR_ANYCRLF
|
||||||
|
|
|
@ -896,6 +896,8 @@ typedef struct dfa_match_block {
|
||||||
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
||||||
const uint8_t *tables; /* Character tables */
|
const uint8_t *tables; /* Character tables */
|
||||||
PCRE2_SIZE start_offset; /* The start offset value */
|
PCRE2_SIZE start_offset; /* The start offset value */
|
||||||
|
PCRE2_SIZE heap_limit; /* As it says */
|
||||||
|
PCRE2_SIZE heap_used; /* As it says */
|
||||||
uint32_t match_limit; /* As it says */
|
uint32_t match_limit; /* As it says */
|
||||||
uint32_t match_limit_depth; /* As it says */
|
uint32_t match_limit_depth; /* As it says */
|
||||||
uint32_t match_call_count; /* Number of calls of internal function */
|
uint32_t match_call_count; /* Number of calls of internal function */
|
||||||
|
|
|
@ -5760,6 +5760,8 @@ PCRE2_SET_HEAP_LIMIT(dat_context, max);
|
||||||
|
|
||||||
for (;;)
|
for (;;)
|
||||||
{
|
{
|
||||||
|
uint32_t stack_start = 0;
|
||||||
|
|
||||||
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
|
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
|
||||||
{
|
{
|
||||||
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
|
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
|
||||||
|
@ -5775,6 +5777,7 @@ for (;;)
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_DFA) != 0)
|
if ((dat_datctl.control & CTL_DFA) != 0)
|
||||||
{
|
{
|
||||||
|
stack_start = DFA_START_RWS_SIZE/1024;
|
||||||
if (dfa_workspace == NULL)
|
if (dfa_workspace == NULL)
|
||||||
dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
|
dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
|
||||||
if (dfa_matched++ == 0)
|
if (dfa_matched++ == 0)
|
||||||
|
@ -5789,11 +5792,21 @@ for (;;)
|
||||||
dat_datctl.options, match_data, PTR(dat_context));
|
dat_datctl.options, match_data, PTR(dat_context));
|
||||||
|
|
||||||
else
|
else
|
||||||
|
{
|
||||||
|
stack_start = START_FRAMES_SIZE/1024;
|
||||||
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
|
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
|
||||||
dat_datctl.options, match_data, PTR(dat_context));
|
dat_datctl.options, match_data, PTR(dat_context));
|
||||||
|
}
|
||||||
|
|
||||||
if (capcount == errnumber)
|
if (capcount == errnumber)
|
||||||
{
|
{
|
||||||
|
if ((mid & 0x80000000u) != 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "Can't find minimum %s limit: check pattern for "
|
||||||
|
"restriction\n", msg);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
min = mid;
|
min = mid;
|
||||||
mid = (mid == max - 1)? max : (max != UINT32_MAX)? (min + max)/2 : mid*2;
|
mid = (mid == max - 1)? max : (max != UINT32_MAX)? (min + max)/2 : mid*2;
|
||||||
}
|
}
|
||||||
|
@ -5802,11 +5815,12 @@ for (;;)
|
||||||
capcount == PCRE2_ERROR_PARTIAL)
|
capcount == PCRE2_ERROR_PARTIAL)
|
||||||
{
|
{
|
||||||
/* If we've not hit the error with a heap limit less than the size of the
|
/* If we've not hit the error with a heap limit less than the size of the
|
||||||
initial stack frame vector, the heap is not being used, so the minimum
|
initial stack frame vector (for pcre2_match()) or the initial stack
|
||||||
limit is zero; there's no need to go on. The other limits are always
|
workspace vector (for pcre2_dfa_match()), the heap is not being used, so
|
||||||
greater than zero. */
|
the minimum limit is zero; there's no need to go on. The other limits are
|
||||||
|
always greater than zero. */
|
||||||
|
|
||||||
if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < START_FRAMES_SIZE/1024)
|
if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < stack_start)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "Minimum %s limit = 0\n", msg);
|
fprintf(outfile, "Minimum %s limit = 0\n", msg);
|
||||||
break;
|
break;
|
||||||
|
@ -6771,7 +6785,7 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
||||||
PCRE2_SIZE end = pmatch[i].rm_eo;
|
PCRE2_SIZE end = pmatch[i].rm_eo;
|
||||||
for (j = last_printed + 1; j < i; j++)
|
for (j = last_printed + 1; j < i; j++)
|
||||||
fprintf(outfile, "%2d: <unset>\n", (int)j);
|
fprintf(outfile, "%2d: <unset>\n", (int)j);
|
||||||
last_printed = i;
|
last_printed = i;
|
||||||
if (start > end)
|
if (start > end)
|
||||||
{
|
{
|
||||||
start = pmatch[i].rm_eo;
|
start = pmatch[i].rm_eo;
|
||||||
|
@ -7139,18 +7153,16 @@ else for (gmatched = 0;; gmatched++)
|
||||||
(double)CLOCKS_PER_SEC);
|
(double)CLOCKS_PER_SEC);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Find the heap, match and depth limits if requested. The match and heap
|
/* Find the heap, match and depth limits if requested. The depth and heap
|
||||||
limits are not relevant for DFA matching and the depth and heap limits are
|
limits are not relevant for JIT. The return from check_match_limit() is the
|
||||||
not relevant for JIT. The return from check_match_limit() is the return from
|
return from the final call to pcre2_match() or pcre2_dfa_match(). */
|
||||||
the final call to pcre2_match() or pcre2_dfa_match(). */
|
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
|
if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
|
||||||
{
|
{
|
||||||
capcount = 0; /* This stops compiler warnings */
|
capcount = 0; /* This stops compiler warnings */
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_DFA) == 0 &&
|
if (FLD(compiled_code, executable_jit) == NULL ||
|
||||||
(FLD(compiled_code, executable_jit) == NULL ||
|
(dat_datctl.options & PCRE2_NO_JIT) != 0)
|
||||||
(dat_datctl.options & PCRE2_NO_JIT) != 0))
|
|
||||||
{
|
{
|
||||||
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
||||||
}
|
}
|
||||||
|
@ -7165,6 +7177,12 @@ else for (gmatched = 0;; gmatched++)
|
||||||
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
|
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
|
||||||
"depth");
|
"depth");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (capcount == 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n");
|
||||||
|
capcount = dat_datctl.oveccount;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Otherwise just run a single match, setting up a callout if required (the
|
/* Otherwise just run a single match, setting up a callout if required (the
|
||||||
|
@ -7877,7 +7895,7 @@ else
|
||||||
(void)PCRE2_CONFIG(PCRE2_CONFIG_NEWLINE, &optval);
|
(void)PCRE2_CONFIG(PCRE2_CONFIG_NEWLINE, &optval);
|
||||||
print_newline_config(optval, FALSE);
|
print_newline_config(optval, FALSE);
|
||||||
(void)PCRE2_CONFIG(PCRE2_CONFIG_BSR, &optval);
|
(void)PCRE2_CONFIG(PCRE2_CONFIG_BSR, &optval);
|
||||||
printf(" \\R matches %s\n",
|
printf(" \\R matches %s\n",
|
||||||
(optval == PCRE2_BSR_ANYCRLF)? "CR, LF, or CRLF only" :
|
(optval == PCRE2_BSR_ANYCRLF)? "CR, LF, or CRLF only" :
|
||||||
"all Unicode newlines");
|
"all Unicode newlines");
|
||||||
(void)PCRE2_CONFIG(PCRE2_CONFIG_NEVER_BACKSLASH_C, &optval);
|
(void)PCRE2_CONFIG(PCRE2_CONFIG_NEVER_BACKSLASH_C, &optval);
|
||||||
|
|
|
@ -4874,6 +4874,14 @@
|
||||||
\= Expect depth limit exceeded
|
\= Expect depth limit exceeded
|
||||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
|
||||||
|
/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
|
||||||
|
\= Expect heap limit exceeded
|
||||||
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
|
||||||
|
/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
|
||||||
|
\= Expect success
|
||||||
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
|
||||||
/(02-)?[0-9]{3}-[0-9]{3}/
|
/(02-)?[0-9]{3}-[0-9]{3}/
|
||||||
02-123-123
|
02-123-123
|
||||||
|
|
||||||
|
|
|
@ -7667,12 +7667,23 @@ No match
|
||||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
Failed: error -53: matching depth limit exceeded
|
Failed: error -53: matching depth limit exceeded
|
||||||
|
|
||||||
|
/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
|
||||||
|
\= Expect heap limit exceeded
|
||||||
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
Failed: error -63: heap limit exceeded
|
||||||
|
|
||||||
|
/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
|
||||||
|
\= Expect success
|
||||||
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
0: a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
|
||||||
/(02-)?[0-9]{3}-[0-9]{3}/
|
/(02-)?[0-9]{3}-[0-9]{3}/
|
||||||
02-123-123
|
02-123-123
|
||||||
0: 02-123-123
|
0: 02-123-123
|
||||||
|
|
||||||
/^(a(?2))(b)(?1)/
|
/^(a(?2))(b)(?1)/
|
||||||
abbab\=find_limits
|
abbab\=find_limits
|
||||||
|
Minimum heap limit = 0
|
||||||
Minimum match limit = 4
|
Minimum match limit = 4
|
||||||
Minimum depth limit = 2
|
Minimum depth limit = 2
|
||||||
0: abbab
|
0: abbab
|
||||||
|
|
Loading…
Reference in New Issue