Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace
vectors when doing recursive function calls.
This commit is contained in:
parent
fb413521fc
commit
75747ebb11
10
ChangeLog
10
ChangeLog
|
@ -50,7 +50,15 @@ offset is set zero for early errors.
|
|||
|
||||
(c) Support for non-C99 snprintf() that returns -1 in the overflow case.
|
||||
|
||||
11. Minor tidy of pcre2_dfa_matgch() code.
|
||||
11. Minor tidy of pcre2_dfa_match() code.
|
||||
|
||||
12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
|
||||
use the stack for local workspace and local ovectors. Instead, an initial block
|
||||
of stack is reserved, but if this is insufficient, heap memory is used. The
|
||||
heap limit parameter now applies to pcre2_dfa_match().
|
||||
|
||||
13. If a "find limits" test of DFA matching in pcre2test resulted in too many
|
||||
matches for the ovector, no matches were displayed.
|
||||
|
||||
|
||||
Version 10.31 12-February-2018
|
||||
|
|
12
README
12
README
|
@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
|
|||
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
||||
|
||||
. There is a separate counter that limits the depth of nested backtracking
|
||||
during a matching process, which indirectly limits the amount of heap memory
|
||||
that is used. This also has a default of ten million, which is essentially
|
||||
"unlimited". You can change the default by setting, for example,
|
||||
(pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
|
||||
matching process, which indirectly limits the amount of heap memory that is
|
||||
used, and in the case of pcre2_dfa_match() the amount of stack as well. This
|
||||
counter also has a default of ten million, which is essentially "unlimited".
|
||||
You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit-depth=5000
|
||||
|
||||
|
@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
|
|||
pcre2_set_depth_limit).
|
||||
|
||||
. You can also set an explicit limit on the amount of heap memory used by
|
||||
the pcre2_match() interpreter:
|
||||
the pcre2_match() and pcre2_dfa_match() interpreters:
|
||||
|
||||
--with-heap-limit=500
|
||||
|
||||
|
@ -885,4 +887,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 25 February 2018
|
||||
Last updated: 27 April 2018
|
||||
|
|
19
configure.ac
19
configure.ac
|
@ -718,10 +718,11 @@ AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
|||
AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
|
||||
The value of MATCH_LIMIT determines the default number of times the
|
||||
pcre2_match() function can record a backtrack position during a single
|
||||
matching attempt. There is a runtime interface for setting a different limit.
|
||||
The limit exists in order to catch runaway regular expressions that take for
|
||||
ever to determine that they do not match. The default is set very large so
|
||||
that it does not accidentally catch legitimate cases.])
|
||||
matching attempt. The value is also used to limit a loop counter in
|
||||
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||
limit. The limit exists in order to catch runaway regular expressions that
|
||||
take for ever to determine that they do not match. The default is set very
|
||||
large so that it does not accidentally catch legitimate cases.])
|
||||
|
||||
# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth
|
||||
|
||||
|
@ -745,11 +746,15 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
|
|||
the maximum amount of heap memory that is used. The value of
|
||||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
|
||||
be less than the value of MATCH_LIMIT. The default is to use the same value
|
||||
as MATCH_LIMIT. There is a runtime method for setting a different limit.])
|
||||
as MATCH_LIMIT. There is a runtime method for setting a different limit. In
|
||||
the case of pcre2_dfa_match(), this limit controls the depth of the internal
|
||||
nested function calls that are used for pattern recursions, lookarounds, and
|
||||
atomic groups.])
|
||||
|
||||
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
|
||||
This limits the amount of memory that pcre2_match() may use while matching
|
||||
a pattern. The value is in kilobytes.])
|
||||
This limits the amount of memory that may be used while matching
|
||||
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
|
||||
not apply to JIT matching. The value is in kilobytes.])
|
||||
|
||||
AC_DEFINE([MAX_NAME_SIZE], [32], [
|
||||
This limit is parameterized just in case anybody ever wants to
|
||||
|
|
|
@ -10,6 +10,7 @@ This document contains the following sections:
|
|||
Calling conventions in Windows environments
|
||||
Comments about Win32 builds
|
||||
Building PCRE2 on Windows with CMake
|
||||
Building PCRE2 on Windows with Visual Studio
|
||||
Testing with RunTest.bat
|
||||
Building PCRE2 on native z/OS and z/VM
|
||||
|
||||
|
@ -330,6 +331,18 @@ cache can be deleted by selecting "File > Delete Cache".
|
|||
available for review in Testing\Temporary under your build dir.
|
||||
|
||||
|
||||
BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO
|
||||
|
||||
The code currently cannot be compiled without a stdint.h header, which is
|
||||
available only in relatively recent versions of Visual Studio. However, this
|
||||
portable and permissively-licensed implementation of the header worked without
|
||||
issue:
|
||||
|
||||
http://www.azillionmonkeys.com/qed/pstdint.h
|
||||
|
||||
Just rename it and drop it into the top level of the build tree.
|
||||
|
||||
|
||||
TESTING WITH RUNTEST.BAT
|
||||
|
||||
If configured with CMake, building the test project ("make test" or building
|
||||
|
@ -382,6 +395,6 @@ Everything in that location, source and executable, is in EBCDIC and native
|
|||
z/OS file formats. The port provides an API for LE languages such as COBOL and
|
||||
for the z/OS and z/VM versions of the Rexx languages.
|
||||
|
||||
===============================
|
||||
Last Updated: 13 September 2017
|
||||
===============================
|
||||
===========================
|
||||
Last Updated: 19 April 2018
|
||||
===========================
|
||||
|
|
|
@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
|
|||
discussion in the pcre2api man page (search for pcre2_set_match_limit).
|
||||
|
||||
. There is a separate counter that limits the depth of nested backtracking
|
||||
during a matching process, which indirectly limits the amount of heap memory
|
||||
that is used. This also has a default of ten million, which is essentially
|
||||
"unlimited". You can change the default by setting, for example,
|
||||
(pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
|
||||
matching process, which indirectly limits the amount of heap memory that is
|
||||
used, and in the case of pcre2_dfa_match() the amount of stack as well. This
|
||||
counter also has a default of ten million, which is essentially "unlimited".
|
||||
You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit-depth=5000
|
||||
|
||||
|
@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
|
|||
pcre2_set_depth_limit).
|
||||
|
||||
. You can also set an explicit limit on the amount of heap memory used by
|
||||
the pcre2_match() interpreter:
|
||||
the pcre2_match() and pcre2_dfa_match() interpreters:
|
||||
|
||||
--with-heap-limit=500
|
||||
|
||||
|
@ -885,4 +887,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 25 February 2018
|
||||
Last updated: 27 April 2018
|
||||
|
|
|
@ -46,9 +46,9 @@ just once (except when processing lookaround assertions). This function is
|
|||
<i>wscount</i> Number of elements in the vector
|
||||
</pre>
|
||||
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
|
||||
up a callout function or specify the match and/or the recursion depth limits.
|
||||
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
|
||||
The options are:
|
||||
up a callout function or specify the heap limit or the match or the recursion
|
||||
depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
|
||||
characters. The options are:
|
||||
<pre>
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
|
|
|
@ -951,14 +951,15 @@ offset limit. In other words, whichever limit comes first is used.
|
|||
<br>
|
||||
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
||||
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
||||
information when running an interpretive match. This limit does not apply to
|
||||
matching with the JIT optimization, which has its own memory control
|
||||
arrangements (see the
|
||||
information when running an interpretive match. This limit also applies to
|
||||
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
|
||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
||||
does not apply to matching with the JIT optimization, which has its own memory
|
||||
control arrangements (see the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
|
||||
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
||||
returned. The default limit is set when PCRE2 is built; the default default is
|
||||
very large and is essentially "unlimited".
|
||||
documentation for more details). If the limit is reached, the negative error
|
||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
||||
built; the default default is very large and is essentially "unlimited".
|
||||
</P>
|
||||
<P>
|
||||
A value for the heap limit may also be supplied by an item at the start of a
|
||||
|
@ -978,6 +979,12 @@ Heap memory is used only if the initial vector is too small. If the heap limit
|
|||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
||||
can be successfully processed.
|
||||
</P>
|
||||
<P>
|
||||
Similarly, for <b>pcre2_dfa_match()</b>, a vector on the system stack is used
|
||||
when processing pattern recursions, lookarounds, or atomic groups, and only if
|
||||
this is not big enough is heap memory used. In this case, too, setting a value
|
||||
of zero disables the use of the heap.
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
|
@ -1035,11 +1042,22 @@ backtracking.
|
|||
<P>
|
||||
The depth limit is not relevant, and is ignored, when matching is done using
|
||||
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
|
||||
uses it to limit the depth of internal recursive function calls that implement
|
||||
atomic groups, lookaround assertions, and pattern recursions. This is,
|
||||
therefore, an indirect limit on the amount of system stack that is used. A
|
||||
recursive pattern such as /(.)(?1)/, when matched to a very long string using
|
||||
<b>pcre2_dfa_match()</b>, can use a great deal of stack.
|
||||
uses it to limit the depth of nested internal recursive function calls that
|
||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
||||
in versions before 10.32, when stack memory was used for local workspace
|
||||
vectors for recursive function calls. From version 10.32, only local variables
|
||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||
a small stack can support quite a lot of recursion.
|
||||
</P>
|
||||
<P>
|
||||
If the depth of internal recursive function calls is great enough, local
|
||||
workspace vectors are allocated on the heap from version 10.32 onwards, so the
|
||||
depth limit also indirectly limits the amount of heap memory that is used. A
|
||||
recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
|
||||
using <b>pcre2_dfa_match()</b>, can use a great deal of memory. However, it is
|
||||
probably better to limit heap usage directly by calling
|
||||
<b>pcre2_set_heap_limit()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
|
@ -1096,15 +1114,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
|
|||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
|
||||
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
|
||||
<b>pcre2_set_depth_limit()</b> above.
|
||||
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions,
|
||||
lookarounds, and atomic groups in <b>pcre2_dfa_match()</b>. Further details are
|
||||
given with <b>pcre2_set_depth_limit()</b> above.
|
||||
<pre>
|
||||
PCRE2_CONFIG_HEAPLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
||||
for the amount of heap memory used by <b>pcre2_match()</b>. Further details are
|
||||
given with <b>pcre2_set_heap_limit()</b> above.
|
||||
for the amount of heap memory used by <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b>. Further details are given with
|
||||
<b>pcre2_set_heap_limit()</b> above.
|
||||
<pre>
|
||||
PCRE2_CONFIG_JIT
|
||||
</pre>
|
||||
|
@ -3510,17 +3529,7 @@ capture.
|
|||
Calls to the convenience functions that extract substrings by name
|
||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
||||
DFA match. The convenience functions that extract substrings by number never
|
||||
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
|
||||
slightly different:
|
||||
<pre>
|
||||
PCRE2_ERROR_UNAVAILABLE
|
||||
</pre>
|
||||
The ovector is not big enough to include a slot for the given substring number.
|
||||
<pre>
|
||||
PCRE2_ERROR_UNSET
|
||||
</pre>
|
||||
There is a slot in the ovector for this substring, but there were insufficient
|
||||
matches to fill it.
|
||||
return PCRE2_ERROR_NOSUBSTRING.
|
||||
</P>
|
||||
<P>
|
||||
The matched strings are stored in the ovector in reverse order of length; that
|
||||
|
@ -3594,9 +3603,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 31 December 2017
|
||||
Last updated: 27 April 2018
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -295,9 +295,10 @@ change this by a setting such as
|
|||
--with-heap-limit=500
|
||||
</pre>
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
||||
interpretive matching in pcre2_match(). It does not apply when JIT (which has
|
||||
its own memory arrangements) is used, nor does it apply to
|
||||
<b>pcre2_dfa_match()</b>.
|
||||
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
|
||||
may also use the heap for internal workspace when processing complicated
|
||||
patterns. This limit does not apply when JIT (which has its own memory
|
||||
arrangements) is used.
|
||||
</P>
|
||||
<P>
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
|
@ -573,7 +574,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 25 February 2018
|
||||
Last updated: 26 April 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -310,10 +310,12 @@ PCRE2_UNSET.
|
|||
</P>
|
||||
<P>
|
||||
For DFA matching, the <i>offset_vector</i> field points to the ovector that was
|
||||
passed to the matching function in the match data block, but it holds no useful
|
||||
information at callout time because <b>pcre2_dfa_match()</b> does not support
|
||||
substring capturing. The value of <i>capture_top</i> is always 1 and the value
|
||||
of <i>capture_last</i> is always 0 for DFA matching.
|
||||
passed to the matching function in the match data block for callouts at the top
|
||||
level, but to an internal ovector during the processing of pattern recursions,
|
||||
lookarounds, and atomic groups. However, these ovectors hold no useful
|
||||
information because <b>pcre2_dfa_match()</b> does not support substring
|
||||
capturing. The value of <i>capture_top</i> is always 1 and the value of
|
||||
<i>capture_last</i> is always 0 for DFA matching.
|
||||
</P>
|
||||
<P>
|
||||
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
|
||||
|
@ -461,9 +463,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 22 December 2017
|
||||
Last updated: 26 April 2018
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -173,12 +173,12 @@ the application to apply the JIT optimization by calling
|
|||
Setting match resource limits
|
||||
</b><br>
|
||||
<P>
|
||||
The pcre2_match() function contains a counter that is incremented every time it
|
||||
goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
|
||||
this counter, which therefore limits the amount of computing resource used for
|
||||
a match. The maximum depth of nested backtracking can also be limited; this
|
||||
indirectly restricts the amount of heap memory that is used, but there is also
|
||||
an explicit memory limit that can be set.
|
||||
The <b>pcre2_match()</b> function contains a counter that is incremented every
|
||||
time it goes round its main loop. The caller of <b>pcre2_match()</b> can set a
|
||||
limit on this counter, which therefore limits the amount of computing resource
|
||||
used for a match. The maximum depth of nested backtracking can also be limited;
|
||||
this indirectly restricts the amount of heap memory that is used, but there is
|
||||
also an explicit memory limit that can be set.
|
||||
</P>
|
||||
<P>
|
||||
These facilities are provided to catch runaway matches that are provoked by
|
||||
|
@ -195,20 +195,22 @@ where d is any number of decimal digits. However, the value of the setting must
|
|||
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
||||
for it to have any effect. In other words, the pattern writer can lower the
|
||||
limits set by the programmer, but not raise them. If there is more than one
|
||||
setting of one of these limits, the lower value is used.
|
||||
setting of one of these limits, the lower value is used. The heap limit is
|
||||
specified in kilobytes.
|
||||
</P>
|
||||
<P>
|
||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||
still recognized for backwards compatibility.
|
||||
</P>
|
||||
<P>
|
||||
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
|
||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
||||
(but in a different way) when JIT is being used, or when
|
||||
<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
|
||||
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
||||
matching, which uses function recursion for recursions within the pattern. In
|
||||
this case, the depth limit controls the amount of system stack that is used.
|
||||
The heap limit applies only when the <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b> interpreters are used for matching. It does not apply
|
||||
to JIT. The match limit is used (but in a different way) when JIT is being
|
||||
used, or when <b>pcre2_dfa_match()</b> is called, to limit computing resource
|
||||
usage by those matching functions. The depth limit is ignored by JIT but is
|
||||
relevant for DFA matching, which uses function recursion for recursions within
|
||||
the pattern and for lookaround assertions and atomic groups. In this case, the
|
||||
depth limit controls the depth of such recursion.
|
||||
<a name="newlines"></a></P>
|
||||
<br><b>
|
||||
Newline conventions
|
||||
|
@ -2818,11 +2820,6 @@ matched at the top level, its final captured value is unset, even if it was
|
|||
(temporarily) set at a deeper level during the matching process.
|
||||
</P>
|
||||
<P>
|
||||
If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
|
||||
obtain extra memory from the heap to store data during a recursion. If no
|
||||
memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
|
||||
</P>
|
||||
<P>
|
||||
Do not confuse the (?R) item with the condition (R), which tests for recursion.
|
||||
Consider this pattern, which matches text in angle brackets, allowing for
|
||||
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
|
||||
|
@ -3479,9 +3476,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 12 September 2017
|
||||
Last updated: 25 April 2018
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -93,9 +93,17 @@ may also reduce the memory requirements.
|
|||
<P>
|
||||
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
|
||||
function calls, but only for processing atomic groups, lookaround assertions,
|
||||
and recursion within the pattern. Too much nested recursion may cause stack
|
||||
issues. The "match depth" parameter can be used to limit the depth of function
|
||||
recursion in <b>pcre2_dfa_match()</b>.
|
||||
and recursion within the pattern. The original version of the code used to
|
||||
allocate quite large internal workspace vectors on the stack, which caused some
|
||||
problems for some patterns in environments with small stacks. From release
|
||||
10.32 the code for <b>pcre2_dfa_match()</b> has been re-factored to use heap
|
||||
memory when necessary for internal workspace when recursing, though recursive
|
||||
function calls are still used.
|
||||
</P>
|
||||
<P>
|
||||
The "match depth" parameter can be used to limit the depth of function
|
||||
recursion, and the "match heap" parameter to limit heap memory in
|
||||
<b>pcre2_dfa_match()</b>.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
|
||||
<P>
|
||||
|
@ -244,9 +252,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 08 April 2017
|
||||
Last updated: 25 April 2018
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -1199,7 +1199,7 @@ pattern.
|
|||
get=<number or name> extract captured substring
|
||||
getall extract all captured substrings
|
||||
/g global global matching
|
||||
heap_limit=<n> set a limit on heap memory
|
||||
heap_limit=<n> set a limit on heap memory (Kbytes)
|
||||
jitstack=<n> set size of JIT stack
|
||||
mark show mark values
|
||||
match_limit=<n> set a match limit
|
||||
|
@ -1438,20 +1438,17 @@ Finding minimum limits
|
|||
<P>
|
||||
If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
|
||||
calls the relevant matching function several times, setting different values in
|
||||
the match context via <b>pcre2_set_heap_limit(), \fBpcre2_set_match_limit()</b>,
|
||||
or <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
|
||||
parameter that allows the match to complete without error.
|
||||
the match context via <b>pcre2_set_heap_limit()</b>,
|
||||
<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
|
||||
the minimum values for each parameter that allows the match to complete without
|
||||
error. If JIT is being used, only the match limit is relevant.
|
||||
</P>
|
||||
<P>
|
||||
If JIT is being used, only the match limit is relevant. If DFA matching is
|
||||
being used, only the depth limit is relevant.
|
||||
</P>
|
||||
<P>
|
||||
The <i>match_limit</i> number is a measure of the amount of backtracking
|
||||
that takes place, and learning the minimum value can be instructive. For most
|
||||
simple matches, the number is quite small, but for patterns with very large
|
||||
numbers of matching possibilities, it can become large very quickly with
|
||||
increasing length of subject string.
|
||||
When using this modifier, the pattern should not contain any limit settings
|
||||
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
|
||||
lower than the minimum matching value, the minimum value cannot be found
|
||||
because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
|
||||
an in-pattern limit; they cannot increase it.
|
||||
</P>
|
||||
<P>
|
||||
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
|
||||
|
@ -1460,6 +1457,22 @@ searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
|
|||
recursive calls of the internal function that is used for handling pattern
|
||||
recursion, lookaround assertions, and atomic groups.
|
||||
</P>
|
||||
<P>
|
||||
For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
|
||||
of backtracking that takes place, and learning the minimum value can be
|
||||
instructive. For most simple matches, the number is quite small, but for
|
||||
patterns with very large numbers of matching possibilities, it can become large
|
||||
very quickly with increasing length of subject string. In the case of DFA
|
||||
matching, <i>match_limit</i> controls the total number of calls, both recursive
|
||||
and non-recursive, to the internal matching function, thus controlling the
|
||||
overall amount of computing resource that is used.
|
||||
</P>
|
||||
<P>
|
||||
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
|
||||
limits the amount of heap memory used for matching. A value of zero disables
|
||||
the use of any heap memory; many simple pattern matches can be done without
|
||||
using the heap, so this is not an unreasonable setting.
|
||||
</P>
|
||||
<br><b>
|
||||
Showing MARK names
|
||||
</b><br>
|
||||
|
@ -1476,13 +1489,14 @@ Showing memory usage
|
|||
<P>
|
||||
The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
|
||||
memory allocation and freeing calls that occur during a call to
|
||||
<b>pcre2_match()</b>. These occur only when a match requires a bigger vector
|
||||
than the default for remembering backtracking points. In many cases there will
|
||||
be no heap memory used and therefore no additional output. No heap memory is
|
||||
allocated during matching with <b>pcre2_dfa_match</b> or with JIT, so in those
|
||||
cases the <b>memory</b> modifier never has any effect. For this modifier to
|
||||
work, the <b>null_context</b> modifier must not be set on both the pattern and
|
||||
the subject, though it can be set on one or the other.
|
||||
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
|
||||
requires a bigger vector than the default for remembering backtracking points
|
||||
(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
|
||||
many cases there will be no heap memory used and therefore no additional
|
||||
output. No heap memory is allocated during matching with JIT, so in that case
|
||||
the <b>memory</b> modifier never has any effect. For this modifier to work, the
|
||||
<b>null_context</b> modifier must not be set on both the pattern and the
|
||||
subject, though it can be set on one or the other.
|
||||
</P>
|
||||
<br><b>
|
||||
Setting a starting offset
|
||||
|
@ -1982,9 +1996,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 December 2017
|
||||
Last updated: 25 April 2018
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
131
doc/pcre2.txt
131
doc/pcre2.txt
|
@ -959,13 +959,15 @@ PCRE2 CONTEXTS
|
|||
|
||||
The heap_limit parameter specifies, in units of kilobytes, the maximum
|
||||
amount of heap memory that pcre2_match() may use to hold backtracking
|
||||
information when running an interpretive match. This limit does not
|
||||
apply to matching with the JIT optimization, which has its own memory
|
||||
control arrangements (see the pcre2jit documentation for more details),
|
||||
nor does it apply to pcre2_dfa_match(). If the limit is reached, the
|
||||
negative error code PCRE2_ERROR_HEAPLIMIT is returned. The default
|
||||
limit is set when PCRE2 is built; the default default is very large and
|
||||
is essentially "unlimited".
|
||||
information when running an interpretive match. This limit also applies
|
||||
to pcre2_dfa_match(), which may use the heap when processing patterns
|
||||
with a lot of nested pattern recursion or lookarounds or atomic groups.
|
||||
This limit does not apply to matching with the JIT optimization, which
|
||||
has its own memory control arrangements (see the pcre2jit documentation
|
||||
for more details). If the limit is reached, the negative error code
|
||||
PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2
|
||||
is built; the default default is very large and is essentially "unlim-
|
||||
ited".
|
||||
|
||||
A value for the heap limit may also be supplied by an item at the start
|
||||
of a pattern of the form
|
||||
|
@ -984,6 +986,11 @@ PCRE2 CONTEXTS
|
|||
zero) no heap memory will be used. In this case, only patterns that do
|
||||
not have a lot of nested backtracking can be successfully processed.
|
||||
|
||||
Similarly, for pcre2_dfa_match(), a vector on the system stack is used
|
||||
when processing pattern recursions, lookarounds, or atomic groups, and
|
||||
only if this is not big enough is heap memory used. In this case, too,
|
||||
setting a value of zero disables the use of the heap.
|
||||
|
||||
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
|
@ -1033,12 +1040,22 @@ PCRE2 CONTEXTS
|
|||
|
||||
The depth limit is not relevant, and is ignored, when matching is done
|
||||
using JIT compiled code. However, it is supported by pcre2_dfa_match(),
|
||||
which uses it to limit the depth of internal recursive function calls
|
||||
that implement atomic groups, lookaround assertions, and pattern recur-
|
||||
sions. This is, therefore, an indirect limit on the amount of system
|
||||
stack that is used. A recursive pattern such as /(.)(?1)/, when matched
|
||||
to a very long string using pcre2_dfa_match(), can use a great deal of
|
||||
stack.
|
||||
which uses it to limit the depth of nested internal recursive function
|
||||
calls that implement atomic groups, lookaround assertions, and pattern
|
||||
recursions. This limits, indirectly, the amount of system stack this is
|
||||
used. It was more useful in versions before 10.32, when stack memory
|
||||
was used for local workspace vectors for recursive function calls. From
|
||||
version 10.32, only local variables are allocated on the stack and as
|
||||
each call uses only a few hundred bytes, even a small stack can support
|
||||
quite a lot of recursion.
|
||||
|
||||
If the depth of internal recursive function calls is great enough,
|
||||
local workspace vectors are allocated on the heap from version 10.32
|
||||
onwards, so the depth limit also indirectly limits the amount of heap
|
||||
memory that is used. A recursive pattern such as /(.(?2))((?1)|)/, when
|
||||
matched to a very long string using pcre2_dfa_match(), can use a great
|
||||
deal of memory. However, it is probably better to limit heap usage
|
||||
directly by calling pcre2_set_heap_limit().
|
||||
|
||||
The default value for the depth limit can be set when PCRE2 is built;
|
||||
the default default is the same value as the default for the match
|
||||
|
@ -1095,14 +1112,15 @@ CHECKING BUILD-TIME OPTIONS
|
|||
|
||||
The output is a uint32_t integer that gives the default limit for the
|
||||
depth of nested backtracking in pcre2_match() or the depth of nested
|
||||
recursions and lookarounds in pcre2_dfa_match(). Further details are
|
||||
given with pcre2_set_depth_limit() above.
|
||||
recursions, lookarounds, and atomic groups in pcre2_dfa_match(). Fur-
|
||||
ther details are given with pcre2_set_depth_limit() above.
|
||||
|
||||
PCRE2_CONFIG_HEAPLIMIT
|
||||
|
||||
The output is a uint32_t integer that gives, in kilobytes, the default
|
||||
limit for the amount of heap memory used by pcre2_match(). Further
|
||||
details are given with pcre2_set_heap_limit() above.
|
||||
limit for the amount of heap memory used by pcre2_match() or
|
||||
pcre2_dfa_match(). Further details are given with
|
||||
pcre2_set_heap_limit() above.
|
||||
|
||||
PCRE2_CONFIG_JIT
|
||||
|
||||
|
@ -3396,18 +3414,7 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
|||
Calls to the convenience functions that extract substrings by name
|
||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used
|
||||
after a DFA match. The convenience functions that extract substrings by
|
||||
number never return PCRE2_ERROR_NOSUBSTRING, and the meanings of some
|
||||
other errors are slightly different:
|
||||
|
||||
PCRE2_ERROR_UNAVAILABLE
|
||||
|
||||
The ovector is not big enough to include a slot for the given substring
|
||||
number.
|
||||
|
||||
PCRE2_ERROR_UNSET
|
||||
|
||||
There is a slot in the ovector for this substring, but there were
|
||||
insufficient matches to fill it.
|
||||
number never return PCRE2_ERROR_NOSUBSTRING.
|
||||
|
||||
The matched strings are stored in the ovector in reverse order of
|
||||
length; that is, the longest matching string is first. If there were
|
||||
|
@ -3476,8 +3483,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 31 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 27 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -3746,9 +3753,10 @@ LIMITING PCRE2 RESOURCE USAGE
|
|||
--with-heap-limit=500
|
||||
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
||||
only to interpretive matching in pcre2_match(). It does not apply when
|
||||
JIT (which has its own memory arrangements) is used, nor does it apply
|
||||
to pcre2_dfa_match().
|
||||
only to interpretive matching in pcre2_match() and pcre2_dfa_match(),
|
||||
which may also use the heap for internal workspace when processing com-
|
||||
plicated patterns. This limit does not apply when JIT (which has its
|
||||
own memory arrangements) is used.
|
||||
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
pcre2_match() interpreter. This limit defaults to the value that is set
|
||||
|
@ -4030,7 +4038,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 25 February 2018
|
||||
Last updated: 26 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -4311,10 +4319,12 @@ THE CALLOUT INTERFACE
|
|||
their ovector slots set to PCRE2_UNSET.
|
||||
|
||||
For DFA matching, the offset_vector field points to the ovector that
|
||||
was passed to the matching function in the match data block, but it
|
||||
holds no useful information at callout time because pcre2_dfa_match()
|
||||
does not support substring capturing. The value of capture_top is
|
||||
always 1 and the value of capture_last is always 0 for DFA matching.
|
||||
was passed to the matching function in the match data block for call-
|
||||
outs at the top level, but to an internal ovector during the processing
|
||||
of pattern recursions, lookarounds, and atomic groups. However, these
|
||||
ovectors hold no useful information because pcre2_dfa_match() does not
|
||||
support substring capturing. The value of capture_top is always 1 and
|
||||
the value of capture_last is always 0 for DFA matching.
|
||||
|
||||
The subject and subject_length fields contain copies of the values that
|
||||
were passed to the matching function.
|
||||
|
@ -4454,8 +4464,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 22 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 26 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -5919,19 +5929,19 @@ SPECIAL START-OF-PATTERN ITEMS
|
|||
pcre2_match() for it to have any effect. In other words, the pattern
|
||||
writer can lower the limits set by the programmer, but not raise them.
|
||||
If there is more than one setting of one of these limits, the lower
|
||||
value is used.
|
||||
value is used. The heap limit is specified in kilobytes.
|
||||
|
||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
|
||||
name is still recognized for backwards compatibility.
|
||||
|
||||
The heap limit applies only when the pcre2_match() interpreter is used
|
||||
for matching. It does not apply to JIT or DFA matching. The match limit
|
||||
is used (but in a different way) when JIT is being used, or when
|
||||
The heap limit applies only when the pcre2_match() or pcre2_dfa_match()
|
||||
interpreters are used for matching. It does not apply to JIT. The match
|
||||
limit is used (but in a different way) when JIT is being used, or when
|
||||
pcre2_dfa_match() is called, to limit computing resource usage by those
|
||||
matching functions. The depth limit is ignored by JIT but is relevant
|
||||
for DFA matching, which uses function recursion for recursions within
|
||||
the pattern. In this case, the depth limit controls the amount of sys-
|
||||
tem stack that is used.
|
||||
the pattern and for lookaround assertions and atomic groups. In this
|
||||
case, the depth limit controls the depth of such recursion.
|
||||
|
||||
Newline conventions
|
||||
|
||||
|
@ -8260,11 +8270,6 @@ RECURSIVE PATTERNS
|
|||
unset, even if it was (temporarily) set at a deeper level during the
|
||||
matching process.
|
||||
|
||||
If there are more than 15 capturing parentheses in a pattern, PCRE2 has
|
||||
to obtain extra memory from the heap to store data during a recursion.
|
||||
If no memory can be obtained, the match fails with the
|
||||
PCRE2_ERROR_NOMEMORY error.
|
||||
|
||||
Do not confuse the (?R) item with the condition (R), which tests for
|
||||
recursion. Consider this pattern, which matches text in angle brack-
|
||||
ets, allowing for arbitrary nesting. Only digits are allowed in nested
|
||||
|
@ -8887,8 +8892,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 12 September 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -8973,9 +8978,17 @@ STACK AND HEAP USAGE AT RUN TIME
|
|||
|
||||
In contrast to pcre2_match(), pcre2_dfa_match() does use recursive
|
||||
function calls, but only for processing atomic groups, lookaround
|
||||
assertions, and recursion within the pattern. Too much nested recursion
|
||||
may cause stack issues. The "match depth" parameter can be used to
|
||||
limit the depth of function recursion in pcre2_dfa_match().
|
||||
assertions, and recursion within the pattern. The original version of
|
||||
the code used to allocate quite large internal workspace vectors on the
|
||||
stack, which caused some problems for some patterns in environments
|
||||
with small stacks. From release 10.32 the code for pcre2_dfa_match()
|
||||
has been re-factored to use heap memory when necessary for internal
|
||||
workspace when recursing, though recursive function calls are still
|
||||
used.
|
||||
|
||||
The "match depth" parameter can be used to limit the depth of function
|
||||
recursion, and the "match heap" parameter to limit heap memory in
|
||||
pcre2_dfa_match().
|
||||
|
||||
|
||||
PROCESSING TIME
|
||||
|
@ -9115,8 +9128,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 08 April 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
|
||||
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -34,9 +34,9 @@ just once (except when processing lookaround assertions). This function is
|
|||
\fIwscount\fP Number of elements in the vector
|
||||
.sp
|
||||
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
||||
up a callout function or specify the match and/or the recursion depth limits.
|
||||
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
|
||||
The options are:
|
||||
up a callout function or specify the heap limit or the match or the recursion
|
||||
depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
|
||||
characters. The options are:
|
||||
.sp
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "31 December 2017" "PCRE2 10.31"
|
||||
.TH PCRE2API 3 "27 April 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -887,16 +887,17 @@ offset limit. In other words, whichever limit comes first is used.
|
|||
.sp
|
||||
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
||||
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
||||
information when running an interpretive match. This limit does not apply to
|
||||
matching with the JIT optimization, which has its own memory control
|
||||
arrangements (see the
|
||||
information when running an interpretive match. This limit also applies to
|
||||
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
|
||||
lot of nested pattern recursion or lookarounds or atomic groups. This limit
|
||||
does not apply to matching with the JIT optimization, which has its own memory
|
||||
control arrangements (see the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
|
||||
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
||||
returned. The default limit is set when PCRE2 is built; the default default is
|
||||
very large and is essentially "unlimited".
|
||||
documentation for more details). If the limit is reached, the negative error
|
||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
|
||||
built; the default default is very large and is essentially "unlimited".
|
||||
.P
|
||||
A value for the heap limit may also be supplied by an item at the start of a
|
||||
pattern of the form
|
||||
|
@ -914,6 +915,11 @@ Heap memory is used only if the initial vector is too small. If the heap limit
|
|||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
||||
can be successfully processed.
|
||||
.P
|
||||
Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used
|
||||
when processing pattern recursions, lookarounds, or atomic groups, and only if
|
||||
this is not big enough is heap memory used. In this case, too, setting a value
|
||||
of zero disables the use of the heap.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||
|
@ -967,11 +973,21 @@ backtracking.
|
|||
.P
|
||||
The depth limit is not relevant, and is ignored, when matching is done using
|
||||
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
|
||||
uses it to limit the depth of internal recursive function calls that implement
|
||||
atomic groups, lookaround assertions, and pattern recursions. This is,
|
||||
therefore, an indirect limit on the amount of system stack that is used. A
|
||||
recursive pattern such as /(.)(?1)/, when matched to a very long string using
|
||||
\fBpcre2_dfa_match()\fP, can use a great deal of stack.
|
||||
uses it to limit the depth of nested internal recursive function calls that
|
||||
implement atomic groups, lookaround assertions, and pattern recursions. This
|
||||
limits, indirectly, the amount of system stack this is used. It was more useful
|
||||
in versions before 10.32, when stack memory was used for local workspace
|
||||
vectors for recursive function calls. From version 10.32, only local variables
|
||||
are allocated on the stack and as each call uses only a few hundred bytes, even
|
||||
a small stack can support quite a lot of recursion.
|
||||
.P
|
||||
If the depth of internal recursive function calls is great enough, local
|
||||
workspace vectors are allocated on the heap from version 10.32 onwards, so the
|
||||
depth limit also indirectly limits the amount of heap memory that is used. A
|
||||
recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
|
||||
using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
|
||||
probably better to limit heap usage directly by calling
|
||||
\fBpcre2_set_heap_limit()\fP.
|
||||
.P
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
|
@ -1028,15 +1044,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
|
|||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
|
||||
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
|
||||
\fBpcre2_set_depth_limit()\fP above.
|
||||
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions,
|
||||
lookarounds, and atomic groups in \fBpcre2_dfa_match()\fP. Further details are
|
||||
given with \fBpcre2_set_depth_limit()\fP above.
|
||||
.sp
|
||||
PCRE2_CONFIG_HEAPLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives, in kilobytes, the default limit
|
||||
for the amount of heap memory used by \fBpcre2_match()\fP. Further details are
|
||||
given with \fBpcre2_set_heap_limit()\fP above.
|
||||
for the amount of heap memory used by \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP. Further details are given with
|
||||
\fBpcre2_set_heap_limit()\fP above.
|
||||
.sp
|
||||
PCRE2_CONFIG_JIT
|
||||
.sp
|
||||
|
@ -3514,17 +3531,7 @@ capture.
|
|||
Calls to the convenience functions that extract substrings by name
|
||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
||||
DFA match. The convenience functions that extract substrings by number never
|
||||
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
|
||||
slightly different:
|
||||
.sp
|
||||
PCRE2_ERROR_UNAVAILABLE
|
||||
.sp
|
||||
The ovector is not big enough to include a slot for the given substring number.
|
||||
.sp
|
||||
PCRE2_ERROR_UNSET
|
||||
.sp
|
||||
There is a slot in the ovector for this substring, but there were insufficient
|
||||
matches to fill it.
|
||||
return PCRE2_ERROR_NOSUBSTRING.
|
||||
.P
|
||||
The matched strings are stored in the ovector in reverse order of length; that
|
||||
is, the longest matching string is first. If there were too many matches to fit
|
||||
|
@ -3605,6 +3612,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 31 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 27 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2BUILD 3 "25 February 2018" "PCRE2 10.32"
|
||||
.TH PCRE2BUILD 3 "26 April 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.
|
||||
|
@ -292,9 +292,10 @@ change this by a setting such as
|
|||
--with-heap-limit=500
|
||||
.sp
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies only to
|
||||
interpretive matching in pcre2_match(). It does not apply when JIT (which has
|
||||
its own memory arrangements) is used, nor does it apply to
|
||||
\fBpcre2_dfa_match()\fP.
|
||||
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
|
||||
may also use the heap for internal workspace when processing complicated
|
||||
patterns. This limit does not apply when JIT (which has its own memory
|
||||
arrangements) is used.
|
||||
.P
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
\fBpcre2_match()\fP interpreter. This limit defaults to the value that is set
|
||||
|
@ -590,6 +591,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 25 February 2018
|
||||
Last updated: 26 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
|
||||
.TH PCRE2CALLOUT 3 "26 April 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -291,10 +291,12 @@ than \fIcapture_top\fP also have both of their ovector slots set to
|
|||
PCRE2_UNSET.
|
||||
.P
|
||||
For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
|
||||
passed to the matching function in the match data block, but it holds no useful
|
||||
information at callout time because \fBpcre2_dfa_match()\fP does not support
|
||||
substring capturing. The value of \fIcapture_top\fP is always 1 and the value
|
||||
of \fIcapture_last\fP is always 0 for DFA matching.
|
||||
passed to the matching function in the match data block for callouts at the top
|
||||
level, but to an internal ovector during the processing of pattern recursions,
|
||||
lookarounds, and atomic groups. However, these ovectors hold no useful
|
||||
information because \fBpcre2_dfa_match()\fP does not support substring
|
||||
capturing. The value of \fIcapture_top\fP is always 1 and the value of
|
||||
\fIcapture_last\fP is always 0 for DFA matching.
|
||||
.P
|
||||
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
|
||||
that were passed to the matching function.
|
||||
|
@ -441,6 +443,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 22 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 26 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "12 September 2017" "PCRE2 10.31"
|
||||
.TH PCRE2PATTERN 3 "25 April 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -141,12 +141,12 @@ the application to apply the JIT optimization by calling
|
|||
.SS "Setting match resource limits"
|
||||
.rs
|
||||
.sp
|
||||
The pcre2_match() function contains a counter that is incremented every time it
|
||||
goes round its main loop. The caller of \fBpcre2_match()\fP can set a limit on
|
||||
this counter, which therefore limits the amount of computing resource used for
|
||||
a match. The maximum depth of nested backtracking can also be limited; this
|
||||
indirectly restricts the amount of heap memory that is used, but there is also
|
||||
an explicit memory limit that can be set.
|
||||
The \fBpcre2_match()\fP function contains a counter that is incremented every
|
||||
time it goes round its main loop. The caller of \fBpcre2_match()\fP can set a
|
||||
limit on this counter, which therefore limits the amount of computing resource
|
||||
used for a match. The maximum depth of nested backtracking can also be limited;
|
||||
this indirectly restricts the amount of heap memory that is used, but there is
|
||||
also an explicit memory limit that can be set.
|
||||
.P
|
||||
These facilities are provided to catch runaway matches that are provoked by
|
||||
patterns with huge matching trees (a typical example is a pattern with nested
|
||||
|
@ -162,18 +162,20 @@ where d is any number of decimal digits. However, the value of the setting must
|
|||
be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
||||
for it to have any effect. In other words, the pattern writer can lower the
|
||||
limits set by the programmer, but not raise them. If there is more than one
|
||||
setting of one of these limits, the lower value is used.
|
||||
setting of one of these limits, the lower value is used. The heap limit is
|
||||
specified in kilobytes.
|
||||
.P
|
||||
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
|
||||
still recognized for backwards compatibility.
|
||||
.P
|
||||
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
|
||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
||||
(but in a different way) when JIT is being used, or when
|
||||
\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
|
||||
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
||||
matching, which uses function recursion for recursions within the pattern. In
|
||||
this case, the depth limit controls the amount of system stack that is used.
|
||||
The heap limit applies only when the \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP interpreters are used for matching. It does not apply
|
||||
to JIT. The match limit is used (but in a different way) when JIT is being
|
||||
used, or when \fBpcre2_dfa_match()\fP is called, to limit computing resource
|
||||
usage by those matching functions. The depth limit is ignored by JIT but is
|
||||
relevant for DFA matching, which uses function recursion for recursions within
|
||||
the pattern and for lookaround assertions and atomic groups. In this case, the
|
||||
depth limit controls the depth of such recursion.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="newlines"></a>
|
||||
|
@ -2838,10 +2840,6 @@ the last value taken on at the top level. If a capturing subpattern is not
|
|||
matched at the top level, its final captured value is unset, even if it was
|
||||
(temporarily) set at a deeper level during the matching process.
|
||||
.P
|
||||
If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
|
||||
obtain extra memory from the heap to store data during a recursion. If no
|
||||
memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
|
||||
.P
|
||||
Do not confuse the (?R) item with the condition (R), which tests for recursion.
|
||||
Consider this pattern, which matches text in angle brackets, allowing for
|
||||
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
|
||||
|
@ -3505,6 +3503,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 12 September 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PERFORM 3 "08 April 2017" "PCRE2 10.30"
|
||||
.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 PERFORMANCE"
|
||||
|
@ -78,9 +78,16 @@ may also reduce the memory requirements.
|
|||
.P
|
||||
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
|
||||
function calls, but only for processing atomic groups, lookaround assertions,
|
||||
and recursion within the pattern. Too much nested recursion may cause stack
|
||||
issues. The "match depth" parameter can be used to limit the depth of function
|
||||
recursion in \fBpcre2_dfa_match()\fP.
|
||||
and recursion within the pattern. The original version of the code used to
|
||||
allocate quite large internal workspace vectors on the stack, which caused some
|
||||
problems for some patterns in environments with small stacks. From release
|
||||
10.32 the code for \fBpcre2_dfa_match()\fP has been re-factored to use heap
|
||||
memory when necessary for internal workspace when recursing, though recursive
|
||||
function calls are still used.
|
||||
.P
|
||||
The "match depth" parameter can be used to limit the depth of function
|
||||
recursion, and the "match heap" parameter to limit heap memory in
|
||||
\fBpcre2_dfa_match()\fP.
|
||||
.
|
||||
.
|
||||
.SH "PROCESSING TIME"
|
||||
|
@ -232,6 +239,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 08 April 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
|
||||
.TH PCRE2TEST 1 "25 April 2018" "PCRE 10.32"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -1168,7 +1168,7 @@ pattern.
|
|||
get=<number or name> extract captured substring
|
||||
getall extract all captured substrings
|
||||
/g global global matching
|
||||
heap_limit=<n> set a limit on heap memory
|
||||
heap_limit=<n> set a limit on heap memory (Kbytes)
|
||||
jitstack=<n> set size of JIT stack
|
||||
mark show mark values
|
||||
match_limit=<n> set a match limit
|
||||
|
@ -1401,24 +1401,36 @@ the appropriate limits in the match context. These values are ignored when the
|
|||
.sp
|
||||
If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
|
||||
calls the relevant matching function several times, setting different values in
|
||||
the match context via \fBpcre2_set_heap_limit(), \fBpcre2_set_match_limit()\fP,
|
||||
or \fBpcre2_set_depth_limit()\fP until it finds the minimum values for each
|
||||
parameter that allows the match to complete without error.
|
||||
the match context via \fBpcre2_set_heap_limit()\fP,
|
||||
\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
|
||||
the minimum values for each parameter that allows the match to complete without
|
||||
error. If JIT is being used, only the match limit is relevant.
|
||||
.P
|
||||
If JIT is being used, only the match limit is relevant. If DFA matching is
|
||||
being used, only the depth limit is relevant.
|
||||
.P
|
||||
The \fImatch_limit\fP number is a measure of the amount of backtracking
|
||||
that takes place, and learning the minimum value can be instructive. For most
|
||||
simple matches, the number is quite small, but for patterns with very large
|
||||
numbers of matching possibilities, it can become large very quickly with
|
||||
increasing length of subject string.
|
||||
When using this modifier, the pattern should not contain any limit settings
|
||||
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
|
||||
lower than the minimum matching value, the minimum value cannot be found
|
||||
because \fBpcre2_set_match_limit()\fP etc. are only able to reduce the value of
|
||||
an in-pattern limit; they cannot increase it.
|
||||
.P
|
||||
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
|
||||
much nested backtracking happens (that is, how deeply the pattern's tree is
|
||||
searched). In the case of DFA matching, \fIdepth_limit\fP controls the depth of
|
||||
recursive calls of the internal function that is used for handling pattern
|
||||
recursion, lookaround assertions, and atomic groups.
|
||||
.P
|
||||
For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount
|
||||
of backtracking that takes place, and learning the minimum value can be
|
||||
instructive. For most simple matches, the number is quite small, but for
|
||||
patterns with very large numbers of matching possibilities, it can become large
|
||||
very quickly with increasing length of subject string. In the case of DFA
|
||||
matching, \fImatch_limit\fP controls the total number of calls, both recursive
|
||||
and non-recursive, to the internal matching function, thus controlling the
|
||||
overall amount of computing resource that is used.
|
||||
.P
|
||||
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes)
|
||||
limits the amount of heap memory used for matching. A value of zero disables
|
||||
the use of any heap memory; many simple pattern matches can be done without
|
||||
using the heap, so this is not an unreasonable setting.
|
||||
.
|
||||
.
|
||||
.SS "Showing MARK names"
|
||||
|
@ -1437,13 +1449,14 @@ is added to the non-match message.
|
|||
.sp
|
||||
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
|
||||
memory allocation and freeing calls that occur during a call to
|
||||
\fBpcre2_match()\fP. These occur only when a match requires a bigger vector
|
||||
than the default for remembering backtracking points. In many cases there will
|
||||
be no heap memory used and therefore no additional output. No heap memory is
|
||||
allocated during matching with \fBpcre2_dfa_match\fP or with JIT, so in those
|
||||
cases the \fBmemory\fP modifier never has any effect. For this modifier to
|
||||
work, the \fBnull_context\fP modifier must not be set on both the pattern and
|
||||
the subject, though it can be set on one or the other.
|
||||
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match
|
||||
requires a bigger vector than the default for remembering backtracking points
|
||||
(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In
|
||||
many cases there will be no heap memory used and therefore no additional
|
||||
output. No heap memory is allocated during matching with JIT, so in that case
|
||||
the \fBmemory\fP modifier never has any effect. For this modifier to work, the
|
||||
\fBnull_context\fP modifier must not be set on both the pattern and the
|
||||
subject, though it can be set on one or the other.
|
||||
.
|
||||
.
|
||||
.SS "Setting a starting offset"
|
||||
|
@ -1962,6 +1975,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1071,7 +1071,7 @@ SUBJECT MODIFIERS
|
|||
get=<number or name> extract captured substring
|
||||
getall extract all captured substrings
|
||||
/g global global matching
|
||||
heap_limit=<n> set a limit on heap memory
|
||||
heap_limit=<n> set a limit on heap memory (Kbytes)
|
||||
jitstack=<n> set size of JIT stack
|
||||
mark show mark values
|
||||
match_limit=<n> set a match limit
|
||||
|
@ -1291,16 +1291,13 @@ SUBJECT MODIFIERS
|
|||
values in the match context via pcre2_set_heap_limit(),
|
||||
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
||||
minimum values for each parameter that allows the match to complete
|
||||
without error.
|
||||
without error. If JIT is being used, only the match limit is relevant.
|
||||
|
||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
||||
is being used, only the depth limit is relevant.
|
||||
|
||||
The match_limit number is a measure of the amount of backtracking that
|
||||
takes place, and learning the minimum value can be instructive. For
|
||||
most simple matches, the number is quite small, but for patterns with
|
||||
very large numbers of matching possibilities, it can become large very
|
||||
quickly with increasing length of subject string.
|
||||
When using this modifier, the pattern should not contain any limit set-
|
||||
tings such as (*LIMIT_MATCH=...) within it. If such a setting is
|
||||
present and is lower than the minimum matching value, the minimum value
|
||||
cannot be found because pcre2_set_match_limit() etc. are only able to
|
||||
reduce the value of an in-pattern limit; they cannot increase it.
|
||||
|
||||
For non-DFA matching, the minimum depth_limit number is a measure of
|
||||
how much nested backtracking happens (that is, how deeply the pattern's
|
||||
|
@ -1308,6 +1305,22 @@ SUBJECT MODIFIERS
|
|||
the depth of recursive calls of the internal function that is used for
|
||||
handling pattern recursion, lookaround assertions, and atomic groups.
|
||||
|
||||
For non-DFA matching, the match_limit number is a measure of the amount
|
||||
of backtracking that takes place, and learning the minimum value can be
|
||||
instructive. For most simple matches, the number is quite small, but
|
||||
for patterns with very large numbers of matching possibilities, it can
|
||||
become large very quickly with increasing length of subject string. In
|
||||
the case of DFA matching, match_limit controls the total number of
|
||||
calls, both recursive and non-recursive, to the internal matching func-
|
||||
tion, thus controlling the overall amount of computing resource that is
|
||||
used.
|
||||
|
||||
For both kinds of matching, the heap_limit number (which is in kilo-
|
||||
bytes) limits the amount of heap memory used for matching. A value of
|
||||
zero disables the use of any heap memory; many simple pattern matches
|
||||
can be done without using the heap, so this is not an unreasonable set-
|
||||
ting.
|
||||
|
||||
Showing MARK names
|
||||
|
||||
|
||||
|
@ -1321,14 +1334,14 @@ SUBJECT MODIFIERS
|
|||
|
||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||
ory allocation and freeing calls that occur during a call to
|
||||
pcre2_match(). These occur only when a match requires a bigger vector
|
||||
than the default for remembering backtracking points. In many cases
|
||||
there will be no heap memory used and therefore no additional output.
|
||||
No heap memory is allocated during matching with pcre2_dfa_match or
|
||||
with JIT, so in those cases the memory modifier never has any effect.
|
||||
For this modifier to work, the null_context modifier must not be set on
|
||||
both the pattern and the subject, though it can be set on one or the
|
||||
other.
|
||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match
|
||||
requires a bigger vector than the default for remembering backtracking
|
||||
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||
In many cases there will be no heap memory used and therefore no addi-
|
||||
tional output. No heap memory is allocated during matching with JIT, so
|
||||
in that case the memory modifier never has any effect. For this modi-
|
||||
fier to work, the null_context modifier must not be set on both the
|
||||
pattern and the subject, though it can be set on one or the other.
|
||||
|
||||
Setting a starting offset
|
||||
|
||||
|
@ -1799,5 +1812,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 21 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
|
|
|
@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
/* Define to 1 if you have the <zlib.h> header file. */
|
||||
#undef HAVE_ZLIB_H
|
||||
|
||||
/* This limits the amount of memory that pcre2_match() may use while matching
|
||||
a pattern. The value is in kilobytes. */
|
||||
/* This limits the amount of memory that may be used while matching a pattern.
|
||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||
to JIT matching. The value is in kilobytes. */
|
||||
#undef HEAP_LIMIT
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||
|
@ -148,7 +149,8 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the
|
||||
pcre2_match() function can record a backtrack position during a single
|
||||
matching attempt. There is a runtime interface for setting a different
|
||||
matching attempt. The value is also used to limit a loop counter in
|
||||
pcre2_dfa_match(). There is a runtime interface for setting a different
|
||||
limit. The limit exists in order to catch runaway regular expressions that
|
||||
take for ever to determine that they do not match. The default is set very
|
||||
large so that it does not accidentally catch legitimate cases. */
|
||||
|
@ -161,7 +163,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
|
||||
must be less than the value of MATCH_LIMIT. The default is to use the same
|
||||
value as MATCH_LIMIT. There is a runtime method for setting a different
|
||||
limit. */
|
||||
limit. In the case of pcre2_dfa_match(), this limit controls the depth of
|
||||
the internal nested function calls that are used for pattern recursions,
|
||||
lookarounds, and atomic groups. */
|
||||
#undef MATCH_LIMIT_DEPTH
|
||||
|
||||
/* This limit is parameterized just in case anybody ever wants to change it.
|
||||
|
|
|
@ -292,6 +292,35 @@ typedef struct stateblock {
|
|||
#define INTS_PER_STATEBLOCK (int)(sizeof(stateblock)/sizeof(int))
|
||||
|
||||
|
||||
/* Before version 10.32 the recursive calls of internal_dfa_match() were passed
|
||||
local working space and output vectors that were created on the stack. This has
|
||||
caused issues for some patterns, especially in small-stack environments such as
|
||||
Windows. A new scheme is now in use which sets up a vector on the stack, but if
|
||||
this is too small, heap memory is used, up to the heap_limit. The main
|
||||
parameters are all numbers of ints because the workspace is a vector of ints.
|
||||
|
||||
The size of the starting stack vector, DFA_START_RWS_SIZE, is in bytes, and is
|
||||
defined in pcre2_internal.h so as to be available to pcre2test when it is
|
||||
finding the minimum heap requirement for a match. */
|
||||
|
||||
#define OVEC_UNIT (sizeof(PCRE2_SIZE)/sizeof(int))
|
||||
|
||||
#define RWS_BASE_SIZE (DFA_START_RWS_SIZE/sizeof(int)) /* Stack vector */
|
||||
#define RWS_RSIZE 1000 /* Work size for recursion */
|
||||
#define RWS_OVEC_RSIZE (1000*OVEC_UNIT) /* Ovector for recursion */
|
||||
#define RWS_OVEC_OSIZE (2*OVEC_UNIT) /* Ovector in other cases */
|
||||
|
||||
/* This structure is at the start of each workspace block. */
|
||||
|
||||
typedef struct RWS_anchor {
|
||||
struct RWS_anchor *next;
|
||||
unsigned int size; /* Number of ints */
|
||||
unsigned int free; /* Number of ints */
|
||||
} RWS_anchor;
|
||||
|
||||
#define RWS_ANCHOR_SIZE (sizeof(RWS_anchor)/sizeof(int))
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Process a callout *
|
||||
|
@ -353,6 +382,61 @@ return (mb->callout)(cb, mb->callout_data);
|
|||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Expand local workspace memory *
|
||||
*************************************************/
|
||||
|
||||
/* This function is called when internal_dfa_match() is about to be called
|
||||
recursively and there is insufficient workingspace left in the current work
|
||||
space block. If there's an existing next block, use it; otherwise get a new
|
||||
block unless the heap limit is reached.
|
||||
|
||||
Arguments:
|
||||
rwsptr pointer to block pointer (updated)
|
||||
ovecsize space needed for an ovector
|
||||
mb the match block
|
||||
|
||||
Returns: 0 rwsptr has been updated
|
||||
!0 an error code
|
||||
*/
|
||||
|
||||
static int
|
||||
more_workspace(RWS_anchor **rwsptr, unsigned int ovecsize, dfa_match_block *mb)
|
||||
{
|
||||
RWS_anchor *rws = *rwsptr;
|
||||
RWS_anchor *new;
|
||||
|
||||
if (rws->next != NULL)
|
||||
{
|
||||
new = rws->next;
|
||||
}
|
||||
|
||||
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
||||
kilobytes. */
|
||||
|
||||
else
|
||||
{
|
||||
unsigned int newsize = rws->size * 2;
|
||||
unsigned int heapleft = (unsigned int)
|
||||
(((1024/sizeof(int))*mb->heap_limit - mb->heap_used));
|
||||
if (newsize > heapleft) newsize = heapleft;
|
||||
if (newsize < RWS_RSIZE + ovecsize + RWS_ANCHOR_SIZE)
|
||||
return PCRE2_ERROR_HEAPLIMIT;
|
||||
new = mb->memctl.malloc(newsize*sizeof(int), mb->memctl.memory_data);
|
||||
if (new == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
mb->heap_used += newsize;
|
||||
new->next = NULL;
|
||||
new->size = newsize;
|
||||
rws->next = new;
|
||||
}
|
||||
|
||||
new->free = new->size - RWS_ANCHOR_SIZE;
|
||||
*rwsptr = new;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Match a Regular Expression - DFA engine *
|
||||
*************************************************/
|
||||
|
@ -431,7 +515,8 @@ internal_dfa_match(
|
|||
uint32_t offsetcount,
|
||||
int *workspace,
|
||||
int wscount,
|
||||
uint32_t rlevel)
|
||||
uint32_t rlevel,
|
||||
int *RWS)
|
||||
{
|
||||
stateblock *active_states, *new_states, *temp_states;
|
||||
stateblock *next_active_state, *next_new_state;
|
||||
|
@ -2587,10 +2672,22 @@ for (;;)
|
|||
case OP_ASSERTBACK:
|
||||
case OP_ASSERTBACK_NOT:
|
||||
{
|
||||
PCRE2_SPTR endasscode = code + GET(code, 1);
|
||||
PCRE2_SIZE local_offsets[2];
|
||||
int rc;
|
||||
int local_workspace[1000];
|
||||
int *local_workspace;
|
||||
PCRE2_SIZE *local_offsets;
|
||||
PCRE2_SPTR endasscode = code + GET(code, 1);
|
||||
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||
|
||||
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||
{
|
||||
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||
if (rc != 0) return rc;
|
||||
RWS = (int *)rws;
|
||||
}
|
||||
|
||||
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
|
||||
|
||||
|
@ -2600,10 +2697,13 @@ for (;;)
|
|||
ptr, /* where we currently are */
|
||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||
local_offsets, /* offset vector */
|
||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
||||
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||
local_workspace, /* workspace vector */
|
||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
RWS_RSIZE, /* size of same */
|
||||
rlevel, /* function recursion level */
|
||||
RWS); /* recursion workspace */
|
||||
|
||||
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
|
||||
|
@ -2670,11 +2770,23 @@ for (;;)
|
|||
|
||||
else
|
||||
{
|
||||
PCRE2_SIZE local_offsets[2];
|
||||
int local_workspace[1000];
|
||||
int rc;
|
||||
int *local_workspace;
|
||||
PCRE2_SIZE *local_offsets;
|
||||
PCRE2_SPTR asscode = code + LINK_SIZE + 1;
|
||||
PCRE2_SPTR endasscode = asscode + GET(asscode, 1);
|
||||
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||
|
||||
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||
{
|
||||
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||
if (rc != 0) return rc;
|
||||
RWS = (int *)rws;
|
||||
}
|
||||
|
||||
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
|
||||
|
||||
|
@ -2684,10 +2796,13 @@ for (;;)
|
|||
ptr, /* where we currently are */
|
||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||
local_offsets, /* offset vector */
|
||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
||||
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||
local_workspace, /* workspace vector */
|
||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
RWS_RSIZE, /* size of same */
|
||||
rlevel, /* function recursion level */
|
||||
RWS); /* recursion work space */
|
||||
|
||||
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||
if ((rc >= 0) ==
|
||||
|
@ -2702,13 +2817,25 @@ for (;;)
|
|||
/*-----------------------------------------------------------------*/
|
||||
case OP_RECURSE:
|
||||
{
|
||||
int rc;
|
||||
int *local_workspace;
|
||||
PCRE2_SIZE *local_offsets;
|
||||
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||
dfa_recursion_info *ri;
|
||||
PCRE2_SIZE local_offsets[1000];
|
||||
int local_workspace[1000];
|
||||
PCRE2_SPTR callpat = start_code + GET(code, 1);
|
||||
uint32_t recno = (callpat == mb->start_code)? 0 :
|
||||
GET2(callpat, 1 + LINK_SIZE);
|
||||
int rc;
|
||||
|
||||
if (rws->free < RWS_RSIZE + RWS_OVEC_RSIZE)
|
||||
{
|
||||
rc = more_workspace(&rws, RWS_OVEC_RSIZE, mb);
|
||||
if (rc != 0) return rc;
|
||||
RWS = (int *)rws;
|
||||
}
|
||||
|
||||
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||
local_workspace = ((int *)local_offsets) + RWS_OVEC_RSIZE;
|
||||
rws->free -= RWS_RSIZE + RWS_OVEC_RSIZE;
|
||||
|
||||
/* Check for repeating a recursion without advancing the subject
|
||||
pointer. This should catch convoluted mutual recursions. (Some simple
|
||||
|
@ -2732,11 +2859,13 @@ for (;;)
|
|||
ptr, /* where we currently are */
|
||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||
local_offsets, /* offset vector */
|
||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
||||
RWS_OVEC_RSIZE/OVEC_UNIT, /* size of same */
|
||||
local_workspace, /* workspace vector */
|
||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
RWS_RSIZE, /* size of same */
|
||||
rlevel, /* function recursion level */
|
||||
RWS); /* recursion workspace */
|
||||
|
||||
rws->free += RWS_RSIZE + RWS_OVEC_RSIZE;
|
||||
mb->recursive = new_recursive.prevrec; /* Done this recursion */
|
||||
|
||||
/* Ran out of internal offsets */
|
||||
|
@ -2782,10 +2911,25 @@ for (;;)
|
|||
case OP_SCBRAPOS:
|
||||
case OP_BRAPOSZERO:
|
||||
{
|
||||
int rc;
|
||||
int *local_workspace;
|
||||
PCRE2_SIZE *local_offsets;
|
||||
PCRE2_SIZE charcount, matched_count;
|
||||
PCRE2_SPTR local_ptr = ptr;
|
||||
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||
BOOL allow_zero;
|
||||
|
||||
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||
{
|
||||
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||
if (rc != 0) return rc;
|
||||
RWS = (int *)rws;
|
||||
}
|
||||
|
||||
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
if (codevalue == OP_BRAPOSZERO)
|
||||
{
|
||||
allow_zero = TRUE;
|
||||
|
@ -2798,19 +2942,17 @@ for (;;)
|
|||
|
||||
for (matched_count = 0;; matched_count++)
|
||||
{
|
||||
PCRE2_SIZE local_offsets[2];
|
||||
int local_workspace[1000];
|
||||
|
||||
int rc = internal_dfa_match(
|
||||
rc = internal_dfa_match(
|
||||
mb, /* fixed match data */
|
||||
code, /* this subexpression's code */
|
||||
local_ptr, /* where we currently are */
|
||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||
local_offsets, /* offset vector */
|
||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
||||
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||
local_workspace, /* workspace vector */
|
||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
RWS_RSIZE, /* size of same */
|
||||
rlevel, /* function recursion level */
|
||||
RWS); /* recursion workspace */
|
||||
|
||||
/* Failed to match */
|
||||
|
||||
|
@ -2827,6 +2969,8 @@ for (;;)
|
|||
local_ptr += charcount; /* Advance temporary position ptr */
|
||||
}
|
||||
|
||||
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
/* At this point we have matched the subpattern matched_count
|
||||
times, and local_ptr is pointing to the character after the end of the
|
||||
last match. */
|
||||
|
@ -2869,19 +3013,35 @@ for (;;)
|
|||
/*-----------------------------------------------------------------*/
|
||||
case OP_ONCE:
|
||||
{
|
||||
PCRE2_SIZE local_offsets[2];
|
||||
int local_workspace[1000];
|
||||
int rc;
|
||||
int *local_workspace;
|
||||
PCRE2_SIZE *local_offsets;
|
||||
RWS_anchor *rws = (RWS_anchor *)RWS;
|
||||
|
||||
int rc = internal_dfa_match(
|
||||
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
|
||||
{
|
||||
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
|
||||
if (rc != 0) return rc;
|
||||
RWS = (int *)rws;
|
||||
}
|
||||
|
||||
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
|
||||
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
|
||||
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
rc = internal_dfa_match(
|
||||
mb, /* fixed match data */
|
||||
code, /* this subexpression's code */
|
||||
ptr, /* where we currently are */
|
||||
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
|
||||
local_offsets, /* offset vector */
|
||||
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
|
||||
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
|
||||
local_workspace, /* workspace vector */
|
||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
RWS_RSIZE, /* size of same */
|
||||
rlevel, /* function recursion level */
|
||||
RWS); /* recursion workspace */
|
||||
|
||||
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
|
||||
|
||||
if (rc >= 0)
|
||||
{
|
||||
|
@ -3063,6 +3223,7 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
|||
PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data,
|
||||
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
|
||||
{
|
||||
int rc;
|
||||
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
||||
|
||||
PCRE2_SPTR start_match;
|
||||
|
@ -3071,9 +3232,9 @@ PCRE2_SPTR bumpalong_limit;
|
|||
PCRE2_SPTR req_cu_ptr;
|
||||
|
||||
BOOL utf, anchored, startline, firstline;
|
||||
|
||||
BOOL has_first_cu = FALSE;
|
||||
BOOL has_req_cu = FALSE;
|
||||
|
||||
PCRE2_UCHAR first_cu = 0;
|
||||
PCRE2_UCHAR first_cu2 = 0;
|
||||
PCRE2_UCHAR req_cu = 0;
|
||||
|
@ -3088,6 +3249,17 @@ pcre2_callout_block cb;
|
|||
dfa_match_block actual_match_block;
|
||||
dfa_match_block *mb = &actual_match_block;
|
||||
|
||||
/* Set up a starting block of memory for use during recursive calls to
|
||||
internal_dfa_match(). By putting this on the stack, it minimizes resource use
|
||||
in the case when it is not needed. If this is too small, more memory is
|
||||
obtained from the heap. At the start of each block is an anchor structure.*/
|
||||
|
||||
int base_recursion_workspace[RWS_BASE_SIZE];
|
||||
RWS_anchor *rws = (RWS_anchor *)base_recursion_workspace;
|
||||
rws->next = NULL;
|
||||
rws->size = RWS_BASE_SIZE;
|
||||
rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
|
||||
|
||||
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
||||
subject string. */
|
||||
|
||||
|
@ -3184,6 +3356,7 @@ if (mcontext == NULL)
|
|||
mb->memctl = re->memctl;
|
||||
mb->match_limit = PRIV(default_match_context).match_limit;
|
||||
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
|
||||
mb->heap_limit = PRIV(default_match_context).heap_limit;
|
||||
}
|
||||
else
|
||||
{
|
||||
|
@ -3198,6 +3371,7 @@ else
|
|||
mb->memctl = mcontext->memctl;
|
||||
mb->match_limit = mcontext->match_limit;
|
||||
mb->match_limit_depth = mcontext->depth_limit;
|
||||
mb->heap_limit = mcontext->heap_limit;
|
||||
}
|
||||
|
||||
if (mb->match_limit > re->limit_match)
|
||||
|
@ -3206,6 +3380,9 @@ if (mb->match_limit > re->limit_match)
|
|||
if (mb->match_limit_depth > re->limit_depth)
|
||||
mb->match_limit_depth = re->limit_depth;
|
||||
|
||||
if (mb->heap_limit > re->limit_heap)
|
||||
mb->heap_limit = re->limit_heap;
|
||||
|
||||
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
|
||||
re->name_count * re->name_entry_size;
|
||||
mb->tables = re->tables;
|
||||
|
@ -3215,6 +3392,7 @@ mb->start_offset = start_offset;
|
|||
mb->moptions = options;
|
||||
mb->poptions = re->overall_options;
|
||||
mb->match_call_count = 0;
|
||||
mb->heap_used = 0;
|
||||
|
||||
/* Process the \R and newline settings. */
|
||||
|
||||
|
@ -3351,8 +3529,6 @@ a match. */
|
|||
|
||||
for (;;)
|
||||
{
|
||||
int rc;
|
||||
|
||||
/* ----------------- Start of match optimizations ---------------- */
|
||||
|
||||
/* There are some optimizations that avoid running the match if a known
|
||||
|
@ -3544,7 +3720,7 @@ for (;;)
|
|||
in characters, we treat it as code units to avoid spending too much time
|
||||
in this optimization. */
|
||||
|
||||
if (end_subject - start_match < re->minlength) return PCRE2_ERROR_NOMATCH;
|
||||
if (end_subject - start_match < re->minlength) goto NOMATCH_EXIT;
|
||||
|
||||
/* If req_cu is set, we know that that code unit must appear in the
|
||||
subject for the match to succeed. If the first code unit is set, req_cu
|
||||
|
@ -3621,7 +3797,8 @@ for (;;)
|
|||
(uint32_t)match_data->oveccount * 2, /* actual size of same */
|
||||
workspace, /* workspace vector */
|
||||
(int)wscount, /* size of same */
|
||||
0); /* function recurse level */
|
||||
0, /* function recurse level */
|
||||
base_recursion_workspace); /* initial workspace for recursion */
|
||||
|
||||
/* Anything other than "no match" means we are done, always; otherwise, carry
|
||||
on only if not anchored. */
|
||||
|
@ -3637,7 +3814,7 @@ for (;;)
|
|||
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
|
||||
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
|
||||
match_data->rc = rc;
|
||||
return rc;
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
/* Advance to the next subject character unless we are at the end of a line
|
||||
|
@ -3668,8 +3845,18 @@ for (;;)
|
|||
|
||||
} /* "Bumpalong" loop */
|
||||
|
||||
NOMATCH_EXIT:
|
||||
rc = PCRE2_ERROR_NOMATCH;
|
||||
|
||||
return PCRE2_ERROR_NOMATCH;
|
||||
EXIT:
|
||||
while (rws->next != NULL)
|
||||
{
|
||||
RWS_anchor *next = rws->next;
|
||||
rws->next = next->next;
|
||||
mb->memctl.free(next, mb->memctl.memory_data);
|
||||
}
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
/* End of pcre2_dfa_match.c */
|
||||
|
|
|
@ -253,6 +253,11 @@ maximum size of this can be limited. */
|
|||
|
||||
#define START_FRAMES_SIZE 20480
|
||||
|
||||
/* Similarly, for DFA matching, an initial internal workspace vector is
|
||||
allocated on the stack. */
|
||||
|
||||
#define DFA_START_RWS_SIZE 30720
|
||||
|
||||
/* Define the default BSR convention. */
|
||||
|
||||
#ifdef BSR_ANYCRLF
|
||||
|
|
|
@ -896,6 +896,8 @@ typedef struct dfa_match_block {
|
|||
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
||||
const uint8_t *tables; /* Character tables */
|
||||
PCRE2_SIZE start_offset; /* The start offset value */
|
||||
PCRE2_SIZE heap_limit; /* As it says */
|
||||
PCRE2_SIZE heap_used; /* As it says */
|
||||
uint32_t match_limit; /* As it says */
|
||||
uint32_t match_limit_depth; /* As it says */
|
||||
uint32_t match_call_count; /* Number of calls of internal function */
|
||||
|
|
|
@ -5760,6 +5760,8 @@ PCRE2_SET_HEAP_LIMIT(dat_context, max);
|
|||
|
||||
for (;;)
|
||||
{
|
||||
uint32_t stack_start = 0;
|
||||
|
||||
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
|
||||
{
|
||||
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
|
||||
|
@ -5775,6 +5777,7 @@ for (;;)
|
|||
|
||||
if ((dat_datctl.control & CTL_DFA) != 0)
|
||||
{
|
||||
stack_start = DFA_START_RWS_SIZE/1024;
|
||||
if (dfa_workspace == NULL)
|
||||
dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
|
||||
if (dfa_matched++ == 0)
|
||||
|
@ -5789,11 +5792,21 @@ for (;;)
|
|||
dat_datctl.options, match_data, PTR(dat_context));
|
||||
|
||||
else
|
||||
{
|
||||
stack_start = START_FRAMES_SIZE/1024;
|
||||
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
|
||||
dat_datctl.options, match_data, PTR(dat_context));
|
||||
}
|
||||
|
||||
if (capcount == errnumber)
|
||||
{
|
||||
if ((mid & 0x80000000u) != 0)
|
||||
{
|
||||
fprintf(outfile, "Can't find minimum %s limit: check pattern for "
|
||||
"restriction\n", msg);
|
||||
break;
|
||||
}
|
||||
|
||||
min = mid;
|
||||
mid = (mid == max - 1)? max : (max != UINT32_MAX)? (min + max)/2 : mid*2;
|
||||
}
|
||||
|
@ -5802,11 +5815,12 @@ for (;;)
|
|||
capcount == PCRE2_ERROR_PARTIAL)
|
||||
{
|
||||
/* If we've not hit the error with a heap limit less than the size of the
|
||||
initial stack frame vector, the heap is not being used, so the minimum
|
||||
limit is zero; there's no need to go on. The other limits are always
|
||||
greater than zero. */
|
||||
initial stack frame vector (for pcre2_match()) or the initial stack
|
||||
workspace vector (for pcre2_dfa_match()), the heap is not being used, so
|
||||
the minimum limit is zero; there's no need to go on. The other limits are
|
||||
always greater than zero. */
|
||||
|
||||
if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < START_FRAMES_SIZE/1024)
|
||||
if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < stack_start)
|
||||
{
|
||||
fprintf(outfile, "Minimum %s limit = 0\n", msg);
|
||||
break;
|
||||
|
@ -7139,18 +7153,16 @@ else for (gmatched = 0;; gmatched++)
|
|||
(double)CLOCKS_PER_SEC);
|
||||
}
|
||||
|
||||
/* Find the heap, match and depth limits if requested. The match and heap
|
||||
limits are not relevant for DFA matching and the depth and heap limits are
|
||||
not relevant for JIT. The return from check_match_limit() is the return from
|
||||
the final call to pcre2_match() or pcre2_dfa_match(). */
|
||||
/* Find the heap, match and depth limits if requested. The depth and heap
|
||||
limits are not relevant for JIT. The return from check_match_limit() is the
|
||||
return from the final call to pcre2_match() or pcre2_dfa_match(). */
|
||||
|
||||
if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
|
||||
{
|
||||
capcount = 0; /* This stops compiler warnings */
|
||||
|
||||
if ((dat_datctl.control & CTL_DFA) == 0 &&
|
||||
(FLD(compiled_code, executable_jit) == NULL ||
|
||||
(dat_datctl.options & PCRE2_NO_JIT) != 0))
|
||||
if (FLD(compiled_code, executable_jit) == NULL ||
|
||||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
|
||||
{
|
||||
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
||||
}
|
||||
|
@ -7165,6 +7177,12 @@ else for (gmatched = 0;; gmatched++)
|
|||
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
|
||||
"depth");
|
||||
}
|
||||
|
||||
if (capcount == 0)
|
||||
{
|
||||
fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n");
|
||||
capcount = dat_datctl.oveccount;
|
||||
}
|
||||
}
|
||||
|
||||
/* Otherwise just run a single match, setting up a callout if required (the
|
||||
|
|
|
@ -4874,6 +4874,14 @@
|
|||
\= Expect depth limit exceeded
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
|
||||
/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
|
||||
\= Expect heap limit exceeded
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
|
||||
/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
|
||||
\= Expect success
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
|
||||
/(02-)?[0-9]{3}-[0-9]{3}/
|
||||
02-123-123
|
||||
|
||||
|
|
|
@ -7667,12 +7667,23 @@ No match
|
|||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
Failed: error -53: matching depth limit exceeded
|
||||
|
||||
/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
|
||||
\= Expect heap limit exceeded
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
Failed: error -63: heap limit exceeded
|
||||
|
||||
/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
|
||||
\= Expect success
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
0: a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
|
||||
/(02-)?[0-9]{3}-[0-9]{3}/
|
||||
02-123-123
|
||||
0: 02-123-123
|
||||
|
||||
/^(a(?2))(b)(?1)/
|
||||
abbab\=find_limits
|
||||
Minimum heap limit = 0
|
||||
Minimum match limit = 4
|
||||
Minimum depth limit = 2
|
||||
0: abbab
|
||||
|
|
Loading…
Reference in New Issue