Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace

vectors when doing recursive function calls.
This commit is contained in:
Philip.Hazel 2018-04-27 16:48:35 +00:00
parent fb413521fc
commit 75747ebb11
28 changed files with 1221 additions and 871 deletions

View File

@ -50,7 +50,15 @@ offset is set zero for early errors.
(c) Support for non-C99 snprintf() that returns -1 in the overflow case.
11. Minor tidy of pcre2_dfa_matgch() code.
11. Minor tidy of pcre2_dfa_match() code.
12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
use the stack for local workspace and local ovectors. Instead, an initial block
of stack is reserved, but if this is insufficient, heap memory is used. The
heap limit parameter now applies to pcre2_dfa_match().
13. If a "find limits" test of DFA matching in pcre2test resulted in too many
matches for the ovector, no matches were displayed.
Version 10.31 12-February-2018

12
README
View File

@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory
that is used. This also has a default of ten million, which is essentially
"unlimited". You can change the default by setting, for example,
(pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
matching process, which indirectly limits the amount of heap memory that is
used, and in the case of pcre2_dfa_match() the amount of stack as well. This
counter also has a default of ten million, which is essentially "unlimited".
You can change the default by setting, for example,
--with-match-limit-depth=5000
@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
pcre2_set_depth_limit).
. You can also set an explicit limit on the amount of heap memory used by
the pcre2_match() interpreter:
the pcre2_match() and pcre2_dfa_match() interpreters:
--with-heap-limit=500
@ -885,4 +887,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 25 February 2018
Last updated: 27 April 2018

View File

@ -718,10 +718,11 @@ AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
The value of MATCH_LIMIT determines the default number of times the
pcre2_match() function can record a backtrack position during a single
matching attempt. There is a runtime interface for setting a different limit.
The limit exists in order to catch runaway regular expressions that take for
ever to determine that they do not match. The default is set very large so
that it does not accidentally catch legitimate cases.])
matching attempt. The value is also used to limit a loop counter in
pcre2_dfa_match(). There is a runtime interface for setting a different
limit. The limit exists in order to catch runaway regular expressions that
take for ever to determine that they do not match. The default is set very
large so that it does not accidentally catch legitimate cases.])
# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth
@ -745,11 +746,15 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
the maximum amount of heap memory that is used. The value of
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it must
be less than the value of MATCH_LIMIT. The default is to use the same value
as MATCH_LIMIT. There is a runtime method for setting a different limit.])
as MATCH_LIMIT. There is a runtime method for setting a different limit. In
the case of pcre2_dfa_match(), this limit controls the depth of the internal
nested function calls that are used for pattern recursions, lookarounds, and
atomic groups.])
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
This limits the amount of memory that pcre2_match() may use while matching
a pattern. The value is in kilobytes.])
This limits the amount of memory that may be used while matching
a pattern. It applies to both pcre2_match() and pcre2_dfa_match(). It does
not apply to JIT matching. The value is in kilobytes.])
AC_DEFINE([MAX_NAME_SIZE], [32], [
This limit is parameterized just in case anybody ever wants to

View File

@ -10,6 +10,7 @@ This document contains the following sections:
Calling conventions in Windows environments
Comments about Win32 builds
Building PCRE2 on Windows with CMake
Building PCRE2 on Windows with Visual Studio
Testing with RunTest.bat
Building PCRE2 on native z/OS and z/VM
@ -330,6 +331,18 @@ cache can be deleted by selecting "File > Delete Cache".
available for review in Testing\Temporary under your build dir.
BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO
The code currently cannot be compiled without a stdint.h header, which is
available only in relatively recent versions of Visual Studio. However, this
portable and permissively-licensed implementation of the header worked without
issue:
http://www.azillionmonkeys.com/qed/pstdint.h
Just rename it and drop it into the top level of the build tree.
TESTING WITH RUNTEST.BAT
If configured with CMake, building the test project ("make test" or building
@ -382,6 +395,6 @@ Everything in that location, source and executable, is in EBCDIC and native
z/OS file formats. The port provides an API for LE languages such as COBOL and
for the z/OS and z/VM versions of the Rexx languages.
===============================
Last Updated: 13 September 2017
===============================
===========================
Last Updated: 19 April 2018
===========================

View File

@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory
that is used. This also has a default of ten million, which is essentially
"unlimited". You can change the default by setting, for example,
(pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
matching process, which indirectly limits the amount of heap memory that is
used, and in the case of pcre2_dfa_match() the amount of stack as well. This
counter also has a default of ten million, which is essentially "unlimited".
You can change the default by setting, for example,
--with-match-limit-depth=5000
@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
pcre2_set_depth_limit).
. You can also set an explicit limit on the amount of heap memory used by
the pcre2_match() interpreter:
the pcre2_match() and pcre2_dfa_match() interpreters:
--with-heap-limit=500
@ -885,4 +887,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 25 February 2018
Last updated: 27 April 2018

View File

@ -46,9 +46,9 @@ just once (except when processing lookaround assertions). This function is
<i>wscount</i> Number of elements in the vector
</pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
up a callout function or specify the match and/or the recursion depth limits.
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
The options are:
up a callout function or specify the heap limit or the match or the recursion
depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
characters. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject

View File

@ -951,14 +951,15 @@ offset limit. In other words, whichever limit comes first is used.
<br>
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
information when running an interpretive match. This limit does not apply to
matching with the JIT optimization, which has its own memory control
arrangements (see the
information when running an interpretive match. This limit also applies to
<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
lot of nested pattern recursion or lookarounds or atomic groups. This limit
does not apply to matching with the JIT optimization, which has its own memory
control arrangements (see the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
returned. The default limit is set when PCRE2 is built; the default default is
very large and is essentially "unlimited".
documentation for more details). If the limit is reached, the negative error
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
built; the default default is very large and is essentially "unlimited".
</P>
<P>
A value for the heap limit may also be supplied by an item at the start of a
@ -978,6 +979,12 @@ Heap memory is used only if the initial vector is too small. If the heap limit
is set to a value less than 21 (in particular, zero) no heap memory will be
used. In this case, only patterns that do not have a lot of nested backtracking
can be successfully processed.
</P>
<P>
Similarly, for <b>pcre2_dfa_match()</b>, a vector on the system stack is used
when processing pattern recursions, lookarounds, or atomic groups, and only if
this is not big enough is heap memory used. In this case, too, setting a value
of zero disables the use of the heap.
<br>
<br>
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
@ -1035,11 +1042,22 @@ backtracking.
<P>
The depth limit is not relevant, and is ignored, when matching is done using
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
uses it to limit the depth of internal recursive function calls that implement
atomic groups, lookaround assertions, and pattern recursions. This is,
therefore, an indirect limit on the amount of system stack that is used. A
recursive pattern such as /(.)(?1)/, when matched to a very long string using
<b>pcre2_dfa_match()</b>, can use a great deal of stack.
uses it to limit the depth of nested internal recursive function calls that
implement atomic groups, lookaround assertions, and pattern recursions. This
limits, indirectly, the amount of system stack this is used. It was more useful
in versions before 10.32, when stack memory was used for local workspace
vectors for recursive function calls. From version 10.32, only local variables
are allocated on the stack and as each call uses only a few hundred bytes, even
a small stack can support quite a lot of recursion.
</P>
<P>
If the depth of internal recursive function calls is great enough, local
workspace vectors are allocated on the heap from version 10.32 onwards, so the
depth limit also indirectly limits the amount of heap memory that is used. A
recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
using <b>pcre2_dfa_match()</b>, can use a great deal of memory. However, it is
probably better to limit heap usage directly by calling
<b>pcre2_set_heap_limit()</b>.
</P>
<P>
The default value for the depth limit can be set when PCRE2 is built; the
@ -1096,15 +1114,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
PCRE2_CONFIG_DEPTHLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the depth of
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
<b>pcre2_set_depth_limit()</b> above.
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions,
lookarounds, and atomic groups in <b>pcre2_dfa_match()</b>. Further details are
given with <b>pcre2_set_depth_limit()</b> above.
<pre>
PCRE2_CONFIG_HEAPLIMIT
</pre>
The output is a uint32_t integer that gives, in kilobytes, the default limit
for the amount of heap memory used by <b>pcre2_match()</b>. Further details are
given with <b>pcre2_set_heap_limit()</b> above.
for the amount of heap memory used by <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b>. Further details are given with
<b>pcre2_set_heap_limit()</b> above.
<pre>
PCRE2_CONFIG_JIT
</pre>
@ -3510,17 +3529,7 @@ capture.
Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
DFA match. The convenience functions that extract substrings by number never
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
slightly different:
<pre>
PCRE2_ERROR_UNAVAILABLE
</pre>
The ovector is not big enough to include a slot for the given substring number.
<pre>
PCRE2_ERROR_UNSET
</pre>
There is a slot in the ovector for this substring, but there were insufficient
matches to fill it.
return PCRE2_ERROR_NOSUBSTRING.
</P>
<P>
The matched strings are stored in the ovector in reverse order of length; that
@ -3594,9 +3603,9 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 31 December 2017
Last updated: 27 April 2018
<br>
Copyright &copy; 1997-2017 University of Cambridge.
Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -295,9 +295,10 @@ change this by a setting such as
--with-heap-limit=500
</pre>
which limits the amount of heap to 500 kilobytes. This limit applies only to
interpretive matching in pcre2_match(). It does not apply when JIT (which has
its own memory arrangements) is used, nor does it apply to
<b>pcre2_dfa_match()</b>.
interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
may also use the heap for internal workspace when processing complicated
patterns. This limit does not apply when JIT (which has its own memory
arrangements) is used.
</P>
<P>
You can also explicitly limit the depth of nested backtracking in the
@ -573,7 +574,7 @@ Cambridge, England.
</P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P>
Last updated: 25 February 2018
Last updated: 26 April 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -310,10 +310,12 @@ PCRE2_UNSET.
</P>
<P>
For DFA matching, the <i>offset_vector</i> field points to the ovector that was
passed to the matching function in the match data block, but it holds no useful
information at callout time because <b>pcre2_dfa_match()</b> does not support
substring capturing. The value of <i>capture_top</i> is always 1 and the value
of <i>capture_last</i> is always 0 for DFA matching.
passed to the matching function in the match data block for callouts at the top
level, but to an internal ovector during the processing of pattern recursions,
lookarounds, and atomic groups. However, these ovectors hold no useful
information because <b>pcre2_dfa_match()</b> does not support substring
capturing. The value of <i>capture_top</i> is always 1 and the value of
<i>capture_last</i> is always 0 for DFA matching.
</P>
<P>
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
@ -461,9 +463,9 @@ Cambridge, England.
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
Last updated: 22 December 2017
Last updated: 26 April 2018
<br>
Copyright &copy; 1997-2017 University of Cambridge.
Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -173,12 +173,12 @@ the application to apply the JIT optimization by calling
Setting match resource limits
</b><br>
<P>
The pcre2_match() function contains a counter that is incremented every time it
goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
this counter, which therefore limits the amount of computing resource used for
a match. The maximum depth of nested backtracking can also be limited; this
indirectly restricts the amount of heap memory that is used, but there is also
an explicit memory limit that can be set.
The <b>pcre2_match()</b> function contains a counter that is incremented every
time it goes round its main loop. The caller of <b>pcre2_match()</b> can set a
limit on this counter, which therefore limits the amount of computing resource
used for a match. The maximum depth of nested backtracking can also be limited;
this indirectly restricts the amount of heap memory that is used, but there is
also an explicit memory limit that can be set.
</P>
<P>
These facilities are provided to catch runaway matches that are provoked by
@ -195,20 +195,22 @@ where d is any number of decimal digits. However, the value of the setting must
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used.
setting of one of these limits, the lower value is used. The heap limit is
specified in kilobytes.
</P>
<P>
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
still recognized for backwards compatibility.
</P>
<P>
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit is used
(but in a different way) when JIT is being used, or when
<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
matching functions. The depth limit is ignored by JIT but is relevant for DFA
matching, which uses function recursion for recursions within the pattern. In
this case, the depth limit controls the amount of system stack that is used.
The heap limit applies only when the <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b> interpreters are used for matching. It does not apply
to JIT. The match limit is used (but in a different way) when JIT is being
used, or when <b>pcre2_dfa_match()</b> is called, to limit computing resource
usage by those matching functions. The depth limit is ignored by JIT but is
relevant for DFA matching, which uses function recursion for recursions within
the pattern and for lookaround assertions and atomic groups. In this case, the
depth limit controls the depth of such recursion.
<a name="newlines"></a></P>
<br><b>
Newline conventions
@ -2818,11 +2820,6 @@ matched at the top level, its final captured value is unset, even if it was
(temporarily) set at a deeper level during the matching process.
</P>
<P>
If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
obtain extra memory from the heap to store data during a recursion. If no
memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
</P>
<P>
Do not confuse the (?R) item with the condition (R), which tests for recursion.
Consider this pattern, which matches text in angle brackets, allowing for
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
@ -3479,9 +3476,9 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 12 September 2017
Last updated: 25 April 2018
<br>
Copyright &copy; 1997-2017 University of Cambridge.
Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -93,9 +93,17 @@ may also reduce the memory requirements.
<P>
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
function calls, but only for processing atomic groups, lookaround assertions,
and recursion within the pattern. Too much nested recursion may cause stack
issues. The "match depth" parameter can be used to limit the depth of function
recursion in <b>pcre2_dfa_match()</b>.
and recursion within the pattern. The original version of the code used to
allocate quite large internal workspace vectors on the stack, which caused some
problems for some patterns in environments with small stacks. From release
10.32 the code for <b>pcre2_dfa_match()</b> has been re-factored to use heap
memory when necessary for internal workspace when recursing, though recursive
function calls are still used.
</P>
<P>
The "match depth" parameter can be used to limit the depth of function
recursion, and the "match heap" parameter to limit heap memory in
<b>pcre2_dfa_match()</b>.
</P>
<br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
<P>
@ -244,9 +252,9 @@ Cambridge, England.
</P>
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P>
Last updated: 08 April 2017
Last updated: 25 April 2018
<br>
Copyright &copy; 1997-2017 University of Cambridge.
Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1199,7 +1199,7 @@ pattern.
get=&#60;number or name&#62; extract captured substring
getall extract all captured substrings
/g global global matching
heap_limit=&#60;n&#62; set a limit on heap memory
heap_limit=&#60;n&#62; set a limit on heap memory (Kbytes)
jitstack=&#60;n&#62; set size of JIT stack
mark show mark values
match_limit=&#60;n&#62; set a match limit
@ -1438,20 +1438,17 @@ Finding minimum limits
<P>
If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
calls the relevant matching function several times, setting different values in
the match context via <b>pcre2_set_heap_limit(), \fBpcre2_set_match_limit()</b>,
or <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
parameter that allows the match to complete without error.
the match context via <b>pcre2_set_heap_limit()</b>,
<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
the minimum values for each parameter that allows the match to complete without
error. If JIT is being used, only the match limit is relevant.
</P>
<P>
If JIT is being used, only the match limit is relevant. If DFA matching is
being used, only the depth limit is relevant.
</P>
<P>
The <i>match_limit</i> number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with
increasing length of subject string.
When using this modifier, the pattern should not contain any limit settings
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
lower than the minimum matching value, the minimum value cannot be found
because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
an in-pattern limit; they cannot increase it.
</P>
<P>
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
@ -1460,6 +1457,22 @@ searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
recursive calls of the internal function that is used for handling pattern
recursion, lookaround assertions, and atomic groups.
</P>
<P>
For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
of backtracking that takes place, and learning the minimum value can be
instructive. For most simple matches, the number is quite small, but for
patterns with very large numbers of matching possibilities, it can become large
very quickly with increasing length of subject string. In the case of DFA
matching, <i>match_limit</i> controls the total number of calls, both recursive
and non-recursive, to the internal matching function, thus controlling the
overall amount of computing resource that is used.
</P>
<P>
For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
limits the amount of heap memory used for matching. A value of zero disables
the use of any heap memory; many simple pattern matches can be done without
using the heap, so this is not an unreasonable setting.
</P>
<br><b>
Showing MARK names
</b><br>
@ -1476,13 +1489,14 @@ Showing memory usage
<P>
The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
memory allocation and freeing calls that occur during a call to
<b>pcre2_match()</b>. These occur only when a match requires a bigger vector
than the default for remembering backtracking points. In many cases there will
be no heap memory used and therefore no additional output. No heap memory is
allocated during matching with <b>pcre2_dfa_match</b> or with JIT, so in those
cases the <b>memory</b> modifier never has any effect. For this modifier to
work, the <b>null_context</b> modifier must not be set on both the pattern and
the subject, though it can be set on one or the other.
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
requires a bigger vector than the default for remembering backtracking points
(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
many cases there will be no heap memory used and therefore no additional
output. No heap memory is allocated during matching with JIT, so in that case
the <b>memory</b> modifier never has any effect. For this modifier to work, the
<b>null_context</b> modifier must not be set on both the pattern and the
subject, though it can be set on one or the other.
</P>
<br><b>
Setting a starting offset
@ -1982,9 +1996,9 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 21 December 2017
Last updated: 25 April 2018
<br>
Copyright &copy; 1997-2017 University of Cambridge.
Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -959,13 +959,15 @@ PCRE2 CONTEXTS
The heap_limit parameter specifies, in units of kilobytes, the maximum
amount of heap memory that pcre2_match() may use to hold backtracking
information when running an interpretive match. This limit does not
apply to matching with the JIT optimization, which has its own memory
control arrangements (see the pcre2jit documentation for more details),
nor does it apply to pcre2_dfa_match(). If the limit is reached, the
negative error code PCRE2_ERROR_HEAPLIMIT is returned. The default
limit is set when PCRE2 is built; the default default is very large and
is essentially "unlimited".
information when running an interpretive match. This limit also applies
to pcre2_dfa_match(), which may use the heap when processing patterns
with a lot of nested pattern recursion or lookarounds or atomic groups.
This limit does not apply to matching with the JIT optimization, which
has its own memory control arrangements (see the pcre2jit documentation
for more details). If the limit is reached, the negative error code
PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2
is built; the default default is very large and is essentially "unlim-
ited".
A value for the heap limit may also be supplied by an item at the start
of a pattern of the form
@ -984,6 +986,11 @@ PCRE2 CONTEXTS
zero) no heap memory will be used. In this case, only patterns that do
not have a lot of nested backtracking can be successfully processed.
Similarly, for pcre2_dfa_match(), a vector on the system stack is used
when processing pattern recursions, lookarounds, or atomic groups, and
only if this is not big enough is heap memory used. In this case, too,
setting a value of zero disables the use of the heap.
int pcre2_set_match_limit(pcre2_match_context *mcontext,
uint32_t value);
@ -1033,12 +1040,22 @@ PCRE2 CONTEXTS
The depth limit is not relevant, and is ignored, when matching is done
using JIT compiled code. However, it is supported by pcre2_dfa_match(),
which uses it to limit the depth of internal recursive function calls
that implement atomic groups, lookaround assertions, and pattern recur-
sions. This is, therefore, an indirect limit on the amount of system
stack that is used. A recursive pattern such as /(.)(?1)/, when matched
to a very long string using pcre2_dfa_match(), can use a great deal of
stack.
which uses it to limit the depth of nested internal recursive function
calls that implement atomic groups, lookaround assertions, and pattern
recursions. This limits, indirectly, the amount of system stack this is
used. It was more useful in versions before 10.32, when stack memory
was used for local workspace vectors for recursive function calls. From
version 10.32, only local variables are allocated on the stack and as
each call uses only a few hundred bytes, even a small stack can support
quite a lot of recursion.
If the depth of internal recursive function calls is great enough,
local workspace vectors are allocated on the heap from version 10.32
onwards, so the depth limit also indirectly limits the amount of heap
memory that is used. A recursive pattern such as /(.(?2))((?1)|)/, when
matched to a very long string using pcre2_dfa_match(), can use a great
deal of memory. However, it is probably better to limit heap usage
directly by calling pcre2_set_heap_limit().
The default value for the depth limit can be set when PCRE2 is built;
the default default is the same value as the default for the match
@ -1095,14 +1112,15 @@ CHECKING BUILD-TIME OPTIONS
The output is a uint32_t integer that gives the default limit for the
depth of nested backtracking in pcre2_match() or the depth of nested
recursions and lookarounds in pcre2_dfa_match(). Further details are
given with pcre2_set_depth_limit() above.
recursions, lookarounds, and atomic groups in pcre2_dfa_match(). Fur-
ther details are given with pcre2_set_depth_limit() above.
PCRE2_CONFIG_HEAPLIMIT
The output is a uint32_t integer that gives, in kilobytes, the default
limit for the amount of heap memory used by pcre2_match(). Further
details are given with pcre2_set_heap_limit() above.
limit for the amount of heap memory used by pcre2_match() or
pcre2_dfa_match(). Further details are given with
pcre2_set_heap_limit() above.
PCRE2_CONFIG_JIT
@ -3396,18 +3414,7 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used
after a DFA match. The convenience functions that extract substrings by
number never return PCRE2_ERROR_NOSUBSTRING, and the meanings of some
other errors are slightly different:
PCRE2_ERROR_UNAVAILABLE
The ovector is not big enough to include a slot for the given substring
number.
PCRE2_ERROR_UNSET
There is a slot in the ovector for this substring, but there were
insufficient matches to fill it.
number never return PCRE2_ERROR_NOSUBSTRING.
The matched strings are stored in the ovector in reverse order of
length; that is, the longest matching string is first. If there were
@ -3476,8 +3483,8 @@ AUTHOR
REVISION
Last updated: 31 December 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 27 April 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -3746,9 +3753,10 @@ LIMITING PCRE2 RESOURCE USAGE
--with-heap-limit=500
which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply
to pcre2_dfa_match().
only to interpretive matching in pcre2_match() and pcre2_dfa_match(),
which may also use the heap for internal workspace when processing com-
plicated patterns. This limit does not apply when JIT (which has its
own memory arrangements) is used.
You can also explicitly limit the depth of nested backtracking in the
pcre2_match() interpreter. This limit defaults to the value that is set
@ -4030,7 +4038,7 @@ AUTHOR
REVISION
Last updated: 25 February 2018
Last updated: 26 April 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -4311,10 +4319,12 @@ THE CALLOUT INTERFACE
their ovector slots set to PCRE2_UNSET.
For DFA matching, the offset_vector field points to the ovector that
was passed to the matching function in the match data block, but it
holds no useful information at callout time because pcre2_dfa_match()
does not support substring capturing. The value of capture_top is
always 1 and the value of capture_last is always 0 for DFA matching.
was passed to the matching function in the match data block for call-
outs at the top level, but to an internal ovector during the processing
of pattern recursions, lookarounds, and atomic groups. However, these
ovectors hold no useful information because pcre2_dfa_match() does not
support substring capturing. The value of capture_top is always 1 and
the value of capture_last is always 0 for DFA matching.
The subject and subject_length fields contain copies of the values that
were passed to the matching function.
@ -4454,8 +4464,8 @@ AUTHOR
REVISION
Last updated: 22 December 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 26 April 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -5919,19 +5929,19 @@ SPECIAL START-OF-PATTERN ITEMS
pcre2_match() for it to have any effect. In other words, the pattern
writer can lower the limits set by the programmer, but not raise them.
If there is more than one setting of one of these limits, the lower
value is used.
value is used. The heap limit is specified in kilobytes.
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This
name is still recognized for backwards compatibility.
The heap limit applies only when the pcre2_match() interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit
is used (but in a different way) when JIT is being used, or when
The heap limit applies only when the pcre2_match() or pcre2_dfa_match()
interpreters are used for matching. It does not apply to JIT. The match
limit is used (but in a different way) when JIT is being used, or when
pcre2_dfa_match() is called, to limit computing resource usage by those
matching functions. The depth limit is ignored by JIT but is relevant
for DFA matching, which uses function recursion for recursions within
the pattern. In this case, the depth limit controls the amount of sys-
tem stack that is used.
the pattern and for lookaround assertions and atomic groups. In this
case, the depth limit controls the depth of such recursion.
Newline conventions
@ -8260,11 +8270,6 @@ RECURSIVE PATTERNS
unset, even if it was (temporarily) set at a deeper level during the
matching process.
If there are more than 15 capturing parentheses in a pattern, PCRE2 has
to obtain extra memory from the heap to store data during a recursion.
If no memory can be obtained, the match fails with the
PCRE2_ERROR_NOMEMORY error.
Do not confuse the (?R) item with the condition (R), which tests for
recursion. Consider this pattern, which matches text in angle brack-
ets, allowing for arbitrary nesting. Only digits are allowed in nested
@ -8887,8 +8892,8 @@ AUTHOR
REVISION
Last updated: 12 September 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -8973,9 +8978,17 @@ STACK AND HEAP USAGE AT RUN TIME
In contrast to pcre2_match(), pcre2_dfa_match() does use recursive
function calls, but only for processing atomic groups, lookaround
assertions, and recursion within the pattern. Too much nested recursion
may cause stack issues. The "match depth" parameter can be used to
limit the depth of function recursion in pcre2_dfa_match().
assertions, and recursion within the pattern. The original version of
the code used to allocate quite large internal workspace vectors on the
stack, which caused some problems for some patterns in environments
with small stacks. From release 10.32 the code for pcre2_dfa_match()
has been re-factored to use heap memory when necessary for internal
workspace when recursing, though recursive function calls are still
used.
The "match depth" parameter can be used to limit the depth of function
recursion, and the "match heap" parameter to limit heap memory in
pcre2_dfa_match().
PROCESSING TIME
@ -9115,8 +9128,8 @@ AUTHOR
REVISION
Last updated: 08 April 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -34,9 +34,9 @@ just once (except when processing lookaround assertions). This function is
\fIwscount\fP Number of elements in the vector
.sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function or specify the match and/or the recursion depth limits.
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
The options are:
up a callout function or specify the heap limit or the match or the recursion
depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
characters. The options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "31 December 2017" "PCRE2 10.31"
.TH PCRE2API 3 "27 April 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -887,16 +887,17 @@ offset limit. In other words, whichever limit comes first is used.
.sp
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
information when running an interpretive match. This limit does not apply to
matching with the JIT optimization, which has its own memory control
arrangements (see the
information when running an interpretive match. This limit also applies to
\fBpcre2_dfa_match()\fP, which may use the heap when processing patterns with a
lot of nested pattern recursion or lookarounds or atomic groups. This limit
does not apply to matching with the JIT optimization, which has its own memory
control arrangements (see the
.\" HREF
\fBpcre2jit\fP
.\"
documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
returned. The default limit is set when PCRE2 is built; the default default is
very large and is essentially "unlimited".
documentation for more details). If the limit is reached, the negative error
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
built; the default default is very large and is essentially "unlimited".
.P
A value for the heap limit may also be supplied by an item at the start of a
pattern of the form
@ -914,6 +915,11 @@ Heap memory is used only if the initial vector is too small. If the heap limit
is set to a value less than 21 (in particular, zero) no heap memory will be
used. In this case, only patterns that do not have a lot of nested backtracking
can be successfully processed.
.P
Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used
when processing pattern recursions, lookarounds, or atomic groups, and only if
this is not big enough is heap memory used. In this case, too, setting a value
of zero disables the use of the heap.
.sp
.nf
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
@ -967,11 +973,21 @@ backtracking.
.P
The depth limit is not relevant, and is ignored, when matching is done using
JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
uses it to limit the depth of internal recursive function calls that implement
atomic groups, lookaround assertions, and pattern recursions. This is,
therefore, an indirect limit on the amount of system stack that is used. A
recursive pattern such as /(.)(?1)/, when matched to a very long string using
\fBpcre2_dfa_match()\fP, can use a great deal of stack.
uses it to limit the depth of nested internal recursive function calls that
implement atomic groups, lookaround assertions, and pattern recursions. This
limits, indirectly, the amount of system stack this is used. It was more useful
in versions before 10.32, when stack memory was used for local workspace
vectors for recursive function calls. From version 10.32, only local variables
are allocated on the stack and as each call uses only a few hundred bytes, even
a small stack can support quite a lot of recursion.
.P
If the depth of internal recursive function calls is great enough, local
workspace vectors are allocated on the heap from version 10.32 onwards, so the
depth limit also indirectly limits the amount of heap memory that is used. A
recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
probably better to limit heap usage directly by calling
\fBpcre2_set_heap_limit()\fP.
.P
The default value for the depth limit can be set when PCRE2 is built; the
default default is the same value as the default for the match limit. If the
@ -1028,15 +1044,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
PCRE2_CONFIG_DEPTHLIMIT
.sp
The output is a uint32_t integer that gives the default limit for the depth of
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
\fBpcre2_set_depth_limit()\fP above.
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions,
lookarounds, and atomic groups in \fBpcre2_dfa_match()\fP. Further details are
given with \fBpcre2_set_depth_limit()\fP above.
.sp
PCRE2_CONFIG_HEAPLIMIT
.sp
The output is a uint32_t integer that gives, in kilobytes, the default limit
for the amount of heap memory used by \fBpcre2_match()\fP. Further details are
given with \fBpcre2_set_heap_limit()\fP above.
for the amount of heap memory used by \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP. Further details are given with
\fBpcre2_set_heap_limit()\fP above.
.sp
PCRE2_CONFIG_JIT
.sp
@ -3514,17 +3531,7 @@ capture.
Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
DFA match. The convenience functions that extract substrings by number never
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
slightly different:
.sp
PCRE2_ERROR_UNAVAILABLE
.sp
The ovector is not big enough to include a slot for the given substring number.
.sp
PCRE2_ERROR_UNSET
.sp
There is a slot in the ovector for this substring, but there were insufficient
matches to fill it.
return PCRE2_ERROR_NOSUBSTRING.
.P
The matched strings are stored in the ovector in reverse order of length; that
is, the longest matching string is first. If there were too many matches to fit
@ -3605,6 +3612,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 31 December 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 27 April 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "25 February 2018" "PCRE2 10.32"
.TH PCRE2BUILD 3 "26 April 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.
@ -292,9 +292,10 @@ change this by a setting such as
--with-heap-limit=500
.sp
which limits the amount of heap to 500 kilobytes. This limit applies only to
interpretive matching in pcre2_match(). It does not apply when JIT (which has
its own memory arrangements) is used, nor does it apply to
\fBpcre2_dfa_match()\fP.
interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which
may also use the heap for internal workspace when processing complicated
patterns. This limit does not apply when JIT (which has its own memory
arrangements) is used.
.P
You can also explicitly limit the depth of nested backtracking in the
\fBpcre2_match()\fP interpreter. This limit defaults to the value that is set
@ -590,6 +591,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 25 February 2018
Last updated: 26 April 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
.TH PCRE2CALLOUT 3 "26 April 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -291,10 +291,12 @@ than \fIcapture_top\fP also have both of their ovector slots set to
PCRE2_UNSET.
.P
For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
passed to the matching function in the match data block, but it holds no useful
information at callout time because \fBpcre2_dfa_match()\fP does not support
substring capturing. The value of \fIcapture_top\fP is always 1 and the value
of \fIcapture_last\fP is always 0 for DFA matching.
passed to the matching function in the match data block for callouts at the top
level, but to an internal ovector during the processing of pattern recursions,
lookarounds, and atomic groups. However, these ovectors hold no useful
information because \fBpcre2_dfa_match()\fP does not support substring
capturing. The value of \fIcapture_top\fP is always 1 and the value of
\fIcapture_last\fP is always 0 for DFA matching.
.P
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
that were passed to the matching function.
@ -441,6 +443,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 22 December 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 26 April 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "12 September 2017" "PCRE2 10.31"
.TH PCRE2PATTERN 3 "25 April 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -141,12 +141,12 @@ the application to apply the JIT optimization by calling
.SS "Setting match resource limits"
.rs
.sp
The pcre2_match() function contains a counter that is incremented every time it
goes round its main loop. The caller of \fBpcre2_match()\fP can set a limit on
this counter, which therefore limits the amount of computing resource used for
a match. The maximum depth of nested backtracking can also be limited; this
indirectly restricts the amount of heap memory that is used, but there is also
an explicit memory limit that can be set.
The \fBpcre2_match()\fP function contains a counter that is incremented every
time it goes round its main loop. The caller of \fBpcre2_match()\fP can set a
limit on this counter, which therefore limits the amount of computing resource
used for a match. The maximum depth of nested backtracking can also be limited;
this indirectly restricts the amount of heap memory that is used, but there is
also an explicit memory limit that can be set.
.P
These facilities are provided to catch runaway matches that are provoked by
patterns with huge matching trees (a typical example is a pattern with nested
@ -162,18 +162,20 @@ where d is any number of decimal digits. However, the value of the setting must
be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used.
setting of one of these limits, the lower value is used. The heap limit is
specified in kilobytes.
.P
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
still recognized for backwards compatibility.
.P
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit is used
(but in a different way) when JIT is being used, or when
\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
matching functions. The depth limit is ignored by JIT but is relevant for DFA
matching, which uses function recursion for recursions within the pattern. In
this case, the depth limit controls the amount of system stack that is used.
The heap limit applies only when the \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP interpreters are used for matching. It does not apply
to JIT. The match limit is used (but in a different way) when JIT is being
used, or when \fBpcre2_dfa_match()\fP is called, to limit computing resource
usage by those matching functions. The depth limit is ignored by JIT but is
relevant for DFA matching, which uses function recursion for recursions within
the pattern and for lookaround assertions and atomic groups. In this case, the
depth limit controls the depth of such recursion.
.
.
.\" HTML <a name="newlines"></a>
@ -2838,10 +2840,6 @@ the last value taken on at the top level. If a capturing subpattern is not
matched at the top level, its final captured value is unset, even if it was
(temporarily) set at a deeper level during the matching process.
.P
If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
obtain extra memory from the heap to store data during a recursion. If no
memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
.P
Do not confuse the (?R) item with the condition (R), which tests for recursion.
Consider this pattern, which matches text in angle brackets, allowing for
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
@ -3505,6 +3503,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 12 September 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PERFORM 3 "08 April 2017" "PCRE2 10.30"
.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 PERFORMANCE"
@ -78,9 +78,16 @@ may also reduce the memory requirements.
.P
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
function calls, but only for processing atomic groups, lookaround assertions,
and recursion within the pattern. Too much nested recursion may cause stack
issues. The "match depth" parameter can be used to limit the depth of function
recursion in \fBpcre2_dfa_match()\fP.
and recursion within the pattern. The original version of the code used to
allocate quite large internal workspace vectors on the stack, which caused some
problems for some patterns in environments with small stacks. From release
10.32 the code for \fBpcre2_dfa_match()\fP has been re-factored to use heap
memory when necessary for internal workspace when recursing, though recursive
function calls are still used.
.P
The "match depth" parameter can be used to limit the depth of function
recursion, and the "match heap" parameter to limit heap memory in
\fBpcre2_dfa_match()\fP.
.
.
.SH "PROCESSING TIME"
@ -232,6 +239,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 08 April 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
.TH PCRE2TEST 1 "25 April 2018" "PCRE 10.32"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -1168,7 +1168,7 @@ pattern.
get=<number or name> extract captured substring
getall extract all captured substrings
/g global global matching
heap_limit=<n> set a limit on heap memory
heap_limit=<n> set a limit on heap memory (Kbytes)
jitstack=<n> set size of JIT stack
mark show mark values
match_limit=<n> set a match limit
@ -1401,24 +1401,36 @@ the appropriate limits in the match context. These values are ignored when the
.sp
If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
calls the relevant matching function several times, setting different values in
the match context via \fBpcre2_set_heap_limit(), \fBpcre2_set_match_limit()\fP,
or \fBpcre2_set_depth_limit()\fP until it finds the minimum values for each
parameter that allows the match to complete without error.
the match context via \fBpcre2_set_heap_limit()\fP,
\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
the minimum values for each parameter that allows the match to complete without
error. If JIT is being used, only the match limit is relevant.
.P
If JIT is being used, only the match limit is relevant. If DFA matching is
being used, only the depth limit is relevant.
.P
The \fImatch_limit\fP number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with
increasing length of subject string.
When using this modifier, the pattern should not contain any limit settings
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
lower than the minimum matching value, the minimum value cannot be found
because \fBpcre2_set_match_limit()\fP etc. are only able to reduce the value of
an in-pattern limit; they cannot increase it.
.P
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
much nested backtracking happens (that is, how deeply the pattern's tree is
searched). In the case of DFA matching, \fIdepth_limit\fP controls the depth of
recursive calls of the internal function that is used for handling pattern
recursion, lookaround assertions, and atomic groups.
.P
For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount
of backtracking that takes place, and learning the minimum value can be
instructive. For most simple matches, the number is quite small, but for
patterns with very large numbers of matching possibilities, it can become large
very quickly with increasing length of subject string. In the case of DFA
matching, \fImatch_limit\fP controls the total number of calls, both recursive
and non-recursive, to the internal matching function, thus controlling the
overall amount of computing resource that is used.
.P
For both kinds of matching, the \fIheap_limit\fP number (which is in kilobytes)
limits the amount of heap memory used for matching. A value of zero disables
the use of any heap memory; many simple pattern matches can be done without
using the heap, so this is not an unreasonable setting.
.
.
.SS "Showing MARK names"
@ -1437,13 +1449,14 @@ is added to the non-match message.
.sp
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
memory allocation and freeing calls that occur during a call to
\fBpcre2_match()\fP. These occur only when a match requires a bigger vector
than the default for remembering backtracking points. In many cases there will
be no heap memory used and therefore no additional output. No heap memory is
allocated during matching with \fBpcre2_dfa_match\fP or with JIT, so in those
cases the \fBmemory\fP modifier never has any effect. For this modifier to
work, the \fBnull_context\fP modifier must not be set on both the pattern and
the subject, though it can be set on one or the other.
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match
requires a bigger vector than the default for remembering backtracking points
(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In
many cases there will be no heap memory used and therefore no additional
output. No heap memory is allocated during matching with JIT, so in that case
the \fBmemory\fP modifier never has any effect. For this modifier to work, the
\fBnull_context\fP modifier must not be set on both the pattern and the
subject, though it can be set on one or the other.
.
.
.SS "Setting a starting offset"
@ -1962,6 +1975,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 21 December 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1071,7 +1071,7 @@ SUBJECT MODIFIERS
get=<number or name> extract captured substring
getall extract all captured substrings
/g global global matching
heap_limit=<n> set a limit on heap memory
heap_limit=<n> set a limit on heap memory (Kbytes)
jitstack=<n> set size of JIT stack
mark show mark values
match_limit=<n> set a match limit
@ -1291,16 +1291,13 @@ SUBJECT MODIFIERS
values in the match context via pcre2_set_heap_limit(),
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
minimum values for each parameter that allows the match to complete
without error.
without error. If JIT is being used, only the match limit is relevant.
If JIT is being used, only the match limit is relevant. If DFA matching
is being used, only the depth limit is relevant.
The match_limit number is a measure of the amount of backtracking that
takes place, and learning the minimum value can be instructive. For
most simple matches, the number is quite small, but for patterns with
very large numbers of matching possibilities, it can become large very
quickly with increasing length of subject string.
When using this modifier, the pattern should not contain any limit set-
tings such as (*LIMIT_MATCH=...) within it. If such a setting is
present and is lower than the minimum matching value, the minimum value
cannot be found because pcre2_set_match_limit() etc. are only able to
reduce the value of an in-pattern limit; they cannot increase it.
For non-DFA matching, the minimum depth_limit number is a measure of
how much nested backtracking happens (that is, how deeply the pattern's
@ -1308,6 +1305,22 @@ SUBJECT MODIFIERS
the depth of recursive calls of the internal function that is used for
handling pattern recursion, lookaround assertions, and atomic groups.
For non-DFA matching, the match_limit number is a measure of the amount
of backtracking that takes place, and learning the minimum value can be
instructive. For most simple matches, the number is quite small, but
for patterns with very large numbers of matching possibilities, it can
become large very quickly with increasing length of subject string. In
the case of DFA matching, match_limit controls the total number of
calls, both recursive and non-recursive, to the internal matching func-
tion, thus controlling the overall amount of computing resource that is
used.
For both kinds of matching, the heap_limit number (which is in kilo-
bytes) limits the amount of heap memory used for matching. A value of
zero disables the use of any heap memory; many simple pattern matches
can be done without using the heap, so this is not an unreasonable set-
ting.
Showing MARK names
@ -1321,14 +1334,14 @@ SUBJECT MODIFIERS
The memory modifier causes pcre2test to log the sizes of all heap mem-
ory allocation and freeing calls that occur during a call to
pcre2_match(). These occur only when a match requires a bigger vector
than the default for remembering backtracking points. In many cases
there will be no heap memory used and therefore no additional output.
No heap memory is allocated during matching with pcre2_dfa_match or
with JIT, so in those cases the memory modifier never has any effect.
For this modifier to work, the null_context modifier must not be set on
both the pattern and the subject, though it can be set on one or the
other.
pcre2_match() or pcre2_dfa_match(). These occur only when a match
requires a bigger vector than the default for remembering backtracking
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
In many cases there will be no heap memory used and therefore no addi-
tional output. No heap memory is allocated during matching with JIT, so
in that case the memory modifier never has any effect. For this modi-
fier to work, the null_context modifier must not be set on both the
pattern and the subject, though it can be set on one or the other.
Setting a starting offset
@ -1799,5 +1812,5 @@ AUTHOR
REVISION
Last updated: 21 December 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.

View File

@ -132,8 +132,9 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <zlib.h> header file. */
#undef HAVE_ZLIB_H
/* This limits the amount of memory that pcre2_match() may use while matching
a pattern. The value is in kilobytes. */
/* This limits the amount of memory that may be used while matching a pattern.
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
to JIT matching. The value is in kilobytes. */
#undef HEAP_LIMIT
/* The value of LINK_SIZE determines the number of bytes used to store links
@ -148,7 +149,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* The value of MATCH_LIMIT determines the default number of times the
pcre2_match() function can record a backtrack position during a single
matching attempt. There is a runtime interface for setting a different
matching attempt. The value is also used to limit a loop counter in
pcre2_dfa_match(). There is a runtime interface for setting a different
limit. The limit exists in order to catch runaway regular expressions that
take for ever to determine that they do not match. The default is set very
large so that it does not accidentally catch legitimate cases. */
@ -161,7 +163,9 @@ sure both macros are undefined; an emulation function will then be used. */
MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it
must be less than the value of MATCH_LIMIT. The default is to use the same
value as MATCH_LIMIT. There is a runtime method for setting a different
limit. */
limit. In the case of pcre2_dfa_match(), this limit controls the depth of
the internal nested function calls that are used for pattern recursions,
lookarounds, and atomic groups. */
#undef MATCH_LIMIT_DEPTH
/* This limit is parameterized just in case anybody ever wants to change it.

View File

@ -292,6 +292,35 @@ typedef struct stateblock {
#define INTS_PER_STATEBLOCK (int)(sizeof(stateblock)/sizeof(int))
/* Before version 10.32 the recursive calls of internal_dfa_match() were passed
local working space and output vectors that were created on the stack. This has
caused issues for some patterns, especially in small-stack environments such as
Windows. A new scheme is now in use which sets up a vector on the stack, but if
this is too small, heap memory is used, up to the heap_limit. The main
parameters are all numbers of ints because the workspace is a vector of ints.
The size of the starting stack vector, DFA_START_RWS_SIZE, is in bytes, and is
defined in pcre2_internal.h so as to be available to pcre2test when it is
finding the minimum heap requirement for a match. */
#define OVEC_UNIT (sizeof(PCRE2_SIZE)/sizeof(int))
#define RWS_BASE_SIZE (DFA_START_RWS_SIZE/sizeof(int)) /* Stack vector */
#define RWS_RSIZE 1000 /* Work size for recursion */
#define RWS_OVEC_RSIZE (1000*OVEC_UNIT) /* Ovector for recursion */
#define RWS_OVEC_OSIZE (2*OVEC_UNIT) /* Ovector in other cases */
/* This structure is at the start of each workspace block. */
typedef struct RWS_anchor {
struct RWS_anchor *next;
unsigned int size; /* Number of ints */
unsigned int free; /* Number of ints */
} RWS_anchor;
#define RWS_ANCHOR_SIZE (sizeof(RWS_anchor)/sizeof(int))
/*************************************************
* Process a callout *
@ -353,6 +382,61 @@ return (mb->callout)(cb, mb->callout_data);
/*************************************************
* Expand local workspace memory *
*************************************************/
/* This function is called when internal_dfa_match() is about to be called
recursively and there is insufficient workingspace left in the current work
space block. If there's an existing next block, use it; otherwise get a new
block unless the heap limit is reached.
Arguments:
rwsptr pointer to block pointer (updated)
ovecsize space needed for an ovector
mb the match block
Returns: 0 rwsptr has been updated
!0 an error code
*/
static int
more_workspace(RWS_anchor **rwsptr, unsigned int ovecsize, dfa_match_block *mb)
{
RWS_anchor *rws = *rwsptr;
RWS_anchor *new;
if (rws->next != NULL)
{
new = rws->next;
}
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
kilobytes. */
else
{
unsigned int newsize = rws->size * 2;
unsigned int heapleft = (unsigned int)
(((1024/sizeof(int))*mb->heap_limit - mb->heap_used));
if (newsize > heapleft) newsize = heapleft;
if (newsize < RWS_RSIZE + ovecsize + RWS_ANCHOR_SIZE)
return PCRE2_ERROR_HEAPLIMIT;
new = mb->memctl.malloc(newsize*sizeof(int), mb->memctl.memory_data);
if (new == NULL) return PCRE2_ERROR_NOMEMORY;
mb->heap_used += newsize;
new->next = NULL;
new->size = newsize;
rws->next = new;
}
new->free = new->size - RWS_ANCHOR_SIZE;
*rwsptr = new;
return 0;
}
/*************************************************
* Match a Regular Expression - DFA engine *
*************************************************/
@ -431,7 +515,8 @@ internal_dfa_match(
uint32_t offsetcount,
int *workspace,
int wscount,
uint32_t rlevel)
uint32_t rlevel,
int *RWS)
{
stateblock *active_states, *new_states, *temp_states;
stateblock *next_active_state, *next_new_state;
@ -2587,10 +2672,22 @@ for (;;)
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
{
PCRE2_SPTR endasscode = code + GET(code, 1);
PCRE2_SIZE local_offsets[2];
int rc;
int local_workspace[1000];
int *local_workspace;
PCRE2_SIZE *local_offsets;
PCRE2_SPTR endasscode = code + GET(code, 1);
RWS_anchor *rws = (RWS_anchor *)RWS;
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
{
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
if (rc != 0) return rc;
RWS = (int *)rws;
}
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
@ -2600,10 +2697,13 @@ for (;;)
ptr, /* where we currently are */
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
local_offsets, /* offset vector */
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
local_workspace, /* workspace vector */
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
RWS_RSIZE, /* size of same */
rlevel, /* function recursion level */
RWS); /* recursion workspace */
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
@ -2670,11 +2770,23 @@ for (;;)
else
{
PCRE2_SIZE local_offsets[2];
int local_workspace[1000];
int rc;
int *local_workspace;
PCRE2_SIZE *local_offsets;
PCRE2_SPTR asscode = code + LINK_SIZE + 1;
PCRE2_SPTR endasscode = asscode + GET(asscode, 1);
RWS_anchor *rws = (RWS_anchor *)RWS;
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
{
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
if (rc != 0) return rc;
RWS = (int *)rws;
}
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1);
@ -2684,10 +2796,13 @@ for (;;)
ptr, /* where we currently are */
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
local_offsets, /* offset vector */
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
local_workspace, /* workspace vector */
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
RWS_RSIZE, /* size of same */
rlevel, /* function recursion level */
RWS); /* recursion work space */
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) ==
@ -2702,13 +2817,25 @@ for (;;)
/*-----------------------------------------------------------------*/
case OP_RECURSE:
{
int rc;
int *local_workspace;
PCRE2_SIZE *local_offsets;
RWS_anchor *rws = (RWS_anchor *)RWS;
dfa_recursion_info *ri;
PCRE2_SIZE local_offsets[1000];
int local_workspace[1000];
PCRE2_SPTR callpat = start_code + GET(code, 1);
uint32_t recno = (callpat == mb->start_code)? 0 :
GET2(callpat, 1 + LINK_SIZE);
int rc;
if (rws->free < RWS_RSIZE + RWS_OVEC_RSIZE)
{
rc = more_workspace(&rws, RWS_OVEC_RSIZE, mb);
if (rc != 0) return rc;
RWS = (int *)rws;
}
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
local_workspace = ((int *)local_offsets) + RWS_OVEC_RSIZE;
rws->free -= RWS_RSIZE + RWS_OVEC_RSIZE;
/* Check for repeating a recursion without advancing the subject
pointer. This should catch convoluted mutual recursions. (Some simple
@ -2732,11 +2859,13 @@ for (;;)
ptr, /* where we currently are */
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
local_offsets, /* offset vector */
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
RWS_OVEC_RSIZE/OVEC_UNIT, /* size of same */
local_workspace, /* workspace vector */
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
RWS_RSIZE, /* size of same */
rlevel, /* function recursion level */
RWS); /* recursion workspace */
rws->free += RWS_RSIZE + RWS_OVEC_RSIZE;
mb->recursive = new_recursive.prevrec; /* Done this recursion */
/* Ran out of internal offsets */
@ -2782,10 +2911,25 @@ for (;;)
case OP_SCBRAPOS:
case OP_BRAPOSZERO:
{
int rc;
int *local_workspace;
PCRE2_SIZE *local_offsets;
PCRE2_SIZE charcount, matched_count;
PCRE2_SPTR local_ptr = ptr;
RWS_anchor *rws = (RWS_anchor *)RWS;
BOOL allow_zero;
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
{
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
if (rc != 0) return rc;
RWS = (int *)rws;
}
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
if (codevalue == OP_BRAPOSZERO)
{
allow_zero = TRUE;
@ -2798,19 +2942,17 @@ for (;;)
for (matched_count = 0;; matched_count++)
{
PCRE2_SIZE local_offsets[2];
int local_workspace[1000];
int rc = internal_dfa_match(
rc = internal_dfa_match(
mb, /* fixed match data */
code, /* this subexpression's code */
local_ptr, /* where we currently are */
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
local_offsets, /* offset vector */
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
local_workspace, /* workspace vector */
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
RWS_RSIZE, /* size of same */
rlevel, /* function recursion level */
RWS); /* recursion workspace */
/* Failed to match */
@ -2827,6 +2969,8 @@ for (;;)
local_ptr += charcount; /* Advance temporary position ptr */
}
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
/* At this point we have matched the subpattern matched_count
times, and local_ptr is pointing to the character after the end of the
last match. */
@ -2869,19 +3013,35 @@ for (;;)
/*-----------------------------------------------------------------*/
case OP_ONCE:
{
PCRE2_SIZE local_offsets[2];
int local_workspace[1000];
int rc;
int *local_workspace;
PCRE2_SIZE *local_offsets;
RWS_anchor *rws = (RWS_anchor *)RWS;
int rc = internal_dfa_match(
if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE)
{
rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb);
if (rc != 0) return rc;
RWS = (int *)rws;
}
local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free);
local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE;
rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE;
rc = internal_dfa_match(
mb, /* fixed match data */
code, /* this subexpression's code */
ptr, /* where we currently are */
(PCRE2_SIZE)(ptr - start_subject), /* start offset */
local_offsets, /* offset vector */
sizeof(local_offsets)/sizeof(PCRE2_SIZE), /* size of same */
RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */
local_workspace, /* workspace vector */
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
RWS_RSIZE, /* size of same */
rlevel, /* function recursion level */
RWS); /* recursion workspace */
rws->free += RWS_RSIZE + RWS_OVEC_OSIZE;
if (rc >= 0)
{
@ -3063,6 +3223,7 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data,
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
{
int rc;
const pcre2_real_code *re = (const pcre2_real_code *)code;
PCRE2_SPTR start_match;
@ -3071,9 +3232,9 @@ PCRE2_SPTR bumpalong_limit;
PCRE2_SPTR req_cu_ptr;
BOOL utf, anchored, startline, firstline;
BOOL has_first_cu = FALSE;
BOOL has_req_cu = FALSE;
PCRE2_UCHAR first_cu = 0;
PCRE2_UCHAR first_cu2 = 0;
PCRE2_UCHAR req_cu = 0;
@ -3088,6 +3249,17 @@ pcre2_callout_block cb;
dfa_match_block actual_match_block;
dfa_match_block *mb = &actual_match_block;
/* Set up a starting block of memory for use during recursive calls to
internal_dfa_match(). By putting this on the stack, it minimizes resource use
in the case when it is not needed. If this is too small, more memory is
obtained from the heap. At the start of each block is an anchor structure.*/
int base_recursion_workspace[RWS_BASE_SIZE];
RWS_anchor *rws = (RWS_anchor *)base_recursion_workspace;
rws->next = NULL;
rws->size = RWS_BASE_SIZE;
rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */
@ -3184,6 +3356,7 @@ if (mcontext == NULL)
mb->memctl = re->memctl;
mb->match_limit = PRIV(default_match_context).match_limit;
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
mb->heap_limit = PRIV(default_match_context).heap_limit;
}
else
{
@ -3198,6 +3371,7 @@ else
mb->memctl = mcontext->memctl;
mb->match_limit = mcontext->match_limit;
mb->match_limit_depth = mcontext->depth_limit;
mb->heap_limit = mcontext->heap_limit;
}
if (mb->match_limit > re->limit_match)
@ -3206,6 +3380,9 @@ if (mb->match_limit > re->limit_match)
if (mb->match_limit_depth > re->limit_depth)
mb->match_limit_depth = re->limit_depth;
if (mb->heap_limit > re->limit_heap)
mb->heap_limit = re->limit_heap;
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
re->name_count * re->name_entry_size;
mb->tables = re->tables;
@ -3215,6 +3392,7 @@ mb->start_offset = start_offset;
mb->moptions = options;
mb->poptions = re->overall_options;
mb->match_call_count = 0;
mb->heap_used = 0;
/* Process the \R and newline settings. */
@ -3351,8 +3529,6 @@ a match. */
for (;;)
{
int rc;
/* ----------------- Start of match optimizations ---------------- */
/* There are some optimizations that avoid running the match if a known
@ -3544,7 +3720,7 @@ for (;;)
in characters, we treat it as code units to avoid spending too much time
in this optimization. */
if (end_subject - start_match < re->minlength) return PCRE2_ERROR_NOMATCH;
if (end_subject - start_match < re->minlength) goto NOMATCH_EXIT;
/* If req_cu is set, we know that that code unit must appear in the
subject for the match to succeed. If the first code unit is set, req_cu
@ -3621,7 +3797,8 @@ for (;;)
(uint32_t)match_data->oveccount * 2, /* actual size of same */
workspace, /* workspace vector */
(int)wscount, /* size of same */
0); /* function recurse level */
0, /* function recurse level */
base_recursion_workspace); /* initial workspace for recursion */
/* Anything other than "no match" means we are done, always; otherwise, carry
on only if not anchored. */
@ -3637,7 +3814,7 @@ for (;;)
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
match_data->rc = rc;
return rc;
goto EXIT;
}
/* Advance to the next subject character unless we are at the end of a line
@ -3668,8 +3845,18 @@ for (;;)
} /* "Bumpalong" loop */
NOMATCH_EXIT:
rc = PCRE2_ERROR_NOMATCH;
return PCRE2_ERROR_NOMATCH;
EXIT:
while (rws->next != NULL)
{
RWS_anchor *next = rws->next;
rws->next = next->next;
mb->memctl.free(next, mb->memctl.memory_data);
}
return rc;
}
/* End of pcre2_dfa_match.c */

View File

@ -253,6 +253,11 @@ maximum size of this can be limited. */
#define START_FRAMES_SIZE 20480
/* Similarly, for DFA matching, an initial internal workspace vector is
allocated on the stack. */
#define DFA_START_RWS_SIZE 30720
/* Define the default BSR convention. */
#ifdef BSR_ANYCRLF

View File

@ -896,6 +896,8 @@ typedef struct dfa_match_block {
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
const uint8_t *tables; /* Character tables */
PCRE2_SIZE start_offset; /* The start offset value */
PCRE2_SIZE heap_limit; /* As it says */
PCRE2_SIZE heap_used; /* As it says */
uint32_t match_limit; /* As it says */
uint32_t match_limit_depth; /* As it says */
uint32_t match_call_count; /* Number of calls of internal function */

View File

@ -5760,6 +5760,8 @@ PCRE2_SET_HEAP_LIMIT(dat_context, max);
for (;;)
{
uint32_t stack_start = 0;
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
{
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
@ -5775,6 +5777,7 @@ for (;;)
if ((dat_datctl.control & CTL_DFA) != 0)
{
stack_start = DFA_START_RWS_SIZE/1024;
if (dfa_workspace == NULL)
dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
if (dfa_matched++ == 0)
@ -5789,11 +5792,21 @@ for (;;)
dat_datctl.options, match_data, PTR(dat_context));
else
{
stack_start = START_FRAMES_SIZE/1024;
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
dat_datctl.options, match_data, PTR(dat_context));
}
if (capcount == errnumber)
{
if ((mid & 0x80000000u) != 0)
{
fprintf(outfile, "Can't find minimum %s limit: check pattern for "
"restriction\n", msg);
break;
}
min = mid;
mid = (mid == max - 1)? max : (max != UINT32_MAX)? (min + max)/2 : mid*2;
}
@ -5802,11 +5815,12 @@ for (;;)
capcount == PCRE2_ERROR_PARTIAL)
{
/* If we've not hit the error with a heap limit less than the size of the
initial stack frame vector, the heap is not being used, so the minimum
limit is zero; there's no need to go on. The other limits are always
greater than zero. */
initial stack frame vector (for pcre2_match()) or the initial stack
workspace vector (for pcre2_dfa_match()), the heap is not being used, so
the minimum limit is zero; there's no need to go on. The other limits are
always greater than zero. */
if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < START_FRAMES_SIZE/1024)
if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < stack_start)
{
fprintf(outfile, "Minimum %s limit = 0\n", msg);
break;
@ -7139,18 +7153,16 @@ else for (gmatched = 0;; gmatched++)
(double)CLOCKS_PER_SEC);
}
/* Find the heap, match and depth limits if requested. The match and heap
limits are not relevant for DFA matching and the depth and heap limits are
not relevant for JIT. The return from check_match_limit() is the return from
the final call to pcre2_match() or pcre2_dfa_match(). */
/* Find the heap, match and depth limits if requested. The depth and heap
limits are not relevant for JIT. The return from check_match_limit() is the
return from the final call to pcre2_match() or pcre2_dfa_match(). */
if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
{
capcount = 0; /* This stops compiler warnings */
if ((dat_datctl.control & CTL_DFA) == 0 &&
(FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0))
if (FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
{
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
}
@ -7165,6 +7177,12 @@ else for (gmatched = 0;; gmatched++)
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
"depth");
}
if (capcount == 0)
{
fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n");
capcount = dat_datctl.oveccount;
}
}
/* Otherwise just run a single match, setting up a callout if required (the

8
testdata/testinput6 vendored
View File

@ -4874,6 +4874,14 @@
\= Expect depth limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
\= Expect heap limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
\= Expect success
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
/(02-)?[0-9]{3}-[0-9]{3}/
02-123-123

11
testdata/testoutput6 vendored
View File

@ -7667,12 +7667,23 @@ No match
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
Failed: error -53: matching depth limit exceeded
/(*LIMIT_HEAP=0)^((.)(?1)|.)$/
\= Expect heap limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
Failed: error -63: heap limit exceeded
/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/
\= Expect success
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
0: a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
/(02-)?[0-9]{3}-[0-9]{3}/
02-123-123
0: 02-123-123
/^(a(?2))(b)(?1)/
abbab\=find_limits
Minimum heap limit = 0
Minimum match limit = 4
Minimum depth limit = 2
0: abbab