Make pcre2_dfa_match() take notice of the match limit, to catch patterns that

use too much resource. This should fix oss-fuzz 1761.
This commit is contained in:
Philip.Hazel 2017-05-30 10:42:57 +00:00
parent a16919ce6f
commit c0902e176f
16 changed files with 1340 additions and 1318 deletions

View File

@ -173,6 +173,10 @@ one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
35. A lookbehind assertion that had a zero-length branch caused undefined 35. A lookbehind assertion that had a zero-length branch caused undefined
behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859. behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
36. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
Version 10.23 14-February-2017 Version 10.23 14-February-2017
------------------------------ ------------------------------

View File

@ -46,8 +46,9 @@ just once (except when processing lookaround assertions). This function is
<i>wscount</i> Number of elements in the vector <i>wscount</i> Number of elements in the vector
</pre> </pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
up a callout function or specify the recursion depth limit. The <i>length</i> up a callout function or specify the match and/or the recursion depth limits.
and <i>startoffset</i> values are code units, not characters. The options are: The <i>length</i> and <i>startoffset</i> values are code units, not characters.
The options are:
<pre> <pre>
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_ENDANCHORED Pattern can match only at end of subject

View File

@ -329,7 +329,7 @@ document for an overview of all the PCRE2 documentation.
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b> <b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
<br> <br>
<br> <br>
These functions became obsolete at release 10.30 and are retained only for These functions became obsolete at release 10.30 and are retained only for
backward compatibility. They should not be used in new code. The first is backward compatibility. They should not be used in new code. The first is
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
has no effect (it always returns zero). has no effect (it always returns zero).
@ -428,10 +428,10 @@ documentation, and the
documentation describes how to compile and run it. documentation describes how to compile and run it.
</P> </P>
<P> <P>
The compiling and matching functions recognize various options that are passed The compiling and matching functions recognize various options that are passed
as bits in an options argument. There are also some more complicated parameters as bits in an options argument. There are also some more complicated parameters
such as custom memory management functions and resource limits that are passed such as custom memory management functions and resource limits that are passed
in "contexts" (which are just memory blocks, described below). Simple in "contexts" (which are just memory blocks, described below). Simple
applications do not need to make use of contexts. applications do not need to make use of contexts.
</P> </P>
<P> <P>
@ -450,7 +450,7 @@ More complicated programs might need to make use of the specialist functions
<P> <P>
JIT matching is automatically used by <b>pcre2_match()</b> if it is available, JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
matching, which gives improved performance at the expense of less sanity matching, which gives improved performance at the expense of less sanity
checking. The JIT-specific functions are discussed in the checking. The JIT-specific functions are discussed in the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation. documentation.
@ -705,7 +705,7 @@ following compile-time parameters:
The newline character sequence The newline character sequence
The compile time nested parentheses limit The compile time nested parentheses limit
The maximum length of the pattern string The maximum length of the pattern string
The extra options bits (none set by default) The extra options bits (none set by default)
</pre> </pre>
A compile context is also required if you are using custom memory management. A compile context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of If none of these apply, just pass NULL as the context argument of
@ -757,9 +757,9 @@ in the current locale.
<br> <br>
As PCRE2 has developed, almost all the 32 option bits that are available in As PCRE2 has developed, almost all the 32 option bits that are available in
the <i>options</i> argument of <b>pcre2_compile()</b> have been used up. To avoid the <i>options</i> argument of <b>pcre2_compile()</b> have been used up. To avoid
running out, the compile context contains a set of extra option bits which are running out, the compile context contains a set of extra option bits which are
used for some newer, assumed rarer, options. This function sets those bits. It used for some newer, assumed rarer, options. This function sets those bits. It
always sets all the bits (either on or off). It does not modify any existing always sets all the bits (either on or off). It does not modify any existing
setting. The available options are defined in the section entitled "Extra setting. The available options are defined in the section entitled "Extra
compile options" compile options"
<a href="#extracompileoptions">below.</a> <a href="#extracompileoptions">below.</a>
@ -783,8 +783,8 @@ PCRE2_SIZE variable can hold, which is effectively unlimited.
This specifies which characters or character sequences are to be recognized as This specifies which characters or character sequences are to be recognized as
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only), newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
NUL character, that is a binary zero). NUL character, that is a binary zero).
</P> </P>
<P> <P>
@ -837,7 +837,7 @@ A match context is required if you want to:
<pre> <pre>
Set up a callout function Set up a callout function
Set an offset limit for matching an unanchored pattern Set an offset limit for matching an unanchored pattern
Change the limit on the amount of heap used when matching Change the limit on the amount of heap used when matching
Change the backtracking match limit Change the backtracking match limit
Change the backtracking depth limit Change the backtracking depth limit
Set custom memory management specifically for the match Set custom memory management specifically for the match
@ -908,15 +908,15 @@ In other words, whichever limit comes first is used.
<b> uint32_t <i>value</i>);</b> <b> uint32_t <i>value</i>);</b>
<br> <br>
<br> <br>
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
information when running an interpretive match. This limit does not apply to information when running an interpretive match. This limit does not apply to
matching with the JIT optimization, which has its own memory control matching with the JIT optimization, which has its own memory control
arrangements (see the arrangements (see the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>. documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
returned. The default limit is set when PCRE2 is built; the default default is returned. The default limit is set when PCRE2 is built; the default default is
very large and is essentially "unlimited". very large and is essentially "unlimited".
</P> </P>
<P> <P>
@ -932,11 +932,11 @@ limit is set, less than the default.
<P> <P>
The <b>pcre2_match()</b> function starts out using a 20K vector on the system The <b>pcre2_match()</b> function starts out using a 20K vector on the system
stack for recording backtracking points. The more nested backtracking points stack for recording backtracking points. The more nested backtracking points
there are (that is, the deeper the search tree), the more memory is needed. there are (that is, the deeper the search tree), the more memory is needed.
Heap memory is used only if the initial vector is too small. If the heap limit Heap memory is used only if the initial vector is too small. If the heap limit
is set to a value less than 21 (in particular, zero) no heap memory will be is set to a value less than 21 (in particular, zero) no heap memory will be
used. In this case, only patterns that do not have a lot of nested backtracking used. In this case, only patterns that do not have a lot of nested backtracking
can be successfully processed. can be successfully processed.
<br> <br>
<br> <br>
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
@ -954,8 +954,8 @@ time round its main matching loop. If this value reaches the match limit,
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has <b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
the effect of limiting the amount of backtracking that can take place. For the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>, in the subject string. This limit also applies to <b>pcre2_dfa_match()</b>,
which ignores it. though the counting is done in a different way.
</P> </P>
<P> <P>
When <b>pcre2_match()</b> is called with a pattern that was successfully When <b>pcre2_match()</b> is called with a pattern that was successfully
@ -974,8 +974,8 @@ of the form
(*LIMIT_MATCH=ddd) (*LIMIT_MATCH=ddd)
</pre> </pre>
where ddd is a decimal number. However, such a setting is ignored unless ddd is where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such less than the limit set by the caller of <b>pcre2_match()</b> or
limit is set, less than the default. <b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
<br> <br>
<br> <br>
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
@ -983,7 +983,7 @@ limit is set, less than the default.
<br> <br>
<br> <br>
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>. This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
Each time a nested backtracking point is passed, a new memory "frame" is used Each time a nested backtracking point is passed, a new memory "frame" is used
to remember the state of matching at that point. Thus, this parameter to remember the state of matching at that point. Thus, this parameter
indirectly limits the amount of memory that is used in a match. However, indirectly limits the amount of memory that is used in a match. However,
because the size of each memory "frame" depends on the number of capturing because the size of each memory "frame" depends on the number of capturing
@ -1107,7 +1107,7 @@ sequence that is recognized as meaning "newline". The values are:
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
PCRE2_NEWLINE_NUL The NUL character (binary zero) PCRE2_NEWLINE_NUL The NUL character (binary zero)
</pre> </pre>
The default should normally correspond to the standard sequence for your The default should normally correspond to the standard sequence for your
operating system. operating system.
@ -1334,7 +1334,7 @@ parenthesis. The name is not processed in any way, and it is not possible to
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
option is set, normal backslash processing is applied to verb names and only an option is set, normal backslash processing is applied to verb names and only an
unescaped closing parenthesis terminates the name. A closing parenthesis can be unescaped closing parenthesis terminates the name. A closing parenthesis can be
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is
skipped and #-comments are recognized in this mode, exactly as in the rest of skipped and #-comments are recognized in this mode, exactly as in the rest of
the pattern. the pattern.
@ -1352,12 +1352,12 @@ documentation.
</pre> </pre>
If this bit is set, letters in the pattern match both upper and lower case If this bit is set, letters in the pattern match both upper and lower case
letters in the subject. It is equivalent to Perl's /i option, and it can be letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
properties are used for all characters with more than one other case, and for properties are used for all characters with more than one other case, and for
all characters whose code points are greater than U+007f. For lower valued all characters whose code points are greater than U+007f. For lower valued
characters with only one other case, a lookup table is used for speed. When characters with only one other case, a lookup table is used for speed. When
PCRE2_UTF is not set, a lookup table is used for all code points less than 256, PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
and higher code points (available only in 16-bit or 32-bit mode) are treated as and higher code points (available only in 16-bit or 32-bit mode) are treated as
not having another case. not having another case.
<pre> <pre>
PCRE2_DOLLAR_ENDONLY PCRE2_DOLLAR_ENDONLY
@ -1391,11 +1391,11 @@ documentation.
PCRE2_ENDANCHORED PCRE2_ENDANCHORED
</pre> </pre>
If this bit is set, the end of any pattern match must be right at the end of If this bit is set, the end of any pattern match must be right at the end of
the string being searched (the "subject string"). If the pattern match the string being searched (the "subject string"). If the pattern match
succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the
match fails at the current starting point. For unanchored patterns, a new match match fails at the current starting point. For unanchored patterns, a new match
is then tried at the next starting point. However, if the match succeeds by is then tried at the next starting point. However, if the match succeeds by
reaching the end of the pattern, but not the end of the subject, backtracking reaching the end of the pattern, but not the end of the subject, backtracking
occurs and an alternative match may be found. Consider these two patterns: occurs and an alternative match may be found. Consider these two patterns:
<pre> <pre>
.(*ACCEPT)|.. .(*ACCEPT)|..
@ -1407,9 +1407,9 @@ achieved by appropriate constructs in the pattern itself, which is the only way
to do it in Perl. to do it in Perl.
</P> </P>
<P> <P>
For DFA matching with <b>pcre2_dfa_match()</b>, PCRE2_ENDANCHORED applies only For DFA matching with <b>pcre2_dfa_match()</b>, PCRE2_ENDANCHORED applies only
to the first (that is, the longest) matched string. Other parallel matches, to the first (that is, the longest) matched string. Other parallel matches,
which are necessarily substrings of the first one, must obviously end before which are necessarily substrings of the first one, must obviously end before
the end of the subject. the end of the subject.
<pre> <pre>
PCRE2_EXTENDED PCRE2_EXTENDED
@ -1584,7 +1584,7 @@ current starting position, which in this case, it does. However, if the same
match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
subject string does not happen. The first match attempt is run starting from subject string does not happen. The first match attempt is run starting from
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so "D" and when this fails, (*COMMIT) prevents any further matches being tried, so
the overall result is "no match". the overall result is "no match".
</P> </P>
<P> <P>
There are also other start-up optimizations. For example, a minimum length for There are also other start-up optimizations. For example, a minimum length for
@ -1610,13 +1610,13 @@ and
in the in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a> <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
negative error code. negative error code.
</P> </P>
<P> <P>
If you know that your pattern is a valid UTF string, and you want to skip this If you know that your pattern is a valid UTF string, and you want to skip this
check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When
it is set, the effect of passing an invalid UTF string as a pattern is it is set, the effect of passing an invalid UTF string as a pattern is
undefined. It may cause your program to crash or loop. undefined. It may cause your program to crash or loop.
</P> </P>
<P> <P>
Note that this option can also be passed to <b>pcre2_match()</b> and Note that this option can also be passed to <b>pcre2_match()</b> and
@ -1685,13 +1685,13 @@ calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
<pre> <pre>
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
</pre> </pre>
This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate" forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate"
code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot
therefore be represented in UTF-16. They can be represented in UTF-8 and therefore be represented in UTF-16. They can be represented in UTF-8 and
UTF-32, but are defined as invalid code points, and cause errors if encountered UTF-32, but are defined as invalid code points, and cause errors if encountered
in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2. in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
</P> </P>
<P> <P>
These values also cause errors if encountered in escape sequences such as These values also cause errors if encountered in escape sequences such as
@ -1702,9 +1702,9 @@ not disable the error that occurs, because it applies only to the testing of
input strings for UTF validity. input strings for UTF validity.
</P> </P>
<P> <P>
If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
incorporated in the compiled pattern. However, they can only match subject incorporated in the compiled pattern. However, they can only match subject
characters if the matching function is called with PCRE2_NO_UTF_CHECK set. characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
</P> </P>
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br> <br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
@ -1914,7 +1914,7 @@ The third argument should point to an <b>uint32_t</b> variable.
If the pattern set a backtracking depth limit by including an item of the form If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to an unsigned 32-bit integer. If no such value has been set, the
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
that this limit will only be used during matching if it is less than the limit that this limit will only be used during matching if it is less than the limit
set or defaulted by the caller of the match function. set or defaulted by the caller of the match function.
<pre> <pre>
@ -2123,7 +2123,7 @@ The output is one of the following <b>uint32_t</b> values:
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
PCRE2_NEWLINE_NUL The NUL character (binary zero) PCRE2_NEWLINE_NUL The NUL character (binary zero)
</pre> </pre>
This identifies the character sequence that will be recognized as meaning This identifies the character sequence that will be recognized as meaning
"newline" while matching. "newline" while matching.
@ -2334,8 +2334,8 @@ instead of one.
<P> <P>
If a non-zero starting offset is passed when the pattern is anchored, a single If a non-zero starting offset is passed when the pattern is anchored, a single
attempt to match at the given offset is made. This can only succeed if the attempt to match at the given offset is made. This can only succeed if the
pattern does not require the match to be at the start of the subject. In other pattern does not require the match to be at the start of the subject. In other
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A. the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
<a name="matchoptions"></a></P> <a name="matchoptions"></a></P>
<br><b> <br><b>
@ -2508,7 +2508,7 @@ reference, and so advances only by one character after the first failure.
</P> </P>
<P> <P>
An explicit match for CR of LF is either a literal appearance of one of those An explicit match for CR of LF is either a literal appearance of one of those
characters in the pattern, or one of the \r or \n or equivalent octal or characters in the pattern, or one of the \r or \n or equivalent octal or
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
does \s, even though it includes CR and LF in the characters that it matches. does \s, even though it includes CR and LF in the characters that it matches.
</P> </P>
@ -2751,9 +2751,9 @@ The backtracking match limit was reached.
<pre> <pre>
PCRE2_ERROR_NOMEMORY PCRE2_ERROR_NOMEMORY
</pre> </pre>
If a pattern contains many nested backtracking points, heap memory is used to If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit. if the amount of memory needed exceeds the heap limit.
<pre> <pre>
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
@ -3471,7 +3471,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 26 May 2017 Last updated: 30 May 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -260,9 +260,9 @@ setting such as
<pre> <pre>
--with-match-limit=500000 --with-match-limit=500000
</pre> </pre>
to the <b>configure</b> command. This setting has no effect on the to the <b>configure</b> command. This setting also applies to the
<b>pcre2_dfa_match()</b> matching function, but it does also limit JIT matching <b>pcre2_dfa_match()</b> matching function, and to JIT matching (though the
(though the counting is done differently). counting is done differently).
</P> </P>
<P> <P>
The <b>pcre2_match()</b> function starts out using a 20K vector on the system The <b>pcre2_match()</b> function starts out using a 20K vector on the system
@ -554,7 +554,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br> <br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 10 April 2017 Last updated: 30 May 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -204,11 +204,11 @@ still recognized for backwards compatibility.
<P> <P>
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit is used for matching. It does not apply to JIT or DFA matching. The match limit is used
(but in a different way) when JIT is being used, but it is not relevant, and is (but in a different way) when JIT is being used, or when
ignored, when matching with <b>pcre2_dfa_match()</b>. The depth limit is ignored <b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
by JIT but is relevant for DFA matching, which uses function recursion for matching functions. The depth limit is ignored by JIT but is relevant for DFA
recursions within the pattern. In this case, the depth limit controls the matching, which uses function recursion for recursions within the pattern. In
amount of system stack that is used. this case, the depth limit controls the amount of system stack that is used.
<a name="newlines"></a></P> <a name="newlines"></a></P>
<br><b> <br><b>
Newline conventions Newline conventions
@ -3445,7 +3445,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 26 May 2017 Last updated: 30 May 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "04 April 2017" "PCRE2 10.30" .TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -34,8 +34,9 @@ just once (except when processing lookaround assertions). This function is
\fIwscount\fP Number of elements in the vector \fIwscount\fP Number of elements in the vector
.sp .sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function or specify the recursion depth limit. The \fIlength\fP up a callout function or specify the match and/or the recursion depth limits.
and \fIstartoffset\fP values are code units, not characters. The options are: The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
The options are:
.sp .sp
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_ENDANCHORED Pattern can match only at end of subject

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "26 May 2017" "PCRE2 10.30" .TH PCRE2API 3 "30 May 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -266,7 +266,7 @@ document for an overview of all the PCRE2 documentation.
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);" .B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.fi .fi
.sp .sp
These functions became obsolete at release 10.30 and are retained only for These functions became obsolete at release 10.30 and are retained only for
backward compatibility. They should not be used in new code. The first is backward compatibility. They should not be used in new code. The first is
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
has no effect (it always returns zero). has no effect (it always returns zero).
@ -365,10 +365,10 @@ documentation, and the
.\" .\"
documentation describes how to compile and run it. documentation describes how to compile and run it.
.P .P
The compiling and matching functions recognize various options that are passed The compiling and matching functions recognize various options that are passed
as bits in an options argument. There are also some more complicated parameters as bits in an options argument. There are also some more complicated parameters
such as custom memory management functions and resource limits that are passed such as custom memory management functions and resource limits that are passed
in "contexts" (which are just memory blocks, described below). Simple in "contexts" (which are just memory blocks, described below). Simple
applications do not need to make use of contexts. applications do not need to make use of contexts.
.P .P
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
@ -384,7 +384,7 @@ More complicated programs might need to make use of the specialist functions
.P .P
JIT matching is automatically used by \fBpcre2_match()\fP if it is available, JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
matching, which gives improved performance at the expense of less sanity matching, which gives improved performance at the expense of less sanity
checking. The JIT-specific functions are discussed in the checking. The JIT-specific functions are discussed in the
.\" HREF .\" HREF
\fBpcre2jit\fP \fBpcre2jit\fP
@ -646,7 +646,7 @@ following compile-time parameters:
The newline character sequence The newline character sequence
The compile time nested parentheses limit The compile time nested parentheses limit
The maximum length of the pattern string The maximum length of the pattern string
The extra options bits (none set by default) The extra options bits (none set by default)
.sp .sp
A compile context is also required if you are using custom memory management. A compile context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of If none of these apply, just pass NULL as the context argument of
@ -695,9 +695,9 @@ in the current locale.
.sp .sp
As PCRE2 has developed, almost all the 32 option bits that are available in As PCRE2 has developed, almost all the 32 option bits that are available in
the \fIoptions\fP argument of \fBpcre2_compile()\fP have been used up. To avoid the \fIoptions\fP argument of \fBpcre2_compile()\fP have been used up. To avoid
running out, the compile context contains a set of extra option bits which are running out, the compile context contains a set of extra option bits which are
used for some newer, assumed rarer, options. This function sets those bits. It used for some newer, assumed rarer, options. This function sets those bits. It
always sets all the bits (either on or off). It does not modify any existing always sets all the bits (either on or off). It does not modify any existing
setting. The available options are defined in the section entitled "Extra setting. The available options are defined in the section entitled "Extra
compile options" compile options"
.\" HTML <a href="#extracompileoptions"> .\" HTML <a href="#extracompileoptions">
@ -724,8 +724,8 @@ PCRE2_SIZE variable can hold, which is effectively unlimited.
This specifies which characters or character sequences are to be recognized as This specifies which characters or character sequences are to be recognized as
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only), newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
NUL character, that is a binary zero). NUL character, that is a binary zero).
.P .P
A pattern can override the value set in the compile context by starting with a A pattern can override the value set in the compile context by starting with a
@ -778,7 +778,7 @@ A match context is required if you want to:
.sp .sp
Set up a callout function Set up a callout function
Set an offset limit for matching an unanchored pattern Set an offset limit for matching an unanchored pattern
Change the limit on the amount of heap used when matching Change the limit on the amount of heap used when matching
Change the backtracking match limit Change the backtracking match limit
Change the backtracking depth limit Change the backtracking depth limit
Set custom memory management specifically for the match Set custom memory management specifically for the match
@ -846,7 +846,7 @@ In other words, whichever limit comes first is used.
.B " uint32_t \fIvalue\fP);" .B " uint32_t \fIvalue\fP);"
.fi .fi
.sp .sp
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
information when running an interpretive match. This limit does not apply to information when running an interpretive match. This limit does not apply to
matching with the JIT optimization, which has its own memory control matching with the JIT optimization, which has its own memory control
@ -855,8 +855,8 @@ arrangements (see the
\fBpcre2jit\fP \fBpcre2jit\fP
.\" .\"
documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP. documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
returned. The default limit is set when PCRE2 is built; the default default is returned. The default limit is set when PCRE2 is built; the default default is
very large and is essentially "unlimited". very large and is essentially "unlimited".
.P .P
A value for the heap limit may also be supplied by an item at the start of a A value for the heap limit may also be supplied by an item at the start of a
@ -870,11 +870,11 @@ limit is set, less than the default.
.P .P
The \fBpcre2_match()\fP function starts out using a 20K vector on the system The \fBpcre2_match()\fP function starts out using a 20K vector on the system
stack for recording backtracking points. The more nested backtracking points stack for recording backtracking points. The more nested backtracking points
there are (that is, the deeper the search tree), the more memory is needed. there are (that is, the deeper the search tree), the more memory is needed.
Heap memory is used only if the initial vector is too small. If the heap limit Heap memory is used only if the initial vector is too small. If the heap limit
is set to a value less than 21 (in particular, zero) no heap memory will be is set to a value less than 21 (in particular, zero) no heap memory will be
used. In this case, only patterns that do not have a lot of nested backtracking used. In this case, only patterns that do not have a lot of nested backtracking
can be successfully processed. can be successfully processed.
.sp .sp
.nf .nf
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
@ -891,8 +891,8 @@ time round its main matching loop. If this value reaches the match limit,
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has \fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
the effect of limiting the amount of backtracking that can take place. For the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP, in the subject string. This limit also applies to \fBpcre2_dfa_match()\fP,
which ignores it. though the counting is done in a different way.
.P .P
When \fBpcre2_match()\fP is called with a pattern that was successfully When \fBpcre2_match()\fP is called with a pattern that was successfully
processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
@ -909,8 +909,8 @@ of the form
(*LIMIT_MATCH=ddd) (*LIMIT_MATCH=ddd)
.sp .sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such less than the limit set by the caller of \fBpcre2_match()\fP or
limit is set, less than the default. \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp .sp
.nf .nf
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
@ -918,7 +918,7 @@ limit is set, less than the default.
.fi .fi
.sp .sp
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP. This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
Each time a nested backtracking point is passed, a new memory "frame" is used Each time a nested backtracking point is passed, a new memory "frame" is used
to remember the state of matching at that point. Thus, this parameter to remember the state of matching at that point. Thus, this parameter
indirectly limits the amount of memory that is used in a match. However, indirectly limits the amount of memory that is used in a match. However,
because the size of each memory "frame" depends on the number of capturing because the size of each memory "frame" depends on the number of capturing
@ -1040,7 +1040,7 @@ sequence that is recognized as meaning "newline". The values are:
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
PCRE2_NEWLINE_NUL The NUL character (binary zero) PCRE2_NEWLINE_NUL The NUL character (binary zero)
.sp .sp
The default should normally correspond to the standard sequence for your The default should normally correspond to the standard sequence for your
operating system. operating system.
@ -1270,7 +1270,7 @@ parenthesis. The name is not processed in any way, and it is not possible to
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
option is set, normal backslash processing is applied to verb names and only an option is set, normal backslash processing is applied to verb names and only an
unescaped closing parenthesis terminates the name. A closing parenthesis can be unescaped closing parenthesis terminates the name. A closing parenthesis can be
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is
skipped and #-comments are recognized in this mode, exactly as in the rest of skipped and #-comments are recognized in this mode, exactly as in the rest of
the pattern. the pattern.
@ -1290,12 +1290,12 @@ documentation.
.sp .sp
If this bit is set, letters in the pattern match both upper and lower case If this bit is set, letters in the pattern match both upper and lower case
letters in the subject. It is equivalent to Perl's /i option, and it can be letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
properties are used for all characters with more than one other case, and for properties are used for all characters with more than one other case, and for
all characters whose code points are greater than U+007f. For lower valued all characters whose code points are greater than U+007f. For lower valued
characters with only one other case, a lookup table is used for speed. When characters with only one other case, a lookup table is used for speed. When
PCRE2_UTF is not set, a lookup table is used for all code points less than 256, PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
and higher code points (available only in 16-bit or 32-bit mode) are treated as and higher code points (available only in 16-bit or 32-bit mode) are treated as
not having another case. not having another case.
.sp .sp
PCRE2_DOLLAR_ENDONLY PCRE2_DOLLAR_ENDONLY
@ -1331,11 +1331,11 @@ documentation.
PCRE2_ENDANCHORED PCRE2_ENDANCHORED
.sp .sp
If this bit is set, the end of any pattern match must be right at the end of If this bit is set, the end of any pattern match must be right at the end of
the string being searched (the "subject string"). If the pattern match the string being searched (the "subject string"). If the pattern match
succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the
match fails at the current starting point. For unanchored patterns, a new match match fails at the current starting point. For unanchored patterns, a new match
is then tried at the next starting point. However, if the match succeeds by is then tried at the next starting point. However, if the match succeeds by
reaching the end of the pattern, but not the end of the subject, backtracking reaching the end of the pattern, but not the end of the subject, backtracking
occurs and an alternative match may be found. Consider these two patterns: occurs and an alternative match may be found. Consider these two patterns:
.sp .sp
.(*ACCEPT)|.. .(*ACCEPT)|..
@ -1346,9 +1346,9 @@ whereas the second matches "bc". The effect of PCRE2_ENDANCHORED can also be
achieved by appropriate constructs in the pattern itself, which is the only way achieved by appropriate constructs in the pattern itself, which is the only way
to do it in Perl. to do it in Perl.
.P .P
For DFA matching with \fBpcre2_dfa_match()\fP, PCRE2_ENDANCHORED applies only For DFA matching with \fBpcre2_dfa_match()\fP, PCRE2_ENDANCHORED applies only
to the first (that is, the longest) matched string. Other parallel matches, to the first (that is, the longest) matched string. Other parallel matches,
which are necessarily substrings of the first one, must obviously end before which are necessarily substrings of the first one, must obviously end before
the end of the subject. the end of the subject.
.sp .sp
PCRE2_EXTENDED PCRE2_EXTENDED
@ -1520,7 +1520,7 @@ current starting position, which in this case, it does. However, if the same
match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
subject string does not happen. The first match attempt is run starting from subject string does not happen. The first match attempt is run starting from
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so "D" and when this fails, (*COMMIT) prevents any further matches being tried, so
the overall result is "no match". the overall result is "no match".
.P .P
There are also other start-up optimizations. For example, a minimum length for There are also other start-up optimizations. For example, a minimum length for
the subject may be recorded. Consider the pattern the subject may be recorded. Consider the pattern
@ -1556,12 +1556,12 @@ in the
\fBpcre2unicode\fP \fBpcre2unicode\fP
.\" .\"
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
negative error code. negative error code.
.P .P
If you know that your pattern is a valid UTF string, and you want to skip this If you know that your pattern is a valid UTF string, and you want to skip this
check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When
it is set, the effect of passing an invalid UTF string as a pattern is it is set, the effect of passing an invalid UTF string as a pattern is
undefined. It may cause your program to crash or loop. undefined. It may cause your program to crash or loop.
.P .P
Note that this option can also be passed to \fBpcre2_match()\fP and Note that this option can also be passed to \fBpcre2_match()\fP and
\fBpcre_dfa_match()\fP, to suppress UTF validity checking of the subject \fBpcre_dfa_match()\fP, to suppress UTF validity checking of the subject
@ -1575,7 +1575,7 @@ such as \ex{d800} you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra
option, as described in the section entitled "Extra compile options" option, as described in the section entitled "Extra compile options"
.\" HTML <a href="#extracompileoptions"> .\" HTML <a href="#extracompileoptions">
.\" </a> .\" </a>
below. below.
.\" .\"
However, this is possible only in UTF-8 and UTF-32 modes, because these values However, this is possible only in UTF-8 and UTF-32 modes, because these values
are not representable in UTF-16. are not representable in UTF-16.
@ -1642,13 +1642,13 @@ calling the \fBpcre2_set_compile_extra_options()\fP function are as follows:
.sp .sp
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
.sp .sp
This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate" forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate"
code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot
therefore be represented in UTF-16. They can be represented in UTF-8 and therefore be represented in UTF-16. They can be represented in UTF-8 and
UTF-32, but are defined as invalid code points, and cause errors if encountered UTF-32, but are defined as invalid code points, and cause errors if encountered
in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2. in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
.P .P
These values also cause errors if encountered in escape sequences such as These values also cause errors if encountered in escape sequences such as
\ex{d912} within a pattern. However, it seems that some applications, when \ex{d912} within a pattern. However, it seems that some applications, when
@ -1657,9 +1657,9 @@ for the surrogates using escape sequences. The PCRE2_NO_UTF_CHECK option does
not disable the error that occurs, because it applies only to the testing of not disable the error that occurs, because it applies only to the testing of
input strings for UTF validity. input strings for UTF validity.
.P .P
If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
incorporated in the compiled pattern. However, they can only match subject incorporated in the compiled pattern. However, they can only match subject
characters if the matching function is called with PCRE2_NO_UTF_CHECK set. characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
. .
. .
@ -1881,7 +1881,7 @@ The third argument should point to an \fBuint32_t\fP variable.
If the pattern set a backtracking depth limit by including an item of the form If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the should point to an unsigned 32-bit integer. If no such value has been set, the
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
that this limit will only be used during matching if it is less than the limit that this limit will only be used during matching if it is less than the limit
set or defaulted by the caller of the match function. set or defaulted by the caller of the match function.
.sp .sp
@ -2092,7 +2092,7 @@ The output is one of the following \fBuint32_t\fP values:
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
PCRE2_NEWLINE_NUL The NUL character (binary zero) PCRE2_NEWLINE_NUL The NUL character (binary zero)
.sp .sp
This identifies the character sequence that will be recognized as meaning This identifies the character sequence that will be recognized as meaning
"newline" while matching. "newline" while matching.
@ -2319,8 +2319,8 @@ instead of one.
.P .P
If a non-zero starting offset is passed when the pattern is anchored, a single If a non-zero starting offset is passed when the pattern is anchored, a single
attempt to match at the given offset is made. This can only succeed if the attempt to match at the given offset is made. This can only succeed if the
pattern does not require the match to be at the start of the subject. In other pattern does not require the match to be at the start of the subject. In other
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA. the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
. .
. .
@ -2509,7 +2509,7 @@ start, it skips both the CR and the LF before retrying. However, the pattern
reference, and so advances only by one character after the first failure. reference, and so advances only by one character after the first failure.
.P .P
An explicit match for CR of LF is either a literal appearance of one of those An explicit match for CR of LF is either a literal appearance of one of those
characters in the pattern, or one of the \er or \en or equivalent octal or characters in the pattern, or one of the \er or \en or equivalent octal or
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
does \es, even though it includes CR and LF in the characters that it matches. does \es, even though it includes CR and LF in the characters that it matches.
.P .P
@ -2769,9 +2769,9 @@ The backtracking match limit was reached.
.sp .sp
PCRE2_ERROR_NOMEMORY PCRE2_ERROR_NOMEMORY
.sp .sp
If a pattern contains many nested backtracking points, heap memory is used to If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit. if the amount of memory needed exceeds the heap limit.
.sp .sp
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
@ -3491,6 +3491,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 26 May 2017 Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "10 April 2017" "PCRE2 10.30" .TH PCRE2BUILD 3 "30 May 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
. .
@ -256,9 +256,9 @@ setting such as
.sp .sp
--with-match-limit=500000 --with-match-limit=500000
.sp .sp
to the \fBconfigure\fP command. This setting has no effect on the to the \fBconfigure\fP command. This setting also applies to the
\fBpcre2_dfa_match()\fP matching function, but it does also limit JIT matching \fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
(though the counting is done differently). counting is done differently).
.P .P
The \fBpcre2_match()\fP function starts out using a 20K vector on the system The \fBpcre2_match()\fP function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking points there stack to record backtracking points. The more nested backtracking points there
@ -572,6 +572,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 10 April 2017 Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "26 May 2017" "PCRE2 10.30" .TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -169,11 +169,11 @@ still recognized for backwards compatibility.
.P .P
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit is used for matching. It does not apply to JIT or DFA matching. The match limit is used
(but in a different way) when JIT is being used, but it is not relevant, and is (but in a different way) when JIT is being used, or when
ignored, when matching with \fBpcre2_dfa_match()\fP. The depth limit is ignored \fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
by JIT but is relevant for DFA matching, which uses function recursion for matching functions. The depth limit is ignored by JIT but is relevant for DFA
recursions within the pattern. In this case, the depth limit controls the matching, which uses function recursion for recursions within the pattern. In
amount of system stack that is used. this case, the depth limit controls the amount of system stack that is used.
. .
. .
.\" HTML <a name="newlines"></a> .\" HTML <a name="newlines"></a>
@ -3475,6 +3475,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 26 May 2017 Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -396,6 +396,7 @@ BOOL utf = FALSE;
BOOL reset_could_continue = FALSE; BOOL reset_could_continue = FALSE;
if (mb->match_call_count++ >= mb->match_limit) return PCRE2_ERROR_MATCHLIMIT;
if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT; if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT;
offsetcount &= (uint32_t)(-2); /* Round down */ offsetcount &= (uint32_t)(-2); /* Round down */
@ -3218,6 +3219,7 @@ if (mcontext == NULL)
{ {
mb->callout = NULL; mb->callout = NULL;
mb->memctl = re->memctl; mb->memctl = re->memctl;
mb->match_limit = PRIV(default_match_context).match_limit;
mb->match_limit_depth = PRIV(default_match_context).depth_limit; mb->match_limit_depth = PRIV(default_match_context).depth_limit;
} }
else else
@ -3231,8 +3233,13 @@ else
mb->callout = mcontext->callout; mb->callout = mcontext->callout;
mb->callout_data = mcontext->callout_data; mb->callout_data = mcontext->callout_data;
mb->memctl = mcontext->memctl; mb->memctl = mcontext->memctl;
mb->match_limit = mcontext->match_limit;
mb->match_limit_depth = mcontext->depth_limit; mb->match_limit_depth = mcontext->depth_limit;
} }
if (mb->match_limit > re->limit_match)
mb->match_limit = re->limit_match;
if (mb->match_limit_depth > re->limit_depth) if (mb->match_limit_depth > re->limit_depth)
mb->match_limit_depth = re->limit_depth; mb->match_limit_depth = re->limit_depth;
@ -3244,6 +3251,7 @@ mb->end_subject = end_subject;
mb->start_offset = start_offset; mb->start_offset = start_offset;
mb->moptions = options; mb->moptions = options;
mb->poptions = re->overall_options; mb->poptions = re->overall_options;
mb->match_call_count = 0;
/* Process the \R and newline settings. */ /* Process the \R and newline settings. */

View File

@ -178,20 +178,20 @@ for (i = 0; i < 2; i++)
return 0; return 0;
} }
(void)pcre2_set_match_limit(match_context, 100); (void)pcre2_set_match_limit(match_context, 100);
(void)pcre2_set_depth_limit(match_context, 100);
(void)pcre2_set_callout(match_context, callout_function, &callout_count); (void)pcre2_set_callout(match_context, callout_function, &callout_count);
} }
/* Match twice, with and without options, with a depth limit of 100. */ /* Match twice, with and without options. */
(void)pcre2_set_depth_limit(match_context, 100);
for (j = 0; j < 2; j++) for (j = 0; j < 2; j++)
{ {
#ifdef STANDALONE #ifdef STANDALONE
printf("Match options %.8x", match_options); printf("Match options %.8x", match_options);
printf("%s%s%s%s%s%s%s%s%s\n", printf("%s%s%s%s%s%s%s%s%s%s\n",
((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "", ((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "",
((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "", ((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "",
((match_options & PCRE2_NO_JIT) != 0)? ",no_jit" : "",
((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "", ((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "",
((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "", ((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "",
((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "", ((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "",
@ -217,9 +217,8 @@ for (i = 0; i < 2; i++)
match_options = 0; /* For second time */ match_options = 0; /* For second time */
} }
/* Match with DFA twice, with and without options, depth limit of 10. */ /* Match with DFA twice, with and without options. */
(void)pcre2_set_depth_limit(match_context, 10);
match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */ match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */
for (j = 0; j < 2; j++) for (j = 0; j < 2; j++)

View File

@ -877,7 +877,9 @@ typedef struct dfa_match_block {
PCRE2_SPTR last_used_ptr; /* Latest consulted character */ PCRE2_SPTR last_used_ptr; /* Latest consulted character */
const uint8_t *tables; /* Character tables */ const uint8_t *tables; /* Character tables */
PCRE2_SIZE start_offset; /* The start offset value */ PCRE2_SIZE start_offset; /* The start offset value */
uint32_t match_limit; /* As it says */
uint32_t match_limit_depth; /* As it says */ uint32_t match_limit_depth; /* As it says */
uint32_t match_call_count; /* Number of calls of internal function */
uint32_t moptions; /* Match options */ uint32_t moptions; /* Match options */
uint32_t poptions; /* Pattern options */ uint32_t poptions; /* Pattern options */
uint32_t nltype; /* Newline type */ uint32_t nltype; /* Newline type */

View File

@ -7054,17 +7054,15 @@ else for (gmatched = 0;; gmatched++)
{ {
capcount = 0; /* This stops compiler warnings */ capcount = 0; /* This stops compiler warnings */
if ((dat_datctl.control & CTL_DFA) == 0) if ((dat_datctl.control & CTL_DFA) == 0 &&
{ (FLD(compiled_code, executable_jit) == NULL ||
if (FLD(compiled_code, executable_jit) == NULL || (dat_datctl.options & PCRE2_NO_JIT) != 0))
(dat_datctl.options & PCRE2_NO_JIT) != 0) {
{ (void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, }
"heap");
} capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT, "match");
"match");
}
if (FLD(compiled_code, executable_jit) == NULL || if (FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0 || (dat_datctl.options & PCRE2_NO_JIT) != 0 ||

3
testdata/testinput6 vendored
View File

@ -4941,4 +4941,7 @@
/(?<=|abc)/endanchored /(?<=|abc)/endanchored
abcde\=aftertext abcde\=aftertext
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
# End of testinput6 # End of testinput6

View File

@ -7691,6 +7691,7 @@ Failed: error -53: matching depth limit exceeded
/^(a(?2))(b)(?1)/ /^(a(?2))(b)(?1)/
abbab\=find_limits abbab\=find_limits
Minimum match limit = 4
Minimum depth limit = 2 Minimum depth limit = 2
0: abbab 0: abbab
@ -7766,4 +7767,8 @@ No match
0: 0:
0+ 0+
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
Failed: error -47: match limit exceeded
# End of testinput6 # End of testinput6