Make pcre2_dfa_match() take notice of the match limit, to catch patterns that
use too much resource. This should fix oss-fuzz 1761.
This commit is contained in:
parent
a16919ce6f
commit
c0902e176f
|
@ -173,6 +173,10 @@ one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||||
35. A lookbehind assertion that had a zero-length branch caused undefined
|
35. A lookbehind assertion that had a zero-length branch caused undefined
|
||||||
behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
|
behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
|
||||||
|
|
||||||
|
36. The match limit value now also applies to pcre2_dfa_match() as there are
|
||||||
|
patterns that can use up a lot of resources without necessarily recursing very
|
||||||
|
deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
|
||||||
|
|
||||||
|
|
||||||
Version 10.23 14-February-2017
|
Version 10.23 14-February-2017
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
|
@ -46,8 +46,9 @@ just once (except when processing lookaround assertions). This function is
|
||||||
<i>wscount</i> Number of elements in the vector
|
<i>wscount</i> Number of elements in the vector
|
||||||
</pre>
|
</pre>
|
||||||
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
|
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
|
||||||
up a callout function or specify the recursion depth limit. The <i>length</i>
|
up a callout function or specify the match and/or the recursion depth limits.
|
||||||
and <i>startoffset</i> values are code units, not characters. The options are:
|
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
|
||||||
|
The options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
|
|
|
@ -329,7 +329,7 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
These functions became obsolete at release 10.30 and are retained only for
|
These functions became obsolete at release 10.30 and are retained only for
|
||||||
backward compatibility. They should not be used in new code. The first is
|
backward compatibility. They should not be used in new code. The first is
|
||||||
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
|
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
|
||||||
has no effect (it always returns zero).
|
has no effect (it always returns zero).
|
||||||
|
@ -428,10 +428,10 @@ documentation, and the
|
||||||
documentation describes how to compile and run it.
|
documentation describes how to compile and run it.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The compiling and matching functions recognize various options that are passed
|
The compiling and matching functions recognize various options that are passed
|
||||||
as bits in an options argument. There are also some more complicated parameters
|
as bits in an options argument. There are also some more complicated parameters
|
||||||
such as custom memory management functions and resource limits that are passed
|
such as custom memory management functions and resource limits that are passed
|
||||||
in "contexts" (which are just memory blocks, described below). Simple
|
in "contexts" (which are just memory blocks, described below). Simple
|
||||||
applications do not need to make use of contexts.
|
applications do not need to make use of contexts.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -450,7 +450,7 @@ More complicated programs might need to make use of the specialist functions
|
||||||
<P>
|
<P>
|
||||||
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
|
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
|
||||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||||
matching, which gives improved performance at the expense of less sanity
|
matching, which gives improved performance at the expense of less sanity
|
||||||
checking. The JIT-specific functions are discussed in the
|
checking. The JIT-specific functions are discussed in the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
|
@ -705,7 +705,7 @@ following compile-time parameters:
|
||||||
The newline character sequence
|
The newline character sequence
|
||||||
The compile time nested parentheses limit
|
The compile time nested parentheses limit
|
||||||
The maximum length of the pattern string
|
The maximum length of the pattern string
|
||||||
The extra options bits (none set by default)
|
The extra options bits (none set by default)
|
||||||
</pre>
|
</pre>
|
||||||
A compile context is also required if you are using custom memory management.
|
A compile context is also required if you are using custom memory management.
|
||||||
If none of these apply, just pass NULL as the context argument of
|
If none of these apply, just pass NULL as the context argument of
|
||||||
|
@ -757,9 +757,9 @@ in the current locale.
|
||||||
<br>
|
<br>
|
||||||
As PCRE2 has developed, almost all the 32 option bits that are available in
|
As PCRE2 has developed, almost all the 32 option bits that are available in
|
||||||
the <i>options</i> argument of <b>pcre2_compile()</b> have been used up. To avoid
|
the <i>options</i> argument of <b>pcre2_compile()</b> have been used up. To avoid
|
||||||
running out, the compile context contains a set of extra option bits which are
|
running out, the compile context contains a set of extra option bits which are
|
||||||
used for some newer, assumed rarer, options. This function sets those bits. It
|
used for some newer, assumed rarer, options. This function sets those bits. It
|
||||||
always sets all the bits (either on or off). It does not modify any existing
|
always sets all the bits (either on or off). It does not modify any existing
|
||||||
setting. The available options are defined in the section entitled "Extra
|
setting. The available options are defined in the section entitled "Extra
|
||||||
compile options"
|
compile options"
|
||||||
<a href="#extracompileoptions">below.</a>
|
<a href="#extracompileoptions">below.</a>
|
||||||
|
@ -783,8 +783,8 @@ PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||||
This specifies which characters or character sequences are to be recognized as
|
This specifies which characters or character sequences are to be recognized as
|
||||||
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
|
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
|
||||||
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
||||||
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
|
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
|
||||||
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
|
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
|
||||||
NUL character, that is a binary zero).
|
NUL character, that is a binary zero).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -837,7 +837,7 @@ A match context is required if you want to:
|
||||||
<pre>
|
<pre>
|
||||||
Set up a callout function
|
Set up a callout function
|
||||||
Set an offset limit for matching an unanchored pattern
|
Set an offset limit for matching an unanchored pattern
|
||||||
Change the limit on the amount of heap used when matching
|
Change the limit on the amount of heap used when matching
|
||||||
Change the backtracking match limit
|
Change the backtracking match limit
|
||||||
Change the backtracking depth limit
|
Change the backtracking depth limit
|
||||||
Set custom memory management specifically for the match
|
Set custom memory management specifically for the match
|
||||||
|
@ -908,15 +908,15 @@ In other words, whichever limit comes first is used.
|
||||||
<b> uint32_t <i>value</i>);</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
|
||||||
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
|
||||||
information when running an interpretive match. This limit does not apply to
|
information when running an interpretive match. This limit does not apply to
|
||||||
matching with the JIT optimization, which has its own memory control
|
matching with the JIT optimization, which has its own memory control
|
||||||
arrangements (see the
|
arrangements (see the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
|
documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
|
||||||
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
||||||
returned. The default limit is set when PCRE2 is built; the default default is
|
returned. The default limit is set when PCRE2 is built; the default default is
|
||||||
very large and is essentially "unlimited".
|
very large and is essentially "unlimited".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -932,11 +932,11 @@ limit is set, less than the default.
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
stack for recording backtracking points. The more nested backtracking points
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
there are (that is, the deeper the search tree), the more memory is needed.
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
is set to a value less than 21 (in particular, zero) no heap memory will be
|
||||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
used. In this case, only patterns that do not have a lot of nested backtracking
|
||||||
can be successfully processed.
|
can be successfully processed.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
|
@ -954,8 +954,8 @@ time round its main matching loop. If this value reaches the match limit,
|
||||||
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||||
the effect of limiting the amount of backtracking that can take place. For
|
the effect of limiting the amount of backtracking that can take place. For
|
||||||
patterns that are not anchored, the count restarts from zero for each position
|
patterns that are not anchored, the count restarts from zero for each position
|
||||||
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
|
in the subject string. This limit also applies to <b>pcre2_dfa_match()</b>,
|
||||||
which ignores it.
|
though the counting is done in a different way.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When <b>pcre2_match()</b> is called with a pattern that was successfully
|
When <b>pcre2_match()</b> is called with a pattern that was successfully
|
||||||
|
@ -974,8 +974,8 @@ of the form
|
||||||
(*LIMIT_MATCH=ddd)
|
(*LIMIT_MATCH=ddd)
|
||||||
</pre>
|
</pre>
|
||||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
less than the limit set by the caller of <b>pcre2_match()</b> or
|
||||||
limit is set, less than the default.
|
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
|
@ -983,7 +983,7 @@ limit is set, less than the default.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
|
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
|
||||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||||
to remember the state of matching at that point. Thus, this parameter
|
to remember the state of matching at that point. Thus, this parameter
|
||||||
indirectly limits the amount of memory that is used in a match. However,
|
indirectly limits the amount of memory that is used in a match. However,
|
||||||
because the size of each memory "frame" depends on the number of capturing
|
because the size of each memory "frame" depends on the number of capturing
|
||||||
|
@ -1107,7 +1107,7 @@ sequence that is recognized as meaning "newline". The values are:
|
||||||
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
||||||
</pre>
|
</pre>
|
||||||
The default should normally correspond to the standard sequence for your
|
The default should normally correspond to the standard sequence for your
|
||||||
operating system.
|
operating system.
|
||||||
|
@ -1334,7 +1334,7 @@ parenthesis. The name is not processed in any way, and it is not possible to
|
||||||
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
|
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
|
||||||
option is set, normal backslash processing is applied to verb names and only an
|
option is set, normal backslash processing is applied to verb names and only an
|
||||||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||||
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||||
or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is
|
or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is
|
||||||
skipped and #-comments are recognized in this mode, exactly as in the rest of
|
skipped and #-comments are recognized in this mode, exactly as in the rest of
|
||||||
the pattern.
|
the pattern.
|
||||||
|
@ -1352,12 +1352,12 @@ documentation.
|
||||||
</pre>
|
</pre>
|
||||||
If this bit is set, letters in the pattern match both upper and lower case
|
If this bit is set, letters in the pattern match both upper and lower case
|
||||||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||||
properties are used for all characters with more than one other case, and for
|
properties are used for all characters with more than one other case, and for
|
||||||
all characters whose code points are greater than U+007f. For lower valued
|
all characters whose code points are greater than U+007f. For lower valued
|
||||||
characters with only one other case, a lookup table is used for speed. When
|
characters with only one other case, a lookup table is used for speed. When
|
||||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||||
not having another case.
|
not having another case.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_DOLLAR_ENDONLY
|
PCRE2_DOLLAR_ENDONLY
|
||||||
|
@ -1391,11 +1391,11 @@ documentation.
|
||||||
PCRE2_ENDANCHORED
|
PCRE2_ENDANCHORED
|
||||||
</pre>
|
</pre>
|
||||||
If this bit is set, the end of any pattern match must be right at the end of
|
If this bit is set, the end of any pattern match must be right at the end of
|
||||||
the string being searched (the "subject string"). If the pattern match
|
the string being searched (the "subject string"). If the pattern match
|
||||||
succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the
|
succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the
|
||||||
match fails at the current starting point. For unanchored patterns, a new match
|
match fails at the current starting point. For unanchored patterns, a new match
|
||||||
is then tried at the next starting point. However, if the match succeeds by
|
is then tried at the next starting point. However, if the match succeeds by
|
||||||
reaching the end of the pattern, but not the end of the subject, backtracking
|
reaching the end of the pattern, but not the end of the subject, backtracking
|
||||||
occurs and an alternative match may be found. Consider these two patterns:
|
occurs and an alternative match may be found. Consider these two patterns:
|
||||||
<pre>
|
<pre>
|
||||||
.(*ACCEPT)|..
|
.(*ACCEPT)|..
|
||||||
|
@ -1407,9 +1407,9 @@ achieved by appropriate constructs in the pattern itself, which is the only way
|
||||||
to do it in Perl.
|
to do it in Perl.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For DFA matching with <b>pcre2_dfa_match()</b>, PCRE2_ENDANCHORED applies only
|
For DFA matching with <b>pcre2_dfa_match()</b>, PCRE2_ENDANCHORED applies only
|
||||||
to the first (that is, the longest) matched string. Other parallel matches,
|
to the first (that is, the longest) matched string. Other parallel matches,
|
||||||
which are necessarily substrings of the first one, must obviously end before
|
which are necessarily substrings of the first one, must obviously end before
|
||||||
the end of the subject.
|
the end of the subject.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTENDED
|
PCRE2_EXTENDED
|
||||||
|
@ -1584,7 +1584,7 @@ current starting position, which in this case, it does. However, if the same
|
||||||
match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
|
match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
|
||||||
subject string does not happen. The first match attempt is run starting from
|
subject string does not happen. The first match attempt is run starting from
|
||||||
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
||||||
the overall result is "no match".
|
the overall result is "no match".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There are also other start-up optimizations. For example, a minimum length for
|
There are also other start-up optimizations. For example, a minimum length for
|
||||||
|
@ -1610,13 +1610,13 @@ and
|
||||||
in the
|
in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
|
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
|
||||||
negative error code.
|
negative error code.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If you know that your pattern is a valid UTF string, and you want to skip this
|
If you know that your pattern is a valid UTF string, and you want to skip this
|
||||||
check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When
|
check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When
|
||||||
it is set, the effect of passing an invalid UTF string as a pattern is
|
it is set, the effect of passing an invalid UTF string as a pattern is
|
||||||
undefined. It may cause your program to crash or loop.
|
undefined. It may cause your program to crash or loop.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Note that this option can also be passed to <b>pcre2_match()</b> and
|
Note that this option can also be passed to <b>pcre2_match()</b> and
|
||||||
|
@ -1685,13 +1685,13 @@ calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
</pre>
|
</pre>
|
||||||
This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
|
This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
|
||||||
forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate"
|
forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate"
|
||||||
code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
|
code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
|
||||||
code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot
|
code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot
|
||||||
therefore be represented in UTF-16. They can be represented in UTF-8 and
|
therefore be represented in UTF-16. They can be represented in UTF-8 and
|
||||||
UTF-32, but are defined as invalid code points, and cause errors if encountered
|
UTF-32, but are defined as invalid code points, and cause errors if encountered
|
||||||
in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
|
in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
These values also cause errors if encountered in escape sequences such as
|
These values also cause errors if encountered in escape sequences such as
|
||||||
|
@ -1702,9 +1702,9 @@ not disable the error that occurs, because it applies only to the testing of
|
||||||
input strings for UTF validity.
|
input strings for UTF validity.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
||||||
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
||||||
incorporated in the compiled pattern. However, they can only match subject
|
incorporated in the compiled pattern. However, they can only match subject
|
||||||
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||||
|
@ -1914,7 +1914,7 @@ The third argument should point to an <b>uint32_t</b> variable.
|
||||||
If the pattern set a backtracking depth limit by including an item of the form
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET. Note
|
||||||
that this limit will only be used during matching if it is less than the limit
|
that this limit will only be used during matching if it is less than the limit
|
||||||
set or defaulted by the caller of the match function.
|
set or defaulted by the caller of the match function.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -2123,7 +2123,7 @@ The output is one of the following <b>uint32_t</b> values:
|
||||||
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
||||||
</pre>
|
</pre>
|
||||||
This identifies the character sequence that will be recognized as meaning
|
This identifies the character sequence that will be recognized as meaning
|
||||||
"newline" while matching.
|
"newline" while matching.
|
||||||
|
@ -2334,8 +2334,8 @@ instead of one.
|
||||||
<P>
|
<P>
|
||||||
If a non-zero starting offset is passed when the pattern is anchored, a single
|
If a non-zero starting offset is passed when the pattern is anchored, a single
|
||||||
attempt to match at the given offset is made. This can only succeed if the
|
attempt to match at the given offset is made. This can only succeed if the
|
||||||
pattern does not require the match to be at the start of the subject. In other
|
pattern does not require the match to be at the start of the subject. In other
|
||||||
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||||
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
|
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
|
||||||
<a name="matchoptions"></a></P>
|
<a name="matchoptions"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
@ -2508,7 +2508,7 @@ reference, and so advances only by one character after the first failure.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
An explicit match for CR of LF is either a literal appearance of one of those
|
An explicit match for CR of LF is either a literal appearance of one of those
|
||||||
characters in the pattern, or one of the \r or \n or equivalent octal or
|
characters in the pattern, or one of the \r or \n or equivalent octal or
|
||||||
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||||
does \s, even though it includes CR and LF in the characters that it matches.
|
does \s, even though it includes CR and LF in the characters that it matches.
|
||||||
</P>
|
</P>
|
||||||
|
@ -2751,9 +2751,9 @@ The backtracking match limit was reached.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_NOMEMORY
|
PCRE2_ERROR_NOMEMORY
|
||||||
</pre>
|
</pre>
|
||||||
If a pattern contains many nested backtracking points, heap memory is used to
|
If a pattern contains many nested backtracking points, heap memory is used to
|
||||||
remember them. This error is given when the memory allocation function (default
|
remember them. This error is given when the memory allocation function (default
|
||||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
||||||
if the amount of memory needed exceeds the heap limit.
|
if the amount of memory needed exceeds the heap limit.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
|
@ -3471,7 +3471,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 26 May 2017
|
Last updated: 30 May 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -260,9 +260,9 @@ setting such as
|
||||||
<pre>
|
<pre>
|
||||||
--with-match-limit=500000
|
--with-match-limit=500000
|
||||||
</pre>
|
</pre>
|
||||||
to the <b>configure</b> command. This setting has no effect on the
|
to the <b>configure</b> command. This setting also applies to the
|
||||||
<b>pcre2_dfa_match()</b> matching function, but it does also limit JIT matching
|
<b>pcre2_dfa_match()</b> matching function, and to JIT matching (though the
|
||||||
(though the counting is done differently).
|
counting is done differently).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||||
|
@ -554,7 +554,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 10 April 2017
|
Last updated: 30 May 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -204,11 +204,11 @@ still recognized for backwards compatibility.
|
||||||
<P>
|
<P>
|
||||||
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
|
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
|
||||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
||||||
(but in a different way) when JIT is being used, but it is not relevant, and is
|
(but in a different way) when JIT is being used, or when
|
||||||
ignored, when matching with <b>pcre2_dfa_match()</b>. The depth limit is ignored
|
<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
|
||||||
by JIT but is relevant for DFA matching, which uses function recursion for
|
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
||||||
recursions within the pattern. In this case, the depth limit controls the
|
matching, which uses function recursion for recursions within the pattern. In
|
||||||
amount of system stack that is used.
|
this case, the depth limit controls the amount of system stack that is used.
|
||||||
<a name="newlines"></a></P>
|
<a name="newlines"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Newline conventions
|
Newline conventions
|
||||||
|
@ -3445,7 +3445,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 26 May 2017
|
Last updated: 30 May 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
2309
doc/pcre2.txt
2309
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_DFA_MATCH 3 "04 April 2017" "PCRE2 10.30"
|
.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -34,8 +34,9 @@ just once (except when processing lookaround assertions). This function is
|
||||||
\fIwscount\fP Number of elements in the vector
|
\fIwscount\fP Number of elements in the vector
|
||||||
.sp
|
.sp
|
||||||
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
||||||
up a callout function or specify the recursion depth limit. The \fIlength\fP
|
up a callout function or specify the match and/or the recursion depth limits.
|
||||||
and \fIstartoffset\fP values are code units, not characters. The options are:
|
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
|
||||||
|
The options are:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
|
|
122
doc/pcre2api.3
122
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "26 May 2017" "PCRE2 10.30"
|
.TH PCRE2API 3 "30 May 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -266,7 +266,7 @@ document for an overview of all the PCRE2 documentation.
|
||||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
These functions became obsolete at release 10.30 and are retained only for
|
These functions became obsolete at release 10.30 and are retained only for
|
||||||
backward compatibility. They should not be used in new code. The first is
|
backward compatibility. They should not be used in new code. The first is
|
||||||
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
|
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
|
||||||
has no effect (it always returns zero).
|
has no effect (it always returns zero).
|
||||||
|
@ -365,10 +365,10 @@ documentation, and the
|
||||||
.\"
|
.\"
|
||||||
documentation describes how to compile and run it.
|
documentation describes how to compile and run it.
|
||||||
.P
|
.P
|
||||||
The compiling and matching functions recognize various options that are passed
|
The compiling and matching functions recognize various options that are passed
|
||||||
as bits in an options argument. There are also some more complicated parameters
|
as bits in an options argument. There are also some more complicated parameters
|
||||||
such as custom memory management functions and resource limits that are passed
|
such as custom memory management functions and resource limits that are passed
|
||||||
in "contexts" (which are just memory blocks, described below). Simple
|
in "contexts" (which are just memory blocks, described below). Simple
|
||||||
applications do not need to make use of contexts.
|
applications do not need to make use of contexts.
|
||||||
.P
|
.P
|
||||||
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||||
|
@ -384,7 +384,7 @@ More complicated programs might need to make use of the specialist functions
|
||||||
.P
|
.P
|
||||||
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
|
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
|
||||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||||
matching, which gives improved performance at the expense of less sanity
|
matching, which gives improved performance at the expense of less sanity
|
||||||
checking. The JIT-specific functions are discussed in the
|
checking. The JIT-specific functions are discussed in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
|
@ -646,7 +646,7 @@ following compile-time parameters:
|
||||||
The newline character sequence
|
The newline character sequence
|
||||||
The compile time nested parentheses limit
|
The compile time nested parentheses limit
|
||||||
The maximum length of the pattern string
|
The maximum length of the pattern string
|
||||||
The extra options bits (none set by default)
|
The extra options bits (none set by default)
|
||||||
.sp
|
.sp
|
||||||
A compile context is also required if you are using custom memory management.
|
A compile context is also required if you are using custom memory management.
|
||||||
If none of these apply, just pass NULL as the context argument of
|
If none of these apply, just pass NULL as the context argument of
|
||||||
|
@ -695,9 +695,9 @@ in the current locale.
|
||||||
.sp
|
.sp
|
||||||
As PCRE2 has developed, almost all the 32 option bits that are available in
|
As PCRE2 has developed, almost all the 32 option bits that are available in
|
||||||
the \fIoptions\fP argument of \fBpcre2_compile()\fP have been used up. To avoid
|
the \fIoptions\fP argument of \fBpcre2_compile()\fP have been used up. To avoid
|
||||||
running out, the compile context contains a set of extra option bits which are
|
running out, the compile context contains a set of extra option bits which are
|
||||||
used for some newer, assumed rarer, options. This function sets those bits. It
|
used for some newer, assumed rarer, options. This function sets those bits. It
|
||||||
always sets all the bits (either on or off). It does not modify any existing
|
always sets all the bits (either on or off). It does not modify any existing
|
||||||
setting. The available options are defined in the section entitled "Extra
|
setting. The available options are defined in the section entitled "Extra
|
||||||
compile options"
|
compile options"
|
||||||
.\" HTML <a href="#extracompileoptions">
|
.\" HTML <a href="#extracompileoptions">
|
||||||
|
@ -724,8 +724,8 @@ PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||||
This specifies which characters or character sequences are to be recognized as
|
This specifies which characters or character sequences are to be recognized as
|
||||||
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
|
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
|
||||||
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
||||||
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
|
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
|
||||||
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
|
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
|
||||||
NUL character, that is a binary zero).
|
NUL character, that is a binary zero).
|
||||||
.P
|
.P
|
||||||
A pattern can override the value set in the compile context by starting with a
|
A pattern can override the value set in the compile context by starting with a
|
||||||
|
@ -778,7 +778,7 @@ A match context is required if you want to:
|
||||||
.sp
|
.sp
|
||||||
Set up a callout function
|
Set up a callout function
|
||||||
Set an offset limit for matching an unanchored pattern
|
Set an offset limit for matching an unanchored pattern
|
||||||
Change the limit on the amount of heap used when matching
|
Change the limit on the amount of heap used when matching
|
||||||
Change the backtracking match limit
|
Change the backtracking match limit
|
||||||
Change the backtracking depth limit
|
Change the backtracking depth limit
|
||||||
Set custom memory management specifically for the match
|
Set custom memory management specifically for the match
|
||||||
|
@ -846,7 +846,7 @@ In other words, whichever limit comes first is used.
|
||||||
.B " uint32_t \fIvalue\fP);"
|
.B " uint32_t \fIvalue\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
The \fIheap_limit\fP parameter specifies, in units of kilobytes, the maximum
|
||||||
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
amount of heap memory that \fBpcre2_match()\fP may use to hold backtracking
|
||||||
information when running an interpretive match. This limit does not apply to
|
information when running an interpretive match. This limit does not apply to
|
||||||
matching with the JIT optimization, which has its own memory control
|
matching with the JIT optimization, which has its own memory control
|
||||||
|
@ -855,8 +855,8 @@ arrangements (see the
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
|
documentation for more details), nor does it apply to \fBpcre2_dfa_match()\fP.
|
||||||
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
|
||||||
returned. The default limit is set when PCRE2 is built; the default default is
|
returned. The default limit is set when PCRE2 is built; the default default is
|
||||||
very large and is essentially "unlimited".
|
very large and is essentially "unlimited".
|
||||||
.P
|
.P
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
|
@ -870,11 +870,11 @@ limit is set, less than the default.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
stack for recording backtracking points. The more nested backtracking points
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
there are (that is, the deeper the search tree), the more memory is needed.
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
is set to a value less than 21 (in particular, zero) no heap memory will be
|
||||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
used. In this case, only patterns that do not have a lot of nested backtracking
|
||||||
can be successfully processed.
|
can be successfully processed.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
|
@ -891,8 +891,8 @@ time round its main matching loop. If this value reaches the match limit,
|
||||||
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||||
the effect of limiting the amount of backtracking that can take place. For
|
the effect of limiting the amount of backtracking that can take place. For
|
||||||
patterns that are not anchored, the count restarts from zero for each position
|
patterns that are not anchored, the count restarts from zero for each position
|
||||||
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
|
in the subject string. This limit also applies to \fBpcre2_dfa_match()\fP,
|
||||||
which ignores it.
|
though the counting is done in a different way.
|
||||||
.P
|
.P
|
||||||
When \fBpcre2_match()\fP is called with a pattern that was successfully
|
When \fBpcre2_match()\fP is called with a pattern that was successfully
|
||||||
processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
|
processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
|
||||||
|
@ -909,8 +909,8 @@ of the form
|
||||||
(*LIMIT_MATCH=ddd)
|
(*LIMIT_MATCH=ddd)
|
||||||
.sp
|
.sp
|
||||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||||
limit is set, less than the default.
|
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
|
@ -918,7 +918,7 @@ limit is set, less than the default.
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
||||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||||
to remember the state of matching at that point. Thus, this parameter
|
to remember the state of matching at that point. Thus, this parameter
|
||||||
indirectly limits the amount of memory that is used in a match. However,
|
indirectly limits the amount of memory that is used in a match. However,
|
||||||
because the size of each memory "frame" depends on the number of capturing
|
because the size of each memory "frame" depends on the number of capturing
|
||||||
|
@ -1040,7 +1040,7 @@ sequence that is recognized as meaning "newline". The values are:
|
||||||
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
||||||
.sp
|
.sp
|
||||||
The default should normally correspond to the standard sequence for your
|
The default should normally correspond to the standard sequence for your
|
||||||
operating system.
|
operating system.
|
||||||
|
@ -1270,7 +1270,7 @@ parenthesis. The name is not processed in any way, and it is not possible to
|
||||||
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
|
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
|
||||||
option is set, normal backslash processing is applied to verb names and only an
|
option is set, normal backslash processing is applied to verb names and only an
|
||||||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||||
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
|
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
|
||||||
or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is
|
or PCRE2_EXTENDED_MORE option is set, unescaped whitespace in verb names is
|
||||||
skipped and #-comments are recognized in this mode, exactly as in the rest of
|
skipped and #-comments are recognized in this mode, exactly as in the rest of
|
||||||
the pattern.
|
the pattern.
|
||||||
|
@ -1290,12 +1290,12 @@ documentation.
|
||||||
.sp
|
.sp
|
||||||
If this bit is set, letters in the pattern match both upper and lower case
|
If this bit is set, letters in the pattern match both upper and lower case
|
||||||
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
changed within a pattern by a (?i) option setting. If PCRE2_UTF is set, Unicode
|
||||||
properties are used for all characters with more than one other case, and for
|
properties are used for all characters with more than one other case, and for
|
||||||
all characters whose code points are greater than U+007f. For lower valued
|
all characters whose code points are greater than U+007f. For lower valued
|
||||||
characters with only one other case, a lookup table is used for speed. When
|
characters with only one other case, a lookup table is used for speed. When
|
||||||
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
PCRE2_UTF is not set, a lookup table is used for all code points less than 256,
|
||||||
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
and higher code points (available only in 16-bit or 32-bit mode) are treated as
|
||||||
not having another case.
|
not having another case.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_DOLLAR_ENDONLY
|
PCRE2_DOLLAR_ENDONLY
|
||||||
|
@ -1331,11 +1331,11 @@ documentation.
|
||||||
PCRE2_ENDANCHORED
|
PCRE2_ENDANCHORED
|
||||||
.sp
|
.sp
|
||||||
If this bit is set, the end of any pattern match must be right at the end of
|
If this bit is set, the end of any pattern match must be right at the end of
|
||||||
the string being searched (the "subject string"). If the pattern match
|
the string being searched (the "subject string"). If the pattern match
|
||||||
succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the
|
succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the
|
||||||
match fails at the current starting point. For unanchored patterns, a new match
|
match fails at the current starting point. For unanchored patterns, a new match
|
||||||
is then tried at the next starting point. However, if the match succeeds by
|
is then tried at the next starting point. However, if the match succeeds by
|
||||||
reaching the end of the pattern, but not the end of the subject, backtracking
|
reaching the end of the pattern, but not the end of the subject, backtracking
|
||||||
occurs and an alternative match may be found. Consider these two patterns:
|
occurs and an alternative match may be found. Consider these two patterns:
|
||||||
.sp
|
.sp
|
||||||
.(*ACCEPT)|..
|
.(*ACCEPT)|..
|
||||||
|
@ -1346,9 +1346,9 @@ whereas the second matches "bc". The effect of PCRE2_ENDANCHORED can also be
|
||||||
achieved by appropriate constructs in the pattern itself, which is the only way
|
achieved by appropriate constructs in the pattern itself, which is the only way
|
||||||
to do it in Perl.
|
to do it in Perl.
|
||||||
.P
|
.P
|
||||||
For DFA matching with \fBpcre2_dfa_match()\fP, PCRE2_ENDANCHORED applies only
|
For DFA matching with \fBpcre2_dfa_match()\fP, PCRE2_ENDANCHORED applies only
|
||||||
to the first (that is, the longest) matched string. Other parallel matches,
|
to the first (that is, the longest) matched string. Other parallel matches,
|
||||||
which are necessarily substrings of the first one, must obviously end before
|
which are necessarily substrings of the first one, must obviously end before
|
||||||
the end of the subject.
|
the end of the subject.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_EXTENDED
|
PCRE2_EXTENDED
|
||||||
|
@ -1520,7 +1520,7 @@ current starting position, which in this case, it does. However, if the same
|
||||||
match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
|
match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
|
||||||
subject string does not happen. The first match attempt is run starting from
|
subject string does not happen. The first match attempt is run starting from
|
||||||
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
||||||
the overall result is "no match".
|
the overall result is "no match".
|
||||||
.P
|
.P
|
||||||
There are also other start-up optimizations. For example, a minimum length for
|
There are also other start-up optimizations. For example, a minimum length for
|
||||||
the subject may be recorded. Consider the pattern
|
the subject may be recorded. Consider the pattern
|
||||||
|
@ -1556,12 +1556,12 @@ in the
|
||||||
\fBpcre2unicode\fP
|
\fBpcre2unicode\fP
|
||||||
.\"
|
.\"
|
||||||
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
|
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
|
||||||
negative error code.
|
negative error code.
|
||||||
.P
|
.P
|
||||||
If you know that your pattern is a valid UTF string, and you want to skip this
|
If you know that your pattern is a valid UTF string, and you want to skip this
|
||||||
check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When
|
check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When
|
||||||
it is set, the effect of passing an invalid UTF string as a pattern is
|
it is set, the effect of passing an invalid UTF string as a pattern is
|
||||||
undefined. It may cause your program to crash or loop.
|
undefined. It may cause your program to crash or loop.
|
||||||
.P
|
.P
|
||||||
Note that this option can also be passed to \fBpcre2_match()\fP and
|
Note that this option can also be passed to \fBpcre2_match()\fP and
|
||||||
\fBpcre_dfa_match()\fP, to suppress UTF validity checking of the subject
|
\fBpcre_dfa_match()\fP, to suppress UTF validity checking of the subject
|
||||||
|
@ -1575,7 +1575,7 @@ such as \ex{d800} you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra
|
||||||
option, as described in the section entitled "Extra compile options"
|
option, as described in the section entitled "Extra compile options"
|
||||||
.\" HTML <a href="#extracompileoptions">
|
.\" HTML <a href="#extracompileoptions">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
below.
|
below.
|
||||||
.\"
|
.\"
|
||||||
However, this is possible only in UTF-8 and UTF-32 modes, because these values
|
However, this is possible only in UTF-8 and UTF-32 modes, because these values
|
||||||
are not representable in UTF-16.
|
are not representable in UTF-16.
|
||||||
|
@ -1642,13 +1642,13 @@ calling the \fBpcre2_set_compile_extra_options()\fP function are as follows:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
.sp
|
.sp
|
||||||
This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
|
This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
|
||||||
forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate"
|
forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate"
|
||||||
code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
|
code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode
|
||||||
code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot
|
code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot
|
||||||
therefore be represented in UTF-16. They can be represented in UTF-8 and
|
therefore be represented in UTF-16. They can be represented in UTF-8 and
|
||||||
UTF-32, but are defined as invalid code points, and cause errors if encountered
|
UTF-32, but are defined as invalid code points, and cause errors if encountered
|
||||||
in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
|
in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
|
||||||
.P
|
.P
|
||||||
These values also cause errors if encountered in escape sequences such as
|
These values also cause errors if encountered in escape sequences such as
|
||||||
\ex{d912} within a pattern. However, it seems that some applications, when
|
\ex{d912} within a pattern. However, it seems that some applications, when
|
||||||
|
@ -1657,9 +1657,9 @@ for the surrogates using escape sequences. The PCRE2_NO_UTF_CHECK option does
|
||||||
not disable the error that occurs, because it applies only to the testing of
|
not disable the error that occurs, because it applies only to the testing of
|
||||||
input strings for UTF validity.
|
input strings for UTF validity.
|
||||||
.P
|
.P
|
||||||
If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
||||||
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
||||||
incorporated in the compiled pattern. However, they can only match subject
|
incorporated in the compiled pattern. However, they can only match subject
|
||||||
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -1881,7 +1881,7 @@ The third argument should point to an \fBuint32_t\fP variable.
|
||||||
If the pattern set a backtracking depth limit by including an item of the form
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note
|
||||||
that this limit will only be used during matching if it is less than the limit
|
that this limit will only be used during matching if it is less than the limit
|
||||||
set or defaulted by the caller of the match function.
|
set or defaulted by the caller of the match function.
|
||||||
.sp
|
.sp
|
||||||
|
@ -2092,7 +2092,7 @@ The output is one of the following \fBuint32_t\fP values:
|
||||||
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
PCRE2_NEWLINE_NUL The NUL character (binary zero)
|
||||||
.sp
|
.sp
|
||||||
This identifies the character sequence that will be recognized as meaning
|
This identifies the character sequence that will be recognized as meaning
|
||||||
"newline" while matching.
|
"newline" while matching.
|
||||||
|
@ -2319,8 +2319,8 @@ instead of one.
|
||||||
.P
|
.P
|
||||||
If a non-zero starting offset is passed when the pattern is anchored, a single
|
If a non-zero starting offset is passed when the pattern is anchored, a single
|
||||||
attempt to match at the given offset is made. This can only succeed if the
|
attempt to match at the given offset is made. This can only succeed if the
|
||||||
pattern does not require the match to be at the start of the subject. In other
|
pattern does not require the match to be at the start of the subject. In other
|
||||||
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||||
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -2509,7 +2509,7 @@ start, it skips both the CR and the LF before retrying. However, the pattern
|
||||||
reference, and so advances only by one character after the first failure.
|
reference, and so advances only by one character after the first failure.
|
||||||
.P
|
.P
|
||||||
An explicit match for CR of LF is either a literal appearance of one of those
|
An explicit match for CR of LF is either a literal appearance of one of those
|
||||||
characters in the pattern, or one of the \er or \en or equivalent octal or
|
characters in the pattern, or one of the \er or \en or equivalent octal or
|
||||||
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||||
does \es, even though it includes CR and LF in the characters that it matches.
|
does \es, even though it includes CR and LF in the characters that it matches.
|
||||||
.P
|
.P
|
||||||
|
@ -2769,9 +2769,9 @@ The backtracking match limit was reached.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NOMEMORY
|
PCRE2_ERROR_NOMEMORY
|
||||||
.sp
|
.sp
|
||||||
If a pattern contains many nested backtracking points, heap memory is used to
|
If a pattern contains many nested backtracking points, heap memory is used to
|
||||||
remember them. This error is given when the memory allocation function (default
|
remember them. This error is given when the memory allocation function (default
|
||||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
||||||
if the amount of memory needed exceeds the heap limit.
|
if the amount of memory needed exceeds the heap limit.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
|
@ -3491,6 +3491,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 26 May 2017
|
Last updated: 30 May 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2BUILD 3 "10 April 2017" "PCRE2 10.30"
|
.TH PCRE2BUILD 3 "30 May 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.
|
.
|
||||||
|
@ -256,9 +256,9 @@ setting such as
|
||||||
.sp
|
.sp
|
||||||
--with-match-limit=500000
|
--with-match-limit=500000
|
||||||
.sp
|
.sp
|
||||||
to the \fBconfigure\fP command. This setting has no effect on the
|
to the \fBconfigure\fP command. This setting also applies to the
|
||||||
\fBpcre2_dfa_match()\fP matching function, but it does also limit JIT matching
|
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
||||||
(though the counting is done differently).
|
counting is done differently).
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
|
@ -572,6 +572,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 10 April 2017
|
Last updated: 30 May 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "26 May 2017" "PCRE2 10.30"
|
.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -169,11 +169,11 @@ still recognized for backwards compatibility.
|
||||||
.P
|
.P
|
||||||
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
|
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
|
||||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
||||||
(but in a different way) when JIT is being used, but it is not relevant, and is
|
(but in a different way) when JIT is being used, or when
|
||||||
ignored, when matching with \fBpcre2_dfa_match()\fP. The depth limit is ignored
|
\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
|
||||||
by JIT but is relevant for DFA matching, which uses function recursion for
|
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
||||||
recursions within the pattern. In this case, the depth limit controls the
|
matching, which uses function recursion for recursions within the pattern. In
|
||||||
amount of system stack that is used.
|
this case, the depth limit controls the amount of system stack that is used.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="newlines"></a>
|
.\" HTML <a name="newlines"></a>
|
||||||
|
@ -3475,6 +3475,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 26 May 2017
|
Last updated: 30 May 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -396,6 +396,7 @@ BOOL utf = FALSE;
|
||||||
|
|
||||||
BOOL reset_could_continue = FALSE;
|
BOOL reset_could_continue = FALSE;
|
||||||
|
|
||||||
|
if (mb->match_call_count++ >= mb->match_limit) return PCRE2_ERROR_MATCHLIMIT;
|
||||||
if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT;
|
if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT;
|
||||||
offsetcount &= (uint32_t)(-2); /* Round down */
|
offsetcount &= (uint32_t)(-2); /* Round down */
|
||||||
|
|
||||||
|
@ -3218,6 +3219,7 @@ if (mcontext == NULL)
|
||||||
{
|
{
|
||||||
mb->callout = NULL;
|
mb->callout = NULL;
|
||||||
mb->memctl = re->memctl;
|
mb->memctl = re->memctl;
|
||||||
|
mb->match_limit = PRIV(default_match_context).match_limit;
|
||||||
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
|
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
|
@ -3231,8 +3233,13 @@ else
|
||||||
mb->callout = mcontext->callout;
|
mb->callout = mcontext->callout;
|
||||||
mb->callout_data = mcontext->callout_data;
|
mb->callout_data = mcontext->callout_data;
|
||||||
mb->memctl = mcontext->memctl;
|
mb->memctl = mcontext->memctl;
|
||||||
|
mb->match_limit = mcontext->match_limit;
|
||||||
mb->match_limit_depth = mcontext->depth_limit;
|
mb->match_limit_depth = mcontext->depth_limit;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (mb->match_limit > re->limit_match)
|
||||||
|
mb->match_limit = re->limit_match;
|
||||||
|
|
||||||
if (mb->match_limit_depth > re->limit_depth)
|
if (mb->match_limit_depth > re->limit_depth)
|
||||||
mb->match_limit_depth = re->limit_depth;
|
mb->match_limit_depth = re->limit_depth;
|
||||||
|
|
||||||
|
@ -3244,6 +3251,7 @@ mb->end_subject = end_subject;
|
||||||
mb->start_offset = start_offset;
|
mb->start_offset = start_offset;
|
||||||
mb->moptions = options;
|
mb->moptions = options;
|
||||||
mb->poptions = re->overall_options;
|
mb->poptions = re->overall_options;
|
||||||
|
mb->match_call_count = 0;
|
||||||
|
|
||||||
/* Process the \R and newline settings. */
|
/* Process the \R and newline settings. */
|
||||||
|
|
||||||
|
|
|
@ -178,20 +178,20 @@ for (i = 0; i < 2; i++)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
(void)pcre2_set_match_limit(match_context, 100);
|
(void)pcre2_set_match_limit(match_context, 100);
|
||||||
|
(void)pcre2_set_depth_limit(match_context, 100);
|
||||||
(void)pcre2_set_callout(match_context, callout_function, &callout_count);
|
(void)pcre2_set_callout(match_context, callout_function, &callout_count);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Match twice, with and without options, with a depth limit of 100. */
|
/* Match twice, with and without options. */
|
||||||
|
|
||||||
(void)pcre2_set_depth_limit(match_context, 100);
|
|
||||||
|
|
||||||
for (j = 0; j < 2; j++)
|
for (j = 0; j < 2; j++)
|
||||||
{
|
{
|
||||||
#ifdef STANDALONE
|
#ifdef STANDALONE
|
||||||
printf("Match options %.8x", match_options);
|
printf("Match options %.8x", match_options);
|
||||||
printf("%s%s%s%s%s%s%s%s%s\n",
|
printf("%s%s%s%s%s%s%s%s%s%s\n",
|
||||||
((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "",
|
((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "",
|
||||||
((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "",
|
((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "",
|
||||||
|
((match_options & PCRE2_NO_JIT) != 0)? ",no_jit" : "",
|
||||||
((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "",
|
((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "",
|
||||||
((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "",
|
((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "",
|
||||||
((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "",
|
((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "",
|
||||||
|
@ -217,9 +217,8 @@ for (i = 0; i < 2; i++)
|
||||||
match_options = 0; /* For second time */
|
match_options = 0; /* For second time */
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Match with DFA twice, with and without options, depth limit of 10. */
|
/* Match with DFA twice, with and without options. */
|
||||||
|
|
||||||
(void)pcre2_set_depth_limit(match_context, 10);
|
|
||||||
match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */
|
match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */
|
||||||
|
|
||||||
for (j = 0; j < 2; j++)
|
for (j = 0; j < 2; j++)
|
||||||
|
|
|
@ -877,7 +877,9 @@ typedef struct dfa_match_block {
|
||||||
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
||||||
const uint8_t *tables; /* Character tables */
|
const uint8_t *tables; /* Character tables */
|
||||||
PCRE2_SIZE start_offset; /* The start offset value */
|
PCRE2_SIZE start_offset; /* The start offset value */
|
||||||
|
uint32_t match_limit; /* As it says */
|
||||||
uint32_t match_limit_depth; /* As it says */
|
uint32_t match_limit_depth; /* As it says */
|
||||||
|
uint32_t match_call_count; /* Number of calls of internal function */
|
||||||
uint32_t moptions; /* Match options */
|
uint32_t moptions; /* Match options */
|
||||||
uint32_t poptions; /* Pattern options */
|
uint32_t poptions; /* Pattern options */
|
||||||
uint32_t nltype; /* Newline type */
|
uint32_t nltype; /* Newline type */
|
||||||
|
|
|
@ -7054,17 +7054,15 @@ else for (gmatched = 0;; gmatched++)
|
||||||
{
|
{
|
||||||
capcount = 0; /* This stops compiler warnings */
|
capcount = 0; /* This stops compiler warnings */
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_DFA) == 0)
|
if ((dat_datctl.control & CTL_DFA) == 0 &&
|
||||||
{
|
(FLD(compiled_code, executable_jit) == NULL ||
|
||||||
if (FLD(compiled_code, executable_jit) == NULL ||
|
(dat_datctl.options & PCRE2_NO_JIT) != 0))
|
||||||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
|
{
|
||||||
{
|
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
||||||
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT,
|
}
|
||||||
"heap");
|
|
||||||
}
|
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
|
||||||
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
|
"match");
|
||||||
"match");
|
|
||||||
}
|
|
||||||
|
|
||||||
if (FLD(compiled_code, executable_jit) == NULL ||
|
if (FLD(compiled_code, executable_jit) == NULL ||
|
||||||
(dat_datctl.options & PCRE2_NO_JIT) != 0 ||
|
(dat_datctl.options & PCRE2_NO_JIT) != 0 ||
|
||||||
|
|
|
@ -4941,4 +4941,7 @@
|
||||||
/(?<=|abc)/endanchored
|
/(?<=|abc)/endanchored
|
||||||
abcde\=aftertext
|
abcde\=aftertext
|
||||||
|
|
||||||
|
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
|
||||||
|
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
|
@ -7691,6 +7691,7 @@ Failed: error -53: matching depth limit exceeded
|
||||||
|
|
||||||
/^(a(?2))(b)(?1)/
|
/^(a(?2))(b)(?1)/
|
||||||
abbab\=find_limits
|
abbab\=find_limits
|
||||||
|
Minimum match limit = 4
|
||||||
Minimum depth limit = 2
|
Minimum depth limit = 2
|
||||||
0: abbab
|
0: abbab
|
||||||
|
|
||||||
|
@ -7766,4 +7767,8 @@ No match
|
||||||
0:
|
0:
|
||||||
0+
|
0+
|
||||||
|
|
||||||
|
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
|
||||||
|
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
|
||||||
|
Failed: error -47: match limit exceeded
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
Loading…
Reference in New Issue