Make pcre2_dfa_match() take notice of the match limit, to catch patterns that
use too much resource. This should fix oss-fuzz 1761.
This commit is contained in:
parent
a16919ce6f
commit
c0902e176f
|
@ -173,6 +173,10 @@ one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
|||
35. A lookbehind assertion that had a zero-length branch caused undefined
|
||||
behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
|
||||
|
||||
36. The match limit value now also applies to pcre2_dfa_match() as there are
|
||||
patterns that can use up a lot of resources without necessarily recursing very
|
||||
deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
|
||||
|
||||
|
||||
Version 10.23 14-February-2017
|
||||
------------------------------
|
||||
|
|
|
@ -46,8 +46,9 @@ just once (except when processing lookaround assertions). This function is
|
|||
<i>wscount</i> Number of elements in the vector
|
||||
</pre>
|
||||
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
|
||||
up a callout function or specify the recursion depth limit. The <i>length</i>
|
||||
and <i>startoffset</i> values are code units, not characters. The options are:
|
||||
up a callout function or specify the match and/or the recursion depth limits.
|
||||
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
|
||||
The options are:
|
||||
<pre>
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
|
|
|
@ -954,8 +954,8 @@ time round its main matching loop. If this value reaches the match limit,
|
|||
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||
the effect of limiting the amount of backtracking that can take place. For
|
||||
patterns that are not anchored, the count restarts from zero for each position
|
||||
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
|
||||
which ignores it.
|
||||
in the subject string. This limit also applies to <b>pcre2_dfa_match()</b>,
|
||||
though the counting is done in a different way.
|
||||
</P>
|
||||
<P>
|
||||
When <b>pcre2_match()</b> is called with a pattern that was successfully
|
||||
|
@ -974,8 +974,8 @@ of the form
|
|||
(*LIMIT_MATCH=ddd)
|
||||
</pre>
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
||||
limit is set, less than the default.
|
||||
less than the limit set by the caller of <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
|
@ -3471,7 +3471,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 30 May 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -260,9 +260,9 @@ setting such as
|
|||
<pre>
|
||||
--with-match-limit=500000
|
||||
</pre>
|
||||
to the <b>configure</b> command. This setting has no effect on the
|
||||
<b>pcre2_dfa_match()</b> matching function, but it does also limit JIT matching
|
||||
(though the counting is done differently).
|
||||
to the <b>configure</b> command. This setting also applies to the
|
||||
<b>pcre2_dfa_match()</b> matching function, and to JIT matching (though the
|
||||
counting is done differently).
|
||||
</P>
|
||||
<P>
|
||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||
|
@ -554,7 +554,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 10 April 2017
|
||||
Last updated: 30 May 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -204,11 +204,11 @@ still recognized for backwards compatibility.
|
|||
<P>
|
||||
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
|
||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
||||
(but in a different way) when JIT is being used, but it is not relevant, and is
|
||||
ignored, when matching with <b>pcre2_dfa_match()</b>. The depth limit is ignored
|
||||
by JIT but is relevant for DFA matching, which uses function recursion for
|
||||
recursions within the pattern. In this case, the depth limit controls the
|
||||
amount of system stack that is used.
|
||||
(but in a different way) when JIT is being used, or when
|
||||
<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
|
||||
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
||||
matching, which uses function recursion for recursions within the pattern. In
|
||||
this case, the depth limit controls the amount of system stack that is used.
|
||||
<a name="newlines"></a></P>
|
||||
<br><b>
|
||||
Newline conventions
|
||||
|
@ -3445,7 +3445,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 30 May 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
2309
doc/pcre2.txt
2309
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_DFA_MATCH 3 "04 April 2017" "PCRE2 10.30"
|
||||
.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -34,8 +34,9 @@ just once (except when processing lookaround assertions). This function is
|
|||
\fIwscount\fP Number of elements in the vector
|
||||
.sp
|
||||
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
||||
up a callout function or specify the recursion depth limit. The \fIlength\fP
|
||||
and \fIstartoffset\fP values are code units, not characters. The options are:
|
||||
up a callout function or specify the match and/or the recursion depth limits.
|
||||
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
|
||||
The options are:
|
||||
.sp
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "26 May 2017" "PCRE2 10.30"
|
||||
.TH PCRE2API 3 "30 May 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -891,8 +891,8 @@ time round its main matching loop. If this value reaches the match limit,
|
|||
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||
the effect of limiting the amount of backtracking that can take place. For
|
||||
patterns that are not anchored, the count restarts from zero for each position
|
||||
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
|
||||
which ignores it.
|
||||
in the subject string. This limit also applies to \fBpcre2_dfa_match()\fP,
|
||||
though the counting is done in a different way.
|
||||
.P
|
||||
When \fBpcre2_match()\fP is called with a pattern that was successfully
|
||||
processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
|
||||
|
@ -909,8 +909,8 @@ of the form
|
|||
(*LIMIT_MATCH=ddd)
|
||||
.sp
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||
limit is set, less than the default.
|
||||
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||
|
@ -3491,6 +3491,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 30 May 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2BUILD 3 "10 April 2017" "PCRE2 10.30"
|
||||
.TH PCRE2BUILD 3 "30 May 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.
|
||||
|
@ -256,9 +256,9 @@ setting such as
|
|||
.sp
|
||||
--with-match-limit=500000
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This setting has no effect on the
|
||||
\fBpcre2_dfa_match()\fP matching function, but it does also limit JIT matching
|
||||
(though the counting is done differently).
|
||||
to the \fBconfigure\fP command. This setting also applies to the
|
||||
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
||||
counting is done differently).
|
||||
.P
|
||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||
stack to record backtracking points. The more nested backtracking points there
|
||||
|
@ -572,6 +572,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 10 April 2017
|
||||
Last updated: 30 May 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "26 May 2017" "PCRE2 10.30"
|
||||
.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -169,11 +169,11 @@ still recognized for backwards compatibility.
|
|||
.P
|
||||
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
|
||||
for matching. It does not apply to JIT or DFA matching. The match limit is used
|
||||
(but in a different way) when JIT is being used, but it is not relevant, and is
|
||||
ignored, when matching with \fBpcre2_dfa_match()\fP. The depth limit is ignored
|
||||
by JIT but is relevant for DFA matching, which uses function recursion for
|
||||
recursions within the pattern. In this case, the depth limit controls the
|
||||
amount of system stack that is used.
|
||||
(but in a different way) when JIT is being used, or when
|
||||
\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
|
||||
matching functions. The depth limit is ignored by JIT but is relevant for DFA
|
||||
matching, which uses function recursion for recursions within the pattern. In
|
||||
this case, the depth limit controls the amount of system stack that is used.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="newlines"></a>
|
||||
|
@ -3475,6 +3475,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 30 May 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -396,6 +396,7 @@ BOOL utf = FALSE;
|
|||
|
||||
BOOL reset_could_continue = FALSE;
|
||||
|
||||
if (mb->match_call_count++ >= mb->match_limit) return PCRE2_ERROR_MATCHLIMIT;
|
||||
if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT;
|
||||
offsetcount &= (uint32_t)(-2); /* Round down */
|
||||
|
||||
|
@ -3218,6 +3219,7 @@ if (mcontext == NULL)
|
|||
{
|
||||
mb->callout = NULL;
|
||||
mb->memctl = re->memctl;
|
||||
mb->match_limit = PRIV(default_match_context).match_limit;
|
||||
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
|
||||
}
|
||||
else
|
||||
|
@ -3231,8 +3233,13 @@ else
|
|||
mb->callout = mcontext->callout;
|
||||
mb->callout_data = mcontext->callout_data;
|
||||
mb->memctl = mcontext->memctl;
|
||||
mb->match_limit = mcontext->match_limit;
|
||||
mb->match_limit_depth = mcontext->depth_limit;
|
||||
}
|
||||
|
||||
if (mb->match_limit > re->limit_match)
|
||||
mb->match_limit = re->limit_match;
|
||||
|
||||
if (mb->match_limit_depth > re->limit_depth)
|
||||
mb->match_limit_depth = re->limit_depth;
|
||||
|
||||
|
@ -3244,6 +3251,7 @@ mb->end_subject = end_subject;
|
|||
mb->start_offset = start_offset;
|
||||
mb->moptions = options;
|
||||
mb->poptions = re->overall_options;
|
||||
mb->match_call_count = 0;
|
||||
|
||||
/* Process the \R and newline settings. */
|
||||
|
||||
|
|
|
@ -178,20 +178,20 @@ for (i = 0; i < 2; i++)
|
|||
return 0;
|
||||
}
|
||||
(void)pcre2_set_match_limit(match_context, 100);
|
||||
(void)pcre2_set_depth_limit(match_context, 100);
|
||||
(void)pcre2_set_callout(match_context, callout_function, &callout_count);
|
||||
}
|
||||
|
||||
/* Match twice, with and without options, with a depth limit of 100. */
|
||||
|
||||
(void)pcre2_set_depth_limit(match_context, 100);
|
||||
/* Match twice, with and without options. */
|
||||
|
||||
for (j = 0; j < 2; j++)
|
||||
{
|
||||
#ifdef STANDALONE
|
||||
printf("Match options %.8x", match_options);
|
||||
printf("%s%s%s%s%s%s%s%s%s\n",
|
||||
printf("%s%s%s%s%s%s%s%s%s%s\n",
|
||||
((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "",
|
||||
((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "",
|
||||
((match_options & PCRE2_NO_JIT) != 0)? ",no_jit" : "",
|
||||
((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "",
|
||||
((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "",
|
||||
((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "",
|
||||
|
@ -217,9 +217,8 @@ for (i = 0; i < 2; i++)
|
|||
match_options = 0; /* For second time */
|
||||
}
|
||||
|
||||
/* Match with DFA twice, with and without options, depth limit of 10. */
|
||||
/* Match with DFA twice, with and without options. */
|
||||
|
||||
(void)pcre2_set_depth_limit(match_context, 10);
|
||||
match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */
|
||||
|
||||
for (j = 0; j < 2; j++)
|
||||
|
|
|
@ -877,7 +877,9 @@ typedef struct dfa_match_block {
|
|||
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
||||
const uint8_t *tables; /* Character tables */
|
||||
PCRE2_SIZE start_offset; /* The start offset value */
|
||||
uint32_t match_limit; /* As it says */
|
||||
uint32_t match_limit_depth; /* As it says */
|
||||
uint32_t match_call_count; /* Number of calls of internal function */
|
||||
uint32_t moptions; /* Match options */
|
||||
uint32_t poptions; /* Pattern options */
|
||||
uint32_t nltype; /* Newline type */
|
||||
|
|
|
@ -7054,18 +7054,16 @@ else for (gmatched = 0;; gmatched++)
|
|||
{
|
||||
capcount = 0; /* This stops compiler warnings */
|
||||
|
||||
if ((dat_datctl.control & CTL_DFA) == 0)
|
||||
if ((dat_datctl.control & CTL_DFA) == 0 &&
|
||||
(FLD(compiled_code, executable_jit) == NULL ||
|
||||
(dat_datctl.options & PCRE2_NO_JIT) != 0))
|
||||
{
|
||||
if (FLD(compiled_code, executable_jit) == NULL ||
|
||||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
|
||||
{
|
||||
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT,
|
||||
"heap");
|
||||
}
|
||||
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
|
||||
"match");
|
||||
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
||||
}
|
||||
|
||||
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
|
||||
"match");
|
||||
|
||||
if (FLD(compiled_code, executable_jit) == NULL ||
|
||||
(dat_datctl.options & PCRE2_NO_JIT) != 0 ||
|
||||
(dat_datctl.control & CTL_DFA) != 0)
|
||||
|
|
|
@ -4941,4 +4941,7 @@
|
|||
/(?<=|abc)/endanchored
|
||||
abcde\=aftertext
|
||||
|
||||
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
|
||||
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
|
||||
|
||||
# End of testinput6
|
||||
|
|
|
@ -7691,6 +7691,7 @@ Failed: error -53: matching depth limit exceeded
|
|||
|
||||
/^(a(?2))(b)(?1)/
|
||||
abbab\=find_limits
|
||||
Minimum match limit = 4
|
||||
Minimum depth limit = 2
|
||||
0: abbab
|
||||
|
||||
|
@ -7766,4 +7767,8 @@ No match
|
|||
0:
|
||||
0+
|
||||
|
||||
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
|
||||
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
|
||||
Failed: error -47: match limit exceeded
|
||||
|
||||
# End of testinput6
|
||||
|
|
Loading…
Reference in New Issue