Make pcre2_dfa_match() take notice of the match limit, to catch patterns that

use too much resource. This should fix oss-fuzz 1761.
This commit is contained in:
Philip.Hazel 2017-05-30 10:42:57 +00:00
parent a16919ce6f
commit c0902e176f
16 changed files with 1340 additions and 1318 deletions

View File

@ -173,6 +173,10 @@ one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
35. A lookbehind assertion that had a zero-length branch caused undefined
behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
36. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
Version 10.23 14-February-2017
------------------------------

View File

@ -46,8 +46,9 @@ just once (except when processing lookaround assertions). This function is
<i>wscount</i> Number of elements in the vector
</pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
up a callout function or specify the recursion depth limit. The <i>length</i>
and <i>startoffset</i> values are code units, not characters. The options are:
up a callout function or specify the match and/or the recursion depth limits.
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject

View File

@ -954,8 +954,8 @@ time round its main matching loop. If this value reaches the match limit,
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
which ignores it.
in the subject string. This limit also applies to <b>pcre2_dfa_match()</b>,
though the counting is done in a different way.
</P>
<P>
When <b>pcre2_match()</b> is called with a pattern that was successfully
@ -974,8 +974,8 @@ of the form
(*LIMIT_MATCH=ddd)
</pre>
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
limit is set, less than the default.
less than the limit set by the caller of <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
<br>
<br>
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
@ -3471,7 +3471,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 May 2017
Last updated: 30 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -260,9 +260,9 @@ setting such as
<pre>
--with-match-limit=500000
</pre>
to the <b>configure</b> command. This setting has no effect on the
<b>pcre2_dfa_match()</b> matching function, but it does also limit JIT matching
(though the counting is done differently).
to the <b>configure</b> command. This setting also applies to the
<b>pcre2_dfa_match()</b> matching function, and to JIT matching (though the
counting is done differently).
</P>
<P>
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
@ -554,7 +554,7 @@ Cambridge, England.
</P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P>
Last updated: 10 April 2017
Last updated: 30 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -204,11 +204,11 @@ still recognized for backwards compatibility.
<P>
The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit is used
(but in a different way) when JIT is being used, but it is not relevant, and is
ignored, when matching with <b>pcre2_dfa_match()</b>. The depth limit is ignored
by JIT but is relevant for DFA matching, which uses function recursion for
recursions within the pattern. In this case, the depth limit controls the
amount of system stack that is used.
(but in a different way) when JIT is being used, or when
<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
matching functions. The depth limit is ignored by JIT but is relevant for DFA
matching, which uses function recursion for recursions within the pattern. In
this case, the depth limit controls the amount of system stack that is used.
<a name="newlines"></a></P>
<br><b>
Newline conventions
@ -3445,7 +3445,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 May 2017
Last updated: 30 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -963,8 +963,8 @@ PCRE2 CONTEXTS
limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
This has the effect of limiting the amount of backtracking that can
take place. For patterns that are not anchored, the count restarts from
zero for each position in the subject string. This limit is not rele-
vant to pcre2_dfa_match(), which ignores it.
zero for each position in the subject string. This limit also applies
to pcre2_dfa_match(), though the counting is done in a different way.
When pcre2_match() is called with a pattern that was successfully pro-
cessed by pcre2_jit_compile(), the way in which matching is executed is
@ -981,8 +981,8 @@ PCRE2 CONTEXTS
(*LIMIT_MATCH=ddd)
where ddd is a decimal number. However, such a setting is ignored
unless ddd is less than the limit set by the caller of pcre2_match()
or, if no such limit is set, less than the default.
unless ddd is less than the limit set by the caller of pcre2_match() or
pcre2_dfa_match() or, if no such limit is set, less than the default.
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
uint32_t value);
@ -3350,7 +3350,7 @@ AUTHOR
REVISION
Last updated: 26 May 2017
Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -3586,9 +3586,9 @@ LIMITING PCRE2 RESOURCE USAGE
--with-match-limit=500000
to the configure command. This setting has no effect on the
pcre2_dfa_match() matching function, but it does also limit JIT match-
ing (though the counting is done differently).
to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the
counting is done differently).
The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking
@ -3885,7 +3885,7 @@ AUTHOR
REVISION
Last updated: 10 April 2017
Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -5751,11 +5751,12 @@ SPECIAL START-OF-PATTERN ITEMS
The heap limit applies only when the pcre2_match() interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit
is used (but in a different way) when JIT is being used, but it is not
relevant, and is ignored, when matching with pcre2_dfa_match(). The
depth limit is ignored by JIT but is relevant for DFA matching, which
uses function recursion for recursions within the pattern. In this
case, the depth limit controls the amount of system stack that is used.
is used (but in a different way) when JIT is being used, or when
pcre2_dfa_match() is called, to limit computing resource usage by those
matching functions. The depth limit is ignored by JIT but is relevant
for DFA matching, which uses function recursion for recursions within
the pattern. In this case, the depth limit controls the amount of sys-
tem stack that is used.
Newline conventions
@ -8686,7 +8687,7 @@ AUTHOR
REVISION
Last updated: 26 May 2017
Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "04 April 2017" "PCRE2 10.30"
.TH PCRE2_DFA_MATCH 3 "30 May 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -34,8 +34,9 @@ just once (except when processing lookaround assertions). This function is
\fIwscount\fP Number of elements in the vector
.sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function or specify the recursion depth limit. The \fIlength\fP
and \fIstartoffset\fP values are code units, not characters. The options are:
up a callout function or specify the match and/or the recursion depth limits.
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
The options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "26 May 2017" "PCRE2 10.30"
.TH PCRE2API 3 "30 May 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -891,8 +891,8 @@ time round its main matching loop. If this value reaches the match limit,
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
which ignores it.
in the subject string. This limit also applies to \fBpcre2_dfa_match()\fP,
though the counting is done in a different way.
.P
When \fBpcre2_match()\fP is called with a pattern that was successfully
processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
@ -909,8 +909,8 @@ of the form
(*LIMIT_MATCH=ddd)
.sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
limit is set, less than the default.
less than the limit set by the caller of \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp
.nf
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
@ -3491,6 +3491,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 May 2017
Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "10 April 2017" "PCRE2 10.30"
.TH PCRE2BUILD 3 "30 May 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.
@ -256,9 +256,9 @@ setting such as
.sp
--with-match-limit=500000
.sp
to the \fBconfigure\fP command. This setting has no effect on the
\fBpcre2_dfa_match()\fP matching function, but it does also limit JIT matching
(though the counting is done differently).
to the \fBconfigure\fP command. This setting also applies to the
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
counting is done differently).
.P
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking points there
@ -572,6 +572,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 10 April 2017
Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "26 May 2017" "PCRE2 10.30"
.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -169,11 +169,11 @@ still recognized for backwards compatibility.
.P
The heap limit applies only when the \fBpcre2_match()\fP interpreter is used
for matching. It does not apply to JIT or DFA matching. The match limit is used
(but in a different way) when JIT is being used, but it is not relevant, and is
ignored, when matching with \fBpcre2_dfa_match()\fP. The depth limit is ignored
by JIT but is relevant for DFA matching, which uses function recursion for
recursions within the pattern. In this case, the depth limit controls the
amount of system stack that is used.
(but in a different way) when JIT is being used, or when
\fBpcre2_dfa_match()\fP is called, to limit computing resource usage by those
matching functions. The depth limit is ignored by JIT but is relevant for DFA
matching, which uses function recursion for recursions within the pattern. In
this case, the depth limit controls the amount of system stack that is used.
.
.
.\" HTML <a name="newlines"></a>
@ -3475,6 +3475,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 May 2017
Last updated: 30 May 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -396,6 +396,7 @@ BOOL utf = FALSE;
BOOL reset_could_continue = FALSE;
if (mb->match_call_count++ >= mb->match_limit) return PCRE2_ERROR_MATCHLIMIT;
if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT;
offsetcount &= (uint32_t)(-2); /* Round down */
@ -3218,6 +3219,7 @@ if (mcontext == NULL)
{
mb->callout = NULL;
mb->memctl = re->memctl;
mb->match_limit = PRIV(default_match_context).match_limit;
mb->match_limit_depth = PRIV(default_match_context).depth_limit;
}
else
@ -3231,8 +3233,13 @@ else
mb->callout = mcontext->callout;
mb->callout_data = mcontext->callout_data;
mb->memctl = mcontext->memctl;
mb->match_limit = mcontext->match_limit;
mb->match_limit_depth = mcontext->depth_limit;
}
if (mb->match_limit > re->limit_match)
mb->match_limit = re->limit_match;
if (mb->match_limit_depth > re->limit_depth)
mb->match_limit_depth = re->limit_depth;
@ -3244,6 +3251,7 @@ mb->end_subject = end_subject;
mb->start_offset = start_offset;
mb->moptions = options;
mb->poptions = re->overall_options;
mb->match_call_count = 0;
/* Process the \R and newline settings. */

View File

@ -178,20 +178,20 @@ for (i = 0; i < 2; i++)
return 0;
}
(void)pcre2_set_match_limit(match_context, 100);
(void)pcre2_set_depth_limit(match_context, 100);
(void)pcre2_set_callout(match_context, callout_function, &callout_count);
}
/* Match twice, with and without options, with a depth limit of 100. */
(void)pcre2_set_depth_limit(match_context, 100);
/* Match twice, with and without options. */
for (j = 0; j < 2; j++)
{
#ifdef STANDALONE
printf("Match options %.8x", match_options);
printf("%s%s%s%s%s%s%s%s%s\n",
printf("%s%s%s%s%s%s%s%s%s%s\n",
((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "",
((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "",
((match_options & PCRE2_NO_JIT) != 0)? ",no_jit" : "",
((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "",
((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "",
((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "",
@ -217,9 +217,8 @@ for (i = 0; i < 2; i++)
match_options = 0; /* For second time */
}
/* Match with DFA twice, with and without options, depth limit of 10. */
/* Match with DFA twice, with and without options. */
(void)pcre2_set_depth_limit(match_context, 10);
match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */
for (j = 0; j < 2; j++)

View File

@ -877,7 +877,9 @@ typedef struct dfa_match_block {
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
const uint8_t *tables; /* Character tables */
PCRE2_SIZE start_offset; /* The start offset value */
uint32_t match_limit; /* As it says */
uint32_t match_limit_depth; /* As it says */
uint32_t match_call_count; /* Number of calls of internal function */
uint32_t moptions; /* Match options */
uint32_t poptions; /* Pattern options */
uint32_t nltype; /* Newline type */

View File

@ -7054,17 +7054,15 @@ else for (gmatched = 0;; gmatched++)
{
capcount = 0; /* This stops compiler warnings */
if ((dat_datctl.control & CTL_DFA) == 0)
if ((dat_datctl.control & CTL_DFA) == 0 &&
(FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0))
{
if (FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
{
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT,
"heap");
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
}
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
"match");
}
if (FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0 ||

3
testdata/testinput6 vendored
View File

@ -4941,4 +4941,7 @@
/(?<=|abc)/endanchored
abcde\=aftertext
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
# End of testinput6

View File

@ -7691,6 +7691,7 @@ Failed: error -53: matching depth limit exceeded
/^(a(?2))(b)(?1)/
abbab\=find_limits
Minimum match limit = 4
Minimum depth limit = 2
0: abbab
@ -7766,4 +7767,8 @@ No match
0:
0+
/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor
.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););
Failed: error -47: match limit exceeded
# End of testinput6