Make the recursion limit apply to DFA matching.

This commit is contained in:
Philip.Hazel 2016-12-23 11:04:51 +00:00
parent 3df9674c4e
commit 1f87b60f01
10 changed files with 66 additions and 30 deletions

View File

@ -233,6 +233,10 @@ too many nested or recursive back references. If the limit was reached in
certain recursive cases it failed to be triggered and an internal error could
be the result.
36. The pcre2_dfa_match() function now takes note of the recursion limit for
the internal recursive calls that are used for lookrounds and recursions within
the pattern.
Version 10.22 29-July-2016
--------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00"
.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -33,8 +33,8 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
\fIwscount\fP Number of elements in the vector
.sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code
units, not characters. The options are:
up a callout function or specify the recursion limit. The \fIlength\fP and
\fIstartoffset\fP values are code units, not characters. The options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject is not the beginning of a line

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23"
.TH PCRE2API 3 "24 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -840,20 +840,22 @@ This limit is of use only if it is set smaller than \fImatch_limit\fP.
Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
and is ignored, when matching is done using JIT compiled code or by the
\fBpcre2_dfa_match()\fP function.
and is ignored, when matching is done using JIT compiled code. However, it is
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
.P
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
default default is the same value as the default for \fImatch_limit\fP. If the
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_RECURSIONLIMIT. A
value for the recursion limit may also be supplied by an item at the start of a
pattern of the form
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
supplied by an item at the start of a pattern of the form
.sp
(*LIMIT_RECURSION=ddd)
.sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
limit is set, less than the default.
less than the limit set by the caller of \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp
.nf
.B int pcre2_set_recursion_memory_management(
@ -3319,6 +3321,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 22 November 2016
Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "23 October 2016" "PCRE2 10.23"
.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -158,6 +158,11 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used.
.P
The match limit is used (but in a different way) when JIT is being used, but it
is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
However, the recursion limit is relevant for DFA matching, which does use some
function recursion, in particular, for recursions within the pattern.
.
.
.\" HTML <a name="newlines"></a>
@ -3477,6 +3482,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 23 October 2016
Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00"
.TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 DISCUSSION OF STACK USAGE"
@ -43,11 +43,12 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
Normally, these are never very deep, and the limit on the complexity of
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions;
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At
present, there is no protection against this.
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
limit is applied (see below).
.P
The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are
relevant only for \fBpcre2_match()\fP without the JIT optimization.
The comments in the next three sections do not apply to
\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
the JIT optimization.
.
.
.SS "Reducing \fBpcre2_match()\fP's stack usage"
@ -147,6 +148,15 @@ pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with
different limits.
.
.
.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
.rs
.sp
The recursion limit, as described above for \fBpcre2_match()\fP, also applies
to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
recursions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
.
.
.SS "Changing stack size in Unix-like systems"
.rs
.sp
@ -197,6 +207,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 21 November 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23"
.TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -428,9 +428,10 @@ appear.
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
.sp
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
limits set by the caller of pcre2_match(), not increase them. The application
can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
PCRE2_NEVER_UCP options, respectively, at compile time.
limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not
increase them. The application can lock out the use of (*UTF) and (*UCP) by
setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
compile time.
.
.
.SH "NEWLINE CONVENTION"
@ -584,6 +585,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 28 September 2016
Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi

View File

@ -371,7 +371,7 @@ internal_dfa_match(
uint32_t offsetcount,
int *workspace,
int wscount,
int rlevel)
uint32_t rlevel)
{
stateblock *active_states, *new_states, *temp_states;
stateblock *next_active_state, *next_new_state;
@ -400,7 +400,7 @@ BOOL utf = FALSE;
BOOL reset_could_continue = FALSE;
rlevel++;
if (rlevel++ > mb->match_limit_recursion) return PCRE2_ERROR_RECURSIONLIMIT;
offsetcount &= (uint32_t)(-2); /* Round down */
wscount -= 2;
@ -2591,7 +2591,7 @@ for (;;)
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
}
@ -2710,7 +2710,7 @@ for (;;)
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) ==
(condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
@ -3216,6 +3216,7 @@ if (mcontext == NULL)
{
mb->callout = NULL;
mb->memctl = re->memctl;
mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
}
else
{
@ -3228,7 +3229,10 @@ else
mb->callout = mcontext->callout;
mb->callout_data = mcontext->callout_data;
mb->memctl = mcontext->memctl;
mb->match_limit_recursion = mcontext->recursion_limit;
}
if (mb->match_limit_recursion > re->limit_recursion)
mb->match_limit_recursion = re->limit_recursion;
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
re->name_count * re->name_entry_size;

View File

@ -843,6 +843,7 @@ typedef struct dfa_match_block {
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
const uint8_t *tables; /* Character tables */
PCRE2_SIZE start_offset; /* The start offset value */
uint32_t match_limit_recursion; /* As it says */
uint32_t moptions; /* Match options */
uint32_t poptions; /* Pattern options */
uint32_t nltype; /* Newline type */

4
testdata/testinput6 vendored
View File

@ -4882,4 +4882,8 @@
aaa\=dfa,allcaptures
a\=dfa,allcaptures
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
\= Expect recursion limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
# End of testinput6

View File

@ -7682,4 +7682,9 @@ No match
** Ignored after DFA matching: allcaptures
0: a
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
\= Expect recursion limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
Failed: error -53: recursion limit exceeded
# End of testinput6