Make the recursion limit apply to DFA matching.

This commit is contained in:
Philip.Hazel 2016-12-23 11:04:51 +00:00
parent 3df9674c4e
commit 1f87b60f01
10 changed files with 66 additions and 30 deletions

View File

@ -233,6 +233,10 @@ too many nested or recursive back references. If the limit was reached in
certain recursive cases it failed to be triggered and an internal error could certain recursive cases it failed to be triggered and an internal error could
be the result. be the result.
36. The pcre2_dfa_match() function now takes note of the recursion limit for
the internal recursive calls that are used for lookrounds and recursions within
the pattern.
Version 10.22 29-July-2016 Version 10.22 29-July-2016
-------------------------- --------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00" .TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -33,8 +33,8 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
\fIwscount\fP Number of elements in the vector \fIwscount\fP Number of elements in the vector
.sp .sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code up a callout function or specify the recursion limit. The \fIlength\fP and
units, not characters. The options are: \fIstartoffset\fP values are code units, not characters. The options are:
.sp .sp
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject is not the beginning of a line PCRE2_NOTBOL Subject is not the beginning of a line

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23" .TH PCRE2API 3 "24 December 2016" "PCRE2 10.23"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -840,20 +840,22 @@ This limit is of use only if it is set smaller than \fImatch_limit\fP.
Limiting the recursion depth limits the amount of system stack that can be Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant, stack, the amount of heap memory that can be used. This limit is not relevant,
and is ignored, when matching is done using JIT compiled code or by the and is ignored, when matching is done using JIT compiled code. However, it is
\fBpcre2_dfa_match()\fP function. supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
.P .P
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
default default is the same value as the default for \fImatch_limit\fP. If the default default is the same value as the default for \fImatch_limit\fP. If the
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_RECURSIONLIMIT. A limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
value for the recursion limit may also be supplied by an item at the start of a PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
pattern of the form supplied by an item at the start of a pattern of the form
.sp .sp
(*LIMIT_RECURSION=ddd) (*LIMIT_RECURSION=ddd)
.sp .sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such less than the limit set by the caller of \fBpcre2_match()\fP or
limit is set, less than the default. \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp .sp
.nf .nf
.B int pcre2_set_recursion_memory_management( .B int pcre2_set_recursion_memory_management(
@ -3319,6 +3321,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 22 November 2016 Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2016 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "23 October 2016" "PCRE2 10.23" .TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -158,6 +158,11 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
for it to have any effect. In other words, the pattern writer can lower the for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used. setting of one of these limits, the lower value is used.
.P
The match limit is used (but in a different way) when JIT is being used, but it
is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
However, the recursion limit is relevant for DFA matching, which does use some
function recursion, in particular, for recursions within the pattern.
. .
. .
.\" HTML <a name="newlines"></a> .\" HTML <a name="newlines"></a>
@ -3477,6 +3482,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 23 October 2016 Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2016 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00" .TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 DISCUSSION OF STACK USAGE" .SH "PCRE2 DISCUSSION OF STACK USAGE"
@ -43,11 +43,12 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
Normally, these are never very deep, and the limit on the complexity of Normally, these are never very deep, and the limit on the complexity of
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given. \fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions; However, it is possible to write patterns with runaway infinite recursions;
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
present, there is no protection against this. limit is applied (see below).
.P .P
The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are The comments in the next three sections do not apply to
relevant only for \fBpcre2_match()\fP without the JIT optimization. \fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
the JIT optimization.
. .
. .
.SS "Reducing \fBpcre2_match()\fP's stack usage" .SS "Reducing \fBpcre2_match()\fP's stack usage"
@ -147,6 +148,15 @@ pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with
different limits. different limits.
. .
. .
.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
.rs
.sp
The recursion limit, as described above for \fBpcre2_match()\fP, also applies
to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
recursions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
.
.
.SS "Changing stack size in Unix-like systems" .SS "Changing stack size in Unix-like systems"
.rs .rs
.sp .sp
@ -197,6 +207,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 21 November 2014 Last updated: 23 December 2016
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2016 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23" .TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -428,9 +428,10 @@ appear.
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc) (*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
.sp .sp
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
limits set by the caller of pcre2_match(), not increase them. The application limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not
can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or increase them. The application can lock out the use of (*UTF) and (*UCP) by
PCRE2_NEVER_UCP options, respectively, at compile time. setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
compile time.
. .
. .
.SH "NEWLINE CONVENTION" .SH "NEWLINE CONVENTION"
@ -584,6 +585,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 September 2016 Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2016 University of Cambridge.
.fi .fi

View File

@ -371,7 +371,7 @@ internal_dfa_match(
uint32_t offsetcount, uint32_t offsetcount,
int *workspace, int *workspace,
int wscount, int wscount,
int rlevel) uint32_t rlevel)
{ {
stateblock *active_states, *new_states, *temp_states; stateblock *active_states, *new_states, *temp_states;
stateblock *next_active_state, *next_new_state; stateblock *next_active_state, *next_new_state;
@ -400,7 +400,7 @@ BOOL utf = FALSE;
BOOL reset_could_continue = FALSE; BOOL reset_could_continue = FALSE;
rlevel++; if (rlevel++ > mb->match_limit_recursion) return PCRE2_ERROR_RECURSIONLIMIT;
offsetcount &= (uint32_t)(-2); /* Round down */ offsetcount &= (uint32_t)(-2); /* Round down */
wscount -= 2; wscount -= 2;
@ -2591,7 +2591,7 @@ for (;;)
sizeof(local_workspace)/sizeof(int), /* size of same */ sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */ rlevel); /* function recursion level */
if (rc == PCRE2_ERROR_DFA_UITEM) return rc; if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK)) if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); } { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
} }
@ -2710,7 +2710,7 @@ for (;;)
sizeof(local_workspace)/sizeof(int), /* size of same */ sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */ rlevel); /* function recursion level */
if (rc == PCRE2_ERROR_DFA_UITEM) return rc; if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) == if ((rc >= 0) ==
(condcode == OP_ASSERT || condcode == OP_ASSERTBACK)) (condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); } { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
@ -3216,6 +3216,7 @@ if (mcontext == NULL)
{ {
mb->callout = NULL; mb->callout = NULL;
mb->memctl = re->memctl; mb->memctl = re->memctl;
mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
} }
else else
{ {
@ -3228,7 +3229,10 @@ else
mb->callout = mcontext->callout; mb->callout = mcontext->callout;
mb->callout_data = mcontext->callout_data; mb->callout_data = mcontext->callout_data;
mb->memctl = mcontext->memctl; mb->memctl = mcontext->memctl;
mb->match_limit_recursion = mcontext->recursion_limit;
} }
if (mb->match_limit_recursion > re->limit_recursion)
mb->match_limit_recursion = re->limit_recursion;
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) + mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
re->name_count * re->name_entry_size; re->name_count * re->name_entry_size;

View File

@ -843,6 +843,7 @@ typedef struct dfa_match_block {
PCRE2_SPTR last_used_ptr; /* Latest consulted character */ PCRE2_SPTR last_used_ptr; /* Latest consulted character */
const uint8_t *tables; /* Character tables */ const uint8_t *tables; /* Character tables */
PCRE2_SIZE start_offset; /* The start offset value */ PCRE2_SIZE start_offset; /* The start offset value */
uint32_t match_limit_recursion; /* As it says */
uint32_t moptions; /* Match options */ uint32_t moptions; /* Match options */
uint32_t poptions; /* Pattern options */ uint32_t poptions; /* Pattern options */
uint32_t nltype; /* Newline type */ uint32_t nltype; /* Newline type */

4
testdata/testinput6 vendored
View File

@ -4882,4 +4882,8 @@
aaa\=dfa,allcaptures aaa\=dfa,allcaptures
a\=dfa,allcaptures a\=dfa,allcaptures
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
\= Expect recursion limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
# End of testinput6 # End of testinput6

View File

@ -7682,4 +7682,9 @@ No match
** Ignored after DFA matching: allcaptures ** Ignored after DFA matching: allcaptures
0: a 0: a
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
\= Expect recursion limit exceeded
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
Failed: error -53: recursion limit exceeded
# End of testinput6 # End of testinput6