Make the recursion limit apply to DFA matching.
This commit is contained in:
parent
3df9674c4e
commit
1f87b60f01
|
@ -233,6 +233,10 @@ too many nested or recursive back references. If the limit was reached in
|
|||
certain recursive cases it failed to be triggered and an internal error could
|
||||
be the result.
|
||||
|
||||
36. The pcre2_dfa_match() function now takes note of the recursion limit for
|
||||
the internal recursive calls that are used for lookrounds and recursions within
|
||||
the pattern.
|
||||
|
||||
|
||||
Version 10.22 29-July-2016
|
||||
--------------------------
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00"
|
||||
.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -33,8 +33,8 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
|
|||
\fIwscount\fP Number of elements in the vector
|
||||
.sp
|
||||
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
||||
up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code
|
||||
units, not characters. The options are:
|
||||
up a callout function or specify the recursion limit. The \fIlength\fP and
|
||||
\fIstartoffset\fP values are code units, not characters. The options are:
|
||||
.sp
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_NOTBOL Subject is not the beginning of a line
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23"
|
||||
.TH PCRE2API 3 "24 December 2016" "PCRE2 10.23"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -840,20 +840,22 @@ This limit is of use only if it is set smaller than \fImatch_limit\fP.
|
|||
Limiting the recursion depth limits the amount of system stack that can be
|
||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
||||
and is ignored, when matching is done using JIT compiled code or by the
|
||||
\fBpcre2_dfa_match()\fP function.
|
||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
||||
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
|
||||
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
|
||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
||||
.P
|
||||
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for \fImatch_limit\fP. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_RECURSIONLIMIT. A
|
||||
value for the recursion limit may also be supplied by an item at the start of a
|
||||
pattern of the form
|
||||
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
|
||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
.sp
|
||||
(*LIMIT_RECURSION=ddd)
|
||||
.sp
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||
limit is set, less than the default.
|
||||
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
|
@ -3319,6 +3321,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 22 November 2016
|
||||
Last updated: 23 December 2016
|
||||
Copyright (c) 1997-2016 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "23 October 2016" "PCRE2 10.23"
|
||||
.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -158,6 +158,11 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
|||
for it to have any effect. In other words, the pattern writer can lower the
|
||||
limits set by the programmer, but not raise them. If there is more than one
|
||||
setting of one of these limits, the lower value is used.
|
||||
.P
|
||||
The match limit is used (but in a different way) when JIT is being used, but it
|
||||
is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
|
||||
However, the recursion limit is relevant for DFA matching, which does use some
|
||||
function recursion, in particular, for recursions within the pattern.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="newlines"></a>
|
||||
|
@ -3477,6 +3482,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 23 October 2016
|
||||
Last updated: 23 December 2016
|
||||
Copyright (c) 1997-2016 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00"
|
||||
.TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 DISCUSSION OF STACK USAGE"
|
||||
|
@ -43,11 +43,12 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
|
|||
Normally, these are never very deep, and the limit on the complexity of
|
||||
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
|
||||
However, it is possible to write patterns with runaway infinite recursions;
|
||||
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At
|
||||
present, there is no protection against this.
|
||||
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
|
||||
limit is applied (see below).
|
||||
.P
|
||||
The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are
|
||||
relevant only for \fBpcre2_match()\fP without the JIT optimization.
|
||||
The comments in the next three sections do not apply to
|
||||
\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
|
||||
the JIT optimization.
|
||||
.
|
||||
.
|
||||
.SS "Reducing \fBpcre2_match()\fP's stack usage"
|
||||
|
@ -147,6 +148,15 @@ pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with
|
|||
different limits.
|
||||
.
|
||||
.
|
||||
.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
|
||||
.rs
|
||||
.sp
|
||||
The recursion limit, as described above for \fBpcre2_match()\fP, also applies
|
||||
to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
|
||||
recursions in the pattern can lead to runaway stack usage. The non-recursive
|
||||
match limit is not relevant for DFA matching, and is ignored.
|
||||
.
|
||||
.
|
||||
.SS "Changing stack size in Unix-like systems"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -197,6 +207,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 November 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
Last updated: 23 December 2016
|
||||
Copyright (c) 1997-2016 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23"
|
||||
.TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -428,9 +428,10 @@ appear.
|
|||
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
|
||||
.sp
|
||||
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
||||
limits set by the caller of pcre2_match(), not increase them. The application
|
||||
can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
|
||||
PCRE2_NEVER_UCP options, respectively, at compile time.
|
||||
limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not
|
||||
increase them. The application can lock out the use of (*UTF) and (*UCP) by
|
||||
setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
|
||||
compile time.
|
||||
.
|
||||
.
|
||||
.SH "NEWLINE CONVENTION"
|
||||
|
@ -584,6 +585,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 28 September 2016
|
||||
Last updated: 23 December 2016
|
||||
Copyright (c) 1997-2016 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -371,7 +371,7 @@ internal_dfa_match(
|
|||
uint32_t offsetcount,
|
||||
int *workspace,
|
||||
int wscount,
|
||||
int rlevel)
|
||||
uint32_t rlevel)
|
||||
{
|
||||
stateblock *active_states, *new_states, *temp_states;
|
||||
stateblock *next_active_state, *next_new_state;
|
||||
|
@ -400,7 +400,7 @@ BOOL utf = FALSE;
|
|||
|
||||
BOOL reset_could_continue = FALSE;
|
||||
|
||||
rlevel++;
|
||||
if (rlevel++ > mb->match_limit_recursion) return PCRE2_ERROR_RECURSIONLIMIT;
|
||||
offsetcount &= (uint32_t)(-2); /* Round down */
|
||||
|
||||
wscount -= 2;
|
||||
|
@ -2591,7 +2591,7 @@ for (;;)
|
|||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
|
||||
if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
|
||||
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
|
||||
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
|
||||
}
|
||||
|
@ -2710,7 +2710,7 @@ for (;;)
|
|||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||
rlevel); /* function recursion level */
|
||||
|
||||
if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
|
||||
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||
if ((rc >= 0) ==
|
||||
(condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
|
||||
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
|
||||
|
@ -3216,6 +3216,7 @@ if (mcontext == NULL)
|
|||
{
|
||||
mb->callout = NULL;
|
||||
mb->memctl = re->memctl;
|
||||
mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
|
||||
}
|
||||
else
|
||||
{
|
||||
|
@ -3228,7 +3229,10 @@ else
|
|||
mb->callout = mcontext->callout;
|
||||
mb->callout_data = mcontext->callout_data;
|
||||
mb->memctl = mcontext->memctl;
|
||||
mb->match_limit_recursion = mcontext->recursion_limit;
|
||||
}
|
||||
if (mb->match_limit_recursion > re->limit_recursion)
|
||||
mb->match_limit_recursion = re->limit_recursion;
|
||||
|
||||
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
|
||||
re->name_count * re->name_entry_size;
|
||||
|
|
|
@ -843,6 +843,7 @@ typedef struct dfa_match_block {
|
|||
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
||||
const uint8_t *tables; /* Character tables */
|
||||
PCRE2_SIZE start_offset; /* The start offset value */
|
||||
uint32_t match_limit_recursion; /* As it says */
|
||||
uint32_t moptions; /* Match options */
|
||||
uint32_t poptions; /* Pattern options */
|
||||
uint32_t nltype; /* Newline type */
|
||||
|
|
|
@ -4882,4 +4882,8 @@
|
|||
aaa\=dfa,allcaptures
|
||||
a\=dfa,allcaptures
|
||||
|
||||
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
|
||||
\= Expect recursion limit exceeded
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
|
||||
# End of testinput6
|
||||
|
|
|
@ -7682,4 +7682,9 @@ No match
|
|||
** Ignored after DFA matching: allcaptures
|
||||
0: a
|
||||
|
||||
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
|
||||
\= Expect recursion limit exceeded
|
||||
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||
Failed: error -53: recursion limit exceeded
|
||||
|
||||
# End of testinput6
|
||||
|
|
Loading…
Reference in New Issue