Make the recursion limit apply to DFA matching.
This commit is contained in:
parent
3df9674c4e
commit
1f87b60f01
|
@ -233,6 +233,10 @@ too many nested or recursive back references. If the limit was reached in
|
||||||
certain recursive cases it failed to be triggered and an internal error could
|
certain recursive cases it failed to be triggered and an internal error could
|
||||||
be the result.
|
be the result.
|
||||||
|
|
||||||
|
36. The pcre2_dfa_match() function now takes note of the recursion limit for
|
||||||
|
the internal recursive calls that are used for lookrounds and recursions within
|
||||||
|
the pattern.
|
||||||
|
|
||||||
|
|
||||||
Version 10.22 29-July-2016
|
Version 10.22 29-July-2016
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00"
|
.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -33,8 +33,8 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
|
||||||
\fIwscount\fP Number of elements in the vector
|
\fIwscount\fP Number of elements in the vector
|
||||||
.sp
|
.sp
|
||||||
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
|
||||||
up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code
|
up a callout function or specify the recursion limit. The \fIlength\fP and
|
||||||
units, not characters. The options are:
|
\fIstartoffset\fP values are code units, not characters. The options are:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_NOTBOL Subject is not the beginning of a line
|
PCRE2_NOTBOL Subject is not the beginning of a line
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23"
|
.TH PCRE2API 3 "24 December 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -840,20 +840,22 @@ This limit is of use only if it is set smaller than \fImatch_limit\fP.
|
||||||
Limiting the recursion depth limits the amount of system stack that can be
|
Limiting the recursion depth limits the amount of system stack that can be
|
||||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
||||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
stack, the amount of heap memory that can be used. This limit is not relevant,
|
||||||
and is ignored, when matching is done using JIT compiled code or by the
|
and is ignored, when matching is done using JIT compiled code. However, it is
|
||||||
\fBpcre2_dfa_match()\fP function.
|
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
|
||||||
|
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
|
||||||
|
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
||||||
.P
|
.P
|
||||||
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
|
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
|
||||||
default default is the same value as the default for \fImatch_limit\fP. If the
|
default default is the same value as the default for \fImatch_limit\fP. If the
|
||||||
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_RECURSIONLIMIT. A
|
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
|
||||||
value for the recursion limit may also be supplied by an item at the start of a
|
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||||
pattern of the form
|
supplied by an item at the start of a pattern of the form
|
||||||
.sp
|
.sp
|
||||||
(*LIMIT_RECURSION=ddd)
|
(*LIMIT_RECURSION=ddd)
|
||||||
.sp
|
.sp
|
||||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||||
limit is set, less than the default.
|
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_recursion_memory_management(
|
.B int pcre2_set_recursion_memory_management(
|
||||||
|
@ -3319,6 +3321,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 22 November 2016
|
Last updated: 23 December 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "23 October 2016" "PCRE2 10.23"
|
.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -158,6 +158,11 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
setting of one of these limits, the lower value is used.
|
setting of one of these limits, the lower value is used.
|
||||||
|
.P
|
||||||
|
The match limit is used (but in a different way) when JIT is being used, but it
|
||||||
|
is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
|
||||||
|
However, the recursion limit is relevant for DFA matching, which does use some
|
||||||
|
function recursion, in particular, for recursions within the pattern.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="newlines"></a>
|
.\" HTML <a name="newlines"></a>
|
||||||
|
@ -3477,6 +3482,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 October 2016
|
Last updated: 23 December 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00"
|
.TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 DISCUSSION OF STACK USAGE"
|
.SH "PCRE2 DISCUSSION OF STACK USAGE"
|
||||||
|
@ -43,11 +43,12 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
|
||||||
Normally, these are never very deep, and the limit on the complexity of
|
Normally, these are never very deep, and the limit on the complexity of
|
||||||
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
|
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
|
||||||
However, it is possible to write patterns with runaway infinite recursions;
|
However, it is possible to write patterns with runaway infinite recursions;
|
||||||
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At
|
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
|
||||||
present, there is no protection against this.
|
limit is applied (see below).
|
||||||
.P
|
.P
|
||||||
The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are
|
The comments in the next three sections do not apply to
|
||||||
relevant only for \fBpcre2_match()\fP without the JIT optimization.
|
\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
|
||||||
|
the JIT optimization.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Reducing \fBpcre2_match()\fP's stack usage"
|
.SS "Reducing \fBpcre2_match()\fP's stack usage"
|
||||||
|
@ -147,6 +148,15 @@ pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with
|
||||||
different limits.
|
different limits.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The recursion limit, as described above for \fBpcre2_match()\fP, also applies
|
||||||
|
to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
|
||||||
|
recursions in the pattern can lead to runaway stack usage. The non-recursive
|
||||||
|
match limit is not relevant for DFA matching, and is ignored.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SS "Changing stack size in Unix-like systems"
|
.SS "Changing stack size in Unix-like systems"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -197,6 +207,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 November 2014
|
Last updated: 23 December 2016
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23"
|
.TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -428,9 +428,10 @@ appear.
|
||||||
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
|
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
|
||||||
.sp
|
.sp
|
||||||
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
||||||
limits set by the caller of pcre2_match(), not increase them. The application
|
limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not
|
||||||
can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
|
increase them. The application can lock out the use of (*UTF) and (*UCP) by
|
||||||
PCRE2_NEVER_UCP options, respectively, at compile time.
|
setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
|
||||||
|
compile time.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "NEWLINE CONVENTION"
|
.SH "NEWLINE CONVENTION"
|
||||||
|
@ -584,6 +585,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 September 2016
|
Last updated: 23 December 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -371,7 +371,7 @@ internal_dfa_match(
|
||||||
uint32_t offsetcount,
|
uint32_t offsetcount,
|
||||||
int *workspace,
|
int *workspace,
|
||||||
int wscount,
|
int wscount,
|
||||||
int rlevel)
|
uint32_t rlevel)
|
||||||
{
|
{
|
||||||
stateblock *active_states, *new_states, *temp_states;
|
stateblock *active_states, *new_states, *temp_states;
|
||||||
stateblock *next_active_state, *next_new_state;
|
stateblock *next_active_state, *next_new_state;
|
||||||
|
@ -400,7 +400,7 @@ BOOL utf = FALSE;
|
||||||
|
|
||||||
BOOL reset_could_continue = FALSE;
|
BOOL reset_could_continue = FALSE;
|
||||||
|
|
||||||
rlevel++;
|
if (rlevel++ > mb->match_limit_recursion) return PCRE2_ERROR_RECURSIONLIMIT;
|
||||||
offsetcount &= (uint32_t)(-2); /* Round down */
|
offsetcount &= (uint32_t)(-2); /* Round down */
|
||||||
|
|
||||||
wscount -= 2;
|
wscount -= 2;
|
||||||
|
@ -2591,7 +2591,7 @@ for (;;)
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel); /* function recursion level */
|
||||||
|
|
||||||
if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
|
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||||
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
|
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
|
||||||
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
|
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
|
||||||
}
|
}
|
||||||
|
@ -2710,7 +2710,7 @@ for (;;)
|
||||||
sizeof(local_workspace)/sizeof(int), /* size of same */
|
sizeof(local_workspace)/sizeof(int), /* size of same */
|
||||||
rlevel); /* function recursion level */
|
rlevel); /* function recursion level */
|
||||||
|
|
||||||
if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
|
if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
|
||||||
if ((rc >= 0) ==
|
if ((rc >= 0) ==
|
||||||
(condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
|
(condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
|
||||||
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
|
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
|
||||||
|
@ -3216,6 +3216,7 @@ if (mcontext == NULL)
|
||||||
{
|
{
|
||||||
mb->callout = NULL;
|
mb->callout = NULL;
|
||||||
mb->memctl = re->memctl;
|
mb->memctl = re->memctl;
|
||||||
|
mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
@ -3228,7 +3229,10 @@ else
|
||||||
mb->callout = mcontext->callout;
|
mb->callout = mcontext->callout;
|
||||||
mb->callout_data = mcontext->callout_data;
|
mb->callout_data = mcontext->callout_data;
|
||||||
mb->memctl = mcontext->memctl;
|
mb->memctl = mcontext->memctl;
|
||||||
|
mb->match_limit_recursion = mcontext->recursion_limit;
|
||||||
}
|
}
|
||||||
|
if (mb->match_limit_recursion > re->limit_recursion)
|
||||||
|
mb->match_limit_recursion = re->limit_recursion;
|
||||||
|
|
||||||
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
|
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
|
||||||
re->name_count * re->name_entry_size;
|
re->name_count * re->name_entry_size;
|
||||||
|
|
|
@ -843,6 +843,7 @@ typedef struct dfa_match_block {
|
||||||
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
|
||||||
const uint8_t *tables; /* Character tables */
|
const uint8_t *tables; /* Character tables */
|
||||||
PCRE2_SIZE start_offset; /* The start offset value */
|
PCRE2_SIZE start_offset; /* The start offset value */
|
||||||
|
uint32_t match_limit_recursion; /* As it says */
|
||||||
uint32_t moptions; /* Match options */
|
uint32_t moptions; /* Match options */
|
||||||
uint32_t poptions; /* Pattern options */
|
uint32_t poptions; /* Pattern options */
|
||||||
uint32_t nltype; /* Newline type */
|
uint32_t nltype; /* Newline type */
|
||||||
|
|
|
@ -4882,4 +4882,8 @@
|
||||||
aaa\=dfa,allcaptures
|
aaa\=dfa,allcaptures
|
||||||
a\=dfa,allcaptures
|
a\=dfa,allcaptures
|
||||||
|
|
||||||
|
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
|
||||||
|
\= Expect recursion limit exceeded
|
||||||
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
|
@ -7682,4 +7682,9 @@ No match
|
||||||
** Ignored after DFA matching: allcaptures
|
** Ignored after DFA matching: allcaptures
|
||||||
0: a
|
0: a
|
||||||
|
|
||||||
|
/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
|
||||||
|
\= Expect recursion limit exceeded
|
||||||
|
a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
|
||||||
|
Failed: error -53: recursion limit exceeded
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
Loading…
Reference in New Issue