From 1f87b60f015fca7cddb7abaeacc34852540d2361 Mon Sep 17 00:00:00 2001 From: "Philip.Hazel" Date: Fri, 23 Dec 2016 11:04:51 +0000 Subject: [PATCH] Make the recursion limit apply to DFA matching. --- ChangeLog | 4 ++++ doc/pcre2_dfa_match.3 | 6 +++--- doc/pcre2api.3 | 20 +++++++++++--------- doc/pcre2pattern.3 | 9 +++++++-- doc/pcre2stack.3 | 24 +++++++++++++++++------- doc/pcre2syntax.3 | 11 ++++++----- src/pcre2_dfa_match.c | 12 ++++++++---- src/pcre2_intmodedep.h | 1 + testdata/testinput6 | 4 ++++ testdata/testoutput6 | 5 +++++ 10 files changed, 66 insertions(+), 30 deletions(-) diff --git a/ChangeLog b/ChangeLog index c24adef..d5689cb 100644 --- a/ChangeLog +++ b/ChangeLog @@ -233,6 +233,10 @@ too many nested or recursive back references. If the limit was reached in certain recursive cases it failed to be triggered and an internal error could be the result. +36. The pcre2_dfa_match() function now takes note of the recursion limit for +the internal recursive calls that are used for lookrounds and recursions within +the pattern. + Version 10.22 29-July-2016 -------------------------- diff --git a/doc/pcre2_dfa_match.3 b/doc/pcre2_dfa_match.3 index f45da0d..d2132d5 100644 --- a/doc/pcre2_dfa_match.3 +++ b/doc/pcre2_dfa_match.3 @@ -1,4 +1,4 @@ -.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00" +.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH SYNOPSIS @@ -33,8 +33,8 @@ is \fBpcre2_match()\fP.) The arguments for this function are: \fIwscount\fP Number of elements in the vector .sp For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set -up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code -units, not characters. The options are: +up a callout function or specify the recursion limit. The \fIlength\fP and +\fIstartoffset\fP values are code units, not characters. The options are: .sp PCRE2_ANCHORED Match only at the first position PCRE2_NOTBOL Subject is not the beginning of a line diff --git a/doc/pcre2api.3 b/doc/pcre2api.3 index 6baf88e..fc434fd 100644 --- a/doc/pcre2api.3 +++ b/doc/pcre2api.3 @@ -1,4 +1,4 @@ -.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23" +.TH PCRE2API 3 "24 December 2016" "PCRE2 10.23" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .sp @@ -840,20 +840,22 @@ This limit is of use only if it is set smaller than \fImatch_limit\fP. Limiting the recursion depth limits the amount of system stack that can be used, or, when PCRE2 has been compiled to use memory on the heap instead of the stack, the amount of heap memory that can be used. This limit is not relevant, -and is ignored, when matching is done using JIT compiled code or by the -\fBpcre2_dfa_match()\fP function. +and is ignored, when matching is done using JIT compiled code. However, it is +supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less +frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of +stack by a recursive pattern such as /(.)(?1)/ matched to a very long string. .P The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the default default is the same value as the default for \fImatch_limit\fP. If the -limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_RECURSIONLIMIT. A -value for the recursion limit may also be supplied by an item at the start of a -pattern of the form +limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return +PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be +supplied by an item at the start of a pattern of the form .sp (*LIMIT_RECURSION=ddd) .sp where ddd is a decimal number. However, such a setting is ignored unless ddd is -less than the limit set by the caller of \fBpcre2_match()\fP or, if no such -limit is set, less than the default. +less than the limit set by the caller of \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default. .sp .nf .B int pcre2_set_recursion_memory_management( @@ -3319,6 +3321,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 22 November 2016 +Last updated: 23 December 2016 Copyright (c) 1997-2016 University of Cambridge. .fi diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index eaae8ef..33e5698 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "23 October 2016" "PCRE2 10.23" +.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -158,6 +158,11 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP for it to have any effect. In other words, the pattern writer can lower the limits set by the programmer, but not raise them. If there is more than one setting of one of these limits, the lower value is used. +.P +The match limit is used (but in a different way) when JIT is being used, but it +is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP. +However, the recursion limit is relevant for DFA matching, which does use some +function recursion, in particular, for recursions within the pattern. . . .\" HTML @@ -3477,6 +3482,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 23 October 2016 +Last updated: 23 December 2016 Copyright (c) 1997-2016 University of Cambridge. .fi diff --git a/doc/pcre2stack.3 b/doc/pcre2stack.3 index 8711263..4c3d4f0 100644 --- a/doc/pcre2stack.3 +++ b/doc/pcre2stack.3 @@ -1,4 +1,4 @@ -.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00" +.TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 DISCUSSION OF STACK USAGE" @@ -43,11 +43,12 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls. Normally, these are never very deep, and the limit on the complexity of \fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given. However, it is possible to write patterns with runaway infinite recursions; -such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At -present, there is no protection against this. +such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a +limit is applied (see below). .P -The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are -relevant only for \fBpcre2_match()\fP without the JIT optimization. +The comments in the next three sections do not apply to +\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without +the JIT optimization. . . .SS "Reducing \fBpcre2_match()\fP's stack usage" @@ -147,6 +148,15 @@ pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with different limits. . . +.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage" +.rs +.sp +The recursion limit, as described above for \fBpcre2_match()\fP, also applies +to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for +recursions in the pattern can lead to runaway stack usage. The non-recursive +match limit is not relevant for DFA matching, and is ignored. +. +. .SS "Changing stack size in Unix-like systems" .rs .sp @@ -197,6 +207,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 21 November 2014 -Copyright (c) 1997-2014 University of Cambridge. +Last updated: 23 December 2016 +Copyright (c) 1997-2016 University of Cambridge. .fi diff --git a/doc/pcre2syntax.3 b/doc/pcre2syntax.3 index cb149a1..451736e 100644 --- a/doc/pcre2syntax.3 +++ b/doc/pcre2syntax.3 @@ -1,4 +1,4 @@ -.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23" +.TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" @@ -428,9 +428,10 @@ appear. (*UCP) set PCRE2_UCP (use Unicode properties for \ed etc) .sp Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the -limits set by the caller of pcre2_match(), not increase them. The application -can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or -PCRE2_NEVER_UCP options, respectively, at compile time. +limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not +increase them. The application can lock out the use of (*UTF) and (*UCP) by +setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at +compile time. . . .SH "NEWLINE CONVENTION" @@ -584,6 +585,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 28 September 2016 +Last updated: 23 December 2016 Copyright (c) 1997-2016 University of Cambridge. .fi diff --git a/src/pcre2_dfa_match.c b/src/pcre2_dfa_match.c index 347c0ac..962023d 100644 --- a/src/pcre2_dfa_match.c +++ b/src/pcre2_dfa_match.c @@ -371,7 +371,7 @@ internal_dfa_match( uint32_t offsetcount, int *workspace, int wscount, - int rlevel) + uint32_t rlevel) { stateblock *active_states, *new_states, *temp_states; stateblock *next_active_state, *next_new_state; @@ -400,7 +400,7 @@ BOOL utf = FALSE; BOOL reset_could_continue = FALSE; -rlevel++; +if (rlevel++ > mb->match_limit_recursion) return PCRE2_ERROR_RECURSIONLIMIT; offsetcount &= (uint32_t)(-2); /* Round down */ wscount -= 2; @@ -2591,7 +2591,7 @@ for (;;) sizeof(local_workspace)/sizeof(int), /* size of same */ rlevel); /* function recursion level */ - if (rc == PCRE2_ERROR_DFA_UITEM) return rc; + if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc; if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK)) { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); } } @@ -2710,7 +2710,7 @@ for (;;) sizeof(local_workspace)/sizeof(int), /* size of same */ rlevel); /* function recursion level */ - if (rc == PCRE2_ERROR_DFA_UITEM) return rc; + if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc; if ((rc >= 0) == (condcode == OP_ASSERT || condcode == OP_ASSERTBACK)) { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); } @@ -3216,6 +3216,7 @@ if (mcontext == NULL) { mb->callout = NULL; mb->memctl = re->memctl; + mb->match_limit_recursion = PRIV(default_match_context).recursion_limit; } else { @@ -3228,7 +3229,10 @@ else mb->callout = mcontext->callout; mb->callout_data = mcontext->callout_data; mb->memctl = mcontext->memctl; + mb->match_limit_recursion = mcontext->recursion_limit; } +if (mb->match_limit_recursion > re->limit_recursion) + mb->match_limit_recursion = re->limit_recursion; mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) + re->name_count * re->name_entry_size; diff --git a/src/pcre2_intmodedep.h b/src/pcre2_intmodedep.h index 843204b..61590cd 100644 --- a/src/pcre2_intmodedep.h +++ b/src/pcre2_intmodedep.h @@ -843,6 +843,7 @@ typedef struct dfa_match_block { PCRE2_SPTR last_used_ptr; /* Latest consulted character */ const uint8_t *tables; /* Character tables */ PCRE2_SIZE start_offset; /* The start offset value */ + uint32_t match_limit_recursion; /* As it says */ uint32_t moptions; /* Match options */ uint32_t poptions; /* Pattern options */ uint32_t nltype; /* Newline type */ diff --git a/testdata/testinput6 b/testdata/testinput6 index 3f058b5..c74554b 100644 --- a/testdata/testinput6 +++ b/testdata/testinput6 @@ -4882,4 +4882,8 @@ aaa\=dfa,allcaptures a\=dfa,allcaptures +/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/ +\= Expect recursion limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + # End of testinput6 diff --git a/testdata/testoutput6 b/testdata/testoutput6 index 4f71446..b3b7779 100644 --- a/testdata/testoutput6 +++ b/testdata/testoutput6 @@ -7682,4 +7682,9 @@ No match ** Ignored after DFA matching: allcaptures 0: a +/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/ +\= Expect recursion limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] +Failed: error -53: recursion limit exceeded + # End of testinput6