From 0a2033f0f75c4c6af6971e315a5dc1ad67820e9c Mon Sep 17 00:00:00 2001 From: "Philip.Hazel" Date: Wed, 18 Dec 2019 16:16:12 +0000 Subject: [PATCH] Remove atomic restriction on capture groups containing recursive back references, as since 10.30 it has been unnecessary. --- ChangeLog | 11 ++++ configure.ac | 6 +- doc/html/pcre2pattern.html | 10 ++-- doc/pcre2.txt | 70 +++++++++++----------- doc/pcre2pattern.3 | 12 ++-- src/pcre2_compile.c | 50 +--------------- src/pcre2_internal.h | 4 +- testdata/testinput1 | 7 +++ testdata/testinput2 | 20 +++---- testdata/testoutput1 | 14 +++++ testdata/testoutput2 | 78 ++++++++++++------------ testdata/testoutput8-16-2 | 118 +++++++++++++++++-------------------- testdata/testoutput8-32-2 | 118 +++++++++++++++++-------------------- testdata/testoutput8-8-2 | 118 +++++++++++++++++-------------------- 14 files changed, 300 insertions(+), 336 deletions(-) diff --git a/ChangeLog b/ChangeLog index 80d2ec5..4507cec 100644 --- a/ChangeLog +++ b/ChangeLog @@ -11,6 +11,17 @@ Version 10.35 3. A JIT bug is fixed which allowed to read the fields of the compiled pattern before its existence is checked. +4. Back in the PCRE1 day, capturing groups that contained recursive back +references to themselves were made atomic (version 8.01, change 18) because +after the end a repeated group, the captured substrings had their values from +the final repetition, not from an earlier repetition that might be the +destination of a backtrack. This feature was documented, and was carried over +into PCRE2. However, it has now been realized that the major refactoring that +was done for 10.30 has made this atomicizing unnecessary, and it is confusing +when users are unaware of it, making some patterns appear not to be working as +expected. Capture values of recursive back references in repeated groups are +now correctly backtracked, so this unnecessary restriction has been removed. + Version 10.34 21-November-2019 ------------------------------ diff --git a/configure.ac b/configure.ac index 30d4ddd..0a7cedb 100644 --- a/configure.ac +++ b/configure.ac @@ -9,9 +9,9 @@ dnl The PCRE2_PRERELEASE feature is for identifying release candidates. It might dnl be defined as -RC2, for example. For real releases, it should be empty. m4_define(pcre2_major, [10]) -m4_define(pcre2_minor, [34]) -m4_define(pcre2_prerelease, []) -m4_define(pcre2_date, [2019-11-21]) +m4_define(pcre2_minor, [35]) +m4_define(pcre2_prerelease, [-RC1]) +m4_define(pcre2_date, [2019-11-27]) # NOTE: The CMakeLists.txt file searches for the above variables in the first # 50 lines of this file. Please update that if the variables above are moved. diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html index 0aa2191..f365306 100644 --- a/doc/html/pcre2pattern.html +++ b/doc/html/pcre2pattern.html @@ -2349,11 +2349,11 @@ using alternation, as in the example above, or by a quantifier with a minimum of zero.

-Backreferences of this type cause the group that they reference to be treated -as an +For versions of PCRE2 less than 10.25, backreferences of this type used to +cause the group that they reference to be treated as an atomic group. -Once the whole group has been matched, a subsequent matching failure cannot -cause backtracking into the middle of the group. +This restriction no longer applies, and backtracking into such groups can occur +as normal.


ASSERTIONS

@@ -3833,7 +3833,7 @@ Cambridge, England.


REVISION

-Last updated: 29 July 2019 +Last updated: 18 December 2019
Copyright © 1997-2019 University of Cambridge.
diff --git a/doc/pcre2.txt b/doc/pcre2.txt index 948b91a..ed4b4e3 100644 --- a/doc/pcre2.txt +++ b/doc/pcre2.txt @@ -180,8 +180,8 @@ REVISION Last updated: 17 September 2018 Copyright (c) 1997-2018 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2API(3) Library Functions Manual PCRE2API(3) @@ -3724,8 +3724,8 @@ REVISION Last updated: 02 September 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3) @@ -4296,8 +4296,8 @@ REVISION Last updated: 03 March 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2CALLOUT(3) Library Functions Manual PCRE2CALLOUT(3) @@ -4726,8 +4726,8 @@ REVISION Last updated: 03 February 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2COMPAT(3) Library Functions Manual PCRE2COMPAT(3) @@ -4935,8 +4935,8 @@ REVISION Last updated: 13 July 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2JIT(3) Library Functions Manual PCRE2JIT(3) @@ -5360,8 +5360,8 @@ REVISION Last updated: 23 May 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2LIMITS(3) Library Functions Manual PCRE2LIMITS(3) @@ -5430,8 +5430,8 @@ REVISION Last updated: 02 February 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2MATCHING(3) Library Functions Manual PCRE2MATCHING(3) @@ -5654,8 +5654,8 @@ REVISION Last updated: 23 May 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2PARTIAL(3) Library Functions Manual PCRE2PARTIAL(3) @@ -6034,8 +6034,8 @@ REVISION Last updated: 04 September 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2PATTERN(3) Library Functions Manual PCRE2PATTERN(3) @@ -8078,10 +8078,10 @@ BACKREFERENCES the backreference. This can be done using alternation, as in the exam- ple above, or by a quantifier with a minimum of zero. - Backreferences of this type cause the group that they reference to be - treated as an atomic group. Once the whole group has been matched, a - subsequent matching failure cannot cause backtracking into the middle - of the group. + For versions of PCRE2 less than 10.25, backreferences of this type used + to cause the group that they reference to be treated as an atomic + group. This restriction no longer applies, and backtracking into such + groups can occur as normal. ASSERTIONS @@ -9463,11 +9463,11 @@ AUTHOR REVISION - Last updated: 29 July 2019 + Last updated: 18 December 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2PERFORM(3) Library Functions Manual PCRE2PERFORM(3) @@ -9701,8 +9701,8 @@ REVISION Last updated: 03 February 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2POSIX(3) Library Functions Manual PCRE2POSIX(3) @@ -10031,8 +10031,8 @@ REVISION Last updated: 30 January 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2SAMPLE(3) Library Functions Manual PCRE2SAMPLE(3) @@ -10310,8 +10310,8 @@ REVISION Last updated: 27 June 2018 Copyright (c) 1997-2018 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2SYNTAX(3) Library Functions Manual PCRE2SYNTAX(3) @@ -10823,8 +10823,8 @@ REVISION Last updated: 29 July 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2UNICODE(3) Library Functions Manual PCRE2UNICODE(3) @@ -11256,5 +11256,5 @@ REVISION Last updated: 24 May 2019 Copyright (c) 1997-2019 University of Cambridge. ------------------------------------------------------------------------------ - - + + diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index d5228f3..dbf7634 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "29 July 2019" "PCRE2 10.34" +.TH PCRE2PATTERN 3 "18 December 2019" "PCRE2 10.35" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -2346,14 +2346,14 @@ the first iteration does not need to match the backreference. This can be done using alternation, as in the example above, or by a quantifier with a minimum of zero. .P -Backreferences of this type cause the group that they reference to be treated -as an +For versions of PCRE2 less than 10.25, backreferences of this type used to +cause the group that they reference to be treated as an .\" HTML .\" atomic group. .\" -Once the whole group has been matched, a subsequent matching failure cannot -cause backtracking into the middle of the group. +This restriction no longer applies, and backtracking into such groups can occur +as normal. . . .\" HTML @@ -3874,6 +3874,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 29 July 2019 +Last updated: 18 December 2019 Copyright (c) 1997-2019 University of Cambridge. .fi diff --git a/src/pcre2_compile.c b/src/pcre2_compile.c index f2e6b6b..8ad4583 100644 --- a/src/pcre2_compile.c +++ b/src/pcre2_compile.c @@ -6671,23 +6671,11 @@ for (;; pptr++) } /* For a back reference, update the back reference map and the - maximum back reference. Then, for each group, we must check to - see if it is recursive, that is, it is inside the group that it - references. A flag is set so that the group can be made atomic. - */ + maximum back reference. */ cb->backref_map |= (groupnumber < 32)? (1u << groupnumber) : 1; if (groupnumber > cb->top_backref) cb->top_backref = groupnumber; - - for (oc = cb->open_caps; oc != NULL; oc = oc->next) - { - if (oc->number == groupnumber) - { - oc->flag = TRUE; - break; - } - } } } @@ -7682,19 +7670,6 @@ for (;; pptr++) cb->backref_map |= (meta_arg < 32)? (1u << meta_arg) : 1; if (meta_arg > cb->top_backref) cb->top_backref = meta_arg; - - /* Check to see if this back reference is recursive, that it, it - is inside the group that it references. A flag is set so that the - group can be made atomic. */ - - for (oc = cb->open_caps; oc != NULL; oc = oc->next) - { - if (oc->number == meta_arg) - { - oc->flag = TRUE; - break; - } - } break; @@ -8035,7 +8010,6 @@ and skip over the pattern offset. */ lookbehind = *code == OP_ASSERTBACK || *code == OP_ASSERTBACK_NOT || *code == OP_ASSERTBACK_NA; - if (lookbehind) { lookbehindlength = META_DATA(pptr[-1]); @@ -8053,7 +8027,6 @@ if (*code == OP_CBRA) capnumber = GET2(code, 1 + LINK_SIZE); capitem.number = capnumber; capitem.next = cb->open_caps; - capitem.flag = FALSE; capitem.assert_depth = cb->assert_depth; cb->open_caps = &capitem; } @@ -8182,26 +8155,9 @@ for (;;) PUT(code, 1, (int)(code - start_bracket)); code += 1 + LINK_SIZE; - /* If it was a capturing subpattern, check to see if it contained any - recursive back references. If so, we must wrap it in atomic brackets. In - any event, remove the block from the chain. */ + /* If it was a capturing subpattern, remove the block from the chain. */ - if (capnumber > 0) - { - if (cb->open_caps->flag) - { - (void)memmove(start_bracket + 1 + LINK_SIZE, start_bracket, - CU2BYTES(code - start_bracket)); - *start_bracket = OP_ONCE; - code += 1 + LINK_SIZE; - PUT(start_bracket, 1, (int)(code - start_bracket)); - *code = OP_KET; - PUT(code, 1, (int)(code - start_bracket)); - code += 1 + LINK_SIZE; - length += 2 + 2*LINK_SIZE; - } - cb->open_caps = cb->open_caps->next; - } + if (capnumber > 0) cb->open_caps = cb->open_caps->next; /* Set values to pass back */ diff --git a/src/pcre2_internal.h b/src/pcre2_internal.h index fe8ffe5..ac96d2d 100644 --- a/src/pcre2_internal.h +++ b/src/pcre2_internal.h @@ -1759,13 +1759,11 @@ typedef struct pcre2_memctl { /* Structure for building a chain of open capturing subpatterns during compiling, so that instructions to close them can be compiled when (*ACCEPT) is -encountered. This is also used to identify subpatterns that contain recursive -back references to themselves, so that they can be made atomic. */ +encountered. */ typedef struct open_capitem { struct open_capitem *next; /* Chain link */ uint16_t number; /* Capture number */ - uint16_t flag; /* Set TRUE if recursive back ref */ uint16_t assert_depth; /* Assertion depth when opened */ } open_capitem; diff --git a/testdata/testinput1 b/testdata/testinput1 index f5159d6..109de29 100644 --- a/testdata/testinput1 +++ b/testdata/testinput1 @@ -6386,4 +6386,11 @@ ef) x/x,mark /^(?a)(?()b)((?<=b).*)$/ abc +/^(a\1?){4}$/ + aaaa + aaaaaa + +/^((\1+)|\d)+133X$/ + 111133X + # End of testinput1 diff --git a/testdata/testinput2 b/testdata/testinput2 index 655e519..b700d9e 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -324,16 +324,7 @@ \= Expect no match fooabar -# This one is here because Perl behaves differently; see also the following. - -/^(a\1?){4}$/I -\= Expect no match - aaaa - aaaaaa - -# Perl does not fail these two for the final subjects. Neither did PCRE until -# release 8.01. The problem is in backtracking into a subpattern that contains -# a recursive reference to itself. PCRE has now made these into atomic patterns. +# Perl does not fail these two for the final subjects. /^(xa|=?\1a){2}$/ xa=xaa @@ -5772,4 +5763,13 @@ a)"xI /(a)?a/I manm +/^(?|(\*)(*napla:\S*_(\2?+.+))|(\w)(?=\S*_(\2?+\1)))+_\2$/ + *abc_12345abc + +/^(?|(\*)(*napla:\S*_(\3?+.+))|(\w)(?=\S*_((\2?+\1))))+_\2$/ + *abc_12345abc + +/^((\1+)(?C)|\d)+133X$/ + 111133X\=callout_capture + # End of testinput2 diff --git a/testdata/testoutput1 b/testdata/testoutput1 index ad2175b..c425ed4 100644 --- a/testdata/testoutput1 +++ b/testdata/testoutput1 @@ -10112,4 +10112,18 @@ No match 1: a 2: c +/^(a\1?){4}$/ + aaaa + 0: aaaa + 1: a + aaaaaa + 0: aaaaaa + 1: aa + +/^((\1+)|\d)+133X$/ + 111133X + 0: 111133X + 1: 11 + 2: 11 + # End of testinput1 diff --git a/testdata/testoutput2 b/testdata/testoutput2 index c733c12..df2f230 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -809,24 +809,7 @@ Subject length lower bound = 3 fooabar No match -# This one is here because Perl behaves differently; see also the following. - -/^(a\1?){4}$/I -Capture group count = 1 -Max back reference = 1 -Compile options: -Overall options: anchored -First code unit = 'a' -Subject length lower bound = 4 -\= Expect no match - aaaa -No match - aaaaaa -No match - -# Perl does not fail these two for the final subjects. Neither did PCRE until -# release 8.01. The problem is in backtracking into a subpattern that contains -# a recursive reference to itself. PCRE has now made these into atomic patterns. +# Perl does not fail these two for the final subjects. /^(xa|=?\1a){2}$/ xa=xaa @@ -10060,7 +10043,6 @@ No match ------------------------------------------------------------------ Bra ^ - Once CBra 1 ab CBra 2 @@ -10071,8 +10053,6 @@ No match Alt x Ket - Ket - Once CBra 1 ab CBra 2 @@ -10083,7 +10063,6 @@ No match Alt x Ket - Ket $ Ket End @@ -10479,27 +10458,23 @@ Failed: error 168 at offset 3: \c must be followed by a printable ASCII characte /(?P(?P=abn)xxx)/B ------------------------------------------------------------------ Bra - Once CBra 1 \1 xxx Ket Ket - Ket End ------------------------------------------------------------------ /(a\1z)/B ------------------------------------------------------------------ Bra - Once CBra 1 a \1 z Ket Ket - Ket End ------------------------------------------------------------------ @@ -11299,27 +11274,23 @@ No match /(?P(?P=abn)xxx)/B ------------------------------------------------------------------ Bra - Once CBra 1 \1 xxx Ket Ket - Ket End ------------------------------------------------------------------ /(a\1z)/B ------------------------------------------------------------------ Bra - Once CBra 1 a \1 z Ket Ket - Ket End ------------------------------------------------------------------ @@ -13319,7 +13290,6 @@ Failed: error 144 at offset 5: subpattern name must start with a non-digit Bra Brazero SCBra 1 - Once CBra 2 CBra 3 a @@ -13331,7 +13301,6 @@ Failed: error 144 at offset 5: subpattern name must start with a non-digit Ket Recurse Ket - Ket KetRmax a?+ Ket @@ -13999,7 +13968,6 @@ Matched, but too many substrings /((?+1)(\1))/B ------------------------------------------------------------------ Bra - Once CBra 1 Recurse CBra 2 @@ -14007,7 +13975,6 @@ Matched, but too many substrings Ket Ket Ket - Ket End ------------------------------------------------------------------ @@ -14425,7 +14392,6 @@ Subject length lower bound = 1 ------------------------------------------------------------------ Bra Any - Once CBra 1 Recurse Recurse @@ -14434,7 +14400,6 @@ Subject length lower bound = 1 Alt $ Ket - Ket CBra 2 Ket Ket @@ -14445,7 +14410,6 @@ Subject length lower bound = 1 ------------------------------------------------------------------ Bra Any - Once CBra 1 Recurse Recurse @@ -14457,7 +14421,6 @@ Subject length lower bound = 1 Alt $ Ket - Ket CBra 3 Ket Ket @@ -17435,6 +17398,45 @@ Subject length lower bound = 1 manm 0: a +/^(?|(\*)(*napla:\S*_(\2?+.+))|(\w)(?=\S*_(\2?+\1)))+_\2$/ + *abc_12345abc + 0: *abc_12345abc + 1: c + 2: 12345abc + +/^(?|(\*)(*napla:\S*_(\3?+.+))|(\w)(?=\S*_((\2?+\1))))+_\2$/ + *abc_12345abc + 0: *abc_12345abc + 1: c + 2: 12345abc + 3: 12345abc + +/^((\1+)(?C)|\d)+133X$/ + 111133X\=callout_capture +Callout 0: last capture = 2 + 1: 1 + 2: 111 +--->111133X + ^ ^ | +Callout 0: last capture = 2 + 1: 3 + 2: 3 +--->111133X + ^ ^ | +Callout 0: last capture = 2 + 1: 1 + 2: 11 +--->111133X + ^ ^ | +Callout 0: last capture = 2 + 1: 3 + 2: 3 +--->111133X + ^ ^ | + 0: 111133X + 1: 11 + 2: 11 + # End of testinput2 Error -70: PCRE2_ERROR_BADDATA (unknown error number) Error -62: bad serialized data diff --git a/testdata/testoutput8-16-2 b/testdata/testoutput8-16-2 index ff3474b..569a860 100644 --- a/testdata/testoutput8-16-2 +++ b/testdata/testoutput8-16-2 @@ -720,41 +720,37 @@ Memory allocation (code space): 14 /(((a\2)|(a*)\g<-1>))*a?/ ------------------------------------------------------------------ - 0 39 Bra + 0 35 Bra 2 Brazero - 3 32 SCBra 1 - 6 27 Once - 8 12 CBra 2 - 11 7 CBra 3 - 14 a - 16 \2 - 18 7 Ket - 20 11 Alt - 22 5 CBra 4 - 25 a* - 27 5 Ket - 29 22 Recurse - 31 23 Ket - 33 27 Ket - 35 32 KetRmax - 37 a?+ - 39 39 Ket - 41 End + 3 28 SCBra 1 + 6 12 CBra 2 + 9 7 CBra 3 + 12 a + 14 \2 + 16 7 Ket + 18 11 Alt + 20 5 CBra 4 + 23 a* + 25 5 Ket + 27 20 Recurse + 29 23 Ket + 31 28 KetRmax + 33 a?+ + 35 35 Ket + 37 End ------------------------------------------------------------------ /((?+1)(\1))/ ------------------------------------------------------------------ - 0 20 Bra - 2 16 Once - 4 12 CBra 1 - 7 9 Recurse - 9 5 CBra 2 - 12 \1 - 14 5 Ket - 16 12 Ket - 18 16 Ket - 20 20 Ket - 22 End + 0 16 Bra + 2 12 CBra 1 + 5 7 Recurse + 7 5 CBra 2 + 10 \1 + 12 5 Ket + 14 12 Ket + 16 16 Ket + 18 End ------------------------------------------------------------------ "(?1)(?#?'){2}(a)" @@ -771,45 +767,41 @@ Memory allocation (code space): 14 /.((?2)(?R)|\1|$)()/ ------------------------------------------------------------------ - 0 28 Bra + 0 24 Bra 2 Any - 3 18 Once - 5 7 CBra 1 - 8 23 Recurse - 10 0 Recurse - 12 4 Alt - 14 \1 - 16 3 Alt - 18 $ - 19 14 Ket - 21 18 Ket - 23 3 CBra 2 - 26 3 Ket - 28 28 Ket - 30 End + 3 7 CBra 1 + 6 19 Recurse + 8 0 Recurse + 10 4 Alt + 12 \1 + 14 3 Alt + 16 $ + 17 14 Ket + 19 3 CBra 2 + 22 3 Ket + 24 24 Ket + 26 End ------------------------------------------------------------------ /.((?3)(?R)()(?2)|\1|$)()/ ------------------------------------------------------------------ - 0 35 Bra + 0 31 Bra 2 Any - 3 25 Once - 5 14 CBra 1 - 8 30 Recurse - 10 0 Recurse - 12 3 CBra 2 - 15 3 Ket - 17 12 Recurse - 19 4 Alt - 21 \1 - 23 3 Alt - 25 $ - 26 21 Ket - 28 25 Ket - 30 3 CBra 3 - 33 3 Ket - 35 35 Ket - 37 End + 3 14 CBra 1 + 6 26 Recurse + 8 0 Recurse + 10 3 CBra 2 + 13 3 Ket + 15 10 Recurse + 17 4 Alt + 19 \1 + 21 3 Alt + 23 $ + 24 21 Ket + 26 3 CBra 3 + 29 3 Ket + 31 31 Ket + 33 End ------------------------------------------------------------------ /(?1)()((((((\1++))\x85)+)|))/ diff --git a/testdata/testoutput8-32-2 b/testdata/testoutput8-32-2 index 7d1c931..91d96c9 100644 --- a/testdata/testoutput8-32-2 +++ b/testdata/testoutput8-32-2 @@ -720,41 +720,37 @@ Memory allocation (code space): 28 /(((a\2)|(a*)\g<-1>))*a?/ ------------------------------------------------------------------ - 0 39 Bra + 0 35 Bra 2 Brazero - 3 32 SCBra 1 - 6 27 Once - 8 12 CBra 2 - 11 7 CBra 3 - 14 a - 16 \2 - 18 7 Ket - 20 11 Alt - 22 5 CBra 4 - 25 a* - 27 5 Ket - 29 22 Recurse - 31 23 Ket - 33 27 Ket - 35 32 KetRmax - 37 a?+ - 39 39 Ket - 41 End + 3 28 SCBra 1 + 6 12 CBra 2 + 9 7 CBra 3 + 12 a + 14 \2 + 16 7 Ket + 18 11 Alt + 20 5 CBra 4 + 23 a* + 25 5 Ket + 27 20 Recurse + 29 23 Ket + 31 28 KetRmax + 33 a?+ + 35 35 Ket + 37 End ------------------------------------------------------------------ /((?+1)(\1))/ ------------------------------------------------------------------ - 0 20 Bra - 2 16 Once - 4 12 CBra 1 - 7 9 Recurse - 9 5 CBra 2 - 12 \1 - 14 5 Ket - 16 12 Ket - 18 16 Ket - 20 20 Ket - 22 End + 0 16 Bra + 2 12 CBra 1 + 5 7 Recurse + 7 5 CBra 2 + 10 \1 + 12 5 Ket + 14 12 Ket + 16 16 Ket + 18 End ------------------------------------------------------------------ "(?1)(?#?'){2}(a)" @@ -771,45 +767,41 @@ Memory allocation (code space): 28 /.((?2)(?R)|\1|$)()/ ------------------------------------------------------------------ - 0 28 Bra + 0 24 Bra 2 Any - 3 18 Once - 5 7 CBra 1 - 8 23 Recurse - 10 0 Recurse - 12 4 Alt - 14 \1 - 16 3 Alt - 18 $ - 19 14 Ket - 21 18 Ket - 23 3 CBra 2 - 26 3 Ket - 28 28 Ket - 30 End + 3 7 CBra 1 + 6 19 Recurse + 8 0 Recurse + 10 4 Alt + 12 \1 + 14 3 Alt + 16 $ + 17 14 Ket + 19 3 CBra 2 + 22 3 Ket + 24 24 Ket + 26 End ------------------------------------------------------------------ /.((?3)(?R)()(?2)|\1|$)()/ ------------------------------------------------------------------ - 0 35 Bra + 0 31 Bra 2 Any - 3 25 Once - 5 14 CBra 1 - 8 30 Recurse - 10 0 Recurse - 12 3 CBra 2 - 15 3 Ket - 17 12 Recurse - 19 4 Alt - 21 \1 - 23 3 Alt - 25 $ - 26 21 Ket - 28 25 Ket - 30 3 CBra 3 - 33 3 Ket - 35 35 Ket - 37 End + 3 14 CBra 1 + 6 26 Recurse + 8 0 Recurse + 10 3 CBra 2 + 13 3 Ket + 15 10 Recurse + 17 4 Alt + 19 \1 + 21 3 Alt + 23 $ + 24 21 Ket + 26 3 CBra 3 + 29 3 Ket + 31 31 Ket + 33 End ------------------------------------------------------------------ /(?1)()((((((\1++))\x85)+)|))/ diff --git a/testdata/testoutput8-8-2 b/testdata/testoutput8-8-2 index 4c4e6a8..8393d5c 100644 --- a/testdata/testoutput8-8-2 +++ b/testdata/testoutput8-8-2 @@ -720,41 +720,37 @@ Memory allocation (code space): 10 /(((a\2)|(a*)\g<-1>))*a?/ ------------------------------------------------------------------ - 0 57 Bra + 0 51 Bra 3 Brazero - 4 48 SCBra 1 - 9 40 Once - 12 18 CBra 2 - 17 10 CBra 3 - 22 a - 24 \2 - 27 10 Ket - 30 16 Alt - 33 7 CBra 4 - 38 a* - 40 7 Ket - 43 33 Recurse - 46 34 Ket - 49 40 Ket - 52 48 KetRmax - 55 a?+ - 57 57 Ket - 60 End + 4 42 SCBra 1 + 9 18 CBra 2 + 14 10 CBra 3 + 19 a + 21 \2 + 24 10 Ket + 27 16 Alt + 30 7 CBra 4 + 35 a* + 37 7 Ket + 40 30 Recurse + 43 34 Ket + 46 42 KetRmax + 49 a?+ + 51 51 Ket + 54 End ------------------------------------------------------------------ /((?+1)(\1))/ ------------------------------------------------------------------ - 0 31 Bra - 3 25 Once - 6 19 CBra 1 - 11 14 Recurse - 14 8 CBra 2 - 19 \1 - 22 8 Ket - 25 19 Ket - 28 25 Ket - 31 31 Ket - 34 End + 0 25 Bra + 3 19 CBra 1 + 8 11 Recurse + 11 8 CBra 2 + 16 \1 + 19 8 Ket + 22 19 Ket + 25 25 Ket + 28 End ------------------------------------------------------------------ "(?1)(?#?'){2}(a)" @@ -771,45 +767,41 @@ Memory allocation (code space): 10 /.((?2)(?R)|\1|$)()/ ------------------------------------------------------------------ - 0 42 Bra + 0 36 Bra 3 Any - 4 27 Once - 7 11 CBra 1 - 12 34 Recurse - 15 0 Recurse - 18 6 Alt - 21 \1 - 24 4 Alt - 27 $ - 28 21 Ket - 31 27 Ket - 34 5 CBra 2 - 39 5 Ket - 42 42 Ket - 45 End + 4 11 CBra 1 + 9 28 Recurse + 12 0 Recurse + 15 6 Alt + 18 \1 + 21 4 Alt + 24 $ + 25 21 Ket + 28 5 CBra 2 + 33 5 Ket + 36 36 Ket + 39 End ------------------------------------------------------------------ /.((?3)(?R)()(?2)|\1|$)()/ ------------------------------------------------------------------ - 0 53 Bra + 0 47 Bra 3 Any - 4 38 Once - 7 22 CBra 1 - 12 45 Recurse - 15 0 Recurse - 18 5 CBra 2 - 23 5 Ket - 26 18 Recurse - 29 6 Alt - 32 \1 - 35 4 Alt - 38 $ - 39 32 Ket - 42 38 Ket - 45 5 CBra 3 - 50 5 Ket - 53 53 Ket - 56 End + 4 22 CBra 1 + 9 39 Recurse + 12 0 Recurse + 15 5 CBra 2 + 20 5 Ket + 23 15 Recurse + 26 6 Alt + 29 \1 + 32 4 Alt + 35 $ + 36 32 Ket + 39 5 CBra 3 + 44 5 Ket + 47 47 Ket + 50 End ------------------------------------------------------------------ /(?1)()((((((\1++))\x85)+)|))/