From 7db5904b9ff6ec51856e6b3749e9258413535c0c Mon Sep 17 00:00:00 2001 From: "Philip.Hazel" Date: Thu, 12 Jul 2018 17:04:43 +0000 Subject: [PATCH] Documentation and tests update and minor tweak to perltest.sh. --- doc/html/pcre2pattern.html | 69 ++++++++++++++++++++-------- doc/pcre2.txt | 93 +++++++++++++++++++++++++------------- doc/pcre2pattern.3 | 69 ++++++++++++++++++++-------- perltest.sh | 10 ++-- testdata/testinput1 | 9 ++++ testdata/testoutput1 | 15 ++++++ 6 files changed, 192 insertions(+), 73 deletions(-) diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html index f68499d..4b95340 100644 --- a/doc/html/pcre2pattern.html +++ b/doc/html/pcre2pattern.html @@ -3227,13 +3227,13 @@ Verbs that act after backtracking

The following verbs do nothing when they are encountered. Matching continues -with what follows, but if there is no subsequent match, causing a backtrack to -the verb, a failure is forced. That is, backtracking cannot pass to the left of -the verb. However, when one of these verbs appears inside an atomic group or in -an assertion that is true, its effect is confined to that group, because once -the group has been matched, there is never any backtracking into it. In this -situation, backtracking has to jump to the left of the entire atomic group or -assertion. +with what follows, but if there is a subsequent match failure, causing a +backtrack to the verb, a failure is forced. That is, backtracking cannot pass +to the left of the verb. However, when one of these verbs appears inside an +atomic group or in a lookaround assertion that is true, its effect is confined +to that group, because once the group has been matched, there is never any +backtracking into it. Backtracking from beyond an assertion or an atomic group +ignores the entire group, and seeks a preceeding backtracking point.

These verbs differ in exactly what kind of failure occurs when backtracking @@ -3321,12 +3321,37 @@ instead of skipping on to "c".

   (*SKIP:NAME)
 
-When (*SKIP) has an associated name, its behaviour is modified. When it is -triggered, the previous path through the pattern is searched for the most -recent (*MARK) that has the same name. If one is found, the "bumpalong" advance -is to the subject position that corresponds to that (*MARK) instead of to where -(*SKIP) was encountered. If no (*MARK) with a matching name is found, the -(*SKIP) is ignored. +When (*SKIP) has an associated name, its behaviour is modified. When such a +(*SKIP) is triggered, the previous path through the pattern is searched for the +most recent (*MARK) that has the same name. If one is found, the "bumpalong" +advance is to the subject position that corresponds to that (*MARK) instead of +to where (*SKIP) was encountered. If no (*MARK) with a matching name is found, +the (*SKIP) is ignored. +

+

+The search for a (*MARK) name uses the normal backtracking mechanism, which +means that it does not see (*MARK) settings that are inside atomic groups or +assertions, because they are never re-entered by backtracking. Compare the +following pcre2test examples: +

+    re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+  data: abc
+   0: a
+   1: a
+  data:
+    re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+  data: abc
+   0: b
+   1: b
+
+In the first example, the (*MARK) setting is in an atomic group, so it is not +seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows +the second branch of the pattern to be tried at the first character position. +In the second example, the (*MARK) setting is not in an atomic group. This +allows (*SKIP:X) to immediately cause a new matching attempt to start at the +second character. This time, the (*MARK) is never seen because "a" does not +match "b", so the matcher immediately jumps to the second branch of the +pattern.

Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores @@ -3456,6 +3481,14 @@ a positive assertion and false for a negative one; captured substrings are retained in both cases.

+The remaining verbs act only when a later failure causes a backtrack to +reach them. This means that their effect is confined to the assertion, +because lookaround assertions are atomic. A backtrack that occurs after an +assertion is complete does not jump back into the assertion. Note in particular +that a (*MARK) name that is set in an assertion is not "seen" by an instance of +(*SKIP:NAME) latter in the pattern. +

+

The effect of (*THEN) is not allowed to escape beyond an assertion. If there are no more branches to try, (*THEN) causes a positive assertion to be false, and a negative assertion to be true. @@ -3463,10 +3496,10 @@ and a negative assertion to be true.

The other backtracking verbs are not treated specially if they appear in a standalone positive assertion. In a conditional positive assertion, -backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be -false. However, for both standalone and conditional negative assertions, -backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be -true, without considering any further alternative branches. +backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE) +causes the condition to be false. However, for both standalone and conditional +negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes +the assertion to be true, without considering any further alternative branches.


Backtracking verbs in subroutines @@ -3509,7 +3542,7 @@ Cambridge, England.


REVISION

-Last updated: 10 July 2018 +Last updated: 11 July 2018
Copyright © 1997-2018 University of Cambridge.
diff --git a/doc/pcre2.txt b/doc/pcre2.txt index 449c3cb..8df2f9b 100644 --- a/doc/pcre2.txt +++ b/doc/pcre2.txt @@ -8695,14 +8695,14 @@ BACKTRACKING CONTROL Verbs that act after backtracking The following verbs do nothing when they are encountered. Matching con- - tinues with what follows, but if there is no subsequent match, causing - a backtrack to the verb, a failure is forced. That is, backtracking - cannot pass to the left of the verb. However, when one of these verbs - appears inside an atomic group or in an assertion that is true, its - effect is confined to that group, because once the group has been - matched, there is never any backtracking into it. In this situation, - backtracking has to jump to the left of the entire atomic group or - assertion. + tinues with what follows, but if there is a subsequent match failure, + causing a backtrack to the verb, a failure is forced. That is, back- + tracking cannot pass to the left of the verb. However, when one of + these verbs appears inside an atomic group or in a lookaround assertion + that is true, its effect is confined to that group, because once the + group has been matched, there is never any backtracking into it. Back- + tracking from beyond an assertion or an atomic group ignores the entire + group, and seeks a preceeding backtracking point. These verbs differ in exactly what kind of failure occurs when back- tracking reaches them. The behaviour described below is what happens @@ -8790,12 +8790,36 @@ BACKTRACKING CONTROL (*SKIP:NAME) - When (*SKIP) has an associated name, its behaviour is modified. When it - is triggered, the previous path through the pattern is searched for the - most recent (*MARK) that has the same name. If one is found, the - "bumpalong" advance is to the subject position that corresponds to that - (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with - a matching name is found, the (*SKIP) is ignored. + When (*SKIP) has an associated name, its behaviour is modified. When + such a (*SKIP) is triggered, the previous path through the pattern is + searched for the most recent (*MARK) that has the same name. If one is + found, the "bumpalong" advance is to the subject position that corre- + sponds to that (*MARK) instead of to where (*SKIP) was encountered. If + no (*MARK) with a matching name is found, the (*SKIP) is ignored. + + The search for a (*MARK) name uses the normal backtracking mechanism, + which means that it does not see (*MARK) settings that are inside + atomic groups or assertions, because they are never re-entered by back- + tracking. Compare the following pcre2test examples: + + re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: a + 1: a + data: + re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: b + 1: b + + In the first example, the (*MARK) setting is in an atomic group, so it + is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. + This allows the second branch of the pattern to be tried at the first + character position. In the second example, the (*MARK) setting is not + in an atomic group. This allows (*SKIP:X) to immediately cause a new + matching attempt to start at the second character. This time, the + (*MARK) is never seen because "a" does not match "b", so the matcher + immediately jumps to the second branch of the pattern. Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME). @@ -8915,41 +8939,48 @@ BACKTRACKING CONTROL true for a positive assertion and false for a negative one; captured substrings are retained in both cases. - The effect of (*THEN) is not allowed to escape beyond an assertion. If - there are no more branches to try, (*THEN) causes a positive assertion + The remaining verbs act only when a later failure causes a backtrack to + reach them. This means that their effect is confined to the assertion, + because lookaround assertions are atomic. A backtrack that occurs after + an assertion is complete does not jump back into the assertion. Note in + particular that a (*MARK) name that is set in an assertion is not + "seen" by an instance of (*SKIP:NAME) latter in the pattern. + + The effect of (*THEN) is not allowed to escape beyond an assertion. If + there are no more branches to try, (*THEN) causes a positive assertion to be false, and a negative assertion to be true. - The other backtracking verbs are not treated specially if they appear - in a standalone positive assertion. In a conditional positive asser- - tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con- - dition to be false. However, for both standalone and conditional nega- - tive assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) - causes the assertion to be true, without considering any further alter- - native branches. + The other backtracking verbs are not treated specially if they appear + in a standalone positive assertion. In a conditional positive asser- + tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP), + or (*PRUNE) causes the condition to be false. However, for both stand- + alone and conditional negative assertions, backtracking into (*COMMIT), + (*SKIP), or (*PRUNE) causes the assertion to be true, without consider- + ing any further alternative branches. Backtracking verbs in subroutines - These behaviours occur whether or not the subpattern is called recur- + These behaviours occur whether or not the subpattern is called recur- sively. Perl's treatment of subroutines is different in some cases. - (*FAIL) in a subpattern called as a subroutine has its normal effect: + (*FAIL) in a subpattern called as a subroutine has its normal effect: it forces an immediate backtrack. - (*ACCEPT) in a subpattern called as a subroutine causes the subroutine - match to succeed without any further processing. Matching then contin- + (*ACCEPT) in a subpattern called as a subroutine causes the subroutine + match to succeed without any further processing. Matching then contin- ues after the subroutine call. (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause the subroutine match to fail. - (*THEN) skips to the next alternative in the innermost enclosing group - within the subpattern that has alternatives. If there is no such group + (*THEN) skips to the next alternative in the innermost enclosing group + within the subpattern that has alternatives. If there is no such group within the subpattern, (*THEN) causes the subroutine match to fail. SEE ALSO - pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3), + pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3), pcre2(3). @@ -8962,7 +8993,7 @@ AUTHOR REVISION - Last updated: 10 July 2018 + Last updated: 11 July 2018 Copyright (c) 1997-2018 University of Cambridge. ------------------------------------------------------------------------------ diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index 84e5811..ab8d228 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32" +.TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -3262,13 +3262,13 @@ to ensure that the match is always attempted. .rs .sp The following verbs do nothing when they are encountered. Matching continues -with what follows, but if there is no subsequent match, causing a backtrack to -the verb, a failure is forced. That is, backtracking cannot pass to the left of -the verb. However, when one of these verbs appears inside an atomic group or in -an assertion that is true, its effect is confined to that group, because once -the group has been matched, there is never any backtracking into it. In this -situation, backtracking has to jump to the left of the entire atomic group or -assertion. +with what follows, but if there is a subsequent match failure, causing a +backtrack to the verb, a failure is forced. That is, backtracking cannot pass +to the left of the verb. However, when one of these verbs appears inside an +atomic group or in a lookaround assertion that is true, its effect is confined +to that group, because once the group has been matched, there is never any +backtracking into it. Backtracking from beyond an assertion or an atomic group +ignores the entire group, and seeks a preceeding backtracking point. .P These verbs differ in exactly what kind of failure occurs when backtracking reaches them. The behaviour described below is what happens when the verb is @@ -3352,12 +3352,36 @@ instead of skipping on to "c". .sp (*SKIP:NAME) .sp -When (*SKIP) has an associated name, its behaviour is modified. When it is -triggered, the previous path through the pattern is searched for the most -recent (*MARK) that has the same name. If one is found, the "bumpalong" advance -is to the subject position that corresponds to that (*MARK) instead of to where -(*SKIP) was encountered. If no (*MARK) with a matching name is found, the -(*SKIP) is ignored. +When (*SKIP) has an associated name, its behaviour is modified. When such a +(*SKIP) is triggered, the previous path through the pattern is searched for the +most recent (*MARK) that has the same name. If one is found, the "bumpalong" +advance is to the subject position that corresponds to that (*MARK) instead of +to where (*SKIP) was encountered. If no (*MARK) with a matching name is found, +the (*SKIP) is ignored. +.P +The search for a (*MARK) name uses the normal backtracking mechanism, which +means that it does not see (*MARK) settings that are inside atomic groups or +assertions, because they are never re-entered by backtracking. Compare the +following \fBpcre2test\fP examples: +.sp + re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: a + 1: a + data: + re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: b + 1: b +.sp +In the first example, the (*MARK) setting is in an atomic group, so it is not +seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows +the second branch of the pattern to be tried at the first character position. +In the second example, the (*MARK) setting is not in an atomic group. This +allows (*SKIP:X) to immediately cause a new matching attempt to start at the +second character. This time, the (*MARK) is never seen because "a" does not +match "b", so the matcher immediately jumps to the second branch of the +pattern. .P Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME). @@ -3481,16 +3505,23 @@ If the assertion is a condition, (*ACCEPT) causes the condition to be true for a positive assertion and false for a negative one; captured substrings are retained in both cases. .P +The remaining verbs act only when a later failure causes a backtrack to +reach them. This means that their effect is confined to the assertion, +because lookaround assertions are atomic. A backtrack that occurs after an +assertion is complete does not jump back into the assertion. Note in particular +that a (*MARK) name that is set in an assertion is not "seen" by an instance of +(*SKIP:NAME) latter in the pattern. +.P The effect of (*THEN) is not allowed to escape beyond an assertion. If there are no more branches to try, (*THEN) causes a positive assertion to be false, and a negative assertion to be true. .P The other backtracking verbs are not treated specially if they appear in a standalone positive assertion. In a conditional positive assertion, -backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be -false. However, for both standalone and conditional negative assertions, -backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be -true, without considering any further alternative branches. +backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE) +causes the condition to be false. However, for both standalone and conditional +negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes +the assertion to be true, without considering any further alternative branches. . . .\" HTML @@ -3536,6 +3567,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 10 July 2018 +Last updated: 11 July 2018 Copyright (c) 1997-2018 University of Cambridge. .fi diff --git a/perltest.sh b/perltest.sh index 6e4f44a..97d21ba 100755 --- a/perltest.sh +++ b/perltest.sh @@ -43,7 +43,7 @@ fi # afteralltext ignored # dupnames ignored (Perl always allows) # jitstack ignored -# mark ignored +# mark show mark information # no_auto_possess ignored # no_start_optimize ignored # subject_literal does not process subjects for escapes @@ -172,9 +172,9 @@ for (;;) $mod =~ s/jitstack=\d+,?//; - # Remove "mark" (asks pcre2test to check MARK data) */ + # The "mark" modifier requests checking of MARK data */ - $mod =~ s/mark,?//; + $show_mark = ($mod =~ s/mark,?//); # "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl @@ -279,7 +279,7 @@ for (;;) elsif (scalar(@subs) == 0) { printf $outfile "No match"; - if (defined $REGERROR && $REGERROR != 1) + if ($show_mark && defined $REGERROR && $REGERROR != 1) { printf $outfile (", mark = %s", &pchars($REGERROR)); } printf $outfile "\n"; } @@ -307,7 +307,7 @@ for (;;) # set and the input pattern was a UTF-8 string. We can, however, force # it to be so marked. - if (defined $REGMARK && $REGMARK != 1) + if ($show_mark && defined $REGMARK && $REGMARK != 1) { $xx = $REGMARK; $xx = Encode::decode_utf8($xx) if $utf8; diff --git a/testdata/testinput1 b/testdata/testinput1 index 1b3191c..5a4ec41 100644 --- a/testdata/testinput1 +++ b/testdata/testinput1 @@ -6202,4 +6202,13 @@ ef) x/x,mark /(?<=(?=.){4,5}x)/ +/a(?=.(*:X))(*SKIP:X)(*F)|(.)/ + abc + +/a(?>(*:X))(*SKIP:X)(*F)|(.)/ + abc + +/a(?:(*:X))(*SKIP:X)(*F)|(.)/ + abc + # End of testinput1 diff --git a/testdata/testoutput1 b/testdata/testoutput1 index 06469fa..44f4745 100644 --- a/testdata/testoutput1 +++ b/testdata/testoutput1 @@ -9841,4 +9841,19 @@ No match /(?<=(?=.){4,5}x)/ +/a(?=.(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: a + 1: a + +/a(?>(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: a + 1: a + +/a(?:(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: b + 1: b + # End of testinput1