From 7db5904b9ff6ec51856e6b3749e9258413535c0c Mon Sep 17 00:00:00 2001
From: "Philip.Hazel"
The following verbs do nothing when they are encountered. Matching continues
-with what follows, but if there is no subsequent match, causing a backtrack to
-the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group or in
-an assertion that is true, its effect is confined to that group, because once
-the group has been matched, there is never any backtracking into it. In this
-situation, backtracking has to jump to the left of the entire atomic group or
-assertion.
+with what follows, but if there is a subsequent match failure, causing a
+backtrack to the verb, a failure is forced. That is, backtracking cannot pass
+to the left of the verb. However, when one of these verbs appears inside an
+atomic group or in a lookaround assertion that is true, its effect is confined
+to that group, because once the group has been matched, there is never any
+backtracking into it. Backtracking from beyond an assertion or an atomic group
+ignores the entire group, and seeks a preceeding backtracking point.
These verbs differ in exactly what kind of failure occurs when backtracking
@@ -3321,12 +3321,37 @@ instead of skipping on to "c".
(*SKIP:NAME)
-When (*SKIP) has an associated name, its behaviour is modified. When it is
-triggered, the previous path through the pattern is searched for the most
-recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
-is to the subject position that corresponds to that (*MARK) instead of to where
-(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
-(*SKIP) is ignored.
+When (*SKIP) has an associated name, its behaviour is modified. When such a
+(*SKIP) is triggered, the previous path through the pattern is searched for the
+most recent (*MARK) that has the same name. If one is found, the "bumpalong"
+advance is to the subject position that corresponds to that (*MARK) instead of
+to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
+the (*SKIP) is ignored.
+
+The search for a (*MARK) name uses the normal backtracking mechanism, which +means that it does not see (*MARK) settings that are inside atomic groups or +assertions, because they are never re-entered by backtracking. Compare the +following pcre2test examples: +
+ re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: a + 1: a + data: + re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: b + 1: b ++In the first example, the (*MARK) setting is in an atomic group, so it is not +seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows +the second branch of the pattern to be tried at the first character position. +In the second example, the (*MARK) setting is not in an atomic group. This +allows (*SKIP:X) to immediately cause a new matching attempt to start at the +second character. This time, the (*MARK) is never seen because "a" does not +match "b", so the matcher immediately jumps to the second branch of the +pattern.
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores @@ -3456,6 +3481,14 @@ a positive assertion and false for a negative one; captured substrings are retained in both cases.
+The remaining verbs act only when a later failure causes a backtrack to +reach them. This means that their effect is confined to the assertion, +because lookaround assertions are atomic. A backtrack that occurs after an +assertion is complete does not jump back into the assertion. Note in particular +that a (*MARK) name that is set in an assertion is not "seen" by an instance of +(*SKIP:NAME) latter in the pattern. +
+The effect of (*THEN) is not allowed to escape beyond an assertion. If there are no more branches to try, (*THEN) causes a positive assertion to be false, and a negative assertion to be true. @@ -3463,10 +3496,10 @@ and a negative assertion to be true.
The other backtracking verbs are not treated specially if they appear in a standalone positive assertion. In a conditional positive assertion, -backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be -false. However, for both standalone and conditional negative assertions, -backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be -true, without considering any further alternative branches. +backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE) +causes the condition to be false. However, for both standalone and conditional +negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes +the assertion to be true, without considering any further alternative branches.
-Last updated: 10 July 2018
+Last updated: 11 July 2018
Copyright © 1997-2018 University of Cambridge.
diff --git a/doc/pcre2.txt b/doc/pcre2.txt
index 449c3cb..8df2f9b 100644
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@@ -8695,14 +8695,14 @@ BACKTRACKING CONTROL
Verbs that act after backtracking
The following verbs do nothing when they are encountered. Matching con-
- tinues with what follows, but if there is no subsequent match, causing
- a backtrack to the verb, a failure is forced. That is, backtracking
- cannot pass to the left of the verb. However, when one of these verbs
- appears inside an atomic group or in an assertion that is true, its
- effect is confined to that group, because once the group has been
- matched, there is never any backtracking into it. In this situation,
- backtracking has to jump to the left of the entire atomic group or
- assertion.
+ tinues with what follows, but if there is a subsequent match failure,
+ causing a backtrack to the verb, a failure is forced. That is, back-
+ tracking cannot pass to the left of the verb. However, when one of
+ these verbs appears inside an atomic group or in a lookaround assertion
+ that is true, its effect is confined to that group, because once the
+ group has been matched, there is never any backtracking into it. Back-
+ tracking from beyond an assertion or an atomic group ignores the entire
+ group, and seeks a preceeding backtracking point.
These verbs differ in exactly what kind of failure occurs when back-
tracking reaches them. The behaviour described below is what happens
@@ -8790,12 +8790,36 @@ BACKTRACKING CONTROL
(*SKIP:NAME)
- When (*SKIP) has an associated name, its behaviour is modified. When it
- is triggered, the previous path through the pattern is searched for the
- most recent (*MARK) that has the same name. If one is found, the
- "bumpalong" advance is to the subject position that corresponds to that
- (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
- a matching name is found, the (*SKIP) is ignored.
+ When (*SKIP) has an associated name, its behaviour is modified. When
+ such a (*SKIP) is triggered, the previous path through the pattern is
+ searched for the most recent (*MARK) that has the same name. If one is
+ found, the "bumpalong" advance is to the subject position that corre-
+ sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
+ no (*MARK) with a matching name is found, the (*SKIP) is ignored.
+
+ The search for a (*MARK) name uses the normal backtracking mechanism,
+ which means that it does not see (*MARK) settings that are inside
+ atomic groups or assertions, because they are never re-entered by back-
+ tracking. Compare the following pcre2test examples:
+
+ re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: a
+ 1: a
+ data:
+ re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: b
+ 1: b
+
+ In the first example, the (*MARK) setting is in an atomic group, so it
+ is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
+ This allows the second branch of the pattern to be tried at the first
+ character position. In the second example, the (*MARK) setting is not
+ in an atomic group. This allows (*SKIP:X) to immediately cause a new
+ matching attempt to start at the second character. This time, the
+ (*MARK) is never seen because "a" does not match "b", so the matcher
+ immediately jumps to the second branch of the pattern.
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@@ -8915,41 +8939,48 @@ BACKTRACKING CONTROL
true for a positive assertion and false for a negative one; captured
substrings are retained in both cases.
- The effect of (*THEN) is not allowed to escape beyond an assertion. If
- there are no more branches to try, (*THEN) causes a positive assertion
+ The remaining verbs act only when a later failure causes a backtrack to
+ reach them. This means that their effect is confined to the assertion,
+ because lookaround assertions are atomic. A backtrack that occurs after
+ an assertion is complete does not jump back into the assertion. Note in
+ particular that a (*MARK) name that is set in an assertion is not
+ "seen" by an instance of (*SKIP:NAME) latter in the pattern.
+
+ The effect of (*THEN) is not allowed to escape beyond an assertion. If
+ there are no more branches to try, (*THEN) causes a positive assertion
to be false, and a negative assertion to be true.
- The other backtracking verbs are not treated specially if they appear
- in a standalone positive assertion. In a conditional positive asser-
- tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con-
- dition to be false. However, for both standalone and conditional nega-
- tive assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE)
- causes the assertion to be true, without considering any further alter-
- native branches.
+ The other backtracking verbs are not treated specially if they appear
+ in a standalone positive assertion. In a conditional positive asser-
+ tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
+ or (*PRUNE) causes the condition to be false. However, for both stand-
+ alone and conditional negative assertions, backtracking into (*COMMIT),
+ (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
+ ing any further alternative branches.
Backtracking verbs in subroutines
- These behaviours occur whether or not the subpattern is called recur-
+ These behaviours occur whether or not the subpattern is called recur-
sively. Perl's treatment of subroutines is different in some cases.
- (*FAIL) in a subpattern called as a subroutine has its normal effect:
+ (*FAIL) in a subpattern called as a subroutine has its normal effect:
it forces an immediate backtrack.
- (*ACCEPT) in a subpattern called as a subroutine causes the subroutine
- match to succeed without any further processing. Matching then contin-
+ (*ACCEPT) in a subpattern called as a subroutine causes the subroutine
+ match to succeed without any further processing. Matching then contin-
ues after the subroutine call.
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
cause the subroutine match to fail.
- (*THEN) skips to the next alternative in the innermost enclosing group
- within the subpattern that has alternatives. If there is no such group
+ (*THEN) skips to the next alternative in the innermost enclosing group
+ within the subpattern that has alternatives. If there is no such group
within the subpattern, (*THEN) causes the subroutine match to fail.
SEE ALSO
- pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
+ pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
pcre2(3).
@@ -8962,7 +8993,7 @@ AUTHOR
REVISION
- Last updated: 10 July 2018
+ Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3
index 84e5811..ab8d228 100644
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32"
+.TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -3262,13 +3262,13 @@ to ensure that the match is always attempted.
.rs
.sp
The following verbs do nothing when they are encountered. Matching continues
-with what follows, but if there is no subsequent match, causing a backtrack to
-the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group or in
-an assertion that is true, its effect is confined to that group, because once
-the group has been matched, there is never any backtracking into it. In this
-situation, backtracking has to jump to the left of the entire atomic group or
-assertion.
+with what follows, but if there is a subsequent match failure, causing a
+backtrack to the verb, a failure is forced. That is, backtracking cannot pass
+to the left of the verb. However, when one of these verbs appears inside an
+atomic group or in a lookaround assertion that is true, its effect is confined
+to that group, because once the group has been matched, there is never any
+backtracking into it. Backtracking from beyond an assertion or an atomic group
+ignores the entire group, and seeks a preceeding backtracking point.
.P
These verbs differ in exactly what kind of failure occurs when backtracking
reaches them. The behaviour described below is what happens when the verb is
@@ -3352,12 +3352,36 @@ instead of skipping on to "c".
.sp
(*SKIP:NAME)
.sp
-When (*SKIP) has an associated name, its behaviour is modified. When it is
-triggered, the previous path through the pattern is searched for the most
-recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
-is to the subject position that corresponds to that (*MARK) instead of to where
-(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
-(*SKIP) is ignored.
+When (*SKIP) has an associated name, its behaviour is modified. When such a
+(*SKIP) is triggered, the previous path through the pattern is searched for the
+most recent (*MARK) that has the same name. If one is found, the "bumpalong"
+advance is to the subject position that corresponds to that (*MARK) instead of
+to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
+the (*SKIP) is ignored.
+.P
+The search for a (*MARK) name uses the normal backtracking mechanism, which
+means that it does not see (*MARK) settings that are inside atomic groups or
+assertions, because they are never re-entered by backtracking. Compare the
+following \fBpcre2test\fP examples:
+.sp
+ re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: a
+ 1: a
+ data:
+ re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+ data: abc
+ 0: b
+ 1: b
+.sp
+In the first example, the (*MARK) setting is in an atomic group, so it is not
+seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
+the second branch of the pattern to be tried at the first character position.
+In the second example, the (*MARK) setting is not in an atomic group. This
+allows (*SKIP:X) to immediately cause a new matching attempt to start at the
+second character. This time, the (*MARK) is never seen because "a" does not
+match "b", so the matcher immediately jumps to the second branch of the
+pattern.
.P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@@ -3481,16 +3505,23 @@ If the assertion is a condition, (*ACCEPT) causes the condition to be true for
a positive assertion and false for a negative one; captured substrings are
retained in both cases.
.P
+The remaining verbs act only when a later failure causes a backtrack to
+reach them. This means that their effect is confined to the assertion,
+because lookaround assertions are atomic. A backtrack that occurs after an
+assertion is complete does not jump back into the assertion. Note in particular
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of
+(*SKIP:NAME) latter in the pattern.
+.P
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true.
.P
The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
-false. However, for both standalone and conditional negative assertions,
-backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
-true, without considering any further alternative branches.
+backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
+causes the condition to be false. However, for both standalone and conditional
+negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
+the assertion to be true, without considering any further alternative branches.
.
.
.\" HTML
@@ -3536,6 +3567,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 10 July 2018
+Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi
diff --git a/perltest.sh b/perltest.sh
index 6e4f44a..97d21ba 100755
--- a/perltest.sh
+++ b/perltest.sh
@@ -43,7 +43,7 @@ fi
# afteralltext ignored
# dupnames ignored (Perl always allows)
# jitstack ignored
-# mark ignored
+# mark show mark information
# no_auto_possess ignored
# no_start_optimize ignored
# subject_literal does not process subjects for escapes
@@ -172,9 +172,9 @@ for (;;)
$mod =~ s/jitstack=\d+,?//;
- # Remove "mark" (asks pcre2test to check MARK data) */
+ # The "mark" modifier requests checking of MARK data */
- $mod =~ s/mark,?//;
+ $show_mark = ($mod =~ s/mark,?//);
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
@@ -279,7 +279,7 @@ for (;;)
elsif (scalar(@subs) == 0)
{
printf $outfile "No match";
- if (defined $REGERROR && $REGERROR != 1)
+ if ($show_mark && defined $REGERROR && $REGERROR != 1)
{ printf $outfile (", mark = %s", &pchars($REGERROR)); }
printf $outfile "\n";
}
@@ -307,7 +307,7 @@ for (;;)
# set and the input pattern was a UTF-8 string. We can, however, force
# it to be so marked.
- if (defined $REGMARK && $REGMARK != 1)
+ if ($show_mark && defined $REGMARK && $REGMARK != 1)
{
$xx = $REGMARK;
$xx = Encode::decode_utf8($xx) if $utf8;
diff --git a/testdata/testinput1 b/testdata/testinput1
index 1b3191c..5a4ec41 100644
--- a/testdata/testinput1
+++ b/testdata/testinput1
@@ -6202,4 +6202,13 @@ ef) x/x,mark
/(?<=(?=.){4,5}x)/
+/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+
+/a(?>(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+
+/a(?:(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+
# End of testinput1
diff --git a/testdata/testoutput1 b/testdata/testoutput1
index 06469fa..44f4745 100644
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@@ -9841,4 +9841,19 @@ No match
/(?<=(?=.){4,5}x)/
+/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+ 0: a
+ 1: a
+
+/a(?>(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+ 0: a
+ 1: a
+
+/a(?:(*:X))(*SKIP:X)(*F)|(.)/
+ abc
+ 0: b
+ 1: b
+
# End of testinput1