Documentation and tests update and minor tweak to perltest.sh.

This commit is contained in:
Philip.Hazel 2018-07-12 17:04:43 +00:00
parent 9bd1f3030e
commit 7db5904b9f
6 changed files with 192 additions and 73 deletions

View File

@ -3227,13 +3227,13 @@ Verbs that act after backtracking
</b><br> </b><br>
<P> <P>
The following verbs do nothing when they are encountered. Matching continues The following verbs do nothing when they are encountered. Matching continues
with what follows, but if there is no subsequent match, causing a backtrack to with what follows, but if there is a subsequent match failure, causing a
the verb, a failure is forced. That is, backtracking cannot pass to the left of backtrack to the verb, a failure is forced. That is, backtracking cannot pass
the verb. However, when one of these verbs appears inside an atomic group or in to the left of the verb. However, when one of these verbs appears inside an
an assertion that is true, its effect is confined to that group, because once atomic group or in a lookaround assertion that is true, its effect is confined
the group has been matched, there is never any backtracking into it. In this to that group, because once the group has been matched, there is never any
situation, backtracking has to jump to the left of the entire atomic group or backtracking into it. Backtracking from beyond an assertion or an atomic group
assertion. ignores the entire group, and seeks a preceeding backtracking point.
</P> </P>
<P> <P>
These verbs differ in exactly what kind of failure occurs when backtracking These verbs differ in exactly what kind of failure occurs when backtracking
@ -3321,12 +3321,37 @@ instead of skipping on to "c".
<pre> <pre>
(*SKIP:NAME) (*SKIP:NAME)
</pre> </pre>
When (*SKIP) has an associated name, its behaviour is modified. When it is When (*SKIP) has an associated name, its behaviour is modified. When such a
triggered, the previous path through the pattern is searched for the most (*SKIP) is triggered, the previous path through the pattern is searched for the
recent (*MARK) that has the same name. If one is found, the "bumpalong" advance most recent (*MARK) that has the same name. If one is found, the "bumpalong"
is to the subject position that corresponds to that (*MARK) instead of to where advance is to the subject position that corresponds to that (*MARK) instead of
(*SKIP) was encountered. If no (*MARK) with a matching name is found, the to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
(*SKIP) is ignored. the (*SKIP) is ignored.
</P>
<P>
The search for a (*MARK) name uses the normal backtracking mechanism, which
means that it does not see (*MARK) settings that are inside atomic groups or
assertions, because they are never re-entered by backtracking. Compare the
following <b>pcre2test</b> examples:
<pre>
re&#62; /a(?&#62;(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: a
1: a
data:
re&#62; /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: b
1: b
</pre>
In the first example, the (*MARK) setting is in an atomic group, so it is not
seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
the second branch of the pattern to be tried at the first character position.
In the second example, the (*MARK) setting is not in an atomic group. This
allows (*SKIP:X) to immediately cause a new matching attempt to start at the
second character. This time, the (*MARK) is never seen because "a" does not
match "b", so the matcher immediately jumps to the second branch of the
pattern.
</P> </P>
<P> <P>
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
@ -3456,6 +3481,14 @@ a positive assertion and false for a negative one; captured substrings are
retained in both cases. retained in both cases.
</P> </P>
<P> <P>
The remaining verbs act only when a later failure causes a backtrack to
reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after an
assertion is complete does not jump back into the assertion. Note in particular
that a (*MARK) name that is set in an assertion is not "seen" by an instance of
(*SKIP:NAME) latter in the pattern.
</P>
<P>
The effect of (*THEN) is not allowed to escape beyond an assertion. If there The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false, are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true. and a negative assertion to be true.
@ -3463,10 +3496,10 @@ and a negative assertion to be true.
<P> <P>
The other backtracking verbs are not treated specially if they appear in a The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion, standalone positive assertion. In a conditional positive assertion,
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
false. However, for both standalone and conditional negative assertions, causes the condition to be false. However, for both standalone and conditional
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
true, without considering any further alternative branches. the assertion to be true, without considering any further alternative branches.
<a name="btsub"></a></P> <a name="btsub"></a></P>
<br><b> <br><b>
Backtracking verbs in subroutines Backtracking verbs in subroutines
@ -3509,7 +3542,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 10 July 2018 Last updated: 11 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -8695,14 +8695,14 @@ BACKTRACKING CONTROL
Verbs that act after backtracking Verbs that act after backtracking
The following verbs do nothing when they are encountered. Matching con- The following verbs do nothing when they are encountered. Matching con-
tinues with what follows, but if there is no subsequent match, causing tinues with what follows, but if there is a subsequent match failure,
a backtrack to the verb, a failure is forced. That is, backtracking causing a backtrack to the verb, a failure is forced. That is, back-
cannot pass to the left of the verb. However, when one of these verbs tracking cannot pass to the left of the verb. However, when one of
appears inside an atomic group or in an assertion that is true, its these verbs appears inside an atomic group or in a lookaround assertion
effect is confined to that group, because once the group has been that is true, its effect is confined to that group, because once the
matched, there is never any backtracking into it. In this situation, group has been matched, there is never any backtracking into it. Back-
backtracking has to jump to the left of the entire atomic group or tracking from beyond an assertion or an atomic group ignores the entire
assertion. group, and seeks a preceeding backtracking point.
These verbs differ in exactly what kind of failure occurs when back- These verbs differ in exactly what kind of failure occurs when back-
tracking reaches them. The behaviour described below is what happens tracking reaches them. The behaviour described below is what happens
@ -8790,12 +8790,36 @@ BACKTRACKING CONTROL
(*SKIP:NAME) (*SKIP:NAME)
When (*SKIP) has an associated name, its behaviour is modified. When it When (*SKIP) has an associated name, its behaviour is modified. When
is triggered, the previous path through the pattern is searched for the such a (*SKIP) is triggered, the previous path through the pattern is
most recent (*MARK) that has the same name. If one is found, the searched for the most recent (*MARK) that has the same name. If one is
"bumpalong" advance is to the subject position that corresponds to that found, the "bumpalong" advance is to the subject position that corre-
(*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
a matching name is found, the (*SKIP) is ignored. no (*MARK) with a matching name is found, the (*SKIP) is ignored.
The search for a (*MARK) name uses the normal backtracking mechanism,
which means that it does not see (*MARK) settings that are inside
atomic groups or assertions, because they are never re-entered by back-
tracking. Compare the following pcre2test examples:
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: a
1: a
data:
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: b
1: b
In the first example, the (*MARK) setting is in an atomic group, so it
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
This allows the second branch of the pattern to be tried at the first
character position. In the second example, the (*MARK) setting is not
in an atomic group. This allows (*SKIP:X) to immediately cause a new
matching attempt to start at the second character. This time, the
(*MARK) is never seen because "a" does not match "b", so the matcher
immediately jumps to the second branch of the pattern.
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME). ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@ -8915,41 +8939,48 @@ BACKTRACKING CONTROL
true for a positive assertion and false for a negative one; captured true for a positive assertion and false for a negative one; captured
substrings are retained in both cases. substrings are retained in both cases.
The effect of (*THEN) is not allowed to escape beyond an assertion. If The remaining verbs act only when a later failure causes a backtrack to
there are no more branches to try, (*THEN) causes a positive assertion reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after
an assertion is complete does not jump back into the assertion. Note in
particular that a (*MARK) name that is set in an assertion is not
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
The effect of (*THEN) is not allowed to escape beyond an assertion. If
there are no more branches to try, (*THEN) causes a positive assertion
to be false, and a negative assertion to be true. to be false, and a negative assertion to be true.
The other backtracking verbs are not treated specially if they appear The other backtracking verbs are not treated specially if they appear
in a standalone positive assertion. In a conditional positive asser- in a standalone positive assertion. In a conditional positive asser-
tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con- tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
dition to be false. However, for both standalone and conditional nega- or (*PRUNE) causes the condition to be false. However, for both stand-
tive assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) alone and conditional negative assertions, backtracking into (*COMMIT),
causes the assertion to be true, without considering any further alter- (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
native branches. ing any further alternative branches.
Backtracking verbs in subroutines Backtracking verbs in subroutines
These behaviours occur whether or not the subpattern is called recur- These behaviours occur whether or not the subpattern is called recur-
sively. Perl's treatment of subroutines is different in some cases. sively. Perl's treatment of subroutines is different in some cases.
(*FAIL) in a subpattern called as a subroutine has its normal effect: (*FAIL) in a subpattern called as a subroutine has its normal effect:
it forces an immediate backtrack. it forces an immediate backtrack.
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine (*ACCEPT) in a subpattern called as a subroutine causes the subroutine
match to succeed without any further processing. Matching then contin- match to succeed without any further processing. Matching then contin-
ues after the subroutine call. ues after the subroutine call.
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
cause the subroutine match to fail. cause the subroutine match to fail.
(*THEN) skips to the next alternative in the innermost enclosing group (*THEN) skips to the next alternative in the innermost enclosing group
within the subpattern that has alternatives. If there is no such group within the subpattern that has alternatives. If there is no such group
within the subpattern, (*THEN) causes the subroutine match to fail. within the subpattern, (*THEN) causes the subroutine match to fail.
SEE ALSO SEE ALSO
pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3), pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
pcre2(3). pcre2(3).
@ -8962,7 +8993,7 @@ AUTHOR
REVISION REVISION
Last updated: 10 July 2018 Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32" .TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -3262,13 +3262,13 @@ to ensure that the match is always attempted.
.rs .rs
.sp .sp
The following verbs do nothing when they are encountered. Matching continues The following verbs do nothing when they are encountered. Matching continues
with what follows, but if there is no subsequent match, causing a backtrack to with what follows, but if there is a subsequent match failure, causing a
the verb, a failure is forced. That is, backtracking cannot pass to the left of backtrack to the verb, a failure is forced. That is, backtracking cannot pass
the verb. However, when one of these verbs appears inside an atomic group or in to the left of the verb. However, when one of these verbs appears inside an
an assertion that is true, its effect is confined to that group, because once atomic group or in a lookaround assertion that is true, its effect is confined
the group has been matched, there is never any backtracking into it. In this to that group, because once the group has been matched, there is never any
situation, backtracking has to jump to the left of the entire atomic group or backtracking into it. Backtracking from beyond an assertion or an atomic group
assertion. ignores the entire group, and seeks a preceeding backtracking point.
.P .P
These verbs differ in exactly what kind of failure occurs when backtracking These verbs differ in exactly what kind of failure occurs when backtracking
reaches them. The behaviour described below is what happens when the verb is reaches them. The behaviour described below is what happens when the verb is
@ -3352,12 +3352,36 @@ instead of skipping on to "c".
.sp .sp
(*SKIP:NAME) (*SKIP:NAME)
.sp .sp
When (*SKIP) has an associated name, its behaviour is modified. When it is When (*SKIP) has an associated name, its behaviour is modified. When such a
triggered, the previous path through the pattern is searched for the most (*SKIP) is triggered, the previous path through the pattern is searched for the
recent (*MARK) that has the same name. If one is found, the "bumpalong" advance most recent (*MARK) that has the same name. If one is found, the "bumpalong"
is to the subject position that corresponds to that (*MARK) instead of to where advance is to the subject position that corresponds to that (*MARK) instead of
(*SKIP) was encountered. If no (*MARK) with a matching name is found, the to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
(*SKIP) is ignored. the (*SKIP) is ignored.
.P
The search for a (*MARK) name uses the normal backtracking mechanism, which
means that it does not see (*MARK) settings that are inside atomic groups or
assertions, because they are never re-entered by backtracking. Compare the
following \fBpcre2test\fP examples:
.sp
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: a
1: a
data:
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: b
1: b
.sp
In the first example, the (*MARK) setting is in an atomic group, so it is not
seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
the second branch of the pattern to be tried at the first character position.
In the second example, the (*MARK) setting is not in an atomic group. This
allows (*SKIP:X) to immediately cause a new matching attempt to start at the
second character. This time, the (*MARK) is never seen because "a" does not
match "b", so the matcher immediately jumps to the second branch of the
pattern.
.P .P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME). names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@ -3481,16 +3505,23 @@ If the assertion is a condition, (*ACCEPT) causes the condition to be true for
a positive assertion and false for a negative one; captured substrings are a positive assertion and false for a negative one; captured substrings are
retained in both cases. retained in both cases.
.P .P
The remaining verbs act only when a later failure causes a backtrack to
reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after an
assertion is complete does not jump back into the assertion. Note in particular
that a (*MARK) name that is set in an assertion is not "seen" by an instance of
(*SKIP:NAME) latter in the pattern.
.P
The effect of (*THEN) is not allowed to escape beyond an assertion. If there The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false, are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true. and a negative assertion to be true.
.P .P
The other backtracking verbs are not treated specially if they appear in a The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion, standalone positive assertion. In a conditional positive assertion,
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
false. However, for both standalone and conditional negative assertions, causes the condition to be false. However, for both standalone and conditional
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
true, without considering any further alternative branches. the assertion to be true, without considering any further alternative branches.
. .
. .
.\" HTML <a name="btsub"></a> .\" HTML <a name="btsub"></a>
@ -3536,6 +3567,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 10 July 2018 Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -43,7 +43,7 @@ fi
# afteralltext ignored # afteralltext ignored
# dupnames ignored (Perl always allows) # dupnames ignored (Perl always allows)
# jitstack ignored # jitstack ignored
# mark ignored # mark show mark information
# no_auto_possess ignored # no_auto_possess ignored
# no_start_optimize ignored # no_start_optimize ignored
# subject_literal does not process subjects for escapes # subject_literal does not process subjects for escapes
@ -172,9 +172,9 @@ for (;;)
$mod =~ s/jitstack=\d+,?//; $mod =~ s/jitstack=\d+,?//;
# Remove "mark" (asks pcre2test to check MARK data) */ # The "mark" modifier requests checking of MARK data */
$mod =~ s/mark,?//; $show_mark = ($mod =~ s/mark,?//);
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl # "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
@ -279,7 +279,7 @@ for (;;)
elsif (scalar(@subs) == 0) elsif (scalar(@subs) == 0)
{ {
printf $outfile "No match"; printf $outfile "No match";
if (defined $REGERROR && $REGERROR != 1) if ($show_mark && defined $REGERROR && $REGERROR != 1)
{ printf $outfile (", mark = %s", &pchars($REGERROR)); } { printf $outfile (", mark = %s", &pchars($REGERROR)); }
printf $outfile "\n"; printf $outfile "\n";
} }
@ -307,7 +307,7 @@ for (;;)
# set and the input pattern was a UTF-8 string. We can, however, force # set and the input pattern was a UTF-8 string. We can, however, force
# it to be so marked. # it to be so marked.
if (defined $REGMARK && $REGMARK != 1) if ($show_mark && defined $REGMARK && $REGMARK != 1)
{ {
$xx = $REGMARK; $xx = $REGMARK;
$xx = Encode::decode_utf8($xx) if $utf8; $xx = Encode::decode_utf8($xx) if $utf8;

9
testdata/testinput1 vendored
View File

@ -6202,4 +6202,13 @@ ef) x/x,mark
/(?<=(?=.){4,5}x)/ /(?<=(?=.){4,5}x)/
/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
abc
/a(?>(*:X))(*SKIP:X)(*F)|(.)/
abc
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
abc
# End of testinput1 # End of testinput1

15
testdata/testoutput1 vendored
View File

@ -9841,4 +9841,19 @@ No match
/(?<=(?=.){4,5}x)/ /(?<=(?=.){4,5}x)/
/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
abc
0: a
1: a
/a(?>(*:X))(*SKIP:X)(*F)|(.)/
abc
0: a
1: a
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
abc
0: b
1: b
# End of testinput1 # End of testinput1