Documentation and tests update and minor tweak to perltest.sh.

This commit is contained in:
Philip.Hazel 2018-07-12 17:04:43 +00:00
parent 9bd1f3030e
commit 7db5904b9f
6 changed files with 192 additions and 73 deletions

View File

@ -3227,13 +3227,13 @@ Verbs that act after backtracking
</b><br>
<P>
The following verbs do nothing when they are encountered. Matching continues
with what follows, but if there is no subsequent match, causing a backtrack to
the verb, a failure is forced. That is, backtracking cannot pass to the left of
the verb. However, when one of these verbs appears inside an atomic group or in
an assertion that is true, its effect is confined to that group, because once
the group has been matched, there is never any backtracking into it. In this
situation, backtracking has to jump to the left of the entire atomic group or
assertion.
with what follows, but if there is a subsequent match failure, causing a
backtrack to the verb, a failure is forced. That is, backtracking cannot pass
to the left of the verb. However, when one of these verbs appears inside an
atomic group or in a lookaround assertion that is true, its effect is confined
to that group, because once the group has been matched, there is never any
backtracking into it. Backtracking from beyond an assertion or an atomic group
ignores the entire group, and seeks a preceeding backtracking point.
</P>
<P>
These verbs differ in exactly what kind of failure occurs when backtracking
@ -3321,12 +3321,37 @@ instead of skipping on to "c".
<pre>
(*SKIP:NAME)
</pre>
When (*SKIP) has an associated name, its behaviour is modified. When it is
triggered, the previous path through the pattern is searched for the most
recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
is to the subject position that corresponds to that (*MARK) instead of to where
(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
(*SKIP) is ignored.
When (*SKIP) has an associated name, its behaviour is modified. When such a
(*SKIP) is triggered, the previous path through the pattern is searched for the
most recent (*MARK) that has the same name. If one is found, the "bumpalong"
advance is to the subject position that corresponds to that (*MARK) instead of
to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
the (*SKIP) is ignored.
</P>
<P>
The search for a (*MARK) name uses the normal backtracking mechanism, which
means that it does not see (*MARK) settings that are inside atomic groups or
assertions, because they are never re-entered by backtracking. Compare the
following <b>pcre2test</b> examples:
<pre>
re&#62; /a(?&#62;(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: a
1: a
data:
re&#62; /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: b
1: b
</pre>
In the first example, the (*MARK) setting is in an atomic group, so it is not
seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
the second branch of the pattern to be tried at the first character position.
In the second example, the (*MARK) setting is not in an atomic group. This
allows (*SKIP:X) to immediately cause a new matching attempt to start at the
second character. This time, the (*MARK) is never seen because "a" does not
match "b", so the matcher immediately jumps to the second branch of the
pattern.
</P>
<P>
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
@ -3456,6 +3481,14 @@ a positive assertion and false for a negative one; captured substrings are
retained in both cases.
</P>
<P>
The remaining verbs act only when a later failure causes a backtrack to
reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after an
assertion is complete does not jump back into the assertion. Note in particular
that a (*MARK) name that is set in an assertion is not "seen" by an instance of
(*SKIP:NAME) latter in the pattern.
</P>
<P>
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true.
@ -3463,10 +3496,10 @@ and a negative assertion to be true.
<P>
The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion,
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
false. However, for both standalone and conditional negative assertions,
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
true, without considering any further alternative branches.
backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
causes the condition to be false. However, for both standalone and conditional
negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
the assertion to be true, without considering any further alternative branches.
<a name="btsub"></a></P>
<br><b>
Backtracking verbs in subroutines
@ -3509,7 +3542,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 10 July 2018
Last updated: 11 July 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -8695,14 +8695,14 @@ BACKTRACKING CONTROL
Verbs that act after backtracking
The following verbs do nothing when they are encountered. Matching con-
tinues with what follows, but if there is no subsequent match, causing
a backtrack to the verb, a failure is forced. That is, backtracking
cannot pass to the left of the verb. However, when one of these verbs
appears inside an atomic group or in an assertion that is true, its
effect is confined to that group, because once the group has been
matched, there is never any backtracking into it. In this situation,
backtracking has to jump to the left of the entire atomic group or
assertion.
tinues with what follows, but if there is a subsequent match failure,
causing a backtrack to the verb, a failure is forced. That is, back-
tracking cannot pass to the left of the verb. However, when one of
these verbs appears inside an atomic group or in a lookaround assertion
that is true, its effect is confined to that group, because once the
group has been matched, there is never any backtracking into it. Back-
tracking from beyond an assertion or an atomic group ignores the entire
group, and seeks a preceeding backtracking point.
These verbs differ in exactly what kind of failure occurs when back-
tracking reaches them. The behaviour described below is what happens
@ -8790,12 +8790,36 @@ BACKTRACKING CONTROL
(*SKIP:NAME)
When (*SKIP) has an associated name, its behaviour is modified. When it
is triggered, the previous path through the pattern is searched for the
most recent (*MARK) that has the same name. If one is found, the
"bumpalong" advance is to the subject position that corresponds to that
(*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
a matching name is found, the (*SKIP) is ignored.
When (*SKIP) has an associated name, its behaviour is modified. When
such a (*SKIP) is triggered, the previous path through the pattern is
searched for the most recent (*MARK) that has the same name. If one is
found, the "bumpalong" advance is to the subject position that corre-
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
The search for a (*MARK) name uses the normal backtracking mechanism,
which means that it does not see (*MARK) settings that are inside
atomic groups or assertions, because they are never re-entered by back-
tracking. Compare the following pcre2test examples:
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: a
1: a
data:
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: b
1: b
In the first example, the (*MARK) setting is in an atomic group, so it
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
This allows the second branch of the pattern to be tried at the first
character position. In the second example, the (*MARK) setting is not
in an atomic group. This allows (*SKIP:X) to immediately cause a new
matching attempt to start at the second character. This time, the
(*MARK) is never seen because "a" does not match "b", so the matcher
immediately jumps to the second branch of the pattern.
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@ -8915,17 +8939,24 @@ BACKTRACKING CONTROL
true for a positive assertion and false for a negative one; captured
substrings are retained in both cases.
The remaining verbs act only when a later failure causes a backtrack to
reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after
an assertion is complete does not jump back into the assertion. Note in
particular that a (*MARK) name that is set in an assertion is not
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
The effect of (*THEN) is not allowed to escape beyond an assertion. If
there are no more branches to try, (*THEN) causes a positive assertion
to be false, and a negative assertion to be true.
The other backtracking verbs are not treated specially if they appear
in a standalone positive assertion. In a conditional positive asser-
tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con-
dition to be false. However, for both standalone and conditional nega-
tive assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE)
causes the assertion to be true, without considering any further alter-
native branches.
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
or (*PRUNE) causes the condition to be false. However, for both stand-
alone and conditional negative assertions, backtracking into (*COMMIT),
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
ing any further alternative branches.
Backtracking verbs in subroutines
@ -8962,7 +8993,7 @@ AUTHOR
REVISION
Last updated: 10 July 2018
Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32"
.TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -3262,13 +3262,13 @@ to ensure that the match is always attempted.
.rs
.sp
The following verbs do nothing when they are encountered. Matching continues
with what follows, but if there is no subsequent match, causing a backtrack to
the verb, a failure is forced. That is, backtracking cannot pass to the left of
the verb. However, when one of these verbs appears inside an atomic group or in
an assertion that is true, its effect is confined to that group, because once
the group has been matched, there is never any backtracking into it. In this
situation, backtracking has to jump to the left of the entire atomic group or
assertion.
with what follows, but if there is a subsequent match failure, causing a
backtrack to the verb, a failure is forced. That is, backtracking cannot pass
to the left of the verb. However, when one of these verbs appears inside an
atomic group or in a lookaround assertion that is true, its effect is confined
to that group, because once the group has been matched, there is never any
backtracking into it. Backtracking from beyond an assertion or an atomic group
ignores the entire group, and seeks a preceeding backtracking point.
.P
These verbs differ in exactly what kind of failure occurs when backtracking
reaches them. The behaviour described below is what happens when the verb is
@ -3352,12 +3352,36 @@ instead of skipping on to "c".
.sp
(*SKIP:NAME)
.sp
When (*SKIP) has an associated name, its behaviour is modified. When it is
triggered, the previous path through the pattern is searched for the most
recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
is to the subject position that corresponds to that (*MARK) instead of to where
(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
(*SKIP) is ignored.
When (*SKIP) has an associated name, its behaviour is modified. When such a
(*SKIP) is triggered, the previous path through the pattern is searched for the
most recent (*MARK) that has the same name. If one is found, the "bumpalong"
advance is to the subject position that corresponds to that (*MARK) instead of
to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
the (*SKIP) is ignored.
.P
The search for a (*MARK) name uses the normal backtracking mechanism, which
means that it does not see (*MARK) settings that are inside atomic groups or
assertions, because they are never re-entered by backtracking. Compare the
following \fBpcre2test\fP examples:
.sp
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: a
1: a
data:
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
data: abc
0: b
1: b
.sp
In the first example, the (*MARK) setting is in an atomic group, so it is not
seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
the second branch of the pattern to be tried at the first character position.
In the second example, the (*MARK) setting is not in an atomic group. This
allows (*SKIP:X) to immediately cause a new matching attempt to start at the
second character. This time, the (*MARK) is never seen because "a" does not
match "b", so the matcher immediately jumps to the second branch of the
pattern.
.P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
@ -3481,16 +3505,23 @@ If the assertion is a condition, (*ACCEPT) causes the condition to be true for
a positive assertion and false for a negative one; captured substrings are
retained in both cases.
.P
The remaining verbs act only when a later failure causes a backtrack to
reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after an
assertion is complete does not jump back into the assertion. Note in particular
that a (*MARK) name that is set in an assertion is not "seen" by an instance of
(*SKIP:NAME) latter in the pattern.
.P
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
are no more branches to try, (*THEN) causes a positive assertion to be false,
and a negative assertion to be true.
.P
The other backtracking verbs are not treated specially if they appear in a
standalone positive assertion. In a conditional positive assertion,
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
false. However, for both standalone and conditional negative assertions,
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
true, without considering any further alternative branches.
backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
causes the condition to be false. However, for both standalone and conditional
negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
the assertion to be true, without considering any further alternative branches.
.
.
.\" HTML <a name="btsub"></a>
@ -3536,6 +3567,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 10 July 2018
Last updated: 11 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -43,7 +43,7 @@ fi
# afteralltext ignored
# dupnames ignored (Perl always allows)
# jitstack ignored
# mark ignored
# mark show mark information
# no_auto_possess ignored
# no_start_optimize ignored
# subject_literal does not process subjects for escapes
@ -172,9 +172,9 @@ for (;;)
$mod =~ s/jitstack=\d+,?//;
# Remove "mark" (asks pcre2test to check MARK data) */
# The "mark" modifier requests checking of MARK data */
$mod =~ s/mark,?//;
$show_mark = ($mod =~ s/mark,?//);
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
@ -279,7 +279,7 @@ for (;;)
elsif (scalar(@subs) == 0)
{
printf $outfile "No match";
if (defined $REGERROR && $REGERROR != 1)
if ($show_mark && defined $REGERROR && $REGERROR != 1)
{ printf $outfile (", mark = %s", &pchars($REGERROR)); }
printf $outfile "\n";
}
@ -307,7 +307,7 @@ for (;;)
# set and the input pattern was a UTF-8 string. We can, however, force
# it to be so marked.
if (defined $REGMARK && $REGMARK != 1)
if ($show_mark && defined $REGMARK && $REGMARK != 1)
{
$xx = $REGMARK;
$xx = Encode::decode_utf8($xx) if $utf8;

9
testdata/testinput1 vendored
View File

@ -6202,4 +6202,13 @@ ef) x/x,mark
/(?<=(?=.){4,5}x)/
/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
abc
/a(?>(*:X))(*SKIP:X)(*F)|(.)/
abc
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
abc
# End of testinput1

15
testdata/testoutput1 vendored
View File

@ -9841,4 +9841,19 @@ No match
/(?<=(?=.){4,5}x)/
/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
abc
0: a
1: a
/a(?>(*:X))(*SKIP:X)(*F)|(.)/
abc
0: a
1: a
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
abc
0: b
1: b
# End of testinput1