Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)

followed by (*ACCEPT) in an assertion. More small updates to perltest.sh.
This commit is contained in:
Philip.Hazel 2018-07-21 14:34:51 +00:00
parent 635d04fbb7
commit 192b82cf6e
23 changed files with 551 additions and 334 deletions

View File

@ -119,9 +119,16 @@ backtrack into the first of the atomic groups. A complicated example is
shouldn't find a MARK (because is in an atomic group), but it did. shouldn't find a MARK (because is in an atomic group), but it did.
26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set 26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
certain modifiers that the script recognizes; (2) Unsupported #command lines a list of modifiers for all subsequent patterns - only those that the script
give a warning when they are ignored; (3) Mark data is output only if the recognizes are meaningful; (2) #subject lines can be used to set or unset a
"mark" modifier is present. default "mark" modifier; (3) Unsupported #command lines give a warning when
they are ignored; (4) Mark data is output only if the "mark" modifier is
present.
27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
28. A (*MARK) name was not being passed back for positive assertions that were
terminated by (*ACCEPT).
Version 10.31 12-February-2018 Version 10.31 12-February-2018

27
HACKING
View File

@ -256,6 +256,7 @@ The following are followed by a length element, then a number of character code
values (which should match with the length): values (which should match with the length):
META_MARK (*MARK:xxxx) META_MARK (*MARK:xxxx)
META_COMMIT_ARG )*COMMIT:xxxx)
META_PRUNE_ARG (*PRUNE:xxx) META_PRUNE_ARG (*PRUNE:xxx)
META_SKIP_ARG (*SKIP:xxxx) META_SKIP_ARG (*SKIP:xxxx)
META_THEN_ARG (*THEN:xxxx) META_THEN_ARG (*THEN:xxxx)
@ -382,7 +383,7 @@ that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
Opcodes with no following data Opcodes with no following data
------------------------------ ------------------------------
These items are all just one unit long These items are all just one unit long:
OP_END end of pattern OP_END end of pattern
OP_ANY match any one character other than newline OP_ANY match any one character other than newline
@ -430,14 +431,22 @@ character). Another use is for [^] when empty classes are permitted
(PCRE2_ALLOW_EMPTY_CLASS is set). (PCRE2_ALLOW_EMPTY_CLASS is set).
Backtracking control verbs with optional data Backtracking control verbs
--------------------------------------------- --------------------------
(*THEN) without an argument generates the opcode OP_THEN and no following data. Verbs with no arguments generate opcodes with no following data (as listed
OP_MARK is followed by the mark name, preceded by a length in one code unit, in the section above).
and followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with
arguments, the opcodes OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, (*MARK:NAME) generates OP_MARK followed by the mark name, preceded by a
with the name following in the same format as OP_MARK. length in one code unit, and followed by a binary zero. The name length is
limited by the size of the code unit.
(*ACCEPT:NAME) and (*FAIL:NAME) are compiled as (*MARK:NAME)(*ACCEPT) and
(*MARK:NAME)(*FAIL) respectively.
For (*COMMIT:NAME), (*PRUNE:NAME), (*SKIP:NAME), and (*THEN:NAME), the opcodes
OP_COMMIT_ARG, OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, with the
name following in the same format as for OP_MARK.
Matching literal characters Matching literal characters
@ -814,4 +823,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
opcode are the correct length, in order to catch updating errors. opcode are the correct length, in order to catch updating errors.
Philip Hazel Philip Hazel
21 April 2017 20 July 2018

View File

@ -3122,17 +3122,16 @@ in the
documentation. documentation.
</P> </P>
<P> <P>
Experiments with Perl suggest that it too has similar optimizations, sometimes Experiments with Perl suggest that it too has similar optimizations, and like
leading to anomalous results. PCRE2, turning them off can change the result of a match.
</P> </P>
<br><b> <br><b>
Verbs that act immediately Verbs that act immediately
</b><br> </b><br>
<P> <P>
The following verbs act as soon as they are encountered. They may not be The following verbs act as soon as they are encountered.
followed by a name.
<pre> <pre>
(*ACCEPT) (*ACCEPT) or (*ACCEPT:NAME)
</pre> </pre>
This verb causes the match to end successfully, skipping the remainder of the This verb causes the match to end successfully, skipping the remainder of the
pattern. However, when it is inside a subpattern that is called as a pattern. However, when it is inside a subpattern that is called as a
@ -3149,19 +3148,23 @@ example:
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
the outer parentheses. the outer parentheses.
<pre> <pre>
(*FAIL) or (*F) (*FAIL) or (*FAIL:NAME)
</pre> </pre>
This verb causes a matching failure, forcing backtracking to occur. It is This verb causes a matching failure, forcing backtracking to occur. It may be
equivalent to (?!) but easier to read. The Perl documentation notes that it is abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
probably useful only when combined with (?{}) or (??{}). Those are, of course, documentation notes that it is probably useful only when combined with (?{}) or
Perl features that are not present in PCRE2. The nearest equivalent is the (??{}). Those are, of course, Perl features that are not present in PCRE2. The
callout feature, as for example in this pattern: nearest equivalent is the callout feature, as for example in this pattern:
<pre> <pre>
a+(?C)(*FAIL) a+(?C)(*FAIL)
</pre> </pre>
A match with the string "aaaa" always fails, but the callout is taken before A match with the string "aaaa" always fails, but the callout is taken before
each backtrack happens (in this example, 10 times). each backtrack happens (in this example, 10 times).
</P> </P>
<P>
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
</P>
<br><b> <br><b>
Recording which path was taken Recording which path was taken
</b><br> </b><br>
@ -3186,9 +3189,9 @@ assertions and atomic groups. (There are differences in those cases when
(*MARK) is used in conjunction with (*SKIP) as described below.) (*MARK) is used in conjunction with (*SKIP) as described below.)
</P> </P>
<P> <P>
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
arguments. Whichever is last on the matching path is passed back. See below for associated NAME arguments. Whichever is last on the matching path is passed
more details of these other verbs. back. See below for more details of these other verbs.
</P> </P>
<P> <P>
Here is an example of <b>pcre2test</b> output, where the "mark" modifier Here is an example of <b>pcre2test</b> output, where the "mark" modifier
@ -3250,22 +3253,25 @@ reaches them. The behaviour described below is what happens when the verb is
not in a subroutine or an assertion. Subsequent sections cover these special not in a subroutine or an assertion. Subsequent sections cover these special
cases. cases.
<pre> <pre>
(*COMMIT) (*COMMIT) or (*COMMIT:NAME)
</pre> </pre>
This verb, which may not be followed by a name, causes the whole match to fail This verb causes the whole match to fail outright if there is a later matching
outright if there is a later matching failure that causes backtracking to reach failure that causes backtracking to reach it. Even if the pattern is
it. Even if the pattern is unanchored, no further attempts to find a match by unanchored, no further attempts to find a match by advancing the starting point
advancing the starting point take place. If (*COMMIT) is the only backtracking take place. If (*COMMIT) is the only backtracking verb that is encountered,
verb that is encountered, once it has been passed <b>pcre2_match()</b> is once it has been passed <b>pcre2_match()</b> is committed to finding a match at
committed to finding a match at the current starting point, or not at all. For the current starting point, or not at all. For example:
example:
<pre> <pre>
a+(*COMMIT)b a+(*COMMIT)b
</pre> </pre>
This matches "xxaab" but not "aacaab". It can be thought of as a kind of This matches "xxaab" but not "aacaab". It can be thought of as a kind of
dynamic anchor, or "I've started, so I must finish." The name of the most dynamic anchor, or "I've started, so I must finish."
recently passed (*MARK) in the path is passed back when (*COMMIT) forces a </P>
match failure. <P>
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
</P> </P>
<P> <P>
If there is more than one backtracking verb in a pattern, a different one that If there is more than one backtracking verb in a pattern, a different one that
@ -3309,7 +3315,7 @@ as (*COMMIT).
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*PRUNE) or (*THEN). ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
<pre> <pre>
(*SKIP) (*SKIP)
</pre> </pre>
@ -3317,7 +3323,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
pattern is unanchored, the "bumpalong" advance is not to the next character, pattern is unanchored, the "bumpalong" advance is not to the next character,
but to the position in the subject where (*SKIP) was encountered. (*SKIP) but to the position in the subject where (*SKIP) was encountered. (*SKIP)
signifies that whatever text was matched leading up to it cannot be part of a signifies that whatever text was matched leading up to it cannot be part of a
successful match. Consider: successful match if there is a later mismatch. Consider:
<pre> <pre>
a+(*SKIP)b a+(*SKIP)b
</pre> </pre>
@ -3364,7 +3370,7 @@ the second branch of the pattern.
</P> </P>
<P> <P>
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME). names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
<pre> <pre>
(*THEN) or (*THEN:NAME) (*THEN) or (*THEN:NAME)
</pre> </pre>
@ -3383,10 +3389,10 @@ more alternatives, so there is a backtrack to whatever came before the entire
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
</P> </P>
<P> <P>
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
It is like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*PRUNE) and (*THEN). ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
</P> </P>
<P> <P>
A subpattern that does not contain a | character is just a part of the A subpattern that does not contain a | character is just a part of the
@ -3461,13 +3467,14 @@ onto (*COMMIT).
Backtracking verbs in repeated groups Backtracking verbs in repeated groups
</b><br> </b><br>
<P> <P>
PCRE2 differs from Perl in its handling of backtracking verbs in repeated PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
groups. For example, consider: repeated groups. For example, consider:
<pre> <pre>
/(a(*COMMIT)b)+ac/ /(a(*COMMIT)b)+ac/
</pre> </pre>
If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT) If the subject is "abac", Perl matches unless its optimizations are disabled,
in the second repeat of the group acts. but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
acts.
<a name="btassert"></a></P> <a name="btassert"></a></P>
<br><b> <br><b>
Backtracking verbs in assertions Backtracking verbs in assertions
@ -3480,9 +3487,10 @@ subpattern.
</P> </P>
<P> <P>
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
without any further processing; captured strings are retained. In a standalone without any further processing; captured strings and a (*MARK) name (if set)
negative assertion, (*ACCEPT) causes the assertion to fail without any further are retained. In a standalone negative assertion, (*ACCEPT) causes the
processing; captured substrings are discarded. assertion to fail without any further processing; captured substrings and any
(*MARK) name are discarded.
</P> </P>
<P> <P>
If the assertion is a condition, (*ACCEPT) causes the condition to be true for If the assertion is a condition, (*ACCEPT) causes the condition to be true for
@ -3515,16 +3523,16 @@ Backtracking verbs in subroutines
</b><br> </b><br>
<P> <P>
These behaviours occur whether or not the subpattern is called recursively. These behaviours occur whether or not the subpattern is called recursively.
Perl's treatment of subroutines is different in some cases.
</P>
<P>
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
an immediate backtrack.
</P> </P>
<P> <P>
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to (*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the succeed without any further processing. Matching then continues after the
subroutine call. subroutine call. Perl documents this behaviour. Perl's treatment of the other
verbs in subroutines is different in some cases.
</P>
<P>
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
an immediate backtrack.
</P> </P>
<P> <P>
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
@ -3551,7 +3559,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 16 July 2018 Last updated: 20 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -569,7 +569,11 @@ condition if the relevant named group exists.
</P> </P>
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br> <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
<P> <P>
The following act immediately they are reached: All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
if :NAME is present. The others just set a name for passing back to the caller,
but this is not a name that (*SKIP) can see. The following act immediately they
are reached:
<pre> <pre>
(*ACCEPT) force successful match (*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F) (*FAIL) force backtrack; synonym (*F)
@ -582,13 +586,13 @@ pattern is not anchored.
<pre> <pre>
(*COMMIT) overall failure, no advance of starting point (*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character (*PRUNE) advance to next starting character
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
(*SKIP) advance to current matching position (*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier (*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored (*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation (*THEN) local failure, backtrack to next alternation
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) </pre>
</PRE> The effect of one of these verbs in a group called as a subroutine is confined
to the subroutine call.
</P> </P>
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
<P> <P>
@ -617,7 +621,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br> <br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 07 July 2018 Last updated: 21 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -410,10 +410,11 @@ patterns. Modifiers on a pattern can change these settings.
The appearance of this line causes all subsequent modifier settings to be The appearance of this line causes all subsequent modifier settings to be
checked for compatibility with the <b>perltest.sh</b> script, which is used to checked for compatibility with the <b>perltest.sh</b> script, which is used to
confirm that Perl gives the same results as PCRE2. Also, apart from comment confirm that Perl gives the same results as PCRE2. Also, apart from comment
lines, none of the other command lines are permitted, because they and many lines, #pattern commands, and #subject commands that set or unset "mark", no
of the modifiers are specific to <b>pcre2test</b>, and should not be used in command lines are permitted, because they and many of the modifiers are
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b> specific to <b>pcre2test</b>, and should not be used in test files that are also
command helps detect tests that are accidentally put in the wrong file. processed by <b>perltest.sh</b>. The <b>#perltest</b> command helps detect tests
that are accidentally put in the wrong file.
<pre> <pre>
#pop [&#60;modifiers&#62;] #pop [&#60;modifiers&#62;]
#popcopy [&#60;modifiers&#62;] #popcopy [&#60;modifiers&#62;]
@ -2003,7 +2004,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 16 July 2018 Last updated: 21 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -8601,44 +8601,46 @@ BACKTRACKING CONTROL
in the pcre2api documentation. in the pcre2api documentation.
Experiments with Perl suggest that it too has similar optimizations, Experiments with Perl suggest that it too has similar optimizations,
sometimes leading to anomalous results. and like PCRE2, turning them off can change the result of a match.
Verbs that act immediately Verbs that act immediately
The following verbs act as soon as they are encountered. They may not The following verbs act as soon as they are encountered.
be followed by a name.
(*ACCEPT) (*ACCEPT) or (*ACCEPT:NAME)
This verb causes the match to end successfully, skipping the remainder This verb causes the match to end successfully, skipping the remainder
of the pattern. However, when it is inside a subpattern that is called of the pattern. However, when it is inside a subpattern that is called
as a subroutine, only that subpattern is ended successfully. Matching as a subroutine, only that subpattern is ended successfully. Matching
then continues at the outer level. If (*ACCEPT) in triggered in a posi- then continues at the outer level. If (*ACCEPT) in triggered in a posi-
tive assertion, the assertion succeeds; in a negative assertion, the tive assertion, the assertion succeeds; in a negative assertion, the
assertion fails. assertion fails.
If (*ACCEPT) is inside capturing parentheses, the data so far is cap- If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
tured. For example: tured. For example:
A((?:A|B(*ACCEPT)|C)D) A((?:A|B(*ACCEPT)|C)D)
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap- This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
tured by the outer parentheses. tured by the outer parentheses.
(*FAIL) or (*F) (*FAIL) or (*FAIL:NAME)
This verb causes a matching failure, forcing backtracking to occur. It This verb causes a matching failure, forcing backtracking to occur. It
is equivalent to (?!) but easier to read. The Perl documentation notes may be abbreviated to (*F). It is equivalent to (?!) but easier to
that it is probably useful only when combined with (?{}) or (??{}). read. The Perl documentation notes that it is probably useful only when
Those are, of course, Perl features that are not present in PCRE2. The combined with (?{}) or (??{}). Those are, of course, Perl features that
nearest equivalent is the callout feature, as for example in this pat- are not present in PCRE2. The nearest equivalent is the callout fea-
tern: ture, as for example in this pattern:
a+(?C)(*FAIL) a+(?C)(*FAIL)
A match with the string "aaaa" always fails, but the callout is taken A match with the string "aaaa" always fails, but the callout is taken
before each backtrack happens (in this example, 10 times). before each backtrack happens (in this example, 10 times).
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
Recording which path was taken Recording which path was taken
There is one verb whose main purpose is to track how a match was There is one verb whose main purpose is to track how a match was
@ -8659,9 +8661,9 @@ BACKTRACKING CONTROL
cases when (*MARK) is used in conjunction with (*SKIP) as described cases when (*MARK) is used in conjunction with (*SKIP) as described
below.) below.)
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
NAME arguments. Whichever is last on the matching path is passed back. associated NAME arguments. Whichever is last on the matching path is
See below for more details of these other verbs. passed back. See below for more details of these other verbs.
Here is an example of pcre2test output, where the "mark" modifier Here is an example of pcre2test output, where the "mark" modifier
requests the retrieval and outputting of (*MARK) data: requests the retrieval and outputting of (*MARK) data:
@ -8717,22 +8719,26 @@ BACKTRACKING CONTROL
when the verb is not in a subroutine or an assertion. Subsequent sec- when the verb is not in a subroutine or an assertion. Subsequent sec-
tions cover these special cases. tions cover these special cases.
(*COMMIT) (*COMMIT) or (*COMMIT:NAME)
This verb, which may not be followed by a name, causes the whole match This verb causes the whole match to fail outright if there is a later
to fail outright if there is a later matching failure that causes back- matching failure that causes backtracking to reach it. Even if the pat-
tracking to reach it. Even if the pattern is unanchored, no further tern is unanchored, no further attempts to find a match by advancing
attempts to find a match by advancing the starting point take place. If the starting point take place. If (*COMMIT) is the only backtracking
(*COMMIT) is the only backtracking verb that is encountered, once it verb that is encountered, once it has been passed pcre2_match() is com-
has been passed pcre2_match() is committed to finding a match at the mitted to finding a match at the current starting point, or not at all.
current starting point, or not at all. For example: For example:
a+(*COMMIT)b a+(*COMMIT)b
This matches "xxaab" but not "aacaab". It can be thought of as a kind This matches "xxaab" but not "aacaab". It can be thought of as a kind
of dynamic anchor, or "I've started, so I must finish." The name of the of dynamic anchor, or "I've started, so I must finish."
most recently passed (*MARK) in the path is passed back when (*COMMIT)
forces a match failure. The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
MIT). It is like (*MARK:NAME) in that the name is remembered for pass-
ing back to the caller. However, (*SKIP:NAME) searches only for names
set with (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and
(*THEN).
If there is more than one backtracking verb in a pattern, a different If there is more than one backtracking verb in a pattern, a different
one that follows (*COMMIT) may be triggered first, so merely passing one that follows (*COMMIT) may be triggered first, so merely passing
@ -8776,7 +8782,7 @@ BACKTRACKING CONTROL
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
It is like (*MARK:NAME) in that the name is remembered for passing back It is like (*MARK:NAME) in that the name is remembered for passing back
to the caller. However, (*SKIP:NAME) searches only for names set with to the caller. However, (*SKIP:NAME) searches only for names set with
(*MARK), ignoring those set by (*PRUNE) or (*THEN). (*MARK), ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
(*SKIP) (*SKIP)
@ -8784,29 +8790,30 @@ BACKTRACKING CONTROL
the pattern is unanchored, the "bumpalong" advance is not to the next the pattern is unanchored, the "bumpalong" advance is not to the next
character, but to the position in the subject where (*SKIP) was encoun- character, but to the position in the subject where (*SKIP) was encoun-
tered. (*SKIP) signifies that whatever text was matched leading up to tered. (*SKIP) signifies that whatever text was matched leading up to
it cannot be part of a successful match. Consider: it cannot be part of a successful match if there is a later mismatch.
Consider:
a+(*SKIP)b a+(*SKIP)b
If the subject is "aaaac...", after the first match attempt fails If the subject is "aaaac...", after the first match attempt fails
(starting at the first character in the string), the starting point (starting at the first character in the string), the starting point
skips on to start the next attempt at "c". Note that a possessive quan- skips on to start the next attempt at "c". Note that a possessive quan-
tifer does not have the same effect as this example; although it would tifer does not have the same effect as this example; although it would
suppress backtracking during the first match attempt, the second suppress backtracking during the first match attempt, the second
attempt would start at the second character instead of skipping on to attempt would start at the second character instead of skipping on to
"c". "c".
(*SKIP:NAME) (*SKIP:NAME)
When (*SKIP) has an associated name, its behaviour is modified. When When (*SKIP) has an associated name, its behaviour is modified. When
such a (*SKIP) is triggered, the previous path through the pattern is such a (*SKIP) is triggered, the previous path through the pattern is
searched for the most recent (*MARK) that has the same name. If one is searched for the most recent (*MARK) that has the same name. If one is
found, the "bumpalong" advance is to the subject position that corre- found, the "bumpalong" advance is to the subject position that corre-
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
no (*MARK) with a matching name is found, the (*SKIP) is ignored. no (*MARK) with a matching name is found, the (*SKIP) is ignored.
The search for a (*MARK) name uses the normal backtracking mechanism, The search for a (*MARK) name uses the normal backtracking mechanism,
which means that it does not see (*MARK) settings that are inside which means that it does not see (*MARK) settings that are inside
atomic groups or assertions, because they are never re-entered by back- atomic groups or assertions, because they are never re-entered by back-
tracking. Compare the following pcre2test examples: tracking. Compare the following pcre2test examples:
@ -8820,18 +8827,19 @@ BACKTRACKING CONTROL
0: b 0: b
1: b 1: b
In the first example, the (*MARK) setting is in an atomic group, so it In the first example, the (*MARK) setting is in an atomic group, so it
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
This allows the second branch of the pattern to be tried at the first This allows the second branch of the pattern to be tried at the first
character position. In the second example, the (*MARK) setting is not character position. In the second example, the (*MARK) setting is not
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
backtracks, and this causes a new matching attempt to start at the sec- backtracks, and this causes a new matching attempt to start at the sec-
ond character. This time, the (*MARK) is never seen because "a" does ond character. This time, the (*MARK) is never seen because "a" does
not match "b", so the matcher immediately jumps to the second branch of not match "b", so the matcher immediately jumps to the second branch of
the pattern. the pattern.
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME). ignores names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or
(*THEN:NAME).
(*THEN) or (*THEN:NAME) (*THEN) or (*THEN:NAME)
@ -8850,87 +8858,87 @@ BACKTRACKING CONTROL
track to whatever came before the entire group. If (*THEN) is not track to whatever came before the entire group. If (*THEN) is not
inside an alternation, it acts like (*PRUNE). inside an alternation, it acts like (*PRUNE).
The behaviour of (*THEN:NAME) is the not the same as The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
(*MARK:NAME)(*THEN). It is like (*MARK:NAME) in that the name is It is like (*MARK:NAME) in that the name is remembered for passing back
remembered for passing back to the caller. However, (*SKIP:NAME) to the caller. However, (*SKIP:NAME) searches only for names set with
searches only for names set with (*MARK), ignoring those set by (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
(*PRUNE) and (*THEN).
A subpattern that does not contain a | character is just a part of the A subpattern that does not contain a | character is just a part of the
enclosing alternative; it is not a nested alternation with only one enclosing alternative; it is not a nested alternation with only one
alternative. The effect of (*THEN) extends beyond such a subpattern to alternative. The effect of (*THEN) extends beyond such a subpattern to
the enclosing alternative. Consider this pattern, where A, B, etc. are the enclosing alternative. Consider this pattern, where A, B, etc. are
complex pattern fragments that do not contain any | characters at this complex pattern fragments that do not contain any | characters at this
level: level:
A (B(*THEN)C) | D A (B(*THEN)C) | D
If A and B are matched, but there is a failure in C, matching does not If A and B are matched, but there is a failure in C, matching does not
backtrack into A; instead it moves to the next alternative, that is, D. backtrack into A; instead it moves to the next alternative, that is, D.
However, if the subpattern containing (*THEN) is given an alternative, However, if the subpattern containing (*THEN) is given an alternative,
it behaves differently: it behaves differently:
A (B(*THEN)C | (*FAIL)) | D A (B(*THEN)C | (*FAIL)) | D
The effect of (*THEN) is now confined to the inner subpattern. After a The effect of (*THEN) is now confined to the inner subpattern. After a
failure in C, matching moves to (*FAIL), which causes the whole subpat- failure in C, matching moves to (*FAIL), which causes the whole subpat-
tern to fail because there are no more alternatives to try. In this tern to fail because there are no more alternatives to try. In this
case, matching does now backtrack into A. case, matching does now backtrack into A.
Note that a conditional subpattern is not considered as having two Note that a conditional subpattern is not considered as having two
alternatives, because only one is ever used. In other words, the | alternatives, because only one is ever used. In other words, the |
character in a conditional subpattern has a different meaning. Ignoring character in a conditional subpattern has a different meaning. Ignoring
white space, consider: white space, consider:
^.*? (?(?=a) a | b(*THEN)c ) ^.*? (?(?=a) a | b(*THEN)c )
If the subject is "ba", this pattern does not match. Because .*? is If the subject is "ba", this pattern does not match. Because .*? is
ungreedy, it initially matches zero characters. The condition (?=a) ungreedy, it initially matches zero characters. The condition (?=a)
then fails, the character "b" is matched, but "c" is not. At this then fails, the character "b" is matched, but "c" is not. At this
point, matching does not backtrack to .*? as might perhaps be expected point, matching does not backtrack to .*? as might perhaps be expected
from the presence of the | character. The conditional subpattern is from the presence of the | character. The conditional subpattern is
part of the single alternative that comprises the whole pattern, and so part of the single alternative that comprises the whole pattern, and so
the match fails. (If there was a backtrack into .*?, allowing it to the match fails. (If there was a backtrack into .*?, allowing it to
match "b", the match would succeed.) match "b", the match would succeed.)
The verbs just described provide four different "strengths" of control The verbs just described provide four different "strengths" of control
when subsequent matching fails. (*THEN) is the weakest, carrying on the when subsequent matching fails. (*THEN) is the weakest, carrying on the
match at the next alternative. (*PRUNE) comes next, failing the match match at the next alternative. (*PRUNE) comes next, failing the match
at the current starting position, but allowing an advance to the next at the current starting position, but allowing an advance to the next
character (for an unanchored pattern). (*SKIP) is similar, except that character (for an unanchored pattern). (*SKIP) is similar, except that
the advance may be more than one character. (*COMMIT) is the strongest, the advance may be more than one character. (*COMMIT) is the strongest,
causing the entire match to fail. causing the entire match to fail.
More than one backtracking verb More than one backtracking verb
If more than one backtracking verb is present in a pattern, the one If more than one backtracking verb is present in a pattern, the one
that is backtracked onto first acts. For example, consider this pat- that is backtracked onto first acts. For example, consider this pat-
tern, where A, B, etc. are complex pattern fragments: tern, where A, B, etc. are complex pattern fragments:
(A(*COMMIT)B(*THEN)C|ABD) (A(*COMMIT)B(*THEN)C|ABD)
If A matches but B fails, the backtrack to (*COMMIT) causes the entire If A matches but B fails, the backtrack to (*COMMIT) causes the entire
match to fail. However, if A and B match, but C fails, the backtrack to match to fail. However, if A and B match, but C fails, the backtrack to
(*THEN) causes the next alternative (ABD) to be tried. This behaviour (*THEN) causes the next alternative (ABD) to be tried. This behaviour
is consistent, but is not always the same as Perl's. It means that if is consistent, but is not always the same as Perl's. It means that if
two or more backtracking verbs appear in succession, all the the last two or more backtracking verbs appear in succession, all the the last
of them has no effect. Consider this example: of them has no effect. Consider this example:
...(*COMMIT)(*PRUNE)... ...(*COMMIT)(*PRUNE)...
If there is a matching failure to the right, backtracking onto (*PRUNE) If there is a matching failure to the right, backtracking onto (*PRUNE)
causes it to be triggered, and its action is taken. There can never be causes it to be triggered, and its action is taken. There can never be
a backtrack onto (*COMMIT). a backtrack onto (*COMMIT).
Backtracking verbs in repeated groups Backtracking verbs in repeated groups
PCRE2 differs from Perl in its handling of backtracking verbs in PCRE2 sometimes differs from Perl in its handling of backtracking verbs
repeated groups. For example, consider: in repeated groups. For example, consider:
/(a(*COMMIT)b)+ac/ /(a(*COMMIT)b)+ac/
If the subject is "abac", Perl matches, but PCRE2 fails because the If the subject is "abac", Perl matches unless its optimizations are
(*COMMIT) in the second repeat of the group acts. disabled, but PCRE2 always fails because the (*COMMIT) in the second
repeat of the group acts.
Backtracking verbs in assertions Backtracking verbs in assertions
@ -8940,44 +8948,46 @@ BACKTRACKING CONTROL
in a conditional subpattern. in a conditional subpattern.
(*ACCEPT) in a standalone positive assertion causes the assertion to (*ACCEPT) in a standalone positive assertion causes the assertion to
succeed without any further processing; captured strings are retained. succeed without any further processing; captured strings and a (*MARK)
In a standalone negative assertion, (*ACCEPT) causes the assertion to name (if set) are retained. In a standalone negative assertion,
fail without any further processing; captured substrings are discarded. (*ACCEPT) causes the assertion to fail without any further processing;
captured substrings and any (*MARK) name are discarded.
If the assertion is a condition, (*ACCEPT) causes the condition to be If the assertion is a condition, (*ACCEPT) causes the condition to be
true for a positive assertion and false for a negative one; captured true for a positive assertion and false for a negative one; captured
substrings are retained in both cases. substrings are retained in both cases.
The remaining verbs act only when a later failure causes a backtrack to The remaining verbs act only when a later failure causes a backtrack to
reach them. This means that their effect is confined to the assertion, reach them. This means that their effect is confined to the assertion,
because lookaround assertions are atomic. A backtrack that occurs after because lookaround assertions are atomic. A backtrack that occurs after
an assertion is complete does not jump back into the assertion. Note in an assertion is complete does not jump back into the assertion. Note in
particular that a (*MARK) name that is set in an assertion is not particular that a (*MARK) name that is set in an assertion is not
"seen" by an instance of (*SKIP:NAME) latter in the pattern. "seen" by an instance of (*SKIP:NAME) latter in the pattern.
The effect of (*THEN) is not allowed to escape beyond an assertion. If The effect of (*THEN) is not allowed to escape beyond an assertion. If
there are no more branches to try, (*THEN) causes a positive assertion there are no more branches to try, (*THEN) causes a positive assertion
to be false, and a negative assertion to be true. to be false, and a negative assertion to be true.
The other backtracking verbs are not treated specially if they appear The other backtracking verbs are not treated specially if they appear
in a standalone positive assertion. In a conditional positive asser- in a standalone positive assertion. In a conditional positive asser-
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP), tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
or (*PRUNE) causes the condition to be false. However, for both stand- or (*PRUNE) causes the condition to be false. However, for both stand-
alone and conditional negative assertions, backtracking into (*COMMIT), alone and conditional negative assertions, backtracking into (*COMMIT),
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider- (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
ing any further alternative branches. ing any further alternative branches.
Backtracking verbs in subroutines Backtracking verbs in subroutines
These behaviours occur whether or not the subpattern is called recur- These behaviours occur whether or not the subpattern is called recur-
sively. Perl's treatment of subroutines is different in some cases. sively.
(*FAIL) in a subpattern called as a subroutine has its normal effect:
it forces an immediate backtrack.
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine (*ACCEPT) in a subpattern called as a subroutine causes the subroutine
match to succeed without any further processing. Matching then contin- match to succeed without any further processing. Matching then contin-
ues after the subroutine call. ues after the subroutine call. Perl documents this behaviour. Perl's
treatment of the other verbs in subroutines is different in some cases.
(*FAIL) in a subpattern called as a subroutine has its normal effect:
it forces an immediate backtrack.
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
cause the subroutine match to fail. cause the subroutine match to fail.
@ -9002,7 +9012,7 @@ AUTHOR
REVISION REVISION
Last updated: 16 July 2018 Last updated: 20 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -10226,7 +10236,11 @@ CONDITIONAL PATTERNS
BACKTRACKING CONTROL BACKTRACKING CONTROL
The following act immediately they are reached: All backtracking control verbs may be in the form (*VERB:NAME). For
(*MARK) the name is mandatory, for the others it is optional. (*SKIP)
changes its behaviour if :NAME is present. The others just set a name
for passing back to the caller, but this is not a name that (*SKIP) can
see. The following act immediately they are reached:
(*ACCEPT) force successful match (*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F) (*FAIL) force backtrack; synonym (*F)
@ -10239,12 +10253,13 @@ BACKTRACKING CONTROL
(*COMMIT) overall failure, no advance of starting point (*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character (*PRUNE) advance to next starting character
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
(*SKIP) advance to current matching position (*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier (*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored (*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation (*THEN) local failure, backtrack to next alternation
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
The effect of one of these verbs in a group called as a subroutine is
confined to the subroutine call.
CALLOUTS CALLOUTS
@ -10254,14 +10269,14 @@ CALLOUTS
(?C"text") callout with string data (?C"text") callout with string data
The allowed string delimiters are ` ' " ^ % # $ (which are the same for The allowed string delimiters are ` ' " ^ % # $ (which are the same for
the start and the end), and the starting delimiter { matched with the the start and the end), and the starting delimiter { matched with the
ending delimiter }. To encode the ending delimiter within the string, ending delimiter }. To encode the ending delimiter within the string,
double it. double it.
SEE ALSO SEE ALSO
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
pcre2(3). pcre2(3).
@ -10274,7 +10289,7 @@ AUTHOR
REVISION REVISION
Last updated: 07 July 2018 Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "16 July 2018" "PCRE2 10.32" .TH PCRE2PATTERN 3 "20 July 2018" "PCRE2 10.32"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -3154,17 +3154,16 @@ in the
.\" .\"
documentation. documentation.
.P .P
Experiments with Perl suggest that it too has similar optimizations, sometimes Experiments with Perl suggest that it too has similar optimizations, and like
leading to anomalous results. PCRE2, turning them off can change the result of a match.
. .
. .
.SS "Verbs that act immediately" .SS "Verbs that act immediately"
.rs .rs
.sp .sp
The following verbs act as soon as they are encountered. They may not be The following verbs act as soon as they are encountered.
followed by a name.
.sp .sp
(*ACCEPT) (*ACCEPT) or (*ACCEPT:NAME)
.sp .sp
This verb causes the match to end successfully, skipping the remainder of the This verb causes the match to end successfully, skipping the remainder of the
pattern. However, when it is inside a subpattern that is called as a pattern. However, when it is inside a subpattern that is called as a
@ -3180,18 +3179,21 @@ example:
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
the outer parentheses. the outer parentheses.
.sp .sp
(*FAIL) or (*F) (*FAIL) or (*FAIL:NAME)
.sp .sp
This verb causes a matching failure, forcing backtracking to occur. It is This verb causes a matching failure, forcing backtracking to occur. It may be
equivalent to (?!) but easier to read. The Perl documentation notes that it is abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
probably useful only when combined with (?{}) or (??{}). Those are, of course, documentation notes that it is probably useful only when combined with (?{}) or
Perl features that are not present in PCRE2. The nearest equivalent is the (??{}). Those are, of course, Perl features that are not present in PCRE2. The
callout feature, as for example in this pattern: nearest equivalent is the callout feature, as for example in this pattern:
.sp .sp
a+(?C)(*FAIL) a+(?C)(*FAIL)
.sp .sp
A match with the string "aaaa" always fails, but the callout is taken before A match with the string "aaaa" always fails, but the callout is taken before
each backtrack happens (in this example, 10 times). each backtrack happens (in this example, 10 times).
.P
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
. .
. .
.SS "Recording which path was taken" .SS "Recording which path was taken"
@ -3220,9 +3222,9 @@ documentation. This applies to all instances of (*MARK), including those inside
assertions and atomic groups. (There are differences in those cases when assertions and atomic groups. (There are differences in those cases when
(*MARK) is used in conjunction with (*SKIP) as described below.) (*MARK) is used in conjunction with (*SKIP) as described below.)
.P .P
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
arguments. Whichever is last on the matching path is passed back. See below for associated NAME arguments. Whichever is last on the matching path is passed
more details of these other verbs. back. See below for more details of these other verbs.
.P .P
Here is an example of \fBpcre2test\fP output, where the "mark" modifier Here is an example of \fBpcre2test\fP output, where the "mark" modifier
requests the retrieval and outputting of (*MARK) data: requests the retrieval and outputting of (*MARK) data:
@ -3282,22 +3284,24 @@ reaches them. The behaviour described below is what happens when the verb is
not in a subroutine or an assertion. Subsequent sections cover these special not in a subroutine or an assertion. Subsequent sections cover these special
cases. cases.
.sp .sp
(*COMMIT) (*COMMIT) or (*COMMIT:NAME)
.sp .sp
This verb, which may not be followed by a name, causes the whole match to fail This verb causes the whole match to fail outright if there is a later matching
outright if there is a later matching failure that causes backtracking to reach failure that causes backtracking to reach it. Even if the pattern is
it. Even if the pattern is unanchored, no further attempts to find a match by unanchored, no further attempts to find a match by advancing the starting point
advancing the starting point take place. If (*COMMIT) is the only backtracking take place. If (*COMMIT) is the only backtracking verb that is encountered,
verb that is encountered, once it has been passed \fBpcre2_match()\fP is once it has been passed \fBpcre2_match()\fP is committed to finding a match at
committed to finding a match at the current starting point, or not at all. For the current starting point, or not at all. For example:
example:
.sp .sp
a+(*COMMIT)b a+(*COMMIT)b
.sp .sp
This matches "xxaab" but not "aacaab". It can be thought of as a kind of This matches "xxaab" but not "aacaab". It can be thought of as a kind of
dynamic anchor, or "I've started, so I must finish." The name of the most dynamic anchor, or "I've started, so I must finish."
recently passed (*MARK) in the path is passed back when (*COMMIT) forces a .P
match failure. The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
.P .P
If there is more than one backtracking verb in a pattern, a different one that If there is more than one backtracking verb in a pattern, a different one that
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
@ -3338,7 +3342,7 @@ as (*COMMIT).
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*PRUNE) or (*THEN). ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
.sp .sp
(*SKIP) (*SKIP)
.sp .sp
@ -3346,7 +3350,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
pattern is unanchored, the "bumpalong" advance is not to the next character, pattern is unanchored, the "bumpalong" advance is not to the next character,
but to the position in the subject where (*SKIP) was encountered. (*SKIP) but to the position in the subject where (*SKIP) was encountered. (*SKIP)
signifies that whatever text was matched leading up to it cannot be part of a signifies that whatever text was matched leading up to it cannot be part of a
successful match. Consider: successful match if there is a later mismatch. Consider:
.sp .sp
a+(*SKIP)b a+(*SKIP)b
.sp .sp
@ -3391,7 +3395,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
the second branch of the pattern. the second branch of the pattern.
.P .P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*PRUNE:NAME) or (*THEN:NAME). names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
.sp .sp
(*THEN) or (*THEN:NAME) (*THEN) or (*THEN:NAME)
.sp .sp
@ -3409,10 +3413,10 @@ succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
more alternatives, so there is a backtrack to whatever came before the entire more alternatives, so there is a backtrack to whatever came before the entire
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
.P .P
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
It is like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*PRUNE) and (*THEN). ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
.P .P
A subpattern that does not contain a | character is just a part of the A subpattern that does not contain a | character is just a part of the
enclosing alternative; it is not a nested alternation with only one enclosing alternative; it is not a nested alternation with only one
@ -3485,13 +3489,14 @@ onto (*COMMIT).
.SS "Backtracking verbs in repeated groups" .SS "Backtracking verbs in repeated groups"
.rs .rs
.sp .sp
PCRE2 differs from Perl in its handling of backtracking verbs in repeated PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
groups. For example, consider: repeated groups. For example, consider:
.sp .sp
/(a(*COMMIT)b)+ac/ /(a(*COMMIT)b)+ac/
.sp .sp
If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT) If the subject is "abac", Perl matches unless its optimizations are disabled,
in the second repeat of the group acts. but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
acts.
. .
. .
.\" HTML <a name="btassert"></a> .\" HTML <a name="btassert"></a>
@ -3504,9 +3509,10 @@ not the assertion is standalone or acting as the condition in a conditional
subpattern. subpattern.
.P .P
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
without any further processing; captured strings are retained. In a standalone without any further processing; captured strings and a (*MARK) name (if set)
negative assertion, (*ACCEPT) causes the assertion to fail without any further are retained. In a standalone negative assertion, (*ACCEPT) causes the
processing; captured substrings are discarded. assertion to fail without any further processing; captured substrings and any
(*MARK) name are discarded.
.P .P
If the assertion is a condition, (*ACCEPT) causes the condition to be true for If the assertion is a condition, (*ACCEPT) causes the condition to be true for
a positive assertion and false for a negative one; captured substrings are a positive assertion and false for a negative one; captured substrings are
@ -3536,14 +3542,14 @@ the assertion to be true, without considering any further alternative branches.
.rs .rs
.sp .sp
These behaviours occur whether or not the subpattern is called recursively. These behaviours occur whether or not the subpattern is called recursively.
Perl's treatment of subroutines is different in some cases.
.P
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
an immediate backtrack.
.P .P
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to (*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the succeed without any further processing. Matching then continues after the
subroutine call. subroutine call. Perl documents this behaviour. Perl's treatment of the other
verbs in subroutines is different in some cases.
.P
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
an immediate backtrack.
.P .P
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
the subroutine match to fail. the subroutine match to fail.
@ -3574,6 +3580,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 16 July 2018 Last updated: 20 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32" .TH PCRE2SYNTAX 3 "21 July 2018" "PCRE2 10.32"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -410,8 +410,6 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?>...) atomic, non-capturing group (?>...) atomic, non-capturing group
. .
. .
.
.
.SH "COMMENT" .SH "COMMENT"
.rs .rs
.sp .sp
@ -552,7 +550,11 @@ condition if the relevant named group exists.
.SH "BACKTRACKING CONTROL" .SH "BACKTRACKING CONTROL"
.rs .rs
.sp .sp
The following act immediately they are reached: All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
if :NAME is present. The others just set a name for passing back to the caller,
but this is not a name that (*SKIP) can see. The following act immediately they
are reached:
.sp .sp
(*ACCEPT) force successful match (*ACCEPT) force successful match
(*FAIL) force backtrack; synonym (*F) (*FAIL) force backtrack; synonym (*F)
@ -565,12 +567,13 @@ pattern is not anchored.
.sp .sp
(*COMMIT) overall failure, no advance of starting point (*COMMIT) overall failure, no advance of starting point
(*PRUNE) advance to next starting character (*PRUNE) advance to next starting character
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
(*SKIP) advance to current matching position (*SKIP) advance to current matching position
(*SKIP:NAME) advance to position corresponding to an earlier (*SKIP:NAME) advance to position corresponding to an earlier
(*MARK:NAME); if not found, the (*SKIP) is ignored (*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN) local failure, backtrack to next alternation (*THEN) local failure, backtrack to next alternation
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) .sp
The effect of one of these verbs in a group called as a subroutine is confined
to the subroutine call.
. .
. .
.SH "CALLOUTS" .SH "CALLOUTS"
@ -606,6 +609,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 07 July 2018 Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "16 July 2018" "PCRE 10.32" .TH PCRE2TEST 1 "21 July 2018" "PCRE 10.32"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -360,10 +360,11 @@ patterns. Modifiers on a pattern can change these settings.
The appearance of this line causes all subsequent modifier settings to be The appearance of this line causes all subsequent modifier settings to be
checked for compatibility with the \fBperltest.sh\fP script, which is used to checked for compatibility with the \fBperltest.sh\fP script, which is used to
confirm that Perl gives the same results as PCRE2. Also, apart from comment confirm that Perl gives the same results as PCRE2. Also, apart from comment
lines, none of the other command lines are permitted, because they and many lines, #pattern commands, and #subject commands that set or unset "mark", no
of the modifiers are specific to \fBpcre2test\fP, and should not be used in command lines are permitted, because they and many of the modifiers are
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP specific to \fBpcre2test\fP, and should not be used in test files that are also
command helps detect tests that are accidentally put in the wrong file. processed by \fBperltest.sh\fP. The \fB#perltest\fP command helps detect tests
that are accidentally put in the wrong file.
.sp .sp
#pop [<modifiers>] #pop [<modifiers>]
#popcopy [<modifiers>] #popcopy [<modifiers>]
@ -1981,6 +1982,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 16 July 2018 Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -344,11 +344,11 @@ COMMAND LINES
The appearance of this line causes all subsequent modifier settings to The appearance of this line causes all subsequent modifier settings to
be checked for compatibility with the perltest.sh script, which is used be checked for compatibility with the perltest.sh script, which is used
to confirm that Perl gives the same results as PCRE2. Also, apart from to confirm that Perl gives the same results as PCRE2. Also, apart from
comment lines, none of the other command lines are permitted, because comment lines, #pattern commands, and #subject commands that set or
they and many of the modifiers are specific to pcre2test, and should unset "mark", no command lines are permitted, because they and many of
not be used in test files that are also processed by perltest.sh. The the modifiers are specific to pcre2test, and should not be used in test
#perltest command helps detect tests that are accidentally put in the files that are also processed by perltest.sh. The #perltest command
wrong file. helps detect tests that are accidentally put in the wrong file.
#pop [<modifiers>] #pop [<modifiers>]
#popcopy [<modifiers>] #popcopy [<modifiers>]
@ -1818,5 +1818,5 @@ AUTHOR
REVISION REVISION
Last updated: 16 July 2018 Last updated: 21 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.

View File

@ -45,17 +45,19 @@ fi
# jitstack ignored # jitstack ignored
# mark show mark information # mark show mark information
# no_auto_possess ignored # no_auto_possess ignored
# no_start_optimize insert ({""}) at pattern start (disable Perl optimizing) # no_start_optimize insert (??{""}) at pattern start (disables optimizing)
# subject_literal does not process subjects for escapes # subject_literal does not process subjects for escapes
# ucp sets Perl's /u modifier # ucp sets Perl's /u modifier
# utf invoke UTF-8 functionality # utf invoke UTF-8 functionality
# #
# Comment lines are ignored. The #pattern command can be used to set modifiers # Comment lines are ignored. The #pattern command can be used to set modifiers
# that will be added to each subsequent pattern. NOTE: this is different to # that will be added to each subsequent pattern, after any modifiers it may
# pcre2test where #pattern sets defaults, some of which can be overridden on # already have. NOTE: this is different to pcre2test where #pattern sets
# individual patterns. The #perltest, #forbid_utf, and #newline_default # defaults which can be overridden on individual patterns. The #subject command
# commands, which are needed in the relevant pcre2test files, are ignored. Any # may be used to set or unset a default "mark" modifier for data lines. This is
# other #-command is ignored, with a warning message. # the only use of #subject that is supported. The #perltest, #forbid_utf, and
# #newline_default commands, which are needed in the relevant pcre2test files,
# are ignored. Any other #-command is ignored, with a warning message.
# #
# The data lines must not have any pcre2test modifiers. Unless # The data lines must not have any pcre2test modifiers. Unless
# "subject_literal" is on the pattern, data lines are processed as # "subject_literal" is on the pattern, data lines are processed as
@ -146,6 +148,22 @@ for (;;)
$extra_modifiers =~ s/\s+$//; $extra_modifiers =~ s/\s+$//;
next; next;
} }
elsif ($_ =~ /^#subject(.*)/)
{
$mod = $1;
chomp($mod);
$mod =~ s/\s+$//;
if ($mod =~ s/(-?)mark,?//)
{
$minus = $1;
$default_show_mark = ($minus =~ /^$/);
}
if ($mod !~ /^\s*$/)
{
printf $outfile "** Warning: \"$mod\" in #subject ignored\n";
}
next;
}
elsif ($_ =~ /^#/) elsif ($_ =~ /^#/)
{ {
if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/) if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)
@ -172,9 +190,9 @@ for (;;)
$pattern =~ /^\s*((.).*\2)(.*)$/s; $pattern =~ /^\s*((.).*\2)(.*)$/s;
$pat = $1; $pat = $1;
$del = $2;
$mod = "$3,$extra_modifiers"; $mod = "$3,$extra_modifiers";
$mod =~ s/^,\s*//; $mod =~ s/^,\s*//;
$del = $2;
# The private "aftertext" modifier means "print $' afterwards". # The private "aftertext" modifier means "print $' afterwards".
@ -202,7 +220,7 @@ for (;;)
# The "mark" modifier requests checking of MARK data */ # The "mark" modifier requests checking of MARK data */
$show_mark = ($mod =~ s/mark,?//); $show_mark = $default_show_mark | ($mod =~ s/mark,?//);
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl # "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl

View File

@ -281,6 +281,7 @@ pcre2_pattern_convert(). */
#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156 #define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156
#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157 #define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157
#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158 #define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158
/* Error 159 is obsolete and should now never occur */
#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159 #define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159
#define PCRE2_ERROR_VERB_UNKNOWN 160 #define PCRE2_ERROR_VERB_UNKNOWN 160
#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161 #define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2017 University of Cambridge New API code Copyright (c) 2016-2018 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -250,34 +250,35 @@ is present where expected in a conditional group. */
#define META_LOOKBEHINDNOT 0x80250000u /* (?<! */ #define META_LOOKBEHINDNOT 0x80250000u /* (?<! */
/* These must be kept in this order, with consecutive values, and the _ARG /* These must be kept in this order, with consecutive values, and the _ARG
versions of PRUNE, SKIP, and THEN immediately after their non-argument versions of COMMIT, PRUNE, SKIP, and THEN immediately after their non-argument
versions. */ versions. */
#define META_MARK 0x80260000u /* (*MARK) */ #define META_MARK 0x80260000u /* (*MARK) */
#define META_ACCEPT 0x80270000u /* (*ACCEPT) */ #define META_ACCEPT 0x80270000u /* (*ACCEPT) */
#define META_COMMIT 0x80280000u /* (*COMMIT) */ #define META_FAIL 0x80280000u /* (*FAIL) */
#define META_FAIL 0x80290000u /* (*FAIL) */ #define META_COMMIT 0x80290000u /* These */
#define META_PRUNE 0x802a0000u /* These pairs must */ #define META_COMMIT_ARG 0x802a0000u /* pairs */
#define META_PRUNE_ARG 0x802b0000u /* be */ #define META_PRUNE 0x802b0000u /* must */
#define META_SKIP 0x802c0000u /* kept */ #define META_PRUNE_ARG 0x802c0000u /* be */
#define META_SKIP_ARG 0x802d0000u /* in */ #define META_SKIP 0x802d0000u /* kept */
#define META_THEN 0x802e0000u /* this */ #define META_SKIP_ARG 0x802e0000u /* in */
#define META_THEN_ARG 0x802f0000u /* order */ #define META_THEN 0x802f0000u /* this */
#define META_THEN_ARG 0x80300000u /* order */
/* These must be kept in groups of adjacent 3 values, and all together. */ /* These must be kept in groups of adjacent 3 values, and all together. */
#define META_ASTERISK 0x80300000u /* * */ #define META_ASTERISK 0x80310000u /* * */
#define META_ASTERISK_PLUS 0x80310000u /* *+ */ #define META_ASTERISK_PLUS 0x80320000u /* *+ */
#define META_ASTERISK_QUERY 0x80320000u /* *? */ #define META_ASTERISK_QUERY 0x80330000u /* *? */
#define META_PLUS 0x80330000u /* + */ #define META_PLUS 0x80340000u /* + */
#define META_PLUS_PLUS 0x80340000u /* ++ */ #define META_PLUS_PLUS 0x80350000u /* ++ */
#define META_PLUS_QUERY 0x80350000u /* +? */ #define META_PLUS_QUERY 0x80360000u /* +? */
#define META_QUERY 0x80360000u /* ? */ #define META_QUERY 0x80370000u /* ? */
#define META_QUERY_PLUS 0x80370000u /* ?+ */ #define META_QUERY_PLUS 0x80380000u /* ?+ */
#define META_QUERY_QUERY 0x80380000u /* ?? */ #define META_QUERY_QUERY 0x80390000u /* ?? */
#define META_MINMAX 0x80390000u /* {n,m} repeat */ #define META_MINMAX 0x803a0000u /* {n,m} repeat */
#define META_MINMAX_PLUS 0x803a0000u /* {n,m}+ repeat */ #define META_MINMAX_PLUS 0x803b0000u /* {n,m}+ repeat */
#define META_MINMAX_QUERY 0x803b0000u /* {n,m}? repeat */ #define META_MINMAX_QUERY 0x803c0000u /* {n,m}? repeat */
#define META_FIRST_QUANTIFIER META_ASTERISK #define META_FIRST_QUANTIFIER META_ASTERISK
#define META_LAST_QUANTIFIER META_MINMAX_QUERY #define META_LAST_QUANTIFIER META_MINMAX_QUERY
@ -327,8 +328,9 @@ static unsigned char meta_extra_lengths[] = {
SIZEOFFSET, /* META_LOOKBEHINDNOT */ SIZEOFFSET, /* META_LOOKBEHINDNOT */
1, /* META_MARK - plus the string length */ 1, /* META_MARK - plus the string length */
0, /* META_ACCEPT */ 0, /* META_ACCEPT */
0, /* META_COMMIT */
0, /* META_FAIL */ 0, /* META_FAIL */
0, /* META_COMMIT */
1, /* META_COMMIT_ARG - plus the string length */
0, /* META_PRUNE */ 0, /* META_PRUNE */
1, /* META_PRUNE_ARG - plus the string length */ 1, /* META_PRUNE_ARG - plus the string length */
0, /* META_SKIP */ 0, /* META_SKIP */
@ -586,9 +588,9 @@ static const char verbnames[] =
"\0" /* Empty name is a shorthand for MARK */ "\0" /* Empty name is a shorthand for MARK */
STRING_MARK0 STRING_MARK0
STRING_ACCEPT0 STRING_ACCEPT0
STRING_COMMIT0
STRING_F0 STRING_F0
STRING_FAIL0 STRING_FAIL0
STRING_COMMIT0
STRING_PRUNE0 STRING_PRUNE0
STRING_SKIP0 STRING_SKIP0
STRING_THEN; STRING_THEN;
@ -596,11 +598,11 @@ static const char verbnames[] =
static const verbitem verbs[] = { static const verbitem verbs[] = {
{ 0, META_MARK, +1 }, /* > 0 => must have an argument */ { 0, META_MARK, +1 }, /* > 0 => must have an argument */
{ 4, META_MARK, +1 }, { 4, META_MARK, +1 },
{ 6, META_ACCEPT, -1 }, /* < 0 => must not have an argument */ { 6, META_ACCEPT, -1 }, /* < 0 => Optional argument, convert to pre-MARK */
{ 6, META_COMMIT, -1 },
{ 1, META_FAIL, -1 }, { 1, META_FAIL, -1 },
{ 4, META_FAIL, -1 }, { 4, META_FAIL, -1 },
{ 5, META_PRUNE, 0 }, /* Argument is optional; bump META code if found */ { 6, META_COMMIT, 0 },
{ 5, META_PRUNE, 0 }, /* Optional argument; bump META code if found */
{ 4, META_SKIP, 0 }, { 4, META_SKIP, 0 },
{ 4, META_THEN, 0 } { 4, META_THEN, 0 }
}; };
@ -610,8 +612,8 @@ static const int verbcount = sizeof(verbs)/sizeof(verbitem);
/* Verb opcodes, indexed by their META code offset from META_MARK. */ /* Verb opcodes, indexed by their META code offset from META_MARK. */
static const uint32_t verbops[] = { static const uint32_t verbops[] = {
OP_MARK, OP_ACCEPT, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_PRUNE_ARG, OP_SKIP, OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
OP_SKIP_ARG, OP_THEN, OP_THEN_ARG }; OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */ /* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
@ -976,8 +978,8 @@ for (;;)
case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break; case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break;
case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break; case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break;
case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
case META_FAIL: fprintf(stderr, "META (*FAIL)"); break; case META_FAIL: fprintf(stderr, "META (*FAIL)"); break;
case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break; case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break;
case META_SKIP: fprintf(stderr, "META (*SKIP)"); break; case META_SKIP: fprintf(stderr, "META (*SKIP)"); break;
case META_THEN: fprintf(stderr, "META (*THEN)"); break; case META_THEN: fprintf(stderr, "META (*THEN)"); break;
@ -1067,6 +1069,10 @@ for (;;)
fprintf(stderr, "META (*MARK:"); fprintf(stderr, "META (*MARK:");
goto SHOWARG; goto SHOWARG;
case META_COMMIT_ARG:
fprintf(stderr, "META (*COMMIT:");
goto SHOWARG;
case META_PRUNE_ARG: case META_PRUNE_ARG:
fprintf(stderr, "META (*PRUNE:"); fprintf(stderr, "META (*PRUNE:");
goto SHOWARG; goto SHOWARG;
@ -2290,6 +2296,7 @@ uint32_t *previous_callout = NULL;
uint32_t *parsed_pattern = cb->parsed_pattern; uint32_t *parsed_pattern = cb->parsed_pattern;
uint32_t *parsed_pattern_end = cb->parsed_pattern_end; uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
uint32_t meta_quantifier = 0; uint32_t meta_quantifier = 0;
uint32_t add_after_mark = 0;
uint16_t nest_depth = 0; uint16_t nest_depth = 0;
int after_manual_callout = 0; int after_manual_callout = 0;
int expect_cond_assert = 0; int expect_cond_assert = 0;
@ -2461,6 +2468,16 @@ while (ptr < ptrend)
goto FAILED; goto FAILED;
} }
*verblengthptr = (uint32_t)verbnamelength; *verblengthptr = (uint32_t)verbnamelength;
/* If this name was on a verb such as (*ACCEPT) which does not continue,
a (*MARK) was generated for the name. We now add the original verb as the
next item. */
if (add_after_mark != 0)
{
*parsed_pattern++ = add_after_mark;
add_after_mark = 0;
}
break; break;
case CHAR_BACKSLASH: case CHAR_BACKSLASH:
@ -3454,13 +3471,25 @@ while (ptr < ptrend)
if (*ptr++ == CHAR_COLON) /* Skip past : or ) */ if (*ptr++ == CHAR_COLON) /* Skip past : or ) */
{ {
if (verbs[i].has_arg < 0) /* Argument is forbidden */ /* Some optional arguments can be treated as a preceding (*MARK) */
if (verbs[i].has_arg < 0)
{ {
errorcode = ERR59; add_after_mark = verbs[i].meta;
goto FAILED; *parsed_pattern++ = META_MARK;
} }
*parsed_pattern++ = verbs[i].meta +
((verbs[i].meta != META_MARK)? 0x00010000u:0); /* The remaining verbs with arguments (except *MARK) need a different
opcode. */
else
{
*parsed_pattern++ = verbs[i].meta +
((verbs[i].meta != META_MARK)? 0x00010000u:0);
}
/* Set up for reading the name in the main loop. */
verblengthptr = parsed_pattern++; verblengthptr = parsed_pattern++;
verbnamestart = ptr; verbnamestart = ptr;
inverbname = TRUE; inverbname = TRUE;
@ -5654,6 +5683,7 @@ for (;; pptr++)
cb->had_pruneorskip = TRUE; cb->had_pruneorskip = TRUE;
/* Fall through */ /* Fall through */
case META_MARK: case META_MARK:
case META_COMMIT_ARG:
VERB_ARG: VERB_ARG:
*code++ = verbops[(meta - META_MARK) >> 16]; *code++ = verbops[(meta - META_MARK) >> 16];
/* The length is in characters. */ /* The length is in characters. */
@ -8002,6 +8032,7 @@ for (;;)
break; break;
case OP_MARK: case OP_MARK:
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_SKIP_ARG: case OP_SKIP_ARG:
case OP_THEN_ARG: case OP_THEN_ARG:
@ -8310,6 +8341,7 @@ for (;; pptr++)
break; break;
case META_MARK: /* Add the length of the name. */ case META_MARK: /* Add the length of the name. */
case META_COMMIT_ARG:
case META_PRUNE_ARG: case META_PRUNE_ARG:
case META_SKIP_ARG: case META_SKIP_ARG:
case META_THEN_ARG: case META_THEN_ARG:
@ -8500,6 +8532,7 @@ for (;; pptr++)
goto EXIT; goto EXIT;
case META_MARK: case META_MARK:
case META_COMMIT_ARG:
case META_PRUNE_ARG: case META_PRUNE_ARG:
case META_SKIP_ARG: case META_SKIP_ARG:
case META_THEN_ARG: case META_THEN_ARG:
@ -8967,6 +9000,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
break; break;
case META_MARK: case META_MARK:
case META_COMMIT_ARG:
case META_PRUNE_ARG: case META_PRUNE_ARG:
case META_SKIP_ARG: case META_SKIP_ARG:
case META_THEN_ARG: case META_THEN_ARG:

View File

@ -181,7 +181,8 @@ static const uint8_t coptable[] = {
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */ 0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */ 0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */ 0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ 0, 0, /* COMMIT, COMMIT_ARG */
0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */ 0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
}; };
@ -254,7 +255,8 @@ static const uint8_t poptable[] = {
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */ 0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */ 0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */ 0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ 0, 0, /* COMMIT, COMMIT_ARG */
0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */ 0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
}; };

View File

@ -133,7 +133,8 @@ static const unsigned char compile_error_texts[] =
"internal error: unknown newline setting\0" "internal error: unknown newline setting\0"
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0" "\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
"(?R (recursive pattern call) must be followed by a closing parenthesis\0" "(?R (recursive pattern call) must be followed by a closing parenthesis\0"
"an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" /* "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" */
"obsolete error (should not occur)\0" /* Was the above */
/* 60 */ /* 60 */
"(*VERB) not recognized or malformed\0" "(*VERB) not recognized or malformed\0"
"group number is too big\0" "group number is too big\0"

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2017 University of Cambridge New API code Copyright (c) 2016-2018 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -1583,23 +1583,26 @@ enum {
OP_THEN, /* 155 */ OP_THEN, /* 155 */
OP_THEN_ARG, /* 156 same, but with argument */ OP_THEN_ARG, /* 156 same, but with argument */
OP_COMMIT, /* 157 */ OP_COMMIT, /* 157 */
OP_COMMIT_ARG, /* 158 same, but with argument */
/* These are forced failure and success verbs */ /* These are forced failure and success verbs. FAIL and ACCEPT do accept an
argument, but these cases can be compiled as, for example, (*MARK:X)(*FAIL)
without the need for a special opcode. */
OP_FAIL, /* 158 */ OP_FAIL, /* 159 */
OP_ACCEPT, /* 159 */ OP_ACCEPT, /* 160 */
OP_ASSERT_ACCEPT, /* 160 Used inside assertions */ OP_ASSERT_ACCEPT, /* 161 Used inside assertions */
OP_CLOSE, /* 161 Used before OP_ACCEPT to close open captures */ OP_CLOSE, /* 162 Used before OP_ACCEPT to close open captures */
/* This is used to skip a subpattern with a {0} quantifier */ /* This is used to skip a subpattern with a {0} quantifier */
OP_SKIPZERO, /* 162 */ OP_SKIPZERO, /* 163 */
/* This is used to identify a DEFINE group during compilation so that it can /* This is used to identify a DEFINE group during compilation so that it can
be checked for having only one branch. It is changed to OP_FALSE before be checked for having only one branch. It is changed to OP_FALSE before
compilation finishes. */ compilation finishes. */
OP_DEFINE, /* 163 */ OP_DEFINE, /* 164 */
/* This is not an opcode, but is used to check that tables indexed by opcode /* This is not an opcode, but is used to check that tables indexed by opcode
are the correct length, in order to catch updating errors - there have been are the correct length, in order to catch updating errors - there have been
@ -1655,7 +1658,7 @@ some cases doesn't actually use these names at all). */
"Cond false", "Cond true", \ "Cond false", "Cond true", \
"Brazero", "Braminzero", "Braposzero", \ "Brazero", "Braminzero", "Braposzero", \
"*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \ "*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \
"*THEN", "*THEN", "*COMMIT", "*FAIL", \ "*THEN", "*THEN", "*COMMIT", "*COMMIT", "*FAIL", \
"*ACCEPT", "*ASSERT_ACCEPT", \ "*ACCEPT", "*ASSERT_ACCEPT", \
"Close", "Skip zero", "Define" "Close", "Skip zero", "Define"
@ -1747,7 +1750,8 @@ in UTF-8 mode. The code that uses this table must know about such things. */
3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \ 3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \
1, 3, /* SKIP, SKIP_ARG */ \ 1, 3, /* SKIP, SKIP_ARG */ \
1, 3, /* THEN, THEN_ARG */ \ 1, 3, /* THEN, THEN_ARG */ \
1, 1, 1, 1, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ \ 1, 3, /* COMMIT, COMMIT_ARG */ \
1, 1, 1, /* FAIL, ACCEPT, ASSERT_ACCEPT */ \
1+IMM2_SIZE, 1, /* CLOSE, SKIPZERO */ \ 1+IMM2_SIZE, 1, /* CLOSE, SKIPZERO */ \
1 /* DEFINE */ 1 /* DEFINE */

View File

@ -839,6 +839,7 @@ switch(*cc)
#endif #endif
case OP_MARK: case OP_MARK:
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_SKIP_ARG: case OP_SKIP_ARG:
case OP_THEN_ARG: case OP_THEN_ARG:
@ -939,6 +940,7 @@ while (cc < ccend)
common->control_head_ptr = 1; common->control_head_ptr = 1;
/* Fall through. */ /* Fall through. */
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_MARK: case OP_MARK:
if (common->mark_ptr == 0) if (common->mark_ptr == 0)
@ -1553,6 +1555,7 @@ while (cc < ccend)
break; break;
case OP_MARK: case OP_MARK:
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_THEN_ARG: case OP_THEN_ARG:
SLJIT_ASSERT(common->mark_ptr != 0); SLJIT_ASSERT(common->mark_ptr != 0);
@ -1733,6 +1736,7 @@ while (cc < ccend)
break; break;
case OP_MARK: case OP_MARK:
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_THEN_ARG: case OP_THEN_ARG:
SLJIT_ASSERT(common->mark_ptr != 0); SLJIT_ASSERT(common->mark_ptr != 0);
@ -2041,6 +2045,7 @@ while (cc < ccend)
break; break;
case OP_MARK: case OP_MARK:
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_THEN_ARG: case OP_THEN_ARG:
SLJIT_ASSERT(common->mark_ptr != 0); SLJIT_ASSERT(common->mark_ptr != 0);
@ -2428,6 +2433,7 @@ while (cc < ccend)
break; break;
case OP_MARK: case OP_MARK:
case OP_COMMIT_ARG:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_THEN_ARG: case OP_THEN_ARG:
SLJIT_ASSERT(common->mark_ptr != 0); SLJIT_ASSERT(common->mark_ptr != 0);
@ -10350,7 +10356,8 @@ backtrack_common *backtrack;
PCRE2_UCHAR opcode = *cc; PCRE2_UCHAR opcode = *cc;
PCRE2_SPTR ccend = cc + 1; PCRE2_SPTR ccend = cc + 1;
if (opcode == OP_PRUNE_ARG || opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG) if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG ||
opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
ccend += 2 + cc[1]; ccend += 2 + cc[1];
PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL); PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL);
@ -10362,7 +10369,7 @@ if (opcode == OP_SKIP)
return ccend; return ccend;
} }
if (opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG) if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
{ {
OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0);
OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2)); OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2));
@ -10681,6 +10688,7 @@ while (cc < ccend)
case OP_THEN: case OP_THEN:
case OP_THEN_ARG: case OP_THEN_ARG:
case OP_COMMIT: case OP_COMMIT:
case OP_COMMIT_ARG:
cc = compile_control_verb_matchingpath(common, cc, parent); cc = compile_control_verb_matchingpath(common, cc, parent);
break; break;
@ -11755,6 +11763,7 @@ while (current)
break; break;
case OP_COMMIT: case OP_COMMIT:
case OP_COMMIT_ARG:
if (!common->local_quit_available) if (!common->local_quit_available)
OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH); OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH);
if (common->quit_label == NULL) if (common->quit_label == NULL)

View File

@ -149,7 +149,7 @@ changed, the code at RETURN_SWITCH below must be updated in sync. */
enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10, enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10,
RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20, RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20,
RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30, RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30,
RM31, RM32, RM33, RM34, RM35 }; RM31, RM32, RM33, RM34, RM35, RM36 };
#ifdef SUPPORT_WIDE_CHARS #ifdef SUPPORT_WIDE_CHARS
enum { RM100=100, RM101 }; enum { RM100=100, RM101 };
@ -770,7 +770,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
/* ===================================================================== */ /* ===================================================================== */
/* Real or forced end of the pattern, assertion, or recursion. In an /* Real or forced end of the pattern, assertion, or recursion. In an
assertion ACCEPT, update the last used pointer and remember the current assertion ACCEPT, update the last used pointer and remember the current
frame so that the captures can be fished out of it. */ frame so that the captures and mark can be fished out of it. */
case OP_ASSERT_ACCEPT: case OP_ASSERT_ACCEPT:
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr; if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
@ -5119,7 +5119,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
/* Positive assertions are like other groups except that PCRE doesn't allow /* Positive assertions are like other groups except that PCRE doesn't allow
the effect of (*THEN) to escape beyond an assertion; it is therefore the effect of (*THEN) to escape beyond an assertion; it is therefore
treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its
captures retained. Any other return is an error. */ captures and mark retained. Any other return is an error. */
#define Lframe_type F->temp_32[0] #define Lframe_type F->temp_32[0]
@ -5136,6 +5136,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
(char *)assert_accept_frame + offsetof(heapframe, ovector), (char *)assert_accept_frame + offsetof(heapframe, ovector),
assert_accept_frame->offset_top * sizeof(PCRE2_SIZE)); assert_accept_frame->offset_top * sizeof(PCRE2_SIZE));
Foffset_top = assert_accept_frame->offset_top; Foffset_top = assert_accept_frame->offset_top;
Fmark = assert_accept_frame->mark;
break; break;
} }
if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc); if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc);
@ -5837,6 +5838,13 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
mb->verb_current_recurse = Fcurrent_recurse; mb->verb_current_recurse = Fcurrent_recurse;
RRETURN(MATCH_COMMIT); RRETURN(MATCH_COMMIT);
case OP_COMMIT_ARG:
Fmark = mb->nomatch_mark = Fecode + 2;
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM36);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
mb->verb_current_recurse = Fcurrent_recurse;
RRETURN(MATCH_COMMIT);
case OP_PRUNE: case OP_PRUNE:
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14); RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14);
if (rrc != MATCH_NOMATCH) RRETURN(rrc); if (rrc != MATCH_NOMATCH) RRETURN(rrc);
@ -5942,7 +5950,7 @@ switch (Freturn_id)
LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16) LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16)
LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24) LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24)
LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32) LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32)
LBL(33) LBL(34) LBL(35) LBL(33) LBL(34) LBL(35) LBL(36)
#ifdef SUPPORT_WIDE_CHARS #ifdef SUPPORT_WIDE_CHARS
LBL(100) LBL(101) LBL(100) LBL(101)

View File

@ -4678,12 +4678,6 @@ uint16_t first_listed_newline;
const char *cmdname; const char *cmdname;
uint8_t *argptr, *serial; uint8_t *argptr, *serial;
if (restrict_for_perl_test)
{
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
return PR_ABEND;
}
yield = PR_OK; yield = PR_OK;
cmd = CMD_UNKNOWN; cmd = CMD_UNKNOWN;
cmdlen = 0; cmdlen = 0;
@ -4702,6 +4696,12 @@ for (i = 0; i < cmdlistcount; i++)
argptr = buffer + cmdlen + 1; argptr = buffer + cmdlen + 1;
if (restrict_for_perl_test && cmd != CMD_PATTERN && cmd != CMD_SUBJECT)
{
fprintf(outfile, "** #%s is not allowed after #perltest\n", cmdname);
return PR_ABEND;
}
switch(cmd) switch(cmd)
{ {
case CMD_UNKNOWN: case CMD_UNKNOWN:

41
testdata/testinput1 vendored
View File

@ -6203,10 +6203,47 @@ ef) x/x,mark
/a(?:(*:X))(*SKIP:X)(*F)|(.)/ /a(?:(*:X))(*SKIP:X)(*F)|(.)/
abc abc
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize #pattern no_start_optimize
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
abc abc
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize /(?>a(*:1))(?>b)(*SKIP:1)x|.*/
abc abc
#subject mark
/a(*ACCEPT:X)b/
abc
/(?=a(*ACCEPT:QQ)bc)axyz/
axyz
/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
abc
/a(*F:X)b/
abc
/(?(DEFINE)(a(*F:X)))(?1)b/
abc
/a(*COMMIT:X)b/
abc
/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
abc
/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
aaaabd
/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
aaaabd
/a(*COMMIT:X)b/
axabc
#pattern -no_start_optimize
#subject -mark
# End of testinput1 # End of testinput1

3
testdata/testinput2 vendored
View File

@ -2949,10 +2949,9 @@
/abc(*:)pqr/ /abc(*:)pqr/
/abc(*FAIL:123)xyz/
# This should, and does, fail. In Perl, it does not, which I think is a # This should, and does, fail. In Perl, it does not, which I think is a
# bug because replacing the B in the pattern by (B|D) does make it fail. # bug because replacing the B in the pattern by (B|D) does make it fail.
# Turning off Perl's optimization by inserting (??{""}) also makes it fail.
/A(*COMMIT)B/aftertext,mark /A(*COMMIT)B/aftertext,mark
\= Expect no match \= Expect no match

56
testdata/testoutput1 vendored
View File

@ -9846,12 +9846,64 @@ No match
0: b 0: b
1: b 1: b
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize #pattern no_start_optimize
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
abc abc
0: abc 0: abc
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize /(?>a(*:1))(?>b)(*SKIP:1)x|.*/
abc abc
0: abc 0: abc
#subject mark
/a(*ACCEPT:X)b/
abc
0: a
MK: X
/(?=a(*ACCEPT:QQ)bc)axyz/
axyz
0: axyz
MK: QQ
/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
abc
0: ab
MK: X
/a(*F:X)b/
abc
No match, mark = X
/(?(DEFINE)(a(*F:X)))(?1)b/
abc
No match, mark = X
/a(*COMMIT:X)b/
abc
0: ab
MK: X
/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
abc
0: ab
MK: X
/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
aaaabd
0: bd
/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
aaaabd
No match, mark = X
/a(*COMMIT:X)b/
axabc
No match, mark = X
#pattern -no_start_optimize
#subject -mark
# End of testinput1 # End of testinput1

View File

@ -10154,11 +10154,9 @@ Failed: error 166 at offset 10: (*MARK) must have an argument
/abc(*:)pqr/ /abc(*:)pqr/
Failed: error 166 at offset 6: (*MARK) must have an argument Failed: error 166 at offset 6: (*MARK) must have an argument
/abc(*FAIL:123)xyz/
Failed: error 159 at offset 10: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
# This should, and does, fail. In Perl, it does not, which I think is a # This should, and does, fail. In Perl, it does not, which I think is a
# bug because replacing the B in the pattern by (B|D) does make it fail. # bug because replacing the B in the pattern by (B|D) does make it fail.
# Turning off Perl's optimization by inserting (??{""}) also makes it fail.
/A(*COMMIT)B/aftertext,mark /A(*COMMIT)B/aftertext,mark
\= Expect no match \= Expect no match