Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
followed by (*ACCEPT) in an assertion. More small updates to perltest.sh.
This commit is contained in:
parent
635d04fbb7
commit
192b82cf6e
13
ChangeLog
13
ChangeLog
|
@ -119,9 +119,16 @@ backtrack into the first of the atomic groups. A complicated example is
|
|||
shouldn't find a MARK (because is in an atomic group), but it did.
|
||||
|
||||
26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
|
||||
certain modifiers that the script recognizes; (2) Unsupported #command lines
|
||||
give a warning when they are ignored; (3) Mark data is output only if the
|
||||
"mark" modifier is present.
|
||||
a list of modifiers for all subsequent patterns - only those that the script
|
||||
recognizes are meaningful; (2) #subject lines can be used to set or unset a
|
||||
default "mark" modifier; (3) Unsupported #command lines give a warning when
|
||||
they are ignored; (4) Mark data is output only if the "mark" modifier is
|
||||
present.
|
||||
|
||||
27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
|
||||
|
||||
28. A (*MARK) name was not being passed back for positive assertions that were
|
||||
terminated by (*ACCEPT).
|
||||
|
||||
|
||||
Version 10.31 12-February-2018
|
||||
|
|
27
HACKING
27
HACKING
|
@ -256,6 +256,7 @@ The following are followed by a length element, then a number of character code
|
|||
values (which should match with the length):
|
||||
|
||||
META_MARK (*MARK:xxxx)
|
||||
META_COMMIT_ARG )*COMMIT:xxxx)
|
||||
META_PRUNE_ARG (*PRUNE:xxx)
|
||||
META_SKIP_ARG (*SKIP:xxxx)
|
||||
META_THEN_ARG (*THEN:xxxx)
|
||||
|
@ -382,7 +383,7 @@ that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
|
|||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
These items are all just one unit long
|
||||
These items are all just one unit long:
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any one character other than newline
|
||||
|
@ -430,14 +431,22 @@ character). Another use is for [^] when empty classes are permitted
|
|||
(PCRE2_ALLOW_EMPTY_CLASS is set).
|
||||
|
||||
|
||||
Backtracking control verbs with optional data
|
||||
---------------------------------------------
|
||||
Backtracking control verbs
|
||||
--------------------------
|
||||
|
||||
(*THEN) without an argument generates the opcode OP_THEN and no following data.
|
||||
OP_MARK is followed by the mark name, preceded by a length in one code unit,
|
||||
and followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with
|
||||
arguments, the opcodes OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used,
|
||||
with the name following in the same format as OP_MARK.
|
||||
Verbs with no arguments generate opcodes with no following data (as listed
|
||||
in the section above).
|
||||
|
||||
(*MARK:NAME) generates OP_MARK followed by the mark name, preceded by a
|
||||
length in one code unit, and followed by a binary zero. The name length is
|
||||
limited by the size of the code unit.
|
||||
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) are compiled as (*MARK:NAME)(*ACCEPT) and
|
||||
(*MARK:NAME)(*FAIL) respectively.
|
||||
|
||||
For (*COMMIT:NAME), (*PRUNE:NAME), (*SKIP:NAME), and (*THEN:NAME), the opcodes
|
||||
OP_COMMIT_ARG, OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, with the
|
||||
name following in the same format as for OP_MARK.
|
||||
|
||||
|
||||
Matching literal characters
|
||||
|
@ -814,4 +823,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
|
|||
opcode are the correct length, in order to catch updating errors.
|
||||
|
||||
Philip Hazel
|
||||
21 April 2017
|
||||
20 July 2018
|
||||
|
|
|
@ -3122,17 +3122,16 @@ in the
|
|||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
Experiments with Perl suggest that it too has similar optimizations, sometimes
|
||||
leading to anomalous results.
|
||||
Experiments with Perl suggest that it too has similar optimizations, and like
|
||||
PCRE2, turning them off can change the result of a match.
|
||||
</P>
|
||||
<br><b>
|
||||
Verbs that act immediately
|
||||
</b><br>
|
||||
<P>
|
||||
The following verbs act as soon as they are encountered. They may not be
|
||||
followed by a name.
|
||||
The following verbs act as soon as they are encountered.
|
||||
<pre>
|
||||
(*ACCEPT)
|
||||
(*ACCEPT) or (*ACCEPT:NAME)
|
||||
</pre>
|
||||
This verb causes the match to end successfully, skipping the remainder of the
|
||||
pattern. However, when it is inside a subpattern that is called as a
|
||||
|
@ -3149,19 +3148,23 @@ example:
|
|||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
||||
the outer parentheses.
|
||||
<pre>
|
||||
(*FAIL) or (*F)
|
||||
(*FAIL) or (*FAIL:NAME)
|
||||
</pre>
|
||||
This verb causes a matching failure, forcing backtracking to occur. It is
|
||||
equivalent to (?!) but easier to read. The Perl documentation notes that it is
|
||||
probably useful only when combined with (?{}) or (??{}). Those are, of course,
|
||||
Perl features that are not present in PCRE2. The nearest equivalent is the
|
||||
callout feature, as for example in this pattern:
|
||||
This verb causes a matching failure, forcing backtracking to occur. It may be
|
||||
abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
|
||||
documentation notes that it is probably useful only when combined with (?{}) or
|
||||
(??{}). Those are, of course, Perl features that are not present in PCRE2. The
|
||||
nearest equivalent is the callout feature, as for example in this pattern:
|
||||
<pre>
|
||||
a+(?C)(*FAIL)
|
||||
</pre>
|
||||
A match with the string "aaaa" always fails, but the callout is taken before
|
||||
each backtrack happens (in this example, 10 times).
|
||||
</P>
|
||||
<P>
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||
</P>
|
||||
<br><b>
|
||||
Recording which path was taken
|
||||
</b><br>
|
||||
|
@ -3186,9 +3189,9 @@ assertions and atomic groups. (There are differences in those cases when
|
|||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
||||
</P>
|
||||
<P>
|
||||
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
|
||||
arguments. Whichever is last on the matching path is passed back. See below for
|
||||
more details of these other verbs.
|
||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||
associated NAME arguments. Whichever is last on the matching path is passed
|
||||
back. See below for more details of these other verbs.
|
||||
</P>
|
||||
<P>
|
||||
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
|
||||
|
@ -3250,22 +3253,25 @@ reaches them. The behaviour described below is what happens when the verb is
|
|||
not in a subroutine or an assertion. Subsequent sections cover these special
|
||||
cases.
|
||||
<pre>
|
||||
(*COMMIT)
|
||||
(*COMMIT) or (*COMMIT:NAME)
|
||||
</pre>
|
||||
This verb, which may not be followed by a name, causes the whole match to fail
|
||||
outright if there is a later matching failure that causes backtracking to reach
|
||||
it. Even if the pattern is unanchored, no further attempts to find a match by
|
||||
advancing the starting point take place. If (*COMMIT) is the only backtracking
|
||||
verb that is encountered, once it has been passed <b>pcre2_match()</b> is
|
||||
committed to finding a match at the current starting point, or not at all. For
|
||||
example:
|
||||
This verb causes the whole match to fail outright if there is a later matching
|
||||
failure that causes backtracking to reach it. Even if the pattern is
|
||||
unanchored, no further attempts to find a match by advancing the starting point
|
||||
take place. If (*COMMIT) is the only backtracking verb that is encountered,
|
||||
once it has been passed <b>pcre2_match()</b> is committed to finding a match at
|
||||
the current starting point, or not at all. For example:
|
||||
<pre>
|
||||
a+(*COMMIT)b
|
||||
</pre>
|
||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
|
||||
dynamic anchor, or "I've started, so I must finish." The name of the most
|
||||
recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
|
||||
match failure.
|
||||
dynamic anchor, or "I've started, so I must finish."
|
||||
</P>
|
||||
<P>
|
||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
</P>
|
||||
<P>
|
||||
If there is more than one backtracking verb in a pattern, a different one that
|
||||
|
@ -3309,7 +3315,7 @@ as (*COMMIT).
|
|||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*PRUNE) or (*THEN).
|
||||
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||
<pre>
|
||||
(*SKIP)
|
||||
</pre>
|
||||
|
@ -3317,7 +3323,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
|
|||
pattern is unanchored, the "bumpalong" advance is not to the next character,
|
||||
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
|
||||
signifies that whatever text was matched leading up to it cannot be part of a
|
||||
successful match. Consider:
|
||||
successful match if there is a later mismatch. Consider:
|
||||
<pre>
|
||||
a+(*SKIP)b
|
||||
</pre>
|
||||
|
@ -3364,7 +3370,7 @@ the second branch of the pattern.
|
|||
</P>
|
||||
<P>
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
||||
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
||||
<pre>
|
||||
(*THEN) or (*THEN:NAME)
|
||||
</pre>
|
||||
|
@ -3383,10 +3389,10 @@ more alternatives, so there is a backtrack to whatever came before the entire
|
|||
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
||||
</P>
|
||||
<P>
|
||||
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
|
||||
It is like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*PRUNE) and (*THEN).
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
</P>
|
||||
<P>
|
||||
A subpattern that does not contain a | character is just a part of the
|
||||
|
@ -3461,13 +3467,14 @@ onto (*COMMIT).
|
|||
Backtracking verbs in repeated groups
|
||||
</b><br>
|
||||
<P>
|
||||
PCRE2 differs from Perl in its handling of backtracking verbs in repeated
|
||||
groups. For example, consider:
|
||||
PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
|
||||
repeated groups. For example, consider:
|
||||
<pre>
|
||||
/(a(*COMMIT)b)+ac/
|
||||
</pre>
|
||||
If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
|
||||
in the second repeat of the group acts.
|
||||
If the subject is "abac", Perl matches unless its optimizations are disabled,
|
||||
but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
|
||||
acts.
|
||||
<a name="btassert"></a></P>
|
||||
<br><b>
|
||||
Backtracking verbs in assertions
|
||||
|
@ -3480,9 +3487,10 @@ subpattern.
|
|||
</P>
|
||||
<P>
|
||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||
without any further processing; captured strings are retained. In a standalone
|
||||
negative assertion, (*ACCEPT) causes the assertion to fail without any further
|
||||
processing; captured substrings are discarded.
|
||||
without any further processing; captured strings and a (*MARK) name (if set)
|
||||
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
||||
assertion to fail without any further processing; captured substrings and any
|
||||
(*MARK) name are discarded.
|
||||
</P>
|
||||
<P>
|
||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||
|
@ -3515,16 +3523,16 @@ Backtracking verbs in subroutines
|
|||
</b><br>
|
||||
<P>
|
||||
These behaviours occur whether or not the subpattern is called recursively.
|
||||
Perl's treatment of subroutines is different in some cases.
|
||||
</P>
|
||||
<P>
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
||||
an immediate backtrack.
|
||||
</P>
|
||||
<P>
|
||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
|
||||
succeed without any further processing. Matching then continues after the
|
||||
subroutine call.
|
||||
subroutine call. Perl documents this behaviour. Perl's treatment of the other
|
||||
verbs in subroutines is different in some cases.
|
||||
</P>
|
||||
<P>
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
||||
an immediate backtrack.
|
||||
</P>
|
||||
<P>
|
||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
|
||||
|
@ -3551,7 +3559,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 16 July 2018
|
||||
Last updated: 20 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -569,7 +569,11 @@ condition if the relevant named group exists.
|
|||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||
<P>
|
||||
The following act immediately they are reached:
|
||||
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
||||
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
||||
if :NAME is present. The others just set a name for passing back to the caller,
|
||||
but this is not a name that (*SKIP) can see. The following act immediately they
|
||||
are reached:
|
||||
<pre>
|
||||
(*ACCEPT) force successful match
|
||||
(*FAIL) force backtrack; synonym (*F)
|
||||
|
@ -582,13 +586,13 @@ pattern is not anchored.
|
|||
<pre>
|
||||
(*COMMIT) overall failure, no advance of starting point
|
||||
(*PRUNE) advance to next starting character
|
||||
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
|
||||
(*SKIP) advance to current matching position
|
||||
(*SKIP:NAME) advance to position corresponding to an earlier
|
||||
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
||||
(*THEN) local failure, backtrack to next alternation
|
||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
||||
</PRE>
|
||||
</pre>
|
||||
The effect of one of these verbs in a group called as a subroutine is confined
|
||||
to the subroutine call.
|
||||
</P>
|
||||
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
|
@ -617,7 +621,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 07 July 2018
|
||||
Last updated: 21 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -410,10 +410,11 @@ patterns. Modifiers on a pattern can change these settings.
|
|||
The appearance of this line causes all subsequent modifier settings to be
|
||||
checked for compatibility with the <b>perltest.sh</b> script, which is used to
|
||||
confirm that Perl gives the same results as PCRE2. Also, apart from comment
|
||||
lines, none of the other command lines are permitted, because they and many
|
||||
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
||||
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
||||
command helps detect tests that are accidentally put in the wrong file.
|
||||
lines, #pattern commands, and #subject commands that set or unset "mark", no
|
||||
command lines are permitted, because they and many of the modifiers are
|
||||
specific to <b>pcre2test</b>, and should not be used in test files that are also
|
||||
processed by <b>perltest.sh</b>. The <b>#perltest</b> command helps detect tests
|
||||
that are accidentally put in the wrong file.
|
||||
<pre>
|
||||
#pop [<modifiers>]
|
||||
#popcopy [<modifiers>]
|
||||
|
@ -2003,7 +2004,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 16 July 2018
|
||||
Last updated: 21 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
247
doc/pcre2.txt
247
doc/pcre2.txt
|
@ -8601,44 +8601,46 @@ BACKTRACKING CONTROL
|
|||
in the pcre2api documentation.
|
||||
|
||||
Experiments with Perl suggest that it too has similar optimizations,
|
||||
sometimes leading to anomalous results.
|
||||
and like PCRE2, turning them off can change the result of a match.
|
||||
|
||||
Verbs that act immediately
|
||||
|
||||
The following verbs act as soon as they are encountered. They may not
|
||||
be followed by a name.
|
||||
The following verbs act as soon as they are encountered.
|
||||
|
||||
(*ACCEPT)
|
||||
(*ACCEPT) or (*ACCEPT:NAME)
|
||||
|
||||
This verb causes the match to end successfully, skipping the remainder
|
||||
of the pattern. However, when it is inside a subpattern that is called
|
||||
as a subroutine, only that subpattern is ended successfully. Matching
|
||||
This verb causes the match to end successfully, skipping the remainder
|
||||
of the pattern. However, when it is inside a subpattern that is called
|
||||
as a subroutine, only that subpattern is ended successfully. Matching
|
||||
then continues at the outer level. If (*ACCEPT) in triggered in a posi-
|
||||
tive assertion, the assertion succeeds; in a negative assertion, the
|
||||
tive assertion, the assertion succeeds; in a negative assertion, the
|
||||
assertion fails.
|
||||
|
||||
If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
|
||||
If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
|
||||
tured. For example:
|
||||
|
||||
A((?:A|B(*ACCEPT)|C)D)
|
||||
|
||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
|
||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
|
||||
tured by the outer parentheses.
|
||||
|
||||
(*FAIL) or (*F)
|
||||
(*FAIL) or (*FAIL:NAME)
|
||||
|
||||
This verb causes a matching failure, forcing backtracking to occur. It
|
||||
is equivalent to (?!) but easier to read. The Perl documentation notes
|
||||
that it is probably useful only when combined with (?{}) or (??{}).
|
||||
Those are, of course, Perl features that are not present in PCRE2. The
|
||||
nearest equivalent is the callout feature, as for example in this pat-
|
||||
tern:
|
||||
This verb causes a matching failure, forcing backtracking to occur. It
|
||||
may be abbreviated to (*F). It is equivalent to (?!) but easier to
|
||||
read. The Perl documentation notes that it is probably useful only when
|
||||
combined with (?{}) or (??{}). Those are, of course, Perl features that
|
||||
are not present in PCRE2. The nearest equivalent is the callout fea-
|
||||
ture, as for example in this pattern:
|
||||
|
||||
a+(?C)(*FAIL)
|
||||
|
||||
A match with the string "aaaa" always fails, but the callout is taken
|
||||
A match with the string "aaaa" always fails, but the callout is taken
|
||||
before each backtrack happens (in this example, 10 times).
|
||||
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||
|
||||
Recording which path was taken
|
||||
|
||||
There is one verb whose main purpose is to track how a match was
|
||||
|
@ -8659,9 +8661,9 @@ BACKTRACKING CONTROL
|
|||
cases when (*MARK) is used in conjunction with (*SKIP) as described
|
||||
below.)
|
||||
|
||||
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated
|
||||
NAME arguments. Whichever is last on the matching path is passed back.
|
||||
See below for more details of these other verbs.
|
||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||
associated NAME arguments. Whichever is last on the matching path is
|
||||
passed back. See below for more details of these other verbs.
|
||||
|
||||
Here is an example of pcre2test output, where the "mark" modifier
|
||||
requests the retrieval and outputting of (*MARK) data:
|
||||
|
@ -8717,22 +8719,26 @@ BACKTRACKING CONTROL
|
|||
when the verb is not in a subroutine or an assertion. Subsequent sec-
|
||||
tions cover these special cases.
|
||||
|
||||
(*COMMIT)
|
||||
(*COMMIT) or (*COMMIT:NAME)
|
||||
|
||||
This verb, which may not be followed by a name, causes the whole match
|
||||
to fail outright if there is a later matching failure that causes back-
|
||||
tracking to reach it. Even if the pattern is unanchored, no further
|
||||
attempts to find a match by advancing the starting point take place. If
|
||||
(*COMMIT) is the only backtracking verb that is encountered, once it
|
||||
has been passed pcre2_match() is committed to finding a match at the
|
||||
current starting point, or not at all. For example:
|
||||
This verb causes the whole match to fail outright if there is a later
|
||||
matching failure that causes backtracking to reach it. Even if the pat-
|
||||
tern is unanchored, no further attempts to find a match by advancing
|
||||
the starting point take place. If (*COMMIT) is the only backtracking
|
||||
verb that is encountered, once it has been passed pcre2_match() is com-
|
||||
mitted to finding a match at the current starting point, or not at all.
|
||||
For example:
|
||||
|
||||
a+(*COMMIT)b
|
||||
|
||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind
|
||||
of dynamic anchor, or "I've started, so I must finish." The name of the
|
||||
most recently passed (*MARK) in the path is passed back when (*COMMIT)
|
||||
forces a match failure.
|
||||
of dynamic anchor, or "I've started, so I must finish."
|
||||
|
||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
|
||||
MIT). It is like (*MARK:NAME) in that the name is remembered for pass-
|
||||
ing back to the caller. However, (*SKIP:NAME) searches only for names
|
||||
set with (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and
|
||||
(*THEN).
|
||||
|
||||
If there is more than one backtracking verb in a pattern, a different
|
||||
one that follows (*COMMIT) may be triggered first, so merely passing
|
||||
|
@ -8776,7 +8782,7 @@ BACKTRACKING CONTROL
|
|||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
|
||||
It is like (*MARK:NAME) in that the name is remembered for passing back
|
||||
to the caller. However, (*SKIP:NAME) searches only for names set with
|
||||
(*MARK), ignoring those set by (*PRUNE) or (*THEN).
|
||||
(*MARK), ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||
|
||||
(*SKIP)
|
||||
|
||||
|
@ -8784,29 +8790,30 @@ BACKTRACKING CONTROL
|
|||
the pattern is unanchored, the "bumpalong" advance is not to the next
|
||||
character, but to the position in the subject where (*SKIP) was encoun-
|
||||
tered. (*SKIP) signifies that whatever text was matched leading up to
|
||||
it cannot be part of a successful match. Consider:
|
||||
it cannot be part of a successful match if there is a later mismatch.
|
||||
Consider:
|
||||
|
||||
a+(*SKIP)b
|
||||
|
||||
If the subject is "aaaac...", after the first match attempt fails
|
||||
(starting at the first character in the string), the starting point
|
||||
If the subject is "aaaac...", after the first match attempt fails
|
||||
(starting at the first character in the string), the starting point
|
||||
skips on to start the next attempt at "c". Note that a possessive quan-
|
||||
tifer does not have the same effect as this example; although it would
|
||||
suppress backtracking during the first match attempt, the second
|
||||
attempt would start at the second character instead of skipping on to
|
||||
tifer does not have the same effect as this example; although it would
|
||||
suppress backtracking during the first match attempt, the second
|
||||
attempt would start at the second character instead of skipping on to
|
||||
"c".
|
||||
|
||||
(*SKIP:NAME)
|
||||
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When
|
||||
such a (*SKIP) is triggered, the previous path through the pattern is
|
||||
searched for the most recent (*MARK) that has the same name. If one is
|
||||
found, the "bumpalong" advance is to the subject position that corre-
|
||||
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When
|
||||
such a (*SKIP) is triggered, the previous path through the pattern is
|
||||
searched for the most recent (*MARK) that has the same name. If one is
|
||||
found, the "bumpalong" advance is to the subject position that corre-
|
||||
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
||||
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
|
||||
|
||||
The search for a (*MARK) name uses the normal backtracking mechanism,
|
||||
which means that it does not see (*MARK) settings that are inside
|
||||
The search for a (*MARK) name uses the normal backtracking mechanism,
|
||||
which means that it does not see (*MARK) settings that are inside
|
||||
atomic groups or assertions, because they are never re-entered by back-
|
||||
tracking. Compare the following pcre2test examples:
|
||||
|
||||
|
@ -8820,18 +8827,19 @@ BACKTRACKING CONTROL
|
|||
0: b
|
||||
1: b
|
||||
|
||||
In the first example, the (*MARK) setting is in an atomic group, so it
|
||||
In the first example, the (*MARK) setting is in an atomic group, so it
|
||||
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
|
||||
This allows the second branch of the pattern to be tried at the first
|
||||
character position. In the second example, the (*MARK) setting is not
|
||||
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
|
||||
This allows the second branch of the pattern to be tried at the first
|
||||
character position. In the second example, the (*MARK) setting is not
|
||||
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
|
||||
backtracks, and this causes a new matching attempt to start at the sec-
|
||||
ond character. This time, the (*MARK) is never seen because "a" does
|
||||
ond character. This time, the (*MARK) is never seen because "a" does
|
||||
not match "b", so the matcher immediately jumps to the second branch of
|
||||
the pattern.
|
||||
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
||||
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
||||
ignores names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or
|
||||
(*THEN:NAME).
|
||||
|
||||
(*THEN) or (*THEN:NAME)
|
||||
|
||||
|
@ -8850,87 +8858,87 @@ BACKTRACKING CONTROL
|
|||
track to whatever came before the entire group. If (*THEN) is not
|
||||
inside an alternation, it acts like (*PRUNE).
|
||||
|
||||
The behaviour of (*THEN:NAME) is the not the same as
|
||||
(*MARK:NAME)(*THEN). It is like (*MARK:NAME) in that the name is
|
||||
remembered for passing back to the caller. However, (*SKIP:NAME)
|
||||
searches only for names set with (*MARK), ignoring those set by
|
||||
(*PRUNE) and (*THEN).
|
||||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
|
||||
It is like (*MARK:NAME) in that the name is remembered for passing back
|
||||
to the caller. However, (*SKIP:NAME) searches only for names set with
|
||||
(*MARK), ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
|
||||
A subpattern that does not contain a | character is just a part of the
|
||||
enclosing alternative; it is not a nested alternation with only one
|
||||
alternative. The effect of (*THEN) extends beyond such a subpattern to
|
||||
the enclosing alternative. Consider this pattern, where A, B, etc. are
|
||||
complex pattern fragments that do not contain any | characters at this
|
||||
A subpattern that does not contain a | character is just a part of the
|
||||
enclosing alternative; it is not a nested alternation with only one
|
||||
alternative. The effect of (*THEN) extends beyond such a subpattern to
|
||||
the enclosing alternative. Consider this pattern, where A, B, etc. are
|
||||
complex pattern fragments that do not contain any | characters at this
|
||||
level:
|
||||
|
||||
A (B(*THEN)C) | D
|
||||
|
||||
If A and B are matched, but there is a failure in C, matching does not
|
||||
If A and B are matched, but there is a failure in C, matching does not
|
||||
backtrack into A; instead it moves to the next alternative, that is, D.
|
||||
However, if the subpattern containing (*THEN) is given an alternative,
|
||||
However, if the subpattern containing (*THEN) is given an alternative,
|
||||
it behaves differently:
|
||||
|
||||
A (B(*THEN)C | (*FAIL)) | D
|
||||
|
||||
The effect of (*THEN) is now confined to the inner subpattern. After a
|
||||
The effect of (*THEN) is now confined to the inner subpattern. After a
|
||||
failure in C, matching moves to (*FAIL), which causes the whole subpat-
|
||||
tern to fail because there are no more alternatives to try. In this
|
||||
tern to fail because there are no more alternatives to try. In this
|
||||
case, matching does now backtrack into A.
|
||||
|
||||
Note that a conditional subpattern is not considered as having two
|
||||
alternatives, because only one is ever used. In other words, the |
|
||||
Note that a conditional subpattern is not considered as having two
|
||||
alternatives, because only one is ever used. In other words, the |
|
||||
character in a conditional subpattern has a different meaning. Ignoring
|
||||
white space, consider:
|
||||
|
||||
^.*? (?(?=a) a | b(*THEN)c )
|
||||
|
||||
If the subject is "ba", this pattern does not match. Because .*? is
|
||||
ungreedy, it initially matches zero characters. The condition (?=a)
|
||||
then fails, the character "b" is matched, but "c" is not. At this
|
||||
point, matching does not backtrack to .*? as might perhaps be expected
|
||||
from the presence of the | character. The conditional subpattern is
|
||||
If the subject is "ba", this pattern does not match. Because .*? is
|
||||
ungreedy, it initially matches zero characters. The condition (?=a)
|
||||
then fails, the character "b" is matched, but "c" is not. At this
|
||||
point, matching does not backtrack to .*? as might perhaps be expected
|
||||
from the presence of the | character. The conditional subpattern is
|
||||
part of the single alternative that comprises the whole pattern, and so
|
||||
the match fails. (If there was a backtrack into .*?, allowing it to
|
||||
the match fails. (If there was a backtrack into .*?, allowing it to
|
||||
match "b", the match would succeed.)
|
||||
|
||||
The verbs just described provide four different "strengths" of control
|
||||
The verbs just described provide four different "strengths" of control
|
||||
when subsequent matching fails. (*THEN) is the weakest, carrying on the
|
||||
match at the next alternative. (*PRUNE) comes next, failing the match
|
||||
at the current starting position, but allowing an advance to the next
|
||||
character (for an unanchored pattern). (*SKIP) is similar, except that
|
||||
match at the next alternative. (*PRUNE) comes next, failing the match
|
||||
at the current starting position, but allowing an advance to the next
|
||||
character (for an unanchored pattern). (*SKIP) is similar, except that
|
||||
the advance may be more than one character. (*COMMIT) is the strongest,
|
||||
causing the entire match to fail.
|
||||
|
||||
More than one backtracking verb
|
||||
|
||||
If more than one backtracking verb is present in a pattern, the one
|
||||
that is backtracked onto first acts. For example, consider this pat-
|
||||
If more than one backtracking verb is present in a pattern, the one
|
||||
that is backtracked onto first acts. For example, consider this pat-
|
||||
tern, where A, B, etc. are complex pattern fragments:
|
||||
|
||||
(A(*COMMIT)B(*THEN)C|ABD)
|
||||
|
||||
If A matches but B fails, the backtrack to (*COMMIT) causes the entire
|
||||
If A matches but B fails, the backtrack to (*COMMIT) causes the entire
|
||||
match to fail. However, if A and B match, but C fails, the backtrack to
|
||||
(*THEN) causes the next alternative (ABD) to be tried. This behaviour
|
||||
is consistent, but is not always the same as Perl's. It means that if
|
||||
two or more backtracking verbs appear in succession, all the the last
|
||||
(*THEN) causes the next alternative (ABD) to be tried. This behaviour
|
||||
is consistent, but is not always the same as Perl's. It means that if
|
||||
two or more backtracking verbs appear in succession, all the the last
|
||||
of them has no effect. Consider this example:
|
||||
|
||||
...(*COMMIT)(*PRUNE)...
|
||||
|
||||
If there is a matching failure to the right, backtracking onto (*PRUNE)
|
||||
causes it to be triggered, and its action is taken. There can never be
|
||||
causes it to be triggered, and its action is taken. There can never be
|
||||
a backtrack onto (*COMMIT).
|
||||
|
||||
Backtracking verbs in repeated groups
|
||||
|
||||
PCRE2 differs from Perl in its handling of backtracking verbs in
|
||||
repeated groups. For example, consider:
|
||||
PCRE2 sometimes differs from Perl in its handling of backtracking verbs
|
||||
in repeated groups. For example, consider:
|
||||
|
||||
/(a(*COMMIT)b)+ac/
|
||||
|
||||
If the subject is "abac", Perl matches, but PCRE2 fails because the
|
||||
(*COMMIT) in the second repeat of the group acts.
|
||||
If the subject is "abac", Perl matches unless its optimizations are
|
||||
disabled, but PCRE2 always fails because the (*COMMIT) in the second
|
||||
repeat of the group acts.
|
||||
|
||||
Backtracking verbs in assertions
|
||||
|
||||
|
@ -8940,44 +8948,46 @@ BACKTRACKING CONTROL
|
|||
in a conditional subpattern.
|
||||
|
||||
(*ACCEPT) in a standalone positive assertion causes the assertion to
|
||||
succeed without any further processing; captured strings are retained.
|
||||
In a standalone negative assertion, (*ACCEPT) causes the assertion to
|
||||
fail without any further processing; captured substrings are discarded.
|
||||
succeed without any further processing; captured strings and a (*MARK)
|
||||
name (if set) are retained. In a standalone negative assertion,
|
||||
(*ACCEPT) causes the assertion to fail without any further processing;
|
||||
captured substrings and any (*MARK) name are discarded.
|
||||
|
||||
If the assertion is a condition, (*ACCEPT) causes the condition to be
|
||||
true for a positive assertion and false for a negative one; captured
|
||||
If the assertion is a condition, (*ACCEPT) causes the condition to be
|
||||
true for a positive assertion and false for a negative one; captured
|
||||
substrings are retained in both cases.
|
||||
|
||||
The remaining verbs act only when a later failure causes a backtrack to
|
||||
reach them. This means that their effect is confined to the assertion,
|
||||
reach them. This means that their effect is confined to the assertion,
|
||||
because lookaround assertions are atomic. A backtrack that occurs after
|
||||
an assertion is complete does not jump back into the assertion. Note in
|
||||
particular that a (*MARK) name that is set in an assertion is not
|
||||
particular that a (*MARK) name that is set in an assertion is not
|
||||
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
|
||||
|
||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
||||
there are no more branches to try, (*THEN) causes a positive assertion
|
||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
||||
there are no more branches to try, (*THEN) causes a positive assertion
|
||||
to be false, and a negative assertion to be true.
|
||||
|
||||
The other backtracking verbs are not treated specially if they appear
|
||||
in a standalone positive assertion. In a conditional positive asser-
|
||||
The other backtracking verbs are not treated specially if they appear
|
||||
in a standalone positive assertion. In a conditional positive asser-
|
||||
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
|
||||
or (*PRUNE) causes the condition to be false. However, for both stand-
|
||||
or (*PRUNE) causes the condition to be false. However, for both stand-
|
||||
alone and conditional negative assertions, backtracking into (*COMMIT),
|
||||
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
|
||||
ing any further alternative branches.
|
||||
|
||||
Backtracking verbs in subroutines
|
||||
|
||||
These behaviours occur whether or not the subpattern is called recur-
|
||||
sively. Perl's treatment of subroutines is different in some cases.
|
||||
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect:
|
||||
it forces an immediate backtrack.
|
||||
These behaviours occur whether or not the subpattern is called recur-
|
||||
sively.
|
||||
|
||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine
|
||||
match to succeed without any further processing. Matching then contin-
|
||||
ues after the subroutine call.
|
||||
ues after the subroutine call. Perl documents this behaviour. Perl's
|
||||
treatment of the other verbs in subroutines is different in some cases.
|
||||
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect:
|
||||
it forces an immediate backtrack.
|
||||
|
||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
|
||||
cause the subroutine match to fail.
|
||||
|
@ -9002,7 +9012,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 16 July 2018
|
||||
Last updated: 20 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -10226,7 +10236,11 @@ CONDITIONAL PATTERNS
|
|||
|
||||
BACKTRACKING CONTROL
|
||||
|
||||
The following act immediately they are reached:
|
||||
All backtracking control verbs may be in the form (*VERB:NAME). For
|
||||
(*MARK) the name is mandatory, for the others it is optional. (*SKIP)
|
||||
changes its behaviour if :NAME is present. The others just set a name
|
||||
for passing back to the caller, but this is not a name that (*SKIP) can
|
||||
see. The following act immediately they are reached:
|
||||
|
||||
(*ACCEPT) force successful match
|
||||
(*FAIL) force backtrack; synonym (*F)
|
||||
|
@ -10239,12 +10253,13 @@ BACKTRACKING CONTROL
|
|||
|
||||
(*COMMIT) overall failure, no advance of starting point
|
||||
(*PRUNE) advance to next starting character
|
||||
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
|
||||
(*SKIP) advance to current matching position
|
||||
(*SKIP:NAME) advance to position corresponding to an earlier
|
||||
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
||||
(*THEN) local failure, backtrack to next alternation
|
||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
||||
|
||||
The effect of one of these verbs in a group called as a subroutine is
|
||||
confined to the subroutine call.
|
||||
|
||||
|
||||
CALLOUTS
|
||||
|
@ -10254,14 +10269,14 @@ CALLOUTS
|
|||
(?C"text") callout with string data
|
||||
|
||||
The allowed string delimiters are ` ' " ^ % # $ (which are the same for
|
||||
the start and the end), and the starting delimiter { matched with the
|
||||
ending delimiter }. To encode the ending delimiter within the string,
|
||||
the start and the end), and the starting delimiter { matched with the
|
||||
ending delimiter }. To encode the ending delimiter within the string,
|
||||
double it.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
|
||||
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
|
||||
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
|
||||
pcre2(3).
|
||||
|
||||
|
||||
|
@ -10274,7 +10289,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 07 July 2018
|
||||
Last updated: 21 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "16 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2PATTERN 3 "20 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -3154,17 +3154,16 @@ in the
|
|||
.\"
|
||||
documentation.
|
||||
.P
|
||||
Experiments with Perl suggest that it too has similar optimizations, sometimes
|
||||
leading to anomalous results.
|
||||
Experiments with Perl suggest that it too has similar optimizations, and like
|
||||
PCRE2, turning them off can change the result of a match.
|
||||
.
|
||||
.
|
||||
.SS "Verbs that act immediately"
|
||||
.rs
|
||||
.sp
|
||||
The following verbs act as soon as they are encountered. They may not be
|
||||
followed by a name.
|
||||
The following verbs act as soon as they are encountered.
|
||||
.sp
|
||||
(*ACCEPT)
|
||||
(*ACCEPT) or (*ACCEPT:NAME)
|
||||
.sp
|
||||
This verb causes the match to end successfully, skipping the remainder of the
|
||||
pattern. However, when it is inside a subpattern that is called as a
|
||||
|
@ -3180,18 +3179,21 @@ example:
|
|||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
||||
the outer parentheses.
|
||||
.sp
|
||||
(*FAIL) or (*F)
|
||||
(*FAIL) or (*FAIL:NAME)
|
||||
.sp
|
||||
This verb causes a matching failure, forcing backtracking to occur. It is
|
||||
equivalent to (?!) but easier to read. The Perl documentation notes that it is
|
||||
probably useful only when combined with (?{}) or (??{}). Those are, of course,
|
||||
Perl features that are not present in PCRE2. The nearest equivalent is the
|
||||
callout feature, as for example in this pattern:
|
||||
This verb causes a matching failure, forcing backtracking to occur. It may be
|
||||
abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
|
||||
documentation notes that it is probably useful only when combined with (?{}) or
|
||||
(??{}). Those are, of course, Perl features that are not present in PCRE2. The
|
||||
nearest equivalent is the callout feature, as for example in this pattern:
|
||||
.sp
|
||||
a+(?C)(*FAIL)
|
||||
.sp
|
||||
A match with the string "aaaa" always fails, but the callout is taken before
|
||||
each backtrack happens (in this example, 10 times).
|
||||
.P
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||
.
|
||||
.
|
||||
.SS "Recording which path was taken"
|
||||
|
@ -3220,9 +3222,9 @@ documentation. This applies to all instances of (*MARK), including those inside
|
|||
assertions and atomic groups. (There are differences in those cases when
|
||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
||||
.P
|
||||
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
|
||||
arguments. Whichever is last on the matching path is passed back. See below for
|
||||
more details of these other verbs.
|
||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||
associated NAME arguments. Whichever is last on the matching path is passed
|
||||
back. See below for more details of these other verbs.
|
||||
.P
|
||||
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
|
||||
requests the retrieval and outputting of (*MARK) data:
|
||||
|
@ -3282,22 +3284,24 @@ reaches them. The behaviour described below is what happens when the verb is
|
|||
not in a subroutine or an assertion. Subsequent sections cover these special
|
||||
cases.
|
||||
.sp
|
||||
(*COMMIT)
|
||||
(*COMMIT) or (*COMMIT:NAME)
|
||||
.sp
|
||||
This verb, which may not be followed by a name, causes the whole match to fail
|
||||
outright if there is a later matching failure that causes backtracking to reach
|
||||
it. Even if the pattern is unanchored, no further attempts to find a match by
|
||||
advancing the starting point take place. If (*COMMIT) is the only backtracking
|
||||
verb that is encountered, once it has been passed \fBpcre2_match()\fP is
|
||||
committed to finding a match at the current starting point, or not at all. For
|
||||
example:
|
||||
This verb causes the whole match to fail outright if there is a later matching
|
||||
failure that causes backtracking to reach it. Even if the pattern is
|
||||
unanchored, no further attempts to find a match by advancing the starting point
|
||||
take place. If (*COMMIT) is the only backtracking verb that is encountered,
|
||||
once it has been passed \fBpcre2_match()\fP is committed to finding a match at
|
||||
the current starting point, or not at all. For example:
|
||||
.sp
|
||||
a+(*COMMIT)b
|
||||
.sp
|
||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
|
||||
dynamic anchor, or "I've started, so I must finish." The name of the most
|
||||
recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
|
||||
match failure.
|
||||
dynamic anchor, or "I've started, so I must finish."
|
||||
.P
|
||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
.P
|
||||
If there is more than one backtracking verb in a pattern, a different one that
|
||||
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
|
||||
|
@ -3338,7 +3342,7 @@ as (*COMMIT).
|
|||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*PRUNE) or (*THEN).
|
||||
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||
.sp
|
||||
(*SKIP)
|
||||
.sp
|
||||
|
@ -3346,7 +3350,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
|
|||
pattern is unanchored, the "bumpalong" advance is not to the next character,
|
||||
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
|
||||
signifies that whatever text was matched leading up to it cannot be part of a
|
||||
successful match. Consider:
|
||||
successful match if there is a later mismatch. Consider:
|
||||
.sp
|
||||
a+(*SKIP)b
|
||||
.sp
|
||||
|
@ -3391,7 +3395,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
|
|||
the second branch of the pattern.
|
||||
.P
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
||||
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
||||
.sp
|
||||
(*THEN) or (*THEN:NAME)
|
||||
.sp
|
||||
|
@ -3409,10 +3413,10 @@ succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
|
|||
more alternatives, so there is a backtrack to whatever came before the entire
|
||||
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
||||
.P
|
||||
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
|
||||
It is like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*PRUNE) and (*THEN).
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
.P
|
||||
A subpattern that does not contain a | character is just a part of the
|
||||
enclosing alternative; it is not a nested alternation with only one
|
||||
|
@ -3485,13 +3489,14 @@ onto (*COMMIT).
|
|||
.SS "Backtracking verbs in repeated groups"
|
||||
.rs
|
||||
.sp
|
||||
PCRE2 differs from Perl in its handling of backtracking verbs in repeated
|
||||
groups. For example, consider:
|
||||
PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
|
||||
repeated groups. For example, consider:
|
||||
.sp
|
||||
/(a(*COMMIT)b)+ac/
|
||||
.sp
|
||||
If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
|
||||
in the second repeat of the group acts.
|
||||
If the subject is "abac", Perl matches unless its optimizations are disabled,
|
||||
but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
|
||||
acts.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="btassert"></a>
|
||||
|
@ -3504,9 +3509,10 @@ not the assertion is standalone or acting as the condition in a conditional
|
|||
subpattern.
|
||||
.P
|
||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||
without any further processing; captured strings are retained. In a standalone
|
||||
negative assertion, (*ACCEPT) causes the assertion to fail without any further
|
||||
processing; captured substrings are discarded.
|
||||
without any further processing; captured strings and a (*MARK) name (if set)
|
||||
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
||||
assertion to fail without any further processing; captured substrings and any
|
||||
(*MARK) name are discarded.
|
||||
.P
|
||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||
a positive assertion and false for a negative one; captured substrings are
|
||||
|
@ -3536,14 +3542,14 @@ the assertion to be true, without considering any further alternative branches.
|
|||
.rs
|
||||
.sp
|
||||
These behaviours occur whether or not the subpattern is called recursively.
|
||||
Perl's treatment of subroutines is different in some cases.
|
||||
.P
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
||||
an immediate backtrack.
|
||||
.P
|
||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
|
||||
succeed without any further processing. Matching then continues after the
|
||||
subroutine call.
|
||||
subroutine call. Perl documents this behaviour. Perl's treatment of the other
|
||||
verbs in subroutines is different in some cases.
|
||||
.P
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
||||
an immediate backtrack.
|
||||
.P
|
||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
|
||||
the subroutine match to fail.
|
||||
|
@ -3574,6 +3580,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 16 July 2018
|
||||
Last updated: 20 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2SYNTAX 3 "21 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -410,8 +410,6 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
(?>...) atomic, non-capturing group
|
||||
.
|
||||
.
|
||||
.
|
||||
.
|
||||
.SH "COMMENT"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -552,7 +550,11 @@ condition if the relevant named group exists.
|
|||
.SH "BACKTRACKING CONTROL"
|
||||
.rs
|
||||
.sp
|
||||
The following act immediately they are reached:
|
||||
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
||||
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
||||
if :NAME is present. The others just set a name for passing back to the caller,
|
||||
but this is not a name that (*SKIP) can see. The following act immediately they
|
||||
are reached:
|
||||
.sp
|
||||
(*ACCEPT) force successful match
|
||||
(*FAIL) force backtrack; synonym (*F)
|
||||
|
@ -565,12 +567,13 @@ pattern is not anchored.
|
|||
.sp
|
||||
(*COMMIT) overall failure, no advance of starting point
|
||||
(*PRUNE) advance to next starting character
|
||||
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
|
||||
(*SKIP) advance to current matching position
|
||||
(*SKIP:NAME) advance to position corresponding to an earlier
|
||||
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
||||
(*THEN) local failure, backtrack to next alternation
|
||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
||||
.sp
|
||||
The effect of one of these verbs in a group called as a subroutine is confined
|
||||
to the subroutine call.
|
||||
.
|
||||
.
|
||||
.SH "CALLOUTS"
|
||||
|
@ -606,6 +609,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 07 July 2018
|
||||
Last updated: 21 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "16 July 2018" "PCRE 10.32"
|
||||
.TH PCRE2TEST 1 "21 July 2018" "PCRE 10.32"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -360,10 +360,11 @@ patterns. Modifiers on a pattern can change these settings.
|
|||
The appearance of this line causes all subsequent modifier settings to be
|
||||
checked for compatibility with the \fBperltest.sh\fP script, which is used to
|
||||
confirm that Perl gives the same results as PCRE2. Also, apart from comment
|
||||
lines, none of the other command lines are permitted, because they and many
|
||||
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
||||
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
||||
command helps detect tests that are accidentally put in the wrong file.
|
||||
lines, #pattern commands, and #subject commands that set or unset "mark", no
|
||||
command lines are permitted, because they and many of the modifiers are
|
||||
specific to \fBpcre2test\fP, and should not be used in test files that are also
|
||||
processed by \fBperltest.sh\fP. The \fB#perltest\fP command helps detect tests
|
||||
that are accidentally put in the wrong file.
|
||||
.sp
|
||||
#pop [<modifiers>]
|
||||
#popcopy [<modifiers>]
|
||||
|
@ -1981,6 +1982,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 16 July 2018
|
||||
Last updated: 21 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -344,11 +344,11 @@ COMMAND LINES
|
|||
The appearance of this line causes all subsequent modifier settings to
|
||||
be checked for compatibility with the perltest.sh script, which is used
|
||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||
comment lines, none of the other command lines are permitted, because
|
||||
they and many of the modifiers are specific to pcre2test, and should
|
||||
not be used in test files that are also processed by perltest.sh. The
|
||||
#perltest command helps detect tests that are accidentally put in the
|
||||
wrong file.
|
||||
comment lines, #pattern commands, and #subject commands that set or
|
||||
unset "mark", no command lines are permitted, because they and many of
|
||||
the modifiers are specific to pcre2test, and should not be used in test
|
||||
files that are also processed by perltest.sh. The #perltest command
|
||||
helps detect tests that are accidentally put in the wrong file.
|
||||
|
||||
#pop [<modifiers>]
|
||||
#popcopy [<modifiers>]
|
||||
|
@ -1818,5 +1818,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 16 July 2018
|
||||
Last updated: 21 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
|
|
52
perltest.sh
52
perltest.sh
|
@ -45,17 +45,19 @@ fi
|
|||
# jitstack ignored
|
||||
# mark show mark information
|
||||
# no_auto_possess ignored
|
||||
# no_start_optimize insert ({""}) at pattern start (disable Perl optimizing)
|
||||
# no_start_optimize insert (??{""}) at pattern start (disables optimizing)
|
||||
# subject_literal does not process subjects for escapes
|
||||
# ucp sets Perl's /u modifier
|
||||
# utf invoke UTF-8 functionality
|
||||
#
|
||||
# Comment lines are ignored. The #pattern command can be used to set modifiers
|
||||
# that will be added to each subsequent pattern. NOTE: this is different to
|
||||
# pcre2test where #pattern sets defaults, some of which can be overridden on
|
||||
# individual patterns. The #perltest, #forbid_utf, and #newline_default
|
||||
# commands, which are needed in the relevant pcre2test files, are ignored. Any
|
||||
# other #-command is ignored, with a warning message.
|
||||
# that will be added to each subsequent pattern, after any modifiers it may
|
||||
# already have. NOTE: this is different to pcre2test where #pattern sets
|
||||
# defaults which can be overridden on individual patterns. The #subject command
|
||||
# may be used to set or unset a default "mark" modifier for data lines. This is
|
||||
# the only use of #subject that is supported. The #perltest, #forbid_utf, and
|
||||
# #newline_default commands, which are needed in the relevant pcre2test files,
|
||||
# are ignored. Any other #-command is ignored, with a warning message.
|
||||
#
|
||||
# The data lines must not have any pcre2test modifiers. Unless
|
||||
# "subject_literal" is on the pattern, data lines are processed as
|
||||
|
@ -135,23 +137,39 @@ for (;;)
|
|||
last if ! ($_ = <$infile>);
|
||||
printf $outfile "$_" if ! $interact;
|
||||
next if ($_ =~ /^\s*$/ || $_ =~ /^#[\s!]/);
|
||||
|
||||
|
||||
# A few of pcre2test's #-commands are supported, or just ignored. Any others
|
||||
# cause an error.
|
||||
|
||||
# cause an error.
|
||||
|
||||
if ($_ =~ /^#pattern(.*)/)
|
||||
{
|
||||
$extra_modifiers = $1;
|
||||
chomp($extra_modifiers);
|
||||
chomp($extra_modifiers);
|
||||
$extra_modifiers =~ s/\s+$//;
|
||||
next;
|
||||
}
|
||||
}
|
||||
elsif ($_ =~ /^#subject(.*)/)
|
||||
{
|
||||
$mod = $1;
|
||||
chomp($mod);
|
||||
$mod =~ s/\s+$//;
|
||||
if ($mod =~ s/(-?)mark,?//)
|
||||
{
|
||||
$minus = $1;
|
||||
$default_show_mark = ($minus =~ /^$/);
|
||||
}
|
||||
if ($mod !~ /^\s*$/)
|
||||
{
|
||||
printf $outfile "** Warning: \"$mod\" in #subject ignored\n";
|
||||
}
|
||||
next;
|
||||
}
|
||||
elsif ($_ =~ /^#/)
|
||||
{
|
||||
if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)
|
||||
if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)
|
||||
{
|
||||
printf $outfile "** Warning: #-command ignored: %s", $_;
|
||||
}
|
||||
}
|
||||
next;
|
||||
}
|
||||
|
||||
|
@ -172,9 +190,9 @@ for (;;)
|
|||
|
||||
$pattern =~ /^\s*((.).*\2)(.*)$/s;
|
||||
$pat = $1;
|
||||
$del = $2;
|
||||
$mod = "$3,$extra_modifiers";
|
||||
$mod =~ s/^,\s*//;
|
||||
$del = $2;
|
||||
$mod =~ s/^,\s*//;
|
||||
|
||||
# The private "aftertext" modifier means "print $' afterwards".
|
||||
|
||||
|
@ -202,7 +220,7 @@ for (;;)
|
|||
|
||||
# The "mark" modifier requests checking of MARK data */
|
||||
|
||||
$show_mark = ($mod =~ s/mark,?//);
|
||||
$show_mark = $default_show_mark | ($mod =~ s/mark,?//);
|
||||
|
||||
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
|
||||
|
||||
|
@ -214,7 +232,7 @@ for (;;)
|
|||
|
||||
# Use no_start_optimize (disable PCRE2 start-up optimization) to disable Perl
|
||||
# optimization by inserting (??{""}) at the start of the pattern.
|
||||
|
||||
|
||||
if ($mod =~ s/no_start_optimize,?//) { $pat =~ s/$del/$del(??{""})/; }
|
||||
|
||||
# Add back retained modifiers and check that the pattern is valid.
|
||||
|
|
|
@ -281,6 +281,7 @@ pcre2_pattern_convert(). */
|
|||
#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156
|
||||
#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157
|
||||
#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158
|
||||
/* Error 159 is obsolete and should now never occur */
|
||||
#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159
|
||||
#define PCRE2_ERROR_VERB_UNKNOWN 160
|
||||
#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -250,34 +250,35 @@ is present where expected in a conditional group. */
|
|||
#define META_LOOKBEHINDNOT 0x80250000u /* (?<! */
|
||||
|
||||
/* These must be kept in this order, with consecutive values, and the _ARG
|
||||
versions of PRUNE, SKIP, and THEN immediately after their non-argument
|
||||
versions of COMMIT, PRUNE, SKIP, and THEN immediately after their non-argument
|
||||
versions. */
|
||||
|
||||
#define META_MARK 0x80260000u /* (*MARK) */
|
||||
#define META_ACCEPT 0x80270000u /* (*ACCEPT) */
|
||||
#define META_COMMIT 0x80280000u /* (*COMMIT) */
|
||||
#define META_FAIL 0x80290000u /* (*FAIL) */
|
||||
#define META_PRUNE 0x802a0000u /* These pairs must */
|
||||
#define META_PRUNE_ARG 0x802b0000u /* be */
|
||||
#define META_SKIP 0x802c0000u /* kept */
|
||||
#define META_SKIP_ARG 0x802d0000u /* in */
|
||||
#define META_THEN 0x802e0000u /* this */
|
||||
#define META_THEN_ARG 0x802f0000u /* order */
|
||||
#define META_FAIL 0x80280000u /* (*FAIL) */
|
||||
#define META_COMMIT 0x80290000u /* These */
|
||||
#define META_COMMIT_ARG 0x802a0000u /* pairs */
|
||||
#define META_PRUNE 0x802b0000u /* must */
|
||||
#define META_PRUNE_ARG 0x802c0000u /* be */
|
||||
#define META_SKIP 0x802d0000u /* kept */
|
||||
#define META_SKIP_ARG 0x802e0000u /* in */
|
||||
#define META_THEN 0x802f0000u /* this */
|
||||
#define META_THEN_ARG 0x80300000u /* order */
|
||||
|
||||
/* These must be kept in groups of adjacent 3 values, and all together. */
|
||||
|
||||
#define META_ASTERISK 0x80300000u /* * */
|
||||
#define META_ASTERISK_PLUS 0x80310000u /* *+ */
|
||||
#define META_ASTERISK_QUERY 0x80320000u /* *? */
|
||||
#define META_PLUS 0x80330000u /* + */
|
||||
#define META_PLUS_PLUS 0x80340000u /* ++ */
|
||||
#define META_PLUS_QUERY 0x80350000u /* +? */
|
||||
#define META_QUERY 0x80360000u /* ? */
|
||||
#define META_QUERY_PLUS 0x80370000u /* ?+ */
|
||||
#define META_QUERY_QUERY 0x80380000u /* ?? */
|
||||
#define META_MINMAX 0x80390000u /* {n,m} repeat */
|
||||
#define META_MINMAX_PLUS 0x803a0000u /* {n,m}+ repeat */
|
||||
#define META_MINMAX_QUERY 0x803b0000u /* {n,m}? repeat */
|
||||
#define META_ASTERISK 0x80310000u /* * */
|
||||
#define META_ASTERISK_PLUS 0x80320000u /* *+ */
|
||||
#define META_ASTERISK_QUERY 0x80330000u /* *? */
|
||||
#define META_PLUS 0x80340000u /* + */
|
||||
#define META_PLUS_PLUS 0x80350000u /* ++ */
|
||||
#define META_PLUS_QUERY 0x80360000u /* +? */
|
||||
#define META_QUERY 0x80370000u /* ? */
|
||||
#define META_QUERY_PLUS 0x80380000u /* ?+ */
|
||||
#define META_QUERY_QUERY 0x80390000u /* ?? */
|
||||
#define META_MINMAX 0x803a0000u /* {n,m} repeat */
|
||||
#define META_MINMAX_PLUS 0x803b0000u /* {n,m}+ repeat */
|
||||
#define META_MINMAX_QUERY 0x803c0000u /* {n,m}? repeat */
|
||||
|
||||
#define META_FIRST_QUANTIFIER META_ASTERISK
|
||||
#define META_LAST_QUANTIFIER META_MINMAX_QUERY
|
||||
|
@ -327,8 +328,9 @@ static unsigned char meta_extra_lengths[] = {
|
|||
SIZEOFFSET, /* META_LOOKBEHINDNOT */
|
||||
1, /* META_MARK - plus the string length */
|
||||
0, /* META_ACCEPT */
|
||||
0, /* META_COMMIT */
|
||||
0, /* META_FAIL */
|
||||
0, /* META_COMMIT */
|
||||
1, /* META_COMMIT_ARG - plus the string length */
|
||||
0, /* META_PRUNE */
|
||||
1, /* META_PRUNE_ARG - plus the string length */
|
||||
0, /* META_SKIP */
|
||||
|
@ -586,9 +588,9 @@ static const char verbnames[] =
|
|||
"\0" /* Empty name is a shorthand for MARK */
|
||||
STRING_MARK0
|
||||
STRING_ACCEPT0
|
||||
STRING_COMMIT0
|
||||
STRING_F0
|
||||
STRING_FAIL0
|
||||
STRING_COMMIT0
|
||||
STRING_PRUNE0
|
||||
STRING_SKIP0
|
||||
STRING_THEN;
|
||||
|
@ -596,11 +598,11 @@ static const char verbnames[] =
|
|||
static const verbitem verbs[] = {
|
||||
{ 0, META_MARK, +1 }, /* > 0 => must have an argument */
|
||||
{ 4, META_MARK, +1 },
|
||||
{ 6, META_ACCEPT, -1 }, /* < 0 => must not have an argument */
|
||||
{ 6, META_COMMIT, -1 },
|
||||
{ 6, META_ACCEPT, -1 }, /* < 0 => Optional argument, convert to pre-MARK */
|
||||
{ 1, META_FAIL, -1 },
|
||||
{ 4, META_FAIL, -1 },
|
||||
{ 5, META_PRUNE, 0 }, /* Argument is optional; bump META code if found */
|
||||
{ 6, META_COMMIT, 0 },
|
||||
{ 5, META_PRUNE, 0 }, /* Optional argument; bump META code if found */
|
||||
{ 4, META_SKIP, 0 },
|
||||
{ 4, META_THEN, 0 }
|
||||
};
|
||||
|
@ -610,8 +612,8 @@ static const int verbcount = sizeof(verbs)/sizeof(verbitem);
|
|||
/* Verb opcodes, indexed by their META code offset from META_MARK. */
|
||||
|
||||
static const uint32_t verbops[] = {
|
||||
OP_MARK, OP_ACCEPT, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_PRUNE_ARG, OP_SKIP,
|
||||
OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
||||
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
||||
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
||||
|
||||
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
||||
|
||||
|
@ -976,8 +978,8 @@ for (;;)
|
|||
case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break;
|
||||
|
||||
case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break;
|
||||
case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
|
||||
case META_FAIL: fprintf(stderr, "META (*FAIL)"); break;
|
||||
case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
|
||||
case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break;
|
||||
case META_SKIP: fprintf(stderr, "META (*SKIP)"); break;
|
||||
case META_THEN: fprintf(stderr, "META (*THEN)"); break;
|
||||
|
@ -1067,6 +1069,10 @@ for (;;)
|
|||
fprintf(stderr, "META (*MARK:");
|
||||
goto SHOWARG;
|
||||
|
||||
case META_COMMIT_ARG:
|
||||
fprintf(stderr, "META (*COMMIT:");
|
||||
goto SHOWARG;
|
||||
|
||||
case META_PRUNE_ARG:
|
||||
fprintf(stderr, "META (*PRUNE:");
|
||||
goto SHOWARG;
|
||||
|
@ -2290,6 +2296,7 @@ uint32_t *previous_callout = NULL;
|
|||
uint32_t *parsed_pattern = cb->parsed_pattern;
|
||||
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
||||
uint32_t meta_quantifier = 0;
|
||||
uint32_t add_after_mark = 0;
|
||||
uint16_t nest_depth = 0;
|
||||
int after_manual_callout = 0;
|
||||
int expect_cond_assert = 0;
|
||||
|
@ -2461,6 +2468,16 @@ while (ptr < ptrend)
|
|||
goto FAILED;
|
||||
}
|
||||
*verblengthptr = (uint32_t)verbnamelength;
|
||||
|
||||
/* If this name was on a verb such as (*ACCEPT) which does not continue,
|
||||
a (*MARK) was generated for the name. We now add the original verb as the
|
||||
next item. */
|
||||
|
||||
if (add_after_mark != 0)
|
||||
{
|
||||
*parsed_pattern++ = add_after_mark;
|
||||
add_after_mark = 0;
|
||||
}
|
||||
break;
|
||||
|
||||
case CHAR_BACKSLASH:
|
||||
|
@ -3454,13 +3471,25 @@ while (ptr < ptrend)
|
|||
|
||||
if (*ptr++ == CHAR_COLON) /* Skip past : or ) */
|
||||
{
|
||||
if (verbs[i].has_arg < 0) /* Argument is forbidden */
|
||||
/* Some optional arguments can be treated as a preceding (*MARK) */
|
||||
|
||||
if (verbs[i].has_arg < 0)
|
||||
{
|
||||
errorcode = ERR59;
|
||||
goto FAILED;
|
||||
add_after_mark = verbs[i].meta;
|
||||
*parsed_pattern++ = META_MARK;
|
||||
}
|
||||
*parsed_pattern++ = verbs[i].meta +
|
||||
((verbs[i].meta != META_MARK)? 0x00010000u:0);
|
||||
|
||||
/* The remaining verbs with arguments (except *MARK) need a different
|
||||
opcode. */
|
||||
|
||||
else
|
||||
{
|
||||
*parsed_pattern++ = verbs[i].meta +
|
||||
((verbs[i].meta != META_MARK)? 0x00010000u:0);
|
||||
}
|
||||
|
||||
/* Set up for reading the name in the main loop. */
|
||||
|
||||
verblengthptr = parsed_pattern++;
|
||||
verbnamestart = ptr;
|
||||
inverbname = TRUE;
|
||||
|
@ -5654,6 +5683,7 @@ for (;; pptr++)
|
|||
cb->had_pruneorskip = TRUE;
|
||||
/* Fall through */
|
||||
case META_MARK:
|
||||
case META_COMMIT_ARG:
|
||||
VERB_ARG:
|
||||
*code++ = verbops[(meta - META_MARK) >> 16];
|
||||
/* The length is in characters. */
|
||||
|
@ -8002,6 +8032,7 @@ for (;;)
|
|||
break;
|
||||
|
||||
case OP_MARK:
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_SKIP_ARG:
|
||||
case OP_THEN_ARG:
|
||||
|
@ -8310,6 +8341,7 @@ for (;; pptr++)
|
|||
break;
|
||||
|
||||
case META_MARK: /* Add the length of the name. */
|
||||
case META_COMMIT_ARG:
|
||||
case META_PRUNE_ARG:
|
||||
case META_SKIP_ARG:
|
||||
case META_THEN_ARG:
|
||||
|
@ -8500,6 +8532,7 @@ for (;; pptr++)
|
|||
goto EXIT;
|
||||
|
||||
case META_MARK:
|
||||
case META_COMMIT_ARG:
|
||||
case META_PRUNE_ARG:
|
||||
case META_SKIP_ARG:
|
||||
case META_THEN_ARG:
|
||||
|
@ -8967,6 +9000,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
|
|||
break;
|
||||
|
||||
case META_MARK:
|
||||
case META_COMMIT_ARG:
|
||||
case META_PRUNE_ARG:
|
||||
case META_SKIP_ARG:
|
||||
case META_THEN_ARG:
|
||||
|
|
|
@ -181,7 +181,8 @@ static const uint8_t coptable[] = {
|
|||
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
|
||||
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
|
||||
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
|
||||
0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */
|
||||
0, 0, /* COMMIT, COMMIT_ARG */
|
||||
0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */
|
||||
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
|
||||
};
|
||||
|
||||
|
@ -254,7 +255,8 @@ static const uint8_t poptable[] = {
|
|||
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
|
||||
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
|
||||
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
|
||||
0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */
|
||||
0, 0, /* COMMIT, COMMIT_ARG */
|
||||
0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */
|
||||
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
|
||||
};
|
||||
|
||||
|
|
|
@ -133,7 +133,8 @@ static const unsigned char compile_error_texts[] =
|
|||
"internal error: unknown newline setting\0"
|
||||
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
|
||||
"(?R (recursive pattern call) must be followed by a closing parenthesis\0"
|
||||
"an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0"
|
||||
/* "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" */
|
||||
"obsolete error (should not occur)\0" /* Was the above */
|
||||
/* 60 */
|
||||
"(*VERB) not recognized or malformed\0"
|
||||
"group number is too big\0"
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -253,7 +253,7 @@ maximum size of this can be limited. */
|
|||
|
||||
#define START_FRAMES_SIZE 20480
|
||||
|
||||
/* Similarly, for DFA matching, an initial internal workspace vector is
|
||||
/* Similarly, for DFA matching, an initial internal workspace vector is
|
||||
allocated on the stack. */
|
||||
|
||||
#define DFA_START_RWS_SIZE 30720
|
||||
|
@ -1583,23 +1583,26 @@ enum {
|
|||
OP_THEN, /* 155 */
|
||||
OP_THEN_ARG, /* 156 same, but with argument */
|
||||
OP_COMMIT, /* 157 */
|
||||
OP_COMMIT_ARG, /* 158 same, but with argument */
|
||||
|
||||
/* These are forced failure and success verbs */
|
||||
/* These are forced failure and success verbs. FAIL and ACCEPT do accept an
|
||||
argument, but these cases can be compiled as, for example, (*MARK:X)(*FAIL)
|
||||
without the need for a special opcode. */
|
||||
|
||||
OP_FAIL, /* 158 */
|
||||
OP_ACCEPT, /* 159 */
|
||||
OP_ASSERT_ACCEPT, /* 160 Used inside assertions */
|
||||
OP_CLOSE, /* 161 Used before OP_ACCEPT to close open captures */
|
||||
OP_FAIL, /* 159 */
|
||||
OP_ACCEPT, /* 160 */
|
||||
OP_ASSERT_ACCEPT, /* 161 Used inside assertions */
|
||||
OP_CLOSE, /* 162 Used before OP_ACCEPT to close open captures */
|
||||
|
||||
/* This is used to skip a subpattern with a {0} quantifier */
|
||||
|
||||
OP_SKIPZERO, /* 162 */
|
||||
OP_SKIPZERO, /* 163 */
|
||||
|
||||
/* This is used to identify a DEFINE group during compilation so that it can
|
||||
be checked for having only one branch. It is changed to OP_FALSE before
|
||||
compilation finishes. */
|
||||
|
||||
OP_DEFINE, /* 163 */
|
||||
OP_DEFINE, /* 164 */
|
||||
|
||||
/* This is not an opcode, but is used to check that tables indexed by opcode
|
||||
are the correct length, in order to catch updating errors - there have been
|
||||
|
@ -1655,7 +1658,7 @@ some cases doesn't actually use these names at all). */
|
|||
"Cond false", "Cond true", \
|
||||
"Brazero", "Braminzero", "Braposzero", \
|
||||
"*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \
|
||||
"*THEN", "*THEN", "*COMMIT", "*FAIL", \
|
||||
"*THEN", "*THEN", "*COMMIT", "*COMMIT", "*FAIL", \
|
||||
"*ACCEPT", "*ASSERT_ACCEPT", \
|
||||
"Close", "Skip zero", "Define"
|
||||
|
||||
|
@ -1747,7 +1750,8 @@ in UTF-8 mode. The code that uses this table must know about such things. */
|
|||
3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \
|
||||
1, 3, /* SKIP, SKIP_ARG */ \
|
||||
1, 3, /* THEN, THEN_ARG */ \
|
||||
1, 1, 1, 1, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ \
|
||||
1, 3, /* COMMIT, COMMIT_ARG */ \
|
||||
1, 1, 1, /* FAIL, ACCEPT, ASSERT_ACCEPT */ \
|
||||
1+IMM2_SIZE, 1, /* CLOSE, SKIPZERO */ \
|
||||
1 /* DEFINE */
|
||||
|
||||
|
|
|
@ -839,6 +839,7 @@ switch(*cc)
|
|||
#endif
|
||||
|
||||
case OP_MARK:
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_SKIP_ARG:
|
||||
case OP_THEN_ARG:
|
||||
|
@ -939,6 +940,7 @@ while (cc < ccend)
|
|||
common->control_head_ptr = 1;
|
||||
/* Fall through. */
|
||||
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_MARK:
|
||||
if (common->mark_ptr == 0)
|
||||
|
@ -1553,6 +1555,7 @@ while (cc < ccend)
|
|||
break;
|
||||
|
||||
case OP_MARK:
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_THEN_ARG:
|
||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||
|
@ -1733,6 +1736,7 @@ while (cc < ccend)
|
|||
break;
|
||||
|
||||
case OP_MARK:
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_THEN_ARG:
|
||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||
|
@ -2041,6 +2045,7 @@ while (cc < ccend)
|
|||
break;
|
||||
|
||||
case OP_MARK:
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_THEN_ARG:
|
||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||
|
@ -2428,6 +2433,7 @@ while (cc < ccend)
|
|||
break;
|
||||
|
||||
case OP_MARK:
|
||||
case OP_COMMIT_ARG:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_THEN_ARG:
|
||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||
|
@ -10350,7 +10356,8 @@ backtrack_common *backtrack;
|
|||
PCRE2_UCHAR opcode = *cc;
|
||||
PCRE2_SPTR ccend = cc + 1;
|
||||
|
||||
if (opcode == OP_PRUNE_ARG || opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
|
||||
if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG ||
|
||||
opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
|
||||
ccend += 2 + cc[1];
|
||||
|
||||
PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL);
|
||||
|
@ -10362,7 +10369,7 @@ if (opcode == OP_SKIP)
|
|||
return ccend;
|
||||
}
|
||||
|
||||
if (opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
|
||||
if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
|
||||
{
|
||||
OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0);
|
||||
OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2));
|
||||
|
@ -10681,6 +10688,7 @@ while (cc < ccend)
|
|||
case OP_THEN:
|
||||
case OP_THEN_ARG:
|
||||
case OP_COMMIT:
|
||||
case OP_COMMIT_ARG:
|
||||
cc = compile_control_verb_matchingpath(common, cc, parent);
|
||||
break;
|
||||
|
||||
|
@ -11755,6 +11763,7 @@ while (current)
|
|||
break;
|
||||
|
||||
case OP_COMMIT:
|
||||
case OP_COMMIT_ARG:
|
||||
if (!common->local_quit_available)
|
||||
OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH);
|
||||
if (common->quit_label == NULL)
|
||||
|
|
|
@ -149,7 +149,7 @@ changed, the code at RETURN_SWITCH below must be updated in sync. */
|
|||
enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10,
|
||||
RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20,
|
||||
RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30,
|
||||
RM31, RM32, RM33, RM34, RM35 };
|
||||
RM31, RM32, RM33, RM34, RM35, RM36 };
|
||||
|
||||
#ifdef SUPPORT_WIDE_CHARS
|
||||
enum { RM100=100, RM101 };
|
||||
|
@ -770,7 +770,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
/* ===================================================================== */
|
||||
/* Real or forced end of the pattern, assertion, or recursion. In an
|
||||
assertion ACCEPT, update the last used pointer and remember the current
|
||||
frame so that the captures can be fished out of it. */
|
||||
frame so that the captures and mark can be fished out of it. */
|
||||
|
||||
case OP_ASSERT_ACCEPT:
|
||||
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
|
||||
|
@ -5119,7 +5119,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
/* Positive assertions are like other groups except that PCRE doesn't allow
|
||||
the effect of (*THEN) to escape beyond an assertion; it is therefore
|
||||
treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its
|
||||
captures retained. Any other return is an error. */
|
||||
captures and mark retained. Any other return is an error. */
|
||||
|
||||
#define Lframe_type F->temp_32[0]
|
||||
|
||||
|
@ -5136,6 +5136,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
(char *)assert_accept_frame + offsetof(heapframe, ovector),
|
||||
assert_accept_frame->offset_top * sizeof(PCRE2_SIZE));
|
||||
Foffset_top = assert_accept_frame->offset_top;
|
||||
Fmark = assert_accept_frame->mark;
|
||||
break;
|
||||
}
|
||||
if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc);
|
||||
|
@ -5837,6 +5838,13 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
mb->verb_current_recurse = Fcurrent_recurse;
|
||||
RRETURN(MATCH_COMMIT);
|
||||
|
||||
case OP_COMMIT_ARG:
|
||||
Fmark = mb->nomatch_mark = Fecode + 2;
|
||||
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM36);
|
||||
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
||||
mb->verb_current_recurse = Fcurrent_recurse;
|
||||
RRETURN(MATCH_COMMIT);
|
||||
|
||||
case OP_PRUNE:
|
||||
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14);
|
||||
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
||||
|
@ -5942,7 +5950,7 @@ switch (Freturn_id)
|
|||
LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16)
|
||||
LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24)
|
||||
LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32)
|
||||
LBL(33) LBL(34) LBL(35)
|
||||
LBL(33) LBL(34) LBL(35) LBL(36)
|
||||
|
||||
#ifdef SUPPORT_WIDE_CHARS
|
||||
LBL(100) LBL(101)
|
||||
|
|
|
@ -4678,12 +4678,6 @@ uint16_t first_listed_newline;
|
|||
const char *cmdname;
|
||||
uint8_t *argptr, *serial;
|
||||
|
||||
if (restrict_for_perl_test)
|
||||
{
|
||||
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
yield = PR_OK;
|
||||
cmd = CMD_UNKNOWN;
|
||||
cmdlen = 0;
|
||||
|
@ -4702,6 +4696,12 @@ for (i = 0; i < cmdlistcount; i++)
|
|||
|
||||
argptr = buffer + cmdlen + 1;
|
||||
|
||||
if (restrict_for_perl_test && cmd != CMD_PATTERN && cmd != CMD_SUBJECT)
|
||||
{
|
||||
fprintf(outfile, "** #%s is not allowed after #perltest\n", cmdname);
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
switch(cmd)
|
||||
{
|
||||
case CMD_UNKNOWN:
|
||||
|
|
|
@ -6203,10 +6203,47 @@ ef) x/x,mark
|
|||
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
|
||||
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize
|
||||
#pattern no_start_optimize
|
||||
|
||||
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
|
||||
abc
|
||||
|
||||
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize
|
||||
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/
|
||||
abc
|
||||
|
||||
#subject mark
|
||||
|
||||
/a(*ACCEPT:X)b/
|
||||
abc
|
||||
|
||||
/(?=a(*ACCEPT:QQ)bc)axyz/
|
||||
axyz
|
||||
|
||||
/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
|
||||
abc
|
||||
|
||||
/a(*F:X)b/
|
||||
abc
|
||||
|
||||
/(?(DEFINE)(a(*F:X)))(?1)b/
|
||||
abc
|
||||
|
||||
/a(*COMMIT:X)b/
|
||||
abc
|
||||
|
||||
/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
|
||||
abc
|
||||
|
||||
/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
|
||||
aaaabd
|
||||
|
||||
/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
|
||||
aaaabd
|
||||
|
||||
/a(*COMMIT:X)b/
|
||||
axabc
|
||||
|
||||
#pattern -no_start_optimize
|
||||
#subject -mark
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -2949,10 +2949,9 @@
|
|||
|
||||
/abc(*:)pqr/
|
||||
|
||||
/abc(*FAIL:123)xyz/
|
||||
|
||||
# This should, and does, fail. In Perl, it does not, which I think is a
|
||||
# bug because replacing the B in the pattern by (B|D) does make it fail.
|
||||
# Turning off Perl's optimization by inserting (??{""}) also makes it fail.
|
||||
|
||||
/A(*COMMIT)B/aftertext,mark
|
||||
\= Expect no match
|
||||
|
|
|
@ -9846,12 +9846,64 @@ No match
|
|||
0: b
|
||||
1: b
|
||||
|
||||
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize
|
||||
#pattern no_start_optimize
|
||||
|
||||
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
|
||||
abc
|
||||
0: abc
|
||||
|
||||
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize
|
||||
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/
|
||||
abc
|
||||
0: abc
|
||||
|
||||
#subject mark
|
||||
|
||||
/a(*ACCEPT:X)b/
|
||||
abc
|
||||
0: a
|
||||
MK: X
|
||||
|
||||
/(?=a(*ACCEPT:QQ)bc)axyz/
|
||||
axyz
|
||||
0: axyz
|
||||
MK: QQ
|
||||
|
||||
/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
|
||||
abc
|
||||
0: ab
|
||||
MK: X
|
||||
|
||||
/a(*F:X)b/
|
||||
abc
|
||||
No match, mark = X
|
||||
|
||||
/(?(DEFINE)(a(*F:X)))(?1)b/
|
||||
abc
|
||||
No match, mark = X
|
||||
|
||||
/a(*COMMIT:X)b/
|
||||
abc
|
||||
0: ab
|
||||
MK: X
|
||||
|
||||
/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
|
||||
abc
|
||||
0: ab
|
||||
MK: X
|
||||
|
||||
/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
|
||||
aaaabd
|
||||
0: bd
|
||||
|
||||
/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
|
||||
aaaabd
|
||||
No match, mark = X
|
||||
|
||||
/a(*COMMIT:X)b/
|
||||
axabc
|
||||
No match, mark = X
|
||||
|
||||
#pattern -no_start_optimize
|
||||
#subject -mark
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -10154,11 +10154,9 @@ Failed: error 166 at offset 10: (*MARK) must have an argument
|
|||
/abc(*:)pqr/
|
||||
Failed: error 166 at offset 6: (*MARK) must have an argument
|
||||
|
||||
/abc(*FAIL:123)xyz/
|
||||
Failed: error 159 at offset 10: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
|
||||
|
||||
# This should, and does, fail. In Perl, it does not, which I think is a
|
||||
# bug because replacing the B in the pattern by (B|D) does make it fail.
|
||||
# Turning off Perl's optimization by inserting (??{""}) also makes it fail.
|
||||
|
||||
/A(*COMMIT)B/aftertext,mark
|
||||
\= Expect no match
|
||||
|
|
Loading…
Reference in New Issue