Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
followed by (*ACCEPT) in an assertion. More small updates to perltest.sh.
This commit is contained in:
parent
635d04fbb7
commit
192b82cf6e
13
ChangeLog
13
ChangeLog
|
@ -119,9 +119,16 @@ backtrack into the first of the atomic groups. A complicated example is
|
||||||
shouldn't find a MARK (because is in an atomic group), but it did.
|
shouldn't find a MARK (because is in an atomic group), but it did.
|
||||||
|
|
||||||
26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
|
26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
|
||||||
certain modifiers that the script recognizes; (2) Unsupported #command lines
|
a list of modifiers for all subsequent patterns - only those that the script
|
||||||
give a warning when they are ignored; (3) Mark data is output only if the
|
recognizes are meaningful; (2) #subject lines can be used to set or unset a
|
||||||
"mark" modifier is present.
|
default "mark" modifier; (3) Unsupported #command lines give a warning when
|
||||||
|
they are ignored; (4) Mark data is output only if the "mark" modifier is
|
||||||
|
present.
|
||||||
|
|
||||||
|
27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
|
||||||
|
|
||||||
|
28. A (*MARK) name was not being passed back for positive assertions that were
|
||||||
|
terminated by (*ACCEPT).
|
||||||
|
|
||||||
|
|
||||||
Version 10.31 12-February-2018
|
Version 10.31 12-February-2018
|
||||||
|
|
27
HACKING
27
HACKING
|
@ -256,6 +256,7 @@ The following are followed by a length element, then a number of character code
|
||||||
values (which should match with the length):
|
values (which should match with the length):
|
||||||
|
|
||||||
META_MARK (*MARK:xxxx)
|
META_MARK (*MARK:xxxx)
|
||||||
|
META_COMMIT_ARG )*COMMIT:xxxx)
|
||||||
META_PRUNE_ARG (*PRUNE:xxx)
|
META_PRUNE_ARG (*PRUNE:xxx)
|
||||||
META_SKIP_ARG (*SKIP:xxxx)
|
META_SKIP_ARG (*SKIP:xxxx)
|
||||||
META_THEN_ARG (*THEN:xxxx)
|
META_THEN_ARG (*THEN:xxxx)
|
||||||
|
@ -382,7 +383,7 @@ that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
|
||||||
Opcodes with no following data
|
Opcodes with no following data
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
||||||
These items are all just one unit long
|
These items are all just one unit long:
|
||||||
|
|
||||||
OP_END end of pattern
|
OP_END end of pattern
|
||||||
OP_ANY match any one character other than newline
|
OP_ANY match any one character other than newline
|
||||||
|
@ -430,14 +431,22 @@ character). Another use is for [^] when empty classes are permitted
|
||||||
(PCRE2_ALLOW_EMPTY_CLASS is set).
|
(PCRE2_ALLOW_EMPTY_CLASS is set).
|
||||||
|
|
||||||
|
|
||||||
Backtracking control verbs with optional data
|
Backtracking control verbs
|
||||||
---------------------------------------------
|
--------------------------
|
||||||
|
|
||||||
(*THEN) without an argument generates the opcode OP_THEN and no following data.
|
Verbs with no arguments generate opcodes with no following data (as listed
|
||||||
OP_MARK is followed by the mark name, preceded by a length in one code unit,
|
in the section above).
|
||||||
and followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with
|
|
||||||
arguments, the opcodes OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used,
|
(*MARK:NAME) generates OP_MARK followed by the mark name, preceded by a
|
||||||
with the name following in the same format as OP_MARK.
|
length in one code unit, and followed by a binary zero. The name length is
|
||||||
|
limited by the size of the code unit.
|
||||||
|
|
||||||
|
(*ACCEPT:NAME) and (*FAIL:NAME) are compiled as (*MARK:NAME)(*ACCEPT) and
|
||||||
|
(*MARK:NAME)(*FAIL) respectively.
|
||||||
|
|
||||||
|
For (*COMMIT:NAME), (*PRUNE:NAME), (*SKIP:NAME), and (*THEN:NAME), the opcodes
|
||||||
|
OP_COMMIT_ARG, OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, with the
|
||||||
|
name following in the same format as for OP_MARK.
|
||||||
|
|
||||||
|
|
||||||
Matching literal characters
|
Matching literal characters
|
||||||
|
@ -814,4 +823,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
|
||||||
opcode are the correct length, in order to catch updating errors.
|
opcode are the correct length, in order to catch updating errors.
|
||||||
|
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
21 April 2017
|
20 July 2018
|
||||||
|
|
|
@ -3122,17 +3122,16 @@ in the
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Experiments with Perl suggest that it too has similar optimizations, sometimes
|
Experiments with Perl suggest that it too has similar optimizations, and like
|
||||||
leading to anomalous results.
|
PCRE2, turning them off can change the result of a match.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Verbs that act immediately
|
Verbs that act immediately
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The following verbs act as soon as they are encountered. They may not be
|
The following verbs act as soon as they are encountered.
|
||||||
followed by a name.
|
|
||||||
<pre>
|
<pre>
|
||||||
(*ACCEPT)
|
(*ACCEPT) or (*ACCEPT:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
This verb causes the match to end successfully, skipping the remainder of the
|
This verb causes the match to end successfully, skipping the remainder of the
|
||||||
pattern. However, when it is inside a subpattern that is called as a
|
pattern. However, when it is inside a subpattern that is called as a
|
||||||
|
@ -3149,19 +3148,23 @@ example:
|
||||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
||||||
the outer parentheses.
|
the outer parentheses.
|
||||||
<pre>
|
<pre>
|
||||||
(*FAIL) or (*F)
|
(*FAIL) or (*FAIL:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
This verb causes a matching failure, forcing backtracking to occur. It is
|
This verb causes a matching failure, forcing backtracking to occur. It may be
|
||||||
equivalent to (?!) but easier to read. The Perl documentation notes that it is
|
abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
|
||||||
probably useful only when combined with (?{}) or (??{}). Those are, of course,
|
documentation notes that it is probably useful only when combined with (?{}) or
|
||||||
Perl features that are not present in PCRE2. The nearest equivalent is the
|
(??{}). Those are, of course, Perl features that are not present in PCRE2. The
|
||||||
callout feature, as for example in this pattern:
|
nearest equivalent is the callout feature, as for example in this pattern:
|
||||||
<pre>
|
<pre>
|
||||||
a+(?C)(*FAIL)
|
a+(?C)(*FAIL)
|
||||||
</pre>
|
</pre>
|
||||||
A match with the string "aaaa" always fails, but the callout is taken before
|
A match with the string "aaaa" always fails, but the callout is taken before
|
||||||
each backtrack happens (in this example, 10 times).
|
each backtrack happens (in this example, 10 times).
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||||
|
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||||
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Recording which path was taken
|
Recording which path was taken
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -3186,9 +3189,9 @@ assertions and atomic groups. (There are differences in those cases when
|
||||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
|
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||||
arguments. Whichever is last on the matching path is passed back. See below for
|
associated NAME arguments. Whichever is last on the matching path is passed
|
||||||
more details of these other verbs.
|
back. See below for more details of these other verbs.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
|
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
|
||||||
|
@ -3250,22 +3253,25 @@ reaches them. The behaviour described below is what happens when the verb is
|
||||||
not in a subroutine or an assertion. Subsequent sections cover these special
|
not in a subroutine or an assertion. Subsequent sections cover these special
|
||||||
cases.
|
cases.
|
||||||
<pre>
|
<pre>
|
||||||
(*COMMIT)
|
(*COMMIT) or (*COMMIT:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
This verb, which may not be followed by a name, causes the whole match to fail
|
This verb causes the whole match to fail outright if there is a later matching
|
||||||
outright if there is a later matching failure that causes backtracking to reach
|
failure that causes backtracking to reach it. Even if the pattern is
|
||||||
it. Even if the pattern is unanchored, no further attempts to find a match by
|
unanchored, no further attempts to find a match by advancing the starting point
|
||||||
advancing the starting point take place. If (*COMMIT) is the only backtracking
|
take place. If (*COMMIT) is the only backtracking verb that is encountered,
|
||||||
verb that is encountered, once it has been passed <b>pcre2_match()</b> is
|
once it has been passed <b>pcre2_match()</b> is committed to finding a match at
|
||||||
committed to finding a match at the current starting point, or not at all. For
|
the current starting point, or not at all. For example:
|
||||||
example:
|
|
||||||
<pre>
|
<pre>
|
||||||
a+(*COMMIT)b
|
a+(*COMMIT)b
|
||||||
</pre>
|
</pre>
|
||||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
|
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
|
||||||
dynamic anchor, or "I've started, so I must finish." The name of the most
|
dynamic anchor, or "I've started, so I must finish."
|
||||||
recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
|
</P>
|
||||||
match failure.
|
<P>
|
||||||
|
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||||
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
|
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If there is more than one backtracking verb in a pattern, a different one that
|
If there is more than one backtracking verb in a pattern, a different one that
|
||||||
|
@ -3309,7 +3315,7 @@ as (*COMMIT).
|
||||||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*PRUNE) or (*THEN).
|
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||||
<pre>
|
<pre>
|
||||||
(*SKIP)
|
(*SKIP)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3317,7 +3323,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
|
||||||
pattern is unanchored, the "bumpalong" advance is not to the next character,
|
pattern is unanchored, the "bumpalong" advance is not to the next character,
|
||||||
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
|
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
|
||||||
signifies that whatever text was matched leading up to it cannot be part of a
|
signifies that whatever text was matched leading up to it cannot be part of a
|
||||||
successful match. Consider:
|
successful match if there is a later mismatch. Consider:
|
||||||
<pre>
|
<pre>
|
||||||
a+(*SKIP)b
|
a+(*SKIP)b
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3364,7 +3370,7 @@ the second branch of the pattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||||
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
||||||
<pre>
|
<pre>
|
||||||
(*THEN) or (*THEN:NAME)
|
(*THEN) or (*THEN:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3383,10 +3389,10 @@ more alternatives, so there is a backtrack to whatever came before the entire
|
||||||
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
|
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||||
It is like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*PRUNE) and (*THEN).
|
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A subpattern that does not contain a | character is just a part of the
|
A subpattern that does not contain a | character is just a part of the
|
||||||
|
@ -3461,13 +3467,14 @@ onto (*COMMIT).
|
||||||
Backtracking verbs in repeated groups
|
Backtracking verbs in repeated groups
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 differs from Perl in its handling of backtracking verbs in repeated
|
PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
|
||||||
groups. For example, consider:
|
repeated groups. For example, consider:
|
||||||
<pre>
|
<pre>
|
||||||
/(a(*COMMIT)b)+ac/
|
/(a(*COMMIT)b)+ac/
|
||||||
</pre>
|
</pre>
|
||||||
If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
|
If the subject is "abac", Perl matches unless its optimizations are disabled,
|
||||||
in the second repeat of the group acts.
|
but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
|
||||||
|
acts.
|
||||||
<a name="btassert"></a></P>
|
<a name="btassert"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Backtracking verbs in assertions
|
Backtracking verbs in assertions
|
||||||
|
@ -3480,9 +3487,10 @@ subpattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||||
without any further processing; captured strings are retained. In a standalone
|
without any further processing; captured strings and a (*MARK) name (if set)
|
||||||
negative assertion, (*ACCEPT) causes the assertion to fail without any further
|
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
||||||
processing; captured substrings are discarded.
|
assertion to fail without any further processing; captured substrings and any
|
||||||
|
(*MARK) name are discarded.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||||
|
@ -3515,16 +3523,16 @@ Backtracking verbs in subroutines
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
These behaviours occur whether or not the subpattern is called recursively.
|
These behaviours occur whether or not the subpattern is called recursively.
|
||||||
Perl's treatment of subroutines is different in some cases.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
|
||||||
an immediate backtrack.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
|
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
|
||||||
succeed without any further processing. Matching then continues after the
|
succeed without any further processing. Matching then continues after the
|
||||||
subroutine call.
|
subroutine call. Perl documents this behaviour. Perl's treatment of the other
|
||||||
|
verbs in subroutines is different in some cases.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
||||||
|
an immediate backtrack.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
|
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
|
||||||
|
@ -3551,7 +3559,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 16 July 2018
|
Last updated: 20 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -569,7 +569,11 @@ condition if the relevant named group exists.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||||
<P>
|
<P>
|
||||||
The following act immediately they are reached:
|
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
||||||
|
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
||||||
|
if :NAME is present. The others just set a name for passing back to the caller,
|
||||||
|
but this is not a name that (*SKIP) can see. The following act immediately they
|
||||||
|
are reached:
|
||||||
<pre>
|
<pre>
|
||||||
(*ACCEPT) force successful match
|
(*ACCEPT) force successful match
|
||||||
(*FAIL) force backtrack; synonym (*F)
|
(*FAIL) force backtrack; synonym (*F)
|
||||||
|
@ -582,13 +586,13 @@ pattern is not anchored.
|
||||||
<pre>
|
<pre>
|
||||||
(*COMMIT) overall failure, no advance of starting point
|
(*COMMIT) overall failure, no advance of starting point
|
||||||
(*PRUNE) advance to next starting character
|
(*PRUNE) advance to next starting character
|
||||||
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
|
|
||||||
(*SKIP) advance to current matching position
|
(*SKIP) advance to current matching position
|
||||||
(*SKIP:NAME) advance to position corresponding to an earlier
|
(*SKIP:NAME) advance to position corresponding to an earlier
|
||||||
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
||||||
(*THEN) local failure, backtrack to next alternation
|
(*THEN) local failure, backtrack to next alternation
|
||||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
</pre>
|
||||||
</PRE>
|
The effect of one of these verbs in a group called as a subroutine is confined
|
||||||
|
to the subroutine call.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
|
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -617,7 +621,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 07 July 2018
|
Last updated: 21 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -410,10 +410,11 @@ patterns. Modifiers on a pattern can change these settings.
|
||||||
The appearance of this line causes all subsequent modifier settings to be
|
The appearance of this line causes all subsequent modifier settings to be
|
||||||
checked for compatibility with the <b>perltest.sh</b> script, which is used to
|
checked for compatibility with the <b>perltest.sh</b> script, which is used to
|
||||||
confirm that Perl gives the same results as PCRE2. Also, apart from comment
|
confirm that Perl gives the same results as PCRE2. Also, apart from comment
|
||||||
lines, none of the other command lines are permitted, because they and many
|
lines, #pattern commands, and #subject commands that set or unset "mark", no
|
||||||
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
command lines are permitted, because they and many of the modifiers are
|
||||||
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
specific to <b>pcre2test</b>, and should not be used in test files that are also
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
processed by <b>perltest.sh</b>. The <b>#perltest</b> command helps detect tests
|
||||||
|
that are accidentally put in the wrong file.
|
||||||
<pre>
|
<pre>
|
||||||
#pop [<modifiers>]
|
#pop [<modifiers>]
|
||||||
#popcopy [<modifiers>]
|
#popcopy [<modifiers>]
|
||||||
|
@ -2003,7 +2004,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 16 July 2018
|
Last updated: 21 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
247
doc/pcre2.txt
247
doc/pcre2.txt
|
@ -8601,44 +8601,46 @@ BACKTRACKING CONTROL
|
||||||
in the pcre2api documentation.
|
in the pcre2api documentation.
|
||||||
|
|
||||||
Experiments with Perl suggest that it too has similar optimizations,
|
Experiments with Perl suggest that it too has similar optimizations,
|
||||||
sometimes leading to anomalous results.
|
and like PCRE2, turning them off can change the result of a match.
|
||||||
|
|
||||||
Verbs that act immediately
|
Verbs that act immediately
|
||||||
|
|
||||||
The following verbs act as soon as they are encountered. They may not
|
The following verbs act as soon as they are encountered.
|
||||||
be followed by a name.
|
|
||||||
|
|
||||||
(*ACCEPT)
|
(*ACCEPT) or (*ACCEPT:NAME)
|
||||||
|
|
||||||
This verb causes the match to end successfully, skipping the remainder
|
This verb causes the match to end successfully, skipping the remainder
|
||||||
of the pattern. However, when it is inside a subpattern that is called
|
of the pattern. However, when it is inside a subpattern that is called
|
||||||
as a subroutine, only that subpattern is ended successfully. Matching
|
as a subroutine, only that subpattern is ended successfully. Matching
|
||||||
then continues at the outer level. If (*ACCEPT) in triggered in a posi-
|
then continues at the outer level. If (*ACCEPT) in triggered in a posi-
|
||||||
tive assertion, the assertion succeeds; in a negative assertion, the
|
tive assertion, the assertion succeeds; in a negative assertion, the
|
||||||
assertion fails.
|
assertion fails.
|
||||||
|
|
||||||
If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
|
If (*ACCEPT) is inside capturing parentheses, the data so far is cap-
|
||||||
tured. For example:
|
tured. For example:
|
||||||
|
|
||||||
A((?:A|B(*ACCEPT)|C)D)
|
A((?:A|B(*ACCEPT)|C)D)
|
||||||
|
|
||||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
|
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
|
||||||
tured by the outer parentheses.
|
tured by the outer parentheses.
|
||||||
|
|
||||||
(*FAIL) or (*F)
|
(*FAIL) or (*FAIL:NAME)
|
||||||
|
|
||||||
This verb causes a matching failure, forcing backtracking to occur. It
|
This verb causes a matching failure, forcing backtracking to occur. It
|
||||||
is equivalent to (?!) but easier to read. The Perl documentation notes
|
may be abbreviated to (*F). It is equivalent to (?!) but easier to
|
||||||
that it is probably useful only when combined with (?{}) or (??{}).
|
read. The Perl documentation notes that it is probably useful only when
|
||||||
Those are, of course, Perl features that are not present in PCRE2. The
|
combined with (?{}) or (??{}). Those are, of course, Perl features that
|
||||||
nearest equivalent is the callout feature, as for example in this pat-
|
are not present in PCRE2. The nearest equivalent is the callout fea-
|
||||||
tern:
|
ture, as for example in this pattern:
|
||||||
|
|
||||||
a+(?C)(*FAIL)
|
a+(?C)(*FAIL)
|
||||||
|
|
||||||
A match with the string "aaaa" always fails, but the callout is taken
|
A match with the string "aaaa" always fails, but the callout is taken
|
||||||
before each backtrack happens (in this example, 10 times).
|
before each backtrack happens (in this example, 10 times).
|
||||||
|
|
||||||
|
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||||
|
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||||
|
|
||||||
Recording which path was taken
|
Recording which path was taken
|
||||||
|
|
||||||
There is one verb whose main purpose is to track how a match was
|
There is one verb whose main purpose is to track how a match was
|
||||||
|
@ -8659,9 +8661,9 @@ BACKTRACKING CONTROL
|
||||||
cases when (*MARK) is used in conjunction with (*SKIP) as described
|
cases when (*MARK) is used in conjunction with (*SKIP) as described
|
||||||
below.)
|
below.)
|
||||||
|
|
||||||
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated
|
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||||
NAME arguments. Whichever is last on the matching path is passed back.
|
associated NAME arguments. Whichever is last on the matching path is
|
||||||
See below for more details of these other verbs.
|
passed back. See below for more details of these other verbs.
|
||||||
|
|
||||||
Here is an example of pcre2test output, where the "mark" modifier
|
Here is an example of pcre2test output, where the "mark" modifier
|
||||||
requests the retrieval and outputting of (*MARK) data:
|
requests the retrieval and outputting of (*MARK) data:
|
||||||
|
@ -8717,22 +8719,26 @@ BACKTRACKING CONTROL
|
||||||
when the verb is not in a subroutine or an assertion. Subsequent sec-
|
when the verb is not in a subroutine or an assertion. Subsequent sec-
|
||||||
tions cover these special cases.
|
tions cover these special cases.
|
||||||
|
|
||||||
(*COMMIT)
|
(*COMMIT) or (*COMMIT:NAME)
|
||||||
|
|
||||||
This verb, which may not be followed by a name, causes the whole match
|
This verb causes the whole match to fail outright if there is a later
|
||||||
to fail outright if there is a later matching failure that causes back-
|
matching failure that causes backtracking to reach it. Even if the pat-
|
||||||
tracking to reach it. Even if the pattern is unanchored, no further
|
tern is unanchored, no further attempts to find a match by advancing
|
||||||
attempts to find a match by advancing the starting point take place. If
|
the starting point take place. If (*COMMIT) is the only backtracking
|
||||||
(*COMMIT) is the only backtracking verb that is encountered, once it
|
verb that is encountered, once it has been passed pcre2_match() is com-
|
||||||
has been passed pcre2_match() is committed to finding a match at the
|
mitted to finding a match at the current starting point, or not at all.
|
||||||
current starting point, or not at all. For example:
|
For example:
|
||||||
|
|
||||||
a+(*COMMIT)b
|
a+(*COMMIT)b
|
||||||
|
|
||||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind
|
This matches "xxaab" but not "aacaab". It can be thought of as a kind
|
||||||
of dynamic anchor, or "I've started, so I must finish." The name of the
|
of dynamic anchor, or "I've started, so I must finish."
|
||||||
most recently passed (*MARK) in the path is passed back when (*COMMIT)
|
|
||||||
forces a match failure.
|
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
|
||||||
|
MIT). It is like (*MARK:NAME) in that the name is remembered for pass-
|
||||||
|
ing back to the caller. However, (*SKIP:NAME) searches only for names
|
||||||
|
set with (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and
|
||||||
|
(*THEN).
|
||||||
|
|
||||||
If there is more than one backtracking verb in a pattern, a different
|
If there is more than one backtracking verb in a pattern, a different
|
||||||
one that follows (*COMMIT) may be triggered first, so merely passing
|
one that follows (*COMMIT) may be triggered first, so merely passing
|
||||||
|
@ -8776,7 +8782,7 @@ BACKTRACKING CONTROL
|
||||||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
|
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
|
||||||
It is like (*MARK:NAME) in that the name is remembered for passing back
|
It is like (*MARK:NAME) in that the name is remembered for passing back
|
||||||
to the caller. However, (*SKIP:NAME) searches only for names set with
|
to the caller. However, (*SKIP:NAME) searches only for names set with
|
||||||
(*MARK), ignoring those set by (*PRUNE) or (*THEN).
|
(*MARK), ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||||
|
|
||||||
(*SKIP)
|
(*SKIP)
|
||||||
|
|
||||||
|
@ -8784,29 +8790,30 @@ BACKTRACKING CONTROL
|
||||||
the pattern is unanchored, the "bumpalong" advance is not to the next
|
the pattern is unanchored, the "bumpalong" advance is not to the next
|
||||||
character, but to the position in the subject where (*SKIP) was encoun-
|
character, but to the position in the subject where (*SKIP) was encoun-
|
||||||
tered. (*SKIP) signifies that whatever text was matched leading up to
|
tered. (*SKIP) signifies that whatever text was matched leading up to
|
||||||
it cannot be part of a successful match. Consider:
|
it cannot be part of a successful match if there is a later mismatch.
|
||||||
|
Consider:
|
||||||
|
|
||||||
a+(*SKIP)b
|
a+(*SKIP)b
|
||||||
|
|
||||||
If the subject is "aaaac...", after the first match attempt fails
|
If the subject is "aaaac...", after the first match attempt fails
|
||||||
(starting at the first character in the string), the starting point
|
(starting at the first character in the string), the starting point
|
||||||
skips on to start the next attempt at "c". Note that a possessive quan-
|
skips on to start the next attempt at "c". Note that a possessive quan-
|
||||||
tifer does not have the same effect as this example; although it would
|
tifer does not have the same effect as this example; although it would
|
||||||
suppress backtracking during the first match attempt, the second
|
suppress backtracking during the first match attempt, the second
|
||||||
attempt would start at the second character instead of skipping on to
|
attempt would start at the second character instead of skipping on to
|
||||||
"c".
|
"c".
|
||||||
|
|
||||||
(*SKIP:NAME)
|
(*SKIP:NAME)
|
||||||
|
|
||||||
When (*SKIP) has an associated name, its behaviour is modified. When
|
When (*SKIP) has an associated name, its behaviour is modified. When
|
||||||
such a (*SKIP) is triggered, the previous path through the pattern is
|
such a (*SKIP) is triggered, the previous path through the pattern is
|
||||||
searched for the most recent (*MARK) that has the same name. If one is
|
searched for the most recent (*MARK) that has the same name. If one is
|
||||||
found, the "bumpalong" advance is to the subject position that corre-
|
found, the "bumpalong" advance is to the subject position that corre-
|
||||||
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
||||||
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
|
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
|
||||||
|
|
||||||
The search for a (*MARK) name uses the normal backtracking mechanism,
|
The search for a (*MARK) name uses the normal backtracking mechanism,
|
||||||
which means that it does not see (*MARK) settings that are inside
|
which means that it does not see (*MARK) settings that are inside
|
||||||
atomic groups or assertions, because they are never re-entered by back-
|
atomic groups or assertions, because they are never re-entered by back-
|
||||||
tracking. Compare the following pcre2test examples:
|
tracking. Compare the following pcre2test examples:
|
||||||
|
|
||||||
|
@ -8820,18 +8827,19 @@ BACKTRACKING CONTROL
|
||||||
0: b
|
0: b
|
||||||
1: b
|
1: b
|
||||||
|
|
||||||
In the first example, the (*MARK) setting is in an atomic group, so it
|
In the first example, the (*MARK) setting is in an atomic group, so it
|
||||||
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
|
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
|
||||||
This allows the second branch of the pattern to be tried at the first
|
This allows the second branch of the pattern to be tried at the first
|
||||||
character position. In the second example, the (*MARK) setting is not
|
character position. In the second example, the (*MARK) setting is not
|
||||||
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
|
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
|
||||||
backtracks, and this causes a new matching attempt to start at the sec-
|
backtracks, and this causes a new matching attempt to start at the sec-
|
||||||
ond character. This time, the (*MARK) is never seen because "a" does
|
ond character. This time, the (*MARK) is never seen because "a" does
|
||||||
not match "b", so the matcher immediately jumps to the second branch of
|
not match "b", so the matcher immediately jumps to the second branch of
|
||||||
the pattern.
|
the pattern.
|
||||||
|
|
||||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
||||||
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
ignores names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or
|
||||||
|
(*THEN:NAME).
|
||||||
|
|
||||||
(*THEN) or (*THEN:NAME)
|
(*THEN) or (*THEN:NAME)
|
||||||
|
|
||||||
|
@ -8850,87 +8858,87 @@ BACKTRACKING CONTROL
|
||||||
track to whatever came before the entire group. If (*THEN) is not
|
track to whatever came before the entire group. If (*THEN) is not
|
||||||
inside an alternation, it acts like (*PRUNE).
|
inside an alternation, it acts like (*PRUNE).
|
||||||
|
|
||||||
The behaviour of (*THEN:NAME) is the not the same as
|
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
|
||||||
(*MARK:NAME)(*THEN). It is like (*MARK:NAME) in that the name is
|
It is like (*MARK:NAME) in that the name is remembered for passing back
|
||||||
remembered for passing back to the caller. However, (*SKIP:NAME)
|
to the caller. However, (*SKIP:NAME) searches only for names set with
|
||||||
searches only for names set with (*MARK), ignoring those set by
|
(*MARK), ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||||
(*PRUNE) and (*THEN).
|
|
||||||
|
|
||||||
A subpattern that does not contain a | character is just a part of the
|
A subpattern that does not contain a | character is just a part of the
|
||||||
enclosing alternative; it is not a nested alternation with only one
|
enclosing alternative; it is not a nested alternation with only one
|
||||||
alternative. The effect of (*THEN) extends beyond such a subpattern to
|
alternative. The effect of (*THEN) extends beyond such a subpattern to
|
||||||
the enclosing alternative. Consider this pattern, where A, B, etc. are
|
the enclosing alternative. Consider this pattern, where A, B, etc. are
|
||||||
complex pattern fragments that do not contain any | characters at this
|
complex pattern fragments that do not contain any | characters at this
|
||||||
level:
|
level:
|
||||||
|
|
||||||
A (B(*THEN)C) | D
|
A (B(*THEN)C) | D
|
||||||
|
|
||||||
If A and B are matched, but there is a failure in C, matching does not
|
If A and B are matched, but there is a failure in C, matching does not
|
||||||
backtrack into A; instead it moves to the next alternative, that is, D.
|
backtrack into A; instead it moves to the next alternative, that is, D.
|
||||||
However, if the subpattern containing (*THEN) is given an alternative,
|
However, if the subpattern containing (*THEN) is given an alternative,
|
||||||
it behaves differently:
|
it behaves differently:
|
||||||
|
|
||||||
A (B(*THEN)C | (*FAIL)) | D
|
A (B(*THEN)C | (*FAIL)) | D
|
||||||
|
|
||||||
The effect of (*THEN) is now confined to the inner subpattern. After a
|
The effect of (*THEN) is now confined to the inner subpattern. After a
|
||||||
failure in C, matching moves to (*FAIL), which causes the whole subpat-
|
failure in C, matching moves to (*FAIL), which causes the whole subpat-
|
||||||
tern to fail because there are no more alternatives to try. In this
|
tern to fail because there are no more alternatives to try. In this
|
||||||
case, matching does now backtrack into A.
|
case, matching does now backtrack into A.
|
||||||
|
|
||||||
Note that a conditional subpattern is not considered as having two
|
Note that a conditional subpattern is not considered as having two
|
||||||
alternatives, because only one is ever used. In other words, the |
|
alternatives, because only one is ever used. In other words, the |
|
||||||
character in a conditional subpattern has a different meaning. Ignoring
|
character in a conditional subpattern has a different meaning. Ignoring
|
||||||
white space, consider:
|
white space, consider:
|
||||||
|
|
||||||
^.*? (?(?=a) a | b(*THEN)c )
|
^.*? (?(?=a) a | b(*THEN)c )
|
||||||
|
|
||||||
If the subject is "ba", this pattern does not match. Because .*? is
|
If the subject is "ba", this pattern does not match. Because .*? is
|
||||||
ungreedy, it initially matches zero characters. The condition (?=a)
|
ungreedy, it initially matches zero characters. The condition (?=a)
|
||||||
then fails, the character "b" is matched, but "c" is not. At this
|
then fails, the character "b" is matched, but "c" is not. At this
|
||||||
point, matching does not backtrack to .*? as might perhaps be expected
|
point, matching does not backtrack to .*? as might perhaps be expected
|
||||||
from the presence of the | character. The conditional subpattern is
|
from the presence of the | character. The conditional subpattern is
|
||||||
part of the single alternative that comprises the whole pattern, and so
|
part of the single alternative that comprises the whole pattern, and so
|
||||||
the match fails. (If there was a backtrack into .*?, allowing it to
|
the match fails. (If there was a backtrack into .*?, allowing it to
|
||||||
match "b", the match would succeed.)
|
match "b", the match would succeed.)
|
||||||
|
|
||||||
The verbs just described provide four different "strengths" of control
|
The verbs just described provide four different "strengths" of control
|
||||||
when subsequent matching fails. (*THEN) is the weakest, carrying on the
|
when subsequent matching fails. (*THEN) is the weakest, carrying on the
|
||||||
match at the next alternative. (*PRUNE) comes next, failing the match
|
match at the next alternative. (*PRUNE) comes next, failing the match
|
||||||
at the current starting position, but allowing an advance to the next
|
at the current starting position, but allowing an advance to the next
|
||||||
character (for an unanchored pattern). (*SKIP) is similar, except that
|
character (for an unanchored pattern). (*SKIP) is similar, except that
|
||||||
the advance may be more than one character. (*COMMIT) is the strongest,
|
the advance may be more than one character. (*COMMIT) is the strongest,
|
||||||
causing the entire match to fail.
|
causing the entire match to fail.
|
||||||
|
|
||||||
More than one backtracking verb
|
More than one backtracking verb
|
||||||
|
|
||||||
If more than one backtracking verb is present in a pattern, the one
|
If more than one backtracking verb is present in a pattern, the one
|
||||||
that is backtracked onto first acts. For example, consider this pat-
|
that is backtracked onto first acts. For example, consider this pat-
|
||||||
tern, where A, B, etc. are complex pattern fragments:
|
tern, where A, B, etc. are complex pattern fragments:
|
||||||
|
|
||||||
(A(*COMMIT)B(*THEN)C|ABD)
|
(A(*COMMIT)B(*THEN)C|ABD)
|
||||||
|
|
||||||
If A matches but B fails, the backtrack to (*COMMIT) causes the entire
|
If A matches but B fails, the backtrack to (*COMMIT) causes the entire
|
||||||
match to fail. However, if A and B match, but C fails, the backtrack to
|
match to fail. However, if A and B match, but C fails, the backtrack to
|
||||||
(*THEN) causes the next alternative (ABD) to be tried. This behaviour
|
(*THEN) causes the next alternative (ABD) to be tried. This behaviour
|
||||||
is consistent, but is not always the same as Perl's. It means that if
|
is consistent, but is not always the same as Perl's. It means that if
|
||||||
two or more backtracking verbs appear in succession, all the the last
|
two or more backtracking verbs appear in succession, all the the last
|
||||||
of them has no effect. Consider this example:
|
of them has no effect. Consider this example:
|
||||||
|
|
||||||
...(*COMMIT)(*PRUNE)...
|
...(*COMMIT)(*PRUNE)...
|
||||||
|
|
||||||
If there is a matching failure to the right, backtracking onto (*PRUNE)
|
If there is a matching failure to the right, backtracking onto (*PRUNE)
|
||||||
causes it to be triggered, and its action is taken. There can never be
|
causes it to be triggered, and its action is taken. There can never be
|
||||||
a backtrack onto (*COMMIT).
|
a backtrack onto (*COMMIT).
|
||||||
|
|
||||||
Backtracking verbs in repeated groups
|
Backtracking verbs in repeated groups
|
||||||
|
|
||||||
PCRE2 differs from Perl in its handling of backtracking verbs in
|
PCRE2 sometimes differs from Perl in its handling of backtracking verbs
|
||||||
repeated groups. For example, consider:
|
in repeated groups. For example, consider:
|
||||||
|
|
||||||
/(a(*COMMIT)b)+ac/
|
/(a(*COMMIT)b)+ac/
|
||||||
|
|
||||||
If the subject is "abac", Perl matches, but PCRE2 fails because the
|
If the subject is "abac", Perl matches unless its optimizations are
|
||||||
(*COMMIT) in the second repeat of the group acts.
|
disabled, but PCRE2 always fails because the (*COMMIT) in the second
|
||||||
|
repeat of the group acts.
|
||||||
|
|
||||||
Backtracking verbs in assertions
|
Backtracking verbs in assertions
|
||||||
|
|
||||||
|
@ -8940,44 +8948,46 @@ BACKTRACKING CONTROL
|
||||||
in a conditional subpattern.
|
in a conditional subpattern.
|
||||||
|
|
||||||
(*ACCEPT) in a standalone positive assertion causes the assertion to
|
(*ACCEPT) in a standalone positive assertion causes the assertion to
|
||||||
succeed without any further processing; captured strings are retained.
|
succeed without any further processing; captured strings and a (*MARK)
|
||||||
In a standalone negative assertion, (*ACCEPT) causes the assertion to
|
name (if set) are retained. In a standalone negative assertion,
|
||||||
fail without any further processing; captured substrings are discarded.
|
(*ACCEPT) causes the assertion to fail without any further processing;
|
||||||
|
captured substrings and any (*MARK) name are discarded.
|
||||||
|
|
||||||
If the assertion is a condition, (*ACCEPT) causes the condition to be
|
If the assertion is a condition, (*ACCEPT) causes the condition to be
|
||||||
true for a positive assertion and false for a negative one; captured
|
true for a positive assertion and false for a negative one; captured
|
||||||
substrings are retained in both cases.
|
substrings are retained in both cases.
|
||||||
|
|
||||||
The remaining verbs act only when a later failure causes a backtrack to
|
The remaining verbs act only when a later failure causes a backtrack to
|
||||||
reach them. This means that their effect is confined to the assertion,
|
reach them. This means that their effect is confined to the assertion,
|
||||||
because lookaround assertions are atomic. A backtrack that occurs after
|
because lookaround assertions are atomic. A backtrack that occurs after
|
||||||
an assertion is complete does not jump back into the assertion. Note in
|
an assertion is complete does not jump back into the assertion. Note in
|
||||||
particular that a (*MARK) name that is set in an assertion is not
|
particular that a (*MARK) name that is set in an assertion is not
|
||||||
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
|
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
|
||||||
|
|
||||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
||||||
there are no more branches to try, (*THEN) causes a positive assertion
|
there are no more branches to try, (*THEN) causes a positive assertion
|
||||||
to be false, and a negative assertion to be true.
|
to be false, and a negative assertion to be true.
|
||||||
|
|
||||||
The other backtracking verbs are not treated specially if they appear
|
The other backtracking verbs are not treated specially if they appear
|
||||||
in a standalone positive assertion. In a conditional positive asser-
|
in a standalone positive assertion. In a conditional positive asser-
|
||||||
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
|
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
|
||||||
or (*PRUNE) causes the condition to be false. However, for both stand-
|
or (*PRUNE) causes the condition to be false. However, for both stand-
|
||||||
alone and conditional negative assertions, backtracking into (*COMMIT),
|
alone and conditional negative assertions, backtracking into (*COMMIT),
|
||||||
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
|
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
|
||||||
ing any further alternative branches.
|
ing any further alternative branches.
|
||||||
|
|
||||||
Backtracking verbs in subroutines
|
Backtracking verbs in subroutines
|
||||||
|
|
||||||
These behaviours occur whether or not the subpattern is called recur-
|
These behaviours occur whether or not the subpattern is called recur-
|
||||||
sively. Perl's treatment of subroutines is different in some cases.
|
sively.
|
||||||
|
|
||||||
(*FAIL) in a subpattern called as a subroutine has its normal effect:
|
|
||||||
it forces an immediate backtrack.
|
|
||||||
|
|
||||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine
|
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine
|
||||||
match to succeed without any further processing. Matching then contin-
|
match to succeed without any further processing. Matching then contin-
|
||||||
ues after the subroutine call.
|
ues after the subroutine call. Perl documents this behaviour. Perl's
|
||||||
|
treatment of the other verbs in subroutines is different in some cases.
|
||||||
|
|
||||||
|
(*FAIL) in a subpattern called as a subroutine has its normal effect:
|
||||||
|
it forces an immediate backtrack.
|
||||||
|
|
||||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
|
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
|
||||||
cause the subroutine match to fail.
|
cause the subroutine match to fail.
|
||||||
|
@ -9002,7 +9012,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 16 July 2018
|
Last updated: 20 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -10226,7 +10236,11 @@ CONDITIONAL PATTERNS
|
||||||
|
|
||||||
BACKTRACKING CONTROL
|
BACKTRACKING CONTROL
|
||||||
|
|
||||||
The following act immediately they are reached:
|
All backtracking control verbs may be in the form (*VERB:NAME). For
|
||||||
|
(*MARK) the name is mandatory, for the others it is optional. (*SKIP)
|
||||||
|
changes its behaviour if :NAME is present. The others just set a name
|
||||||
|
for passing back to the caller, but this is not a name that (*SKIP) can
|
||||||
|
see. The following act immediately they are reached:
|
||||||
|
|
||||||
(*ACCEPT) force successful match
|
(*ACCEPT) force successful match
|
||||||
(*FAIL) force backtrack; synonym (*F)
|
(*FAIL) force backtrack; synonym (*F)
|
||||||
|
@ -10239,12 +10253,13 @@ BACKTRACKING CONTROL
|
||||||
|
|
||||||
(*COMMIT) overall failure, no advance of starting point
|
(*COMMIT) overall failure, no advance of starting point
|
||||||
(*PRUNE) advance to next starting character
|
(*PRUNE) advance to next starting character
|
||||||
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
|
|
||||||
(*SKIP) advance to current matching position
|
(*SKIP) advance to current matching position
|
||||||
(*SKIP:NAME) advance to position corresponding to an earlier
|
(*SKIP:NAME) advance to position corresponding to an earlier
|
||||||
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
||||||
(*THEN) local failure, backtrack to next alternation
|
(*THEN) local failure, backtrack to next alternation
|
||||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
|
||||||
|
The effect of one of these verbs in a group called as a subroutine is
|
||||||
|
confined to the subroutine call.
|
||||||
|
|
||||||
|
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
@ -10254,14 +10269,14 @@ CALLOUTS
|
||||||
(?C"text") callout with string data
|
(?C"text") callout with string data
|
||||||
|
|
||||||
The allowed string delimiters are ` ' " ^ % # $ (which are the same for
|
The allowed string delimiters are ` ' " ^ % # $ (which are the same for
|
||||||
the start and the end), and the starting delimiter { matched with the
|
the start and the end), and the starting delimiter { matched with the
|
||||||
ending delimiter }. To encode the ending delimiter within the string,
|
ending delimiter }. To encode the ending delimiter within the string,
|
||||||
double it.
|
double it.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
|
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
|
||||||
pcre2(3).
|
pcre2(3).
|
||||||
|
|
||||||
|
|
||||||
|
@ -10274,7 +10289,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 07 July 2018
|
Last updated: 21 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "16 July 2018" "PCRE2 10.32"
|
.TH PCRE2PATTERN 3 "20 July 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -3154,17 +3154,16 @@ in the
|
||||||
.\"
|
.\"
|
||||||
documentation.
|
documentation.
|
||||||
.P
|
.P
|
||||||
Experiments with Perl suggest that it too has similar optimizations, sometimes
|
Experiments with Perl suggest that it too has similar optimizations, and like
|
||||||
leading to anomalous results.
|
PCRE2, turning them off can change the result of a match.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Verbs that act immediately"
|
.SS "Verbs that act immediately"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The following verbs act as soon as they are encountered. They may not be
|
The following verbs act as soon as they are encountered.
|
||||||
followed by a name.
|
|
||||||
.sp
|
.sp
|
||||||
(*ACCEPT)
|
(*ACCEPT) or (*ACCEPT:NAME)
|
||||||
.sp
|
.sp
|
||||||
This verb causes the match to end successfully, skipping the remainder of the
|
This verb causes the match to end successfully, skipping the remainder of the
|
||||||
pattern. However, when it is inside a subpattern that is called as a
|
pattern. However, when it is inside a subpattern that is called as a
|
||||||
|
@ -3180,18 +3179,21 @@ example:
|
||||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
||||||
the outer parentheses.
|
the outer parentheses.
|
||||||
.sp
|
.sp
|
||||||
(*FAIL) or (*F)
|
(*FAIL) or (*FAIL:NAME)
|
||||||
.sp
|
.sp
|
||||||
This verb causes a matching failure, forcing backtracking to occur. It is
|
This verb causes a matching failure, forcing backtracking to occur. It may be
|
||||||
equivalent to (?!) but easier to read. The Perl documentation notes that it is
|
abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
|
||||||
probably useful only when combined with (?{}) or (??{}). Those are, of course,
|
documentation notes that it is probably useful only when combined with (?{}) or
|
||||||
Perl features that are not present in PCRE2. The nearest equivalent is the
|
(??{}). Those are, of course, Perl features that are not present in PCRE2. The
|
||||||
callout feature, as for example in this pattern:
|
nearest equivalent is the callout feature, as for example in this pattern:
|
||||||
.sp
|
.sp
|
||||||
a+(?C)(*FAIL)
|
a+(?C)(*FAIL)
|
||||||
.sp
|
.sp
|
||||||
A match with the string "aaaa" always fails, but the callout is taken before
|
A match with the string "aaaa" always fails, but the callout is taken before
|
||||||
each backtrack happens (in this example, 10 times).
|
each backtrack happens (in this example, 10 times).
|
||||||
|
.P
|
||||||
|
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||||
|
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Recording which path was taken"
|
.SS "Recording which path was taken"
|
||||||
|
@ -3220,9 +3222,9 @@ documentation. This applies to all instances of (*MARK), including those inside
|
||||||
assertions and atomic groups. (There are differences in those cases when
|
assertions and atomic groups. (There are differences in those cases when
|
||||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
||||||
.P
|
.P
|
||||||
As well as (*MARK), the (*PRUNE) and (*THEN) verbs may have associated NAME
|
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||||
arguments. Whichever is last on the matching path is passed back. See below for
|
associated NAME arguments. Whichever is last on the matching path is passed
|
||||||
more details of these other verbs.
|
back. See below for more details of these other verbs.
|
||||||
.P
|
.P
|
||||||
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
|
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
|
||||||
requests the retrieval and outputting of (*MARK) data:
|
requests the retrieval and outputting of (*MARK) data:
|
||||||
|
@ -3282,22 +3284,24 @@ reaches them. The behaviour described below is what happens when the verb is
|
||||||
not in a subroutine or an assertion. Subsequent sections cover these special
|
not in a subroutine or an assertion. Subsequent sections cover these special
|
||||||
cases.
|
cases.
|
||||||
.sp
|
.sp
|
||||||
(*COMMIT)
|
(*COMMIT) or (*COMMIT:NAME)
|
||||||
.sp
|
.sp
|
||||||
This verb, which may not be followed by a name, causes the whole match to fail
|
This verb causes the whole match to fail outright if there is a later matching
|
||||||
outright if there is a later matching failure that causes backtracking to reach
|
failure that causes backtracking to reach it. Even if the pattern is
|
||||||
it. Even if the pattern is unanchored, no further attempts to find a match by
|
unanchored, no further attempts to find a match by advancing the starting point
|
||||||
advancing the starting point take place. If (*COMMIT) is the only backtracking
|
take place. If (*COMMIT) is the only backtracking verb that is encountered,
|
||||||
verb that is encountered, once it has been passed \fBpcre2_match()\fP is
|
once it has been passed \fBpcre2_match()\fP is committed to finding a match at
|
||||||
committed to finding a match at the current starting point, or not at all. For
|
the current starting point, or not at all. For example:
|
||||||
example:
|
|
||||||
.sp
|
.sp
|
||||||
a+(*COMMIT)b
|
a+(*COMMIT)b
|
||||||
.sp
|
.sp
|
||||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
|
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
|
||||||
dynamic anchor, or "I've started, so I must finish." The name of the most
|
dynamic anchor, or "I've started, so I must finish."
|
||||||
recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
|
.P
|
||||||
match failure.
|
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||||
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
|
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||||
.P
|
.P
|
||||||
If there is more than one backtracking verb in a pattern, a different one that
|
If there is more than one backtracking verb in a pattern, a different one that
|
||||||
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
|
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
|
||||||
|
@ -3338,7 +3342,7 @@ as (*COMMIT).
|
||||||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*PRUNE) or (*THEN).
|
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||||
.sp
|
.sp
|
||||||
(*SKIP)
|
(*SKIP)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3346,7 +3350,7 @@ This verb, when given without a name, is like (*PRUNE), except that if the
|
||||||
pattern is unanchored, the "bumpalong" advance is not to the next character,
|
pattern is unanchored, the "bumpalong" advance is not to the next character,
|
||||||
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
|
but to the position in the subject where (*SKIP) was encountered. (*SKIP)
|
||||||
signifies that whatever text was matched leading up to it cannot be part of a
|
signifies that whatever text was matched leading up to it cannot be part of a
|
||||||
successful match. Consider:
|
successful match if there is a later mismatch. Consider:
|
||||||
.sp
|
.sp
|
||||||
a+(*SKIP)b
|
a+(*SKIP)b
|
||||||
.sp
|
.sp
|
||||||
|
@ -3391,7 +3395,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
|
||||||
the second branch of the pattern.
|
the second branch of the pattern.
|
||||||
.P
|
.P
|
||||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||||
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
||||||
.sp
|
.sp
|
||||||
(*THEN) or (*THEN:NAME)
|
(*THEN) or (*THEN:NAME)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3409,10 +3413,10 @@ succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
|
||||||
more alternatives, so there is a backtrack to whatever came before the entire
|
more alternatives, so there is a backtrack to whatever came before the entire
|
||||||
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
||||||
.P
|
.P
|
||||||
The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
|
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||||
It is like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*PRUNE) and (*THEN).
|
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||||
.P
|
.P
|
||||||
A subpattern that does not contain a | character is just a part of the
|
A subpattern that does not contain a | character is just a part of the
|
||||||
enclosing alternative; it is not a nested alternation with only one
|
enclosing alternative; it is not a nested alternation with only one
|
||||||
|
@ -3485,13 +3489,14 @@ onto (*COMMIT).
|
||||||
.SS "Backtracking verbs in repeated groups"
|
.SS "Backtracking verbs in repeated groups"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
PCRE2 differs from Perl in its handling of backtracking verbs in repeated
|
PCRE2 sometimes differs from Perl in its handling of backtracking verbs in
|
||||||
groups. For example, consider:
|
repeated groups. For example, consider:
|
||||||
.sp
|
.sp
|
||||||
/(a(*COMMIT)b)+ac/
|
/(a(*COMMIT)b)+ac/
|
||||||
.sp
|
.sp
|
||||||
If the subject is "abac", Perl matches, but PCRE2 fails because the (*COMMIT)
|
If the subject is "abac", Perl matches unless its optimizations are disabled,
|
||||||
in the second repeat of the group acts.
|
but PCRE2 always fails because the (*COMMIT) in the second repeat of the group
|
||||||
|
acts.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="btassert"></a>
|
.\" HTML <a name="btassert"></a>
|
||||||
|
@ -3504,9 +3509,10 @@ not the assertion is standalone or acting as the condition in a conditional
|
||||||
subpattern.
|
subpattern.
|
||||||
.P
|
.P
|
||||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||||
without any further processing; captured strings are retained. In a standalone
|
without any further processing; captured strings and a (*MARK) name (if set)
|
||||||
negative assertion, (*ACCEPT) causes the assertion to fail without any further
|
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
||||||
processing; captured substrings are discarded.
|
assertion to fail without any further processing; captured substrings and any
|
||||||
|
(*MARK) name are discarded.
|
||||||
.P
|
.P
|
||||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||||
a positive assertion and false for a negative one; captured substrings are
|
a positive assertion and false for a negative one; captured substrings are
|
||||||
|
@ -3536,14 +3542,14 @@ the assertion to be true, without considering any further alternative branches.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
These behaviours occur whether or not the subpattern is called recursively.
|
These behaviours occur whether or not the subpattern is called recursively.
|
||||||
Perl's treatment of subroutines is different in some cases.
|
|
||||||
.P
|
|
||||||
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
|
||||||
an immediate backtrack.
|
|
||||||
.P
|
.P
|
||||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
|
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
|
||||||
succeed without any further processing. Matching then continues after the
|
succeed without any further processing. Matching then continues after the
|
||||||
subroutine call.
|
subroutine call. Perl documents this behaviour. Perl's treatment of the other
|
||||||
|
verbs in subroutines is different in some cases.
|
||||||
|
.P
|
||||||
|
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
|
||||||
|
an immediate backtrack.
|
||||||
.P
|
.P
|
||||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
|
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
|
||||||
the subroutine match to fail.
|
the subroutine match to fail.
|
||||||
|
@ -3574,6 +3580,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 16 July 2018
|
Last updated: 20 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32"
|
.TH PCRE2SYNTAX 3 "21 July 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -410,8 +410,6 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
(?>...) atomic, non-capturing group
|
(?>...) atomic, non-capturing group
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
|
||||||
.
|
|
||||||
.SH "COMMENT"
|
.SH "COMMENT"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -552,7 +550,11 @@ condition if the relevant named group exists.
|
||||||
.SH "BACKTRACKING CONTROL"
|
.SH "BACKTRACKING CONTROL"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The following act immediately they are reached:
|
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
||||||
|
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
||||||
|
if :NAME is present. The others just set a name for passing back to the caller,
|
||||||
|
but this is not a name that (*SKIP) can see. The following act immediately they
|
||||||
|
are reached:
|
||||||
.sp
|
.sp
|
||||||
(*ACCEPT) force successful match
|
(*ACCEPT) force successful match
|
||||||
(*FAIL) force backtrack; synonym (*F)
|
(*FAIL) force backtrack; synonym (*F)
|
||||||
|
@ -565,12 +567,13 @@ pattern is not anchored.
|
||||||
.sp
|
.sp
|
||||||
(*COMMIT) overall failure, no advance of starting point
|
(*COMMIT) overall failure, no advance of starting point
|
||||||
(*PRUNE) advance to next starting character
|
(*PRUNE) advance to next starting character
|
||||||
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
|
|
||||||
(*SKIP) advance to current matching position
|
(*SKIP) advance to current matching position
|
||||||
(*SKIP:NAME) advance to position corresponding to an earlier
|
(*SKIP:NAME) advance to position corresponding to an earlier
|
||||||
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
(*MARK:NAME); if not found, the (*SKIP) is ignored
|
||||||
(*THEN) local failure, backtrack to next alternation
|
(*THEN) local failure, backtrack to next alternation
|
||||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
.sp
|
||||||
|
The effect of one of these verbs in a group called as a subroutine is confined
|
||||||
|
to the subroutine call.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "CALLOUTS"
|
.SH "CALLOUTS"
|
||||||
|
@ -606,6 +609,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 07 July 2018
|
Last updated: 21 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "16 July 2018" "PCRE 10.32"
|
.TH PCRE2TEST 1 "21 July 2018" "PCRE 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -360,10 +360,11 @@ patterns. Modifiers on a pattern can change these settings.
|
||||||
The appearance of this line causes all subsequent modifier settings to be
|
The appearance of this line causes all subsequent modifier settings to be
|
||||||
checked for compatibility with the \fBperltest.sh\fP script, which is used to
|
checked for compatibility with the \fBperltest.sh\fP script, which is used to
|
||||||
confirm that Perl gives the same results as PCRE2. Also, apart from comment
|
confirm that Perl gives the same results as PCRE2. Also, apart from comment
|
||||||
lines, none of the other command lines are permitted, because they and many
|
lines, #pattern commands, and #subject commands that set or unset "mark", no
|
||||||
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
command lines are permitted, because they and many of the modifiers are
|
||||||
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
specific to \fBpcre2test\fP, and should not be used in test files that are also
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
processed by \fBperltest.sh\fP. The \fB#perltest\fP command helps detect tests
|
||||||
|
that are accidentally put in the wrong file.
|
||||||
.sp
|
.sp
|
||||||
#pop [<modifiers>]
|
#pop [<modifiers>]
|
||||||
#popcopy [<modifiers>]
|
#popcopy [<modifiers>]
|
||||||
|
@ -1981,6 +1982,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 16 July 2018
|
Last updated: 21 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -344,11 +344,11 @@ COMMAND LINES
|
||||||
The appearance of this line causes all subsequent modifier settings to
|
The appearance of this line causes all subsequent modifier settings to
|
||||||
be checked for compatibility with the perltest.sh script, which is used
|
be checked for compatibility with the perltest.sh script, which is used
|
||||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||||
comment lines, none of the other command lines are permitted, because
|
comment lines, #pattern commands, and #subject commands that set or
|
||||||
they and many of the modifiers are specific to pcre2test, and should
|
unset "mark", no command lines are permitted, because they and many of
|
||||||
not be used in test files that are also processed by perltest.sh. The
|
the modifiers are specific to pcre2test, and should not be used in test
|
||||||
#perltest command helps detect tests that are accidentally put in the
|
files that are also processed by perltest.sh. The #perltest command
|
||||||
wrong file.
|
helps detect tests that are accidentally put in the wrong file.
|
||||||
|
|
||||||
#pop [<modifiers>]
|
#pop [<modifiers>]
|
||||||
#popcopy [<modifiers>]
|
#popcopy [<modifiers>]
|
||||||
|
@ -1818,5 +1818,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 16 July 2018
|
Last updated: 21 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
|
|
52
perltest.sh
52
perltest.sh
|
@ -45,17 +45,19 @@ fi
|
||||||
# jitstack ignored
|
# jitstack ignored
|
||||||
# mark show mark information
|
# mark show mark information
|
||||||
# no_auto_possess ignored
|
# no_auto_possess ignored
|
||||||
# no_start_optimize insert ({""}) at pattern start (disable Perl optimizing)
|
# no_start_optimize insert (??{""}) at pattern start (disables optimizing)
|
||||||
# subject_literal does not process subjects for escapes
|
# subject_literal does not process subjects for escapes
|
||||||
# ucp sets Perl's /u modifier
|
# ucp sets Perl's /u modifier
|
||||||
# utf invoke UTF-8 functionality
|
# utf invoke UTF-8 functionality
|
||||||
#
|
#
|
||||||
# Comment lines are ignored. The #pattern command can be used to set modifiers
|
# Comment lines are ignored. The #pattern command can be used to set modifiers
|
||||||
# that will be added to each subsequent pattern. NOTE: this is different to
|
# that will be added to each subsequent pattern, after any modifiers it may
|
||||||
# pcre2test where #pattern sets defaults, some of which can be overridden on
|
# already have. NOTE: this is different to pcre2test where #pattern sets
|
||||||
# individual patterns. The #perltest, #forbid_utf, and #newline_default
|
# defaults which can be overridden on individual patterns. The #subject command
|
||||||
# commands, which are needed in the relevant pcre2test files, are ignored. Any
|
# may be used to set or unset a default "mark" modifier for data lines. This is
|
||||||
# other #-command is ignored, with a warning message.
|
# the only use of #subject that is supported. The #perltest, #forbid_utf, and
|
||||||
|
# #newline_default commands, which are needed in the relevant pcre2test files,
|
||||||
|
# are ignored. Any other #-command is ignored, with a warning message.
|
||||||
#
|
#
|
||||||
# The data lines must not have any pcre2test modifiers. Unless
|
# The data lines must not have any pcre2test modifiers. Unless
|
||||||
# "subject_literal" is on the pattern, data lines are processed as
|
# "subject_literal" is on the pattern, data lines are processed as
|
||||||
|
@ -135,23 +137,39 @@ for (;;)
|
||||||
last if ! ($_ = <$infile>);
|
last if ! ($_ = <$infile>);
|
||||||
printf $outfile "$_" if ! $interact;
|
printf $outfile "$_" if ! $interact;
|
||||||
next if ($_ =~ /^\s*$/ || $_ =~ /^#[\s!]/);
|
next if ($_ =~ /^\s*$/ || $_ =~ /^#[\s!]/);
|
||||||
|
|
||||||
# A few of pcre2test's #-commands are supported, or just ignored. Any others
|
# A few of pcre2test's #-commands are supported, or just ignored. Any others
|
||||||
# cause an error.
|
# cause an error.
|
||||||
|
|
||||||
if ($_ =~ /^#pattern(.*)/)
|
if ($_ =~ /^#pattern(.*)/)
|
||||||
{
|
{
|
||||||
$extra_modifiers = $1;
|
$extra_modifiers = $1;
|
||||||
chomp($extra_modifiers);
|
chomp($extra_modifiers);
|
||||||
$extra_modifiers =~ s/\s+$//;
|
$extra_modifiers =~ s/\s+$//;
|
||||||
next;
|
next;
|
||||||
}
|
}
|
||||||
|
elsif ($_ =~ /^#subject(.*)/)
|
||||||
|
{
|
||||||
|
$mod = $1;
|
||||||
|
chomp($mod);
|
||||||
|
$mod =~ s/\s+$//;
|
||||||
|
if ($mod =~ s/(-?)mark,?//)
|
||||||
|
{
|
||||||
|
$minus = $1;
|
||||||
|
$default_show_mark = ($minus =~ /^$/);
|
||||||
|
}
|
||||||
|
if ($mod !~ /^\s*$/)
|
||||||
|
{
|
||||||
|
printf $outfile "** Warning: \"$mod\" in #subject ignored\n";
|
||||||
|
}
|
||||||
|
next;
|
||||||
|
}
|
||||||
elsif ($_ =~ /^#/)
|
elsif ($_ =~ /^#/)
|
||||||
{
|
{
|
||||||
if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)
|
if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/)
|
||||||
{
|
{
|
||||||
printf $outfile "** Warning: #-command ignored: %s", $_;
|
printf $outfile "** Warning: #-command ignored: %s", $_;
|
||||||
}
|
}
|
||||||
next;
|
next;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -172,9 +190,9 @@ for (;;)
|
||||||
|
|
||||||
$pattern =~ /^\s*((.).*\2)(.*)$/s;
|
$pattern =~ /^\s*((.).*\2)(.*)$/s;
|
||||||
$pat = $1;
|
$pat = $1;
|
||||||
|
$del = $2;
|
||||||
$mod = "$3,$extra_modifiers";
|
$mod = "$3,$extra_modifiers";
|
||||||
$mod =~ s/^,\s*//;
|
$mod =~ s/^,\s*//;
|
||||||
$del = $2;
|
|
||||||
|
|
||||||
# The private "aftertext" modifier means "print $' afterwards".
|
# The private "aftertext" modifier means "print $' afterwards".
|
||||||
|
|
||||||
|
@ -202,7 +220,7 @@ for (;;)
|
||||||
|
|
||||||
# The "mark" modifier requests checking of MARK data */
|
# The "mark" modifier requests checking of MARK data */
|
||||||
|
|
||||||
$show_mark = ($mod =~ s/mark,?//);
|
$show_mark = $default_show_mark | ($mod =~ s/mark,?//);
|
||||||
|
|
||||||
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
|
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
|
||||||
|
|
||||||
|
@ -214,7 +232,7 @@ for (;;)
|
||||||
|
|
||||||
# Use no_start_optimize (disable PCRE2 start-up optimization) to disable Perl
|
# Use no_start_optimize (disable PCRE2 start-up optimization) to disable Perl
|
||||||
# optimization by inserting (??{""}) at the start of the pattern.
|
# optimization by inserting (??{""}) at the start of the pattern.
|
||||||
|
|
||||||
if ($mod =~ s/no_start_optimize,?//) { $pat =~ s/$del/$del(??{""})/; }
|
if ($mod =~ s/no_start_optimize,?//) { $pat =~ s/$del/$del(??{""})/; }
|
||||||
|
|
||||||
# Add back retained modifiers and check that the pattern is valid.
|
# Add back retained modifiers and check that the pattern is valid.
|
||||||
|
|
|
@ -281,6 +281,7 @@ pcre2_pattern_convert(). */
|
||||||
#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156
|
#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156
|
||||||
#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157
|
#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157
|
||||||
#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158
|
#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158
|
||||||
|
/* Error 159 is obsolete and should now never occur */
|
||||||
#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159
|
#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159
|
||||||
#define PCRE2_ERROR_VERB_UNKNOWN 160
|
#define PCRE2_ERROR_VERB_UNKNOWN 160
|
||||||
#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161
|
#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -250,34 +250,35 @@ is present where expected in a conditional group. */
|
||||||
#define META_LOOKBEHINDNOT 0x80250000u /* (?<! */
|
#define META_LOOKBEHINDNOT 0x80250000u /* (?<! */
|
||||||
|
|
||||||
/* These must be kept in this order, with consecutive values, and the _ARG
|
/* These must be kept in this order, with consecutive values, and the _ARG
|
||||||
versions of PRUNE, SKIP, and THEN immediately after their non-argument
|
versions of COMMIT, PRUNE, SKIP, and THEN immediately after their non-argument
|
||||||
versions. */
|
versions. */
|
||||||
|
|
||||||
#define META_MARK 0x80260000u /* (*MARK) */
|
#define META_MARK 0x80260000u /* (*MARK) */
|
||||||
#define META_ACCEPT 0x80270000u /* (*ACCEPT) */
|
#define META_ACCEPT 0x80270000u /* (*ACCEPT) */
|
||||||
#define META_COMMIT 0x80280000u /* (*COMMIT) */
|
#define META_FAIL 0x80280000u /* (*FAIL) */
|
||||||
#define META_FAIL 0x80290000u /* (*FAIL) */
|
#define META_COMMIT 0x80290000u /* These */
|
||||||
#define META_PRUNE 0x802a0000u /* These pairs must */
|
#define META_COMMIT_ARG 0x802a0000u /* pairs */
|
||||||
#define META_PRUNE_ARG 0x802b0000u /* be */
|
#define META_PRUNE 0x802b0000u /* must */
|
||||||
#define META_SKIP 0x802c0000u /* kept */
|
#define META_PRUNE_ARG 0x802c0000u /* be */
|
||||||
#define META_SKIP_ARG 0x802d0000u /* in */
|
#define META_SKIP 0x802d0000u /* kept */
|
||||||
#define META_THEN 0x802e0000u /* this */
|
#define META_SKIP_ARG 0x802e0000u /* in */
|
||||||
#define META_THEN_ARG 0x802f0000u /* order */
|
#define META_THEN 0x802f0000u /* this */
|
||||||
|
#define META_THEN_ARG 0x80300000u /* order */
|
||||||
|
|
||||||
/* These must be kept in groups of adjacent 3 values, and all together. */
|
/* These must be kept in groups of adjacent 3 values, and all together. */
|
||||||
|
|
||||||
#define META_ASTERISK 0x80300000u /* * */
|
#define META_ASTERISK 0x80310000u /* * */
|
||||||
#define META_ASTERISK_PLUS 0x80310000u /* *+ */
|
#define META_ASTERISK_PLUS 0x80320000u /* *+ */
|
||||||
#define META_ASTERISK_QUERY 0x80320000u /* *? */
|
#define META_ASTERISK_QUERY 0x80330000u /* *? */
|
||||||
#define META_PLUS 0x80330000u /* + */
|
#define META_PLUS 0x80340000u /* + */
|
||||||
#define META_PLUS_PLUS 0x80340000u /* ++ */
|
#define META_PLUS_PLUS 0x80350000u /* ++ */
|
||||||
#define META_PLUS_QUERY 0x80350000u /* +? */
|
#define META_PLUS_QUERY 0x80360000u /* +? */
|
||||||
#define META_QUERY 0x80360000u /* ? */
|
#define META_QUERY 0x80370000u /* ? */
|
||||||
#define META_QUERY_PLUS 0x80370000u /* ?+ */
|
#define META_QUERY_PLUS 0x80380000u /* ?+ */
|
||||||
#define META_QUERY_QUERY 0x80380000u /* ?? */
|
#define META_QUERY_QUERY 0x80390000u /* ?? */
|
||||||
#define META_MINMAX 0x80390000u /* {n,m} repeat */
|
#define META_MINMAX 0x803a0000u /* {n,m} repeat */
|
||||||
#define META_MINMAX_PLUS 0x803a0000u /* {n,m}+ repeat */
|
#define META_MINMAX_PLUS 0x803b0000u /* {n,m}+ repeat */
|
||||||
#define META_MINMAX_QUERY 0x803b0000u /* {n,m}? repeat */
|
#define META_MINMAX_QUERY 0x803c0000u /* {n,m}? repeat */
|
||||||
|
|
||||||
#define META_FIRST_QUANTIFIER META_ASTERISK
|
#define META_FIRST_QUANTIFIER META_ASTERISK
|
||||||
#define META_LAST_QUANTIFIER META_MINMAX_QUERY
|
#define META_LAST_QUANTIFIER META_MINMAX_QUERY
|
||||||
|
@ -327,8 +328,9 @@ static unsigned char meta_extra_lengths[] = {
|
||||||
SIZEOFFSET, /* META_LOOKBEHINDNOT */
|
SIZEOFFSET, /* META_LOOKBEHINDNOT */
|
||||||
1, /* META_MARK - plus the string length */
|
1, /* META_MARK - plus the string length */
|
||||||
0, /* META_ACCEPT */
|
0, /* META_ACCEPT */
|
||||||
0, /* META_COMMIT */
|
|
||||||
0, /* META_FAIL */
|
0, /* META_FAIL */
|
||||||
|
0, /* META_COMMIT */
|
||||||
|
1, /* META_COMMIT_ARG - plus the string length */
|
||||||
0, /* META_PRUNE */
|
0, /* META_PRUNE */
|
||||||
1, /* META_PRUNE_ARG - plus the string length */
|
1, /* META_PRUNE_ARG - plus the string length */
|
||||||
0, /* META_SKIP */
|
0, /* META_SKIP */
|
||||||
|
@ -586,9 +588,9 @@ static const char verbnames[] =
|
||||||
"\0" /* Empty name is a shorthand for MARK */
|
"\0" /* Empty name is a shorthand for MARK */
|
||||||
STRING_MARK0
|
STRING_MARK0
|
||||||
STRING_ACCEPT0
|
STRING_ACCEPT0
|
||||||
STRING_COMMIT0
|
|
||||||
STRING_F0
|
STRING_F0
|
||||||
STRING_FAIL0
|
STRING_FAIL0
|
||||||
|
STRING_COMMIT0
|
||||||
STRING_PRUNE0
|
STRING_PRUNE0
|
||||||
STRING_SKIP0
|
STRING_SKIP0
|
||||||
STRING_THEN;
|
STRING_THEN;
|
||||||
|
@ -596,11 +598,11 @@ static const char verbnames[] =
|
||||||
static const verbitem verbs[] = {
|
static const verbitem verbs[] = {
|
||||||
{ 0, META_MARK, +1 }, /* > 0 => must have an argument */
|
{ 0, META_MARK, +1 }, /* > 0 => must have an argument */
|
||||||
{ 4, META_MARK, +1 },
|
{ 4, META_MARK, +1 },
|
||||||
{ 6, META_ACCEPT, -1 }, /* < 0 => must not have an argument */
|
{ 6, META_ACCEPT, -1 }, /* < 0 => Optional argument, convert to pre-MARK */
|
||||||
{ 6, META_COMMIT, -1 },
|
|
||||||
{ 1, META_FAIL, -1 },
|
{ 1, META_FAIL, -1 },
|
||||||
{ 4, META_FAIL, -1 },
|
{ 4, META_FAIL, -1 },
|
||||||
{ 5, META_PRUNE, 0 }, /* Argument is optional; bump META code if found */
|
{ 6, META_COMMIT, 0 },
|
||||||
|
{ 5, META_PRUNE, 0 }, /* Optional argument; bump META code if found */
|
||||||
{ 4, META_SKIP, 0 },
|
{ 4, META_SKIP, 0 },
|
||||||
{ 4, META_THEN, 0 }
|
{ 4, META_THEN, 0 }
|
||||||
};
|
};
|
||||||
|
@ -610,8 +612,8 @@ static const int verbcount = sizeof(verbs)/sizeof(verbitem);
|
||||||
/* Verb opcodes, indexed by their META code offset from META_MARK. */
|
/* Verb opcodes, indexed by their META code offset from META_MARK. */
|
||||||
|
|
||||||
static const uint32_t verbops[] = {
|
static const uint32_t verbops[] = {
|
||||||
OP_MARK, OP_ACCEPT, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_PRUNE_ARG, OP_SKIP,
|
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
||||||
OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
||||||
|
|
||||||
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
||||||
|
|
||||||
|
@ -976,8 +978,8 @@ for (;;)
|
||||||
case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break;
|
case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break;
|
||||||
|
|
||||||
case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break;
|
case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break;
|
||||||
case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
|
|
||||||
case META_FAIL: fprintf(stderr, "META (*FAIL)"); break;
|
case META_FAIL: fprintf(stderr, "META (*FAIL)"); break;
|
||||||
|
case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break;
|
||||||
case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break;
|
case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break;
|
||||||
case META_SKIP: fprintf(stderr, "META (*SKIP)"); break;
|
case META_SKIP: fprintf(stderr, "META (*SKIP)"); break;
|
||||||
case META_THEN: fprintf(stderr, "META (*THEN)"); break;
|
case META_THEN: fprintf(stderr, "META (*THEN)"); break;
|
||||||
|
@ -1067,6 +1069,10 @@ for (;;)
|
||||||
fprintf(stderr, "META (*MARK:");
|
fprintf(stderr, "META (*MARK:");
|
||||||
goto SHOWARG;
|
goto SHOWARG;
|
||||||
|
|
||||||
|
case META_COMMIT_ARG:
|
||||||
|
fprintf(stderr, "META (*COMMIT:");
|
||||||
|
goto SHOWARG;
|
||||||
|
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
fprintf(stderr, "META (*PRUNE:");
|
fprintf(stderr, "META (*PRUNE:");
|
||||||
goto SHOWARG;
|
goto SHOWARG;
|
||||||
|
@ -2290,6 +2296,7 @@ uint32_t *previous_callout = NULL;
|
||||||
uint32_t *parsed_pattern = cb->parsed_pattern;
|
uint32_t *parsed_pattern = cb->parsed_pattern;
|
||||||
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
||||||
uint32_t meta_quantifier = 0;
|
uint32_t meta_quantifier = 0;
|
||||||
|
uint32_t add_after_mark = 0;
|
||||||
uint16_t nest_depth = 0;
|
uint16_t nest_depth = 0;
|
||||||
int after_manual_callout = 0;
|
int after_manual_callout = 0;
|
||||||
int expect_cond_assert = 0;
|
int expect_cond_assert = 0;
|
||||||
|
@ -2461,6 +2468,16 @@ while (ptr < ptrend)
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
}
|
}
|
||||||
*verblengthptr = (uint32_t)verbnamelength;
|
*verblengthptr = (uint32_t)verbnamelength;
|
||||||
|
|
||||||
|
/* If this name was on a verb such as (*ACCEPT) which does not continue,
|
||||||
|
a (*MARK) was generated for the name. We now add the original verb as the
|
||||||
|
next item. */
|
||||||
|
|
||||||
|
if (add_after_mark != 0)
|
||||||
|
{
|
||||||
|
*parsed_pattern++ = add_after_mark;
|
||||||
|
add_after_mark = 0;
|
||||||
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case CHAR_BACKSLASH:
|
case CHAR_BACKSLASH:
|
||||||
|
@ -3454,13 +3471,25 @@ while (ptr < ptrend)
|
||||||
|
|
||||||
if (*ptr++ == CHAR_COLON) /* Skip past : or ) */
|
if (*ptr++ == CHAR_COLON) /* Skip past : or ) */
|
||||||
{
|
{
|
||||||
if (verbs[i].has_arg < 0) /* Argument is forbidden */
|
/* Some optional arguments can be treated as a preceding (*MARK) */
|
||||||
|
|
||||||
|
if (verbs[i].has_arg < 0)
|
||||||
{
|
{
|
||||||
errorcode = ERR59;
|
add_after_mark = verbs[i].meta;
|
||||||
goto FAILED;
|
*parsed_pattern++ = META_MARK;
|
||||||
}
|
}
|
||||||
*parsed_pattern++ = verbs[i].meta +
|
|
||||||
((verbs[i].meta != META_MARK)? 0x00010000u:0);
|
/* The remaining verbs with arguments (except *MARK) need a different
|
||||||
|
opcode. */
|
||||||
|
|
||||||
|
else
|
||||||
|
{
|
||||||
|
*parsed_pattern++ = verbs[i].meta +
|
||||||
|
((verbs[i].meta != META_MARK)? 0x00010000u:0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Set up for reading the name in the main loop. */
|
||||||
|
|
||||||
verblengthptr = parsed_pattern++;
|
verblengthptr = parsed_pattern++;
|
||||||
verbnamestart = ptr;
|
verbnamestart = ptr;
|
||||||
inverbname = TRUE;
|
inverbname = TRUE;
|
||||||
|
@ -5654,6 +5683,7 @@ for (;; pptr++)
|
||||||
cb->had_pruneorskip = TRUE;
|
cb->had_pruneorskip = TRUE;
|
||||||
/* Fall through */
|
/* Fall through */
|
||||||
case META_MARK:
|
case META_MARK:
|
||||||
|
case META_COMMIT_ARG:
|
||||||
VERB_ARG:
|
VERB_ARG:
|
||||||
*code++ = verbops[(meta - META_MARK) >> 16];
|
*code++ = verbops[(meta - META_MARK) >> 16];
|
||||||
/* The length is in characters. */
|
/* The length is in characters. */
|
||||||
|
@ -8002,6 +8032,7 @@ for (;;)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_SKIP_ARG:
|
case OP_SKIP_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
|
@ -8310,6 +8341,7 @@ for (;; pptr++)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case META_MARK: /* Add the length of the name. */
|
case META_MARK: /* Add the length of the name. */
|
||||||
|
case META_COMMIT_ARG:
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
case META_SKIP_ARG:
|
case META_SKIP_ARG:
|
||||||
case META_THEN_ARG:
|
case META_THEN_ARG:
|
||||||
|
@ -8500,6 +8532,7 @@ for (;; pptr++)
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
|
|
||||||
case META_MARK:
|
case META_MARK:
|
||||||
|
case META_COMMIT_ARG:
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
case META_SKIP_ARG:
|
case META_SKIP_ARG:
|
||||||
case META_THEN_ARG:
|
case META_THEN_ARG:
|
||||||
|
@ -8967,6 +9000,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case META_MARK:
|
case META_MARK:
|
||||||
|
case META_COMMIT_ARG:
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
case META_SKIP_ARG:
|
case META_SKIP_ARG:
|
||||||
case META_THEN_ARG:
|
case META_THEN_ARG:
|
||||||
|
|
|
@ -181,7 +181,8 @@ static const uint8_t coptable[] = {
|
||||||
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
|
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
|
||||||
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
|
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
|
||||||
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
|
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
|
||||||
0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */
|
0, 0, /* COMMIT, COMMIT_ARG */
|
||||||
|
0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */
|
||||||
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
|
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -254,7 +255,8 @@ static const uint8_t poptable[] = {
|
||||||
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
|
0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */
|
||||||
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
|
0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */
|
||||||
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
|
0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */
|
||||||
0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */
|
0, 0, /* COMMIT, COMMIT_ARG */
|
||||||
|
0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */
|
||||||
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
|
0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
|
@ -133,7 +133,8 @@ static const unsigned char compile_error_texts[] =
|
||||||
"internal error: unknown newline setting\0"
|
"internal error: unknown newline setting\0"
|
||||||
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
|
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
|
||||||
"(?R (recursive pattern call) must be followed by a closing parenthesis\0"
|
"(?R (recursive pattern call) must be followed by a closing parenthesis\0"
|
||||||
"an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0"
|
/* "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" */
|
||||||
|
"obsolete error (should not occur)\0" /* Was the above */
|
||||||
/* 60 */
|
/* 60 */
|
||||||
"(*VERB) not recognized or malformed\0"
|
"(*VERB) not recognized or malformed\0"
|
||||||
"group number is too big\0"
|
"group number is too big\0"
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -253,7 +253,7 @@ maximum size of this can be limited. */
|
||||||
|
|
||||||
#define START_FRAMES_SIZE 20480
|
#define START_FRAMES_SIZE 20480
|
||||||
|
|
||||||
/* Similarly, for DFA matching, an initial internal workspace vector is
|
/* Similarly, for DFA matching, an initial internal workspace vector is
|
||||||
allocated on the stack. */
|
allocated on the stack. */
|
||||||
|
|
||||||
#define DFA_START_RWS_SIZE 30720
|
#define DFA_START_RWS_SIZE 30720
|
||||||
|
@ -1583,23 +1583,26 @@ enum {
|
||||||
OP_THEN, /* 155 */
|
OP_THEN, /* 155 */
|
||||||
OP_THEN_ARG, /* 156 same, but with argument */
|
OP_THEN_ARG, /* 156 same, but with argument */
|
||||||
OP_COMMIT, /* 157 */
|
OP_COMMIT, /* 157 */
|
||||||
|
OP_COMMIT_ARG, /* 158 same, but with argument */
|
||||||
|
|
||||||
/* These are forced failure and success verbs */
|
/* These are forced failure and success verbs. FAIL and ACCEPT do accept an
|
||||||
|
argument, but these cases can be compiled as, for example, (*MARK:X)(*FAIL)
|
||||||
|
without the need for a special opcode. */
|
||||||
|
|
||||||
OP_FAIL, /* 158 */
|
OP_FAIL, /* 159 */
|
||||||
OP_ACCEPT, /* 159 */
|
OP_ACCEPT, /* 160 */
|
||||||
OP_ASSERT_ACCEPT, /* 160 Used inside assertions */
|
OP_ASSERT_ACCEPT, /* 161 Used inside assertions */
|
||||||
OP_CLOSE, /* 161 Used before OP_ACCEPT to close open captures */
|
OP_CLOSE, /* 162 Used before OP_ACCEPT to close open captures */
|
||||||
|
|
||||||
/* This is used to skip a subpattern with a {0} quantifier */
|
/* This is used to skip a subpattern with a {0} quantifier */
|
||||||
|
|
||||||
OP_SKIPZERO, /* 162 */
|
OP_SKIPZERO, /* 163 */
|
||||||
|
|
||||||
/* This is used to identify a DEFINE group during compilation so that it can
|
/* This is used to identify a DEFINE group during compilation so that it can
|
||||||
be checked for having only one branch. It is changed to OP_FALSE before
|
be checked for having only one branch. It is changed to OP_FALSE before
|
||||||
compilation finishes. */
|
compilation finishes. */
|
||||||
|
|
||||||
OP_DEFINE, /* 163 */
|
OP_DEFINE, /* 164 */
|
||||||
|
|
||||||
/* This is not an opcode, but is used to check that tables indexed by opcode
|
/* This is not an opcode, but is used to check that tables indexed by opcode
|
||||||
are the correct length, in order to catch updating errors - there have been
|
are the correct length, in order to catch updating errors - there have been
|
||||||
|
@ -1655,7 +1658,7 @@ some cases doesn't actually use these names at all). */
|
||||||
"Cond false", "Cond true", \
|
"Cond false", "Cond true", \
|
||||||
"Brazero", "Braminzero", "Braposzero", \
|
"Brazero", "Braminzero", "Braposzero", \
|
||||||
"*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \
|
"*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \
|
||||||
"*THEN", "*THEN", "*COMMIT", "*FAIL", \
|
"*THEN", "*THEN", "*COMMIT", "*COMMIT", "*FAIL", \
|
||||||
"*ACCEPT", "*ASSERT_ACCEPT", \
|
"*ACCEPT", "*ASSERT_ACCEPT", \
|
||||||
"Close", "Skip zero", "Define"
|
"Close", "Skip zero", "Define"
|
||||||
|
|
||||||
|
@ -1747,7 +1750,8 @@ in UTF-8 mode. The code that uses this table must know about such things. */
|
||||||
3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \
|
3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \
|
||||||
1, 3, /* SKIP, SKIP_ARG */ \
|
1, 3, /* SKIP, SKIP_ARG */ \
|
||||||
1, 3, /* THEN, THEN_ARG */ \
|
1, 3, /* THEN, THEN_ARG */ \
|
||||||
1, 1, 1, 1, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ \
|
1, 3, /* COMMIT, COMMIT_ARG */ \
|
||||||
|
1, 1, 1, /* FAIL, ACCEPT, ASSERT_ACCEPT */ \
|
||||||
1+IMM2_SIZE, 1, /* CLOSE, SKIPZERO */ \
|
1+IMM2_SIZE, 1, /* CLOSE, SKIPZERO */ \
|
||||||
1 /* DEFINE */
|
1 /* DEFINE */
|
||||||
|
|
||||||
|
|
|
@ -839,6 +839,7 @@ switch(*cc)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_SKIP_ARG:
|
case OP_SKIP_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
|
@ -939,6 +940,7 @@ while (cc < ccend)
|
||||||
common->control_head_ptr = 1;
|
common->control_head_ptr = 1;
|
||||||
/* Fall through. */
|
/* Fall through. */
|
||||||
|
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
if (common->mark_ptr == 0)
|
if (common->mark_ptr == 0)
|
||||||
|
@ -1553,6 +1555,7 @@ while (cc < ccend)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||||
|
@ -1733,6 +1736,7 @@ while (cc < ccend)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||||
|
@ -2041,6 +2045,7 @@ while (cc < ccend)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||||
|
@ -2428,6 +2433,7 @@ while (cc < ccend)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
SLJIT_ASSERT(common->mark_ptr != 0);
|
SLJIT_ASSERT(common->mark_ptr != 0);
|
||||||
|
@ -10350,7 +10356,8 @@ backtrack_common *backtrack;
|
||||||
PCRE2_UCHAR opcode = *cc;
|
PCRE2_UCHAR opcode = *cc;
|
||||||
PCRE2_SPTR ccend = cc + 1;
|
PCRE2_SPTR ccend = cc + 1;
|
||||||
|
|
||||||
if (opcode == OP_PRUNE_ARG || opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
|
if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG ||
|
||||||
|
opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG)
|
||||||
ccend += 2 + cc[1];
|
ccend += 2 + cc[1];
|
||||||
|
|
||||||
PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL);
|
PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL);
|
||||||
|
@ -10362,7 +10369,7 @@ if (opcode == OP_SKIP)
|
||||||
return ccend;
|
return ccend;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
|
if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG)
|
||||||
{
|
{
|
||||||
OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0);
|
OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0);
|
||||||
OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2));
|
OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2));
|
||||||
|
@ -10681,6 +10688,7 @@ while (cc < ccend)
|
||||||
case OP_THEN:
|
case OP_THEN:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
case OP_COMMIT:
|
case OP_COMMIT:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
cc = compile_control_verb_matchingpath(common, cc, parent);
|
cc = compile_control_verb_matchingpath(common, cc, parent);
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
@ -11755,6 +11763,7 @@ while (current)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_COMMIT:
|
case OP_COMMIT:
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
if (!common->local_quit_available)
|
if (!common->local_quit_available)
|
||||||
OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH);
|
OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH);
|
||||||
if (common->quit_label == NULL)
|
if (common->quit_label == NULL)
|
||||||
|
|
|
@ -149,7 +149,7 @@ changed, the code at RETURN_SWITCH below must be updated in sync. */
|
||||||
enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10,
|
enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10,
|
||||||
RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20,
|
RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20,
|
||||||
RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30,
|
RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30,
|
||||||
RM31, RM32, RM33, RM34, RM35 };
|
RM31, RM32, RM33, RM34, RM35, RM36 };
|
||||||
|
|
||||||
#ifdef SUPPORT_WIDE_CHARS
|
#ifdef SUPPORT_WIDE_CHARS
|
||||||
enum { RM100=100, RM101 };
|
enum { RM100=100, RM101 };
|
||||||
|
@ -770,7 +770,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
/* ===================================================================== */
|
/* ===================================================================== */
|
||||||
/* Real or forced end of the pattern, assertion, or recursion. In an
|
/* Real or forced end of the pattern, assertion, or recursion. In an
|
||||||
assertion ACCEPT, update the last used pointer and remember the current
|
assertion ACCEPT, update the last used pointer and remember the current
|
||||||
frame so that the captures can be fished out of it. */
|
frame so that the captures and mark can be fished out of it. */
|
||||||
|
|
||||||
case OP_ASSERT_ACCEPT:
|
case OP_ASSERT_ACCEPT:
|
||||||
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
|
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
|
||||||
|
@ -5119,7 +5119,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
/* Positive assertions are like other groups except that PCRE doesn't allow
|
/* Positive assertions are like other groups except that PCRE doesn't allow
|
||||||
the effect of (*THEN) to escape beyond an assertion; it is therefore
|
the effect of (*THEN) to escape beyond an assertion; it is therefore
|
||||||
treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its
|
treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its
|
||||||
captures retained. Any other return is an error. */
|
captures and mark retained. Any other return is an error. */
|
||||||
|
|
||||||
#define Lframe_type F->temp_32[0]
|
#define Lframe_type F->temp_32[0]
|
||||||
|
|
||||||
|
@ -5136,6 +5136,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
(char *)assert_accept_frame + offsetof(heapframe, ovector),
|
(char *)assert_accept_frame + offsetof(heapframe, ovector),
|
||||||
assert_accept_frame->offset_top * sizeof(PCRE2_SIZE));
|
assert_accept_frame->offset_top * sizeof(PCRE2_SIZE));
|
||||||
Foffset_top = assert_accept_frame->offset_top;
|
Foffset_top = assert_accept_frame->offset_top;
|
||||||
|
Fmark = assert_accept_frame->mark;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc);
|
if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc);
|
||||||
|
@ -5837,6 +5838,13 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
mb->verb_current_recurse = Fcurrent_recurse;
|
mb->verb_current_recurse = Fcurrent_recurse;
|
||||||
RRETURN(MATCH_COMMIT);
|
RRETURN(MATCH_COMMIT);
|
||||||
|
|
||||||
|
case OP_COMMIT_ARG:
|
||||||
|
Fmark = mb->nomatch_mark = Fecode + 2;
|
||||||
|
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM36);
|
||||||
|
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
||||||
|
mb->verb_current_recurse = Fcurrent_recurse;
|
||||||
|
RRETURN(MATCH_COMMIT);
|
||||||
|
|
||||||
case OP_PRUNE:
|
case OP_PRUNE:
|
||||||
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14);
|
RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14);
|
||||||
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
||||||
|
@ -5942,7 +5950,7 @@ switch (Freturn_id)
|
||||||
LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16)
|
LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16)
|
||||||
LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24)
|
LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24)
|
||||||
LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32)
|
LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32)
|
||||||
LBL(33) LBL(34) LBL(35)
|
LBL(33) LBL(34) LBL(35) LBL(36)
|
||||||
|
|
||||||
#ifdef SUPPORT_WIDE_CHARS
|
#ifdef SUPPORT_WIDE_CHARS
|
||||||
LBL(100) LBL(101)
|
LBL(100) LBL(101)
|
||||||
|
|
|
@ -4678,12 +4678,6 @@ uint16_t first_listed_newline;
|
||||||
const char *cmdname;
|
const char *cmdname;
|
||||||
uint8_t *argptr, *serial;
|
uint8_t *argptr, *serial;
|
||||||
|
|
||||||
if (restrict_for_perl_test)
|
|
||||||
{
|
|
||||||
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
|
||||||
return PR_ABEND;
|
|
||||||
}
|
|
||||||
|
|
||||||
yield = PR_OK;
|
yield = PR_OK;
|
||||||
cmd = CMD_UNKNOWN;
|
cmd = CMD_UNKNOWN;
|
||||||
cmdlen = 0;
|
cmdlen = 0;
|
||||||
|
@ -4702,6 +4696,12 @@ for (i = 0; i < cmdlistcount; i++)
|
||||||
|
|
||||||
argptr = buffer + cmdlen + 1;
|
argptr = buffer + cmdlen + 1;
|
||||||
|
|
||||||
|
if (restrict_for_perl_test && cmd != CMD_PATTERN && cmd != CMD_SUBJECT)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** #%s is not allowed after #perltest\n", cmdname);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
switch(cmd)
|
switch(cmd)
|
||||||
{
|
{
|
||||||
case CMD_UNKNOWN:
|
case CMD_UNKNOWN:
|
||||||
|
|
|
@ -6203,10 +6203,47 @@ ef) x/x,mark
|
||||||
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
|
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
|
||||||
abc
|
abc
|
||||||
|
|
||||||
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize
|
#pattern no_start_optimize
|
||||||
|
|
||||||
|
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
|
||||||
abc
|
abc
|
||||||
|
|
||||||
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize
|
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/
|
||||||
abc
|
abc
|
||||||
|
|
||||||
|
#subject mark
|
||||||
|
|
||||||
|
/a(*ACCEPT:X)b/
|
||||||
|
abc
|
||||||
|
|
||||||
|
/(?=a(*ACCEPT:QQ)bc)axyz/
|
||||||
|
axyz
|
||||||
|
|
||||||
|
/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
|
||||||
|
abc
|
||||||
|
|
||||||
|
/a(*F:X)b/
|
||||||
|
abc
|
||||||
|
|
||||||
|
/(?(DEFINE)(a(*F:X)))(?1)b/
|
||||||
|
abc
|
||||||
|
|
||||||
|
/a(*COMMIT:X)b/
|
||||||
|
abc
|
||||||
|
|
||||||
|
/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
|
||||||
|
abc
|
||||||
|
|
||||||
|
/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
|
||||||
|
aaaabd
|
||||||
|
|
||||||
|
/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
|
||||||
|
aaaabd
|
||||||
|
|
||||||
|
/a(*COMMIT:X)b/
|
||||||
|
axabc
|
||||||
|
|
||||||
|
#pattern -no_start_optimize
|
||||||
|
#subject -mark
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -2949,10 +2949,9 @@
|
||||||
|
|
||||||
/abc(*:)pqr/
|
/abc(*:)pqr/
|
||||||
|
|
||||||
/abc(*FAIL:123)xyz/
|
|
||||||
|
|
||||||
# This should, and does, fail. In Perl, it does not, which I think is a
|
# This should, and does, fail. In Perl, it does not, which I think is a
|
||||||
# bug because replacing the B in the pattern by (B|D) does make it fail.
|
# bug because replacing the B in the pattern by (B|D) does make it fail.
|
||||||
|
# Turning off Perl's optimization by inserting (??{""}) also makes it fail.
|
||||||
|
|
||||||
/A(*COMMIT)B/aftertext,mark
|
/A(*COMMIT)B/aftertext,mark
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
|
|
@ -9846,12 +9846,64 @@ No match
|
||||||
0: b
|
0: b
|
||||||
1: b
|
1: b
|
||||||
|
|
||||||
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/no_start_optimize
|
#pattern no_start_optimize
|
||||||
|
|
||||||
|
/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/
|
||||||
abc
|
abc
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/no_start_optimize
|
/(?>a(*:1))(?>b)(*SKIP:1)x|.*/
|
||||||
abc
|
abc
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
|
#subject mark
|
||||||
|
|
||||||
|
/a(*ACCEPT:X)b/
|
||||||
|
abc
|
||||||
|
0: a
|
||||||
|
MK: X
|
||||||
|
|
||||||
|
/(?=a(*ACCEPT:QQ)bc)axyz/
|
||||||
|
axyz
|
||||||
|
0: axyz
|
||||||
|
MK: QQ
|
||||||
|
|
||||||
|
/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/
|
||||||
|
abc
|
||||||
|
0: ab
|
||||||
|
MK: X
|
||||||
|
|
||||||
|
/a(*F:X)b/
|
||||||
|
abc
|
||||||
|
No match, mark = X
|
||||||
|
|
||||||
|
/(?(DEFINE)(a(*F:X)))(?1)b/
|
||||||
|
abc
|
||||||
|
No match, mark = X
|
||||||
|
|
||||||
|
/a(*COMMIT:X)b/
|
||||||
|
abc
|
||||||
|
0: ab
|
||||||
|
MK: X
|
||||||
|
|
||||||
|
/(?(DEFINE)(a(*COMMIT:X)))(?1)b/
|
||||||
|
abc
|
||||||
|
0: ab
|
||||||
|
MK: X
|
||||||
|
|
||||||
|
/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/
|
||||||
|
aaaabd
|
||||||
|
0: bd
|
||||||
|
|
||||||
|
/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/
|
||||||
|
aaaabd
|
||||||
|
No match, mark = X
|
||||||
|
|
||||||
|
/a(*COMMIT:X)b/
|
||||||
|
axabc
|
||||||
|
No match, mark = X
|
||||||
|
|
||||||
|
#pattern -no_start_optimize
|
||||||
|
#subject -mark
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -10154,11 +10154,9 @@ Failed: error 166 at offset 10: (*MARK) must have an argument
|
||||||
/abc(*:)pqr/
|
/abc(*:)pqr/
|
||||||
Failed: error 166 at offset 6: (*MARK) must have an argument
|
Failed: error 166 at offset 6: (*MARK) must have an argument
|
||||||
|
|
||||||
/abc(*FAIL:123)xyz/
|
|
||||||
Failed: error 159 at offset 10: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
|
|
||||||
|
|
||||||
# This should, and does, fail. In Perl, it does not, which I think is a
|
# This should, and does, fail. In Perl, it does not, which I think is a
|
||||||
# bug because replacing the B in the pattern by (B|D) does make it fail.
|
# bug because replacing the B in the pattern by (B|D) does make it fail.
|
||||||
|
# Turning off Perl's optimization by inserting (??{""}) also makes it fail.
|
||||||
|
|
||||||
/A(*COMMIT)B/aftertext,mark
|
/A(*COMMIT)B/aftertext,mark
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
|
Loading…
Reference in New Issue