Allow (*ACCEPT) to be quantified.
This commit is contained in:
parent
cc51779d88
commit
306f2b9c57
|
@ -25,6 +25,9 @@ PCRE2_MATCH_INVALID_UTF compile-time option.
|
||||||
7. Adjust the limit for "must have" code unit searching, in particular,
|
7. Adjust the limit for "must have" code unit searching, in particular,
|
||||||
increase it substantially for non-anchored patterns.
|
increase it substantially for non-anchored patterns.
|
||||||
|
|
||||||
|
8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
|
||||||
|
minimum is potentially useful.
|
||||||
|
|
||||||
|
|
||||||
Version 10.33 16-April-2019
|
Version 10.33 16-April-2019
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
|
@ -3224,8 +3224,8 @@ The doubling is removed before the string is passed to the callout function.
|
||||||
There are a number of special "Backtracking Control Verbs" (to use Perl's
|
There are a number of special "Backtracking Control Verbs" (to use Perl's
|
||||||
terminology) that modify the behaviour of backtracking during matching. They
|
terminology) that modify the behaviour of backtracking during matching. They
|
||||||
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
||||||
possibly behaving differently depending on whether or not a name is present.
|
and may behave differently depending on whether or not a name argument is
|
||||||
The names are not required to be unique within the pattern.
|
present. The names are not required to be unique within the pattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
By default, for compatibility with Perl, a name is any sequence of characters
|
By default, for compatibility with Perl, a name is any sequence of characters
|
||||||
|
@ -3253,7 +3253,8 @@ PCRE2_ALT_VERBNAMES is also set.
|
||||||
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
||||||
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
|
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
|
||||||
parenthesis immediately follows the colon, the effect is as if the colon were
|
parenthesis immediately follows the colon, the effect is as if the colon were
|
||||||
not there. Any number of these verbs may occur in a pattern.
|
not there. Any number of these verbs may occur in a pattern. Except for
|
||||||
|
(*ACCEPT), they may not be quantified.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Since these verbs are specifically related to backtracking, most of them can be
|
Since these verbs are specifically related to backtracking, most of them can be
|
||||||
|
@ -3316,6 +3317,18 @@ This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
||||||
the outer parentheses.
|
the outer parentheses.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
(*ACCEPT) is the only backtracking verb that is allowed to be quantified
|
||||||
|
because an ungreedy quantification with a minimum of zero acts only when a
|
||||||
|
backtrack happens. Consider, for example,
|
||||||
|
<pre>
|
||||||
|
A(*ACCEPT)??BC
|
||||||
|
</pre>
|
||||||
|
where A, B, and C may be complex expressions. After matching "A", the matcher
|
||||||
|
processes "BC"; if that fails, causing a backtrack, (*ACCEPT) is triggered and
|
||||||
|
the match succeeds. Whereas (*COMMIT) (see below) means "fail on backtrack", a
|
||||||
|
repeated (*ACCEPT) of this type means "succeed on backtrack".
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
<b>Warning:</b> (*ACCEPT) should not be used within a script run group, because
|
<b>Warning:</b> (*ACCEPT) should not be used within a script run group, because
|
||||||
it causes an immediate exit from the group, bypassing the script run checking.
|
it causes an immediate exit from the group, bypassing the script run checking.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -3333,8 +3346,9 @@ A match with the string "aaaa" always fails, but the callout is taken before
|
||||||
each backtrack happens (in this example, 10 times).
|
each backtrack happens (in this example, 10 times).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
|
(*ACCEPT:NAME) and (*FAIL:NAME) behave the same as (*MARK:NAME)(*ACCEPT) and
|
||||||
(*MARK:NAME)(*FAIL), respectively.
|
(*MARK:NAME)(*FAIL), respectively, that is, a (*MARK) is recorded just before
|
||||||
|
the verb acts.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Recording which path was taken
|
Recording which path was taken
|
||||||
|
@ -3728,7 +3742,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 23 May 2019
|
Last updated: 10 June 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
301
doc/pcre2.txt
301
doc/pcre2.txt
|
@ -8947,8 +8947,8 @@ BACKTRACKING CONTROL
|
||||||
There are a number of special "Backtracking Control Verbs" (to use
|
There are a number of special "Backtracking Control Verbs" (to use
|
||||||
Perl's terminology) that modify the behaviour of backtracking during
|
Perl's terminology) that modify the behaviour of backtracking during
|
||||||
matching. They are generally of the form (*VERB) or (*VERB:NAME). Some
|
matching. They are generally of the form (*VERB) or (*VERB:NAME). Some
|
||||||
verbs take either form, possibly behaving differently depending on
|
verbs take either form, and may behave differently depending on whether
|
||||||
whether or not a name is present. The names are not required to be
|
or not a name argument is present. The names are not required to be
|
||||||
unique within the pattern.
|
unique within the pattern.
|
||||||
|
|
||||||
By default, for compatibility with Perl, a name is any sequence of
|
By default, for compatibility with Perl, a name is any sequence of
|
||||||
|
@ -8975,7 +8975,7 @@ BACKTRACKING CONTROL
|
||||||
the 16-bit and 32-bit libraries. If the name is empty, that is, if the
|
the 16-bit and 32-bit libraries. If the name is empty, that is, if the
|
||||||
closing parenthesis immediately follows the colon, the effect is as if
|
closing parenthesis immediately follows the colon, the effect is as if
|
||||||
the colon were not there. Any number of these verbs may occur in a pat-
|
the colon were not there. Any number of these verbs may occur in a pat-
|
||||||
tern.
|
tern. Except for (*ACCEPT), they may not be quantified.
|
||||||
|
|
||||||
Since these verbs are specifically related to backtracking, most of
|
Since these verbs are specifically related to backtracking, most of
|
||||||
them can be used only when the pattern is to be matched using the tra-
|
them can be used only when the pattern is to be matched using the tra-
|
||||||
|
@ -9025,6 +9025,18 @@ BACKTRACKING CONTROL
|
||||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
|
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
|
||||||
tured by the outer parentheses.
|
tured by the outer parentheses.
|
||||||
|
|
||||||
|
(*ACCEPT) is the only backtracking verb that is allowed to be quanti-
|
||||||
|
fied because an ungreedy quantification with a minimum of zero acts
|
||||||
|
only when a backtrack happens. Consider, for example,
|
||||||
|
|
||||||
|
A(*ACCEPT)??BC
|
||||||
|
|
||||||
|
where A, B, and C may be complex expressions. After matching "A", the
|
||||||
|
matcher processes "BC"; if that fails, causing a backtrack, (*ACCEPT)
|
||||||
|
is triggered and the match succeeds. Whereas (*COMMIT) (see below)
|
||||||
|
means "fail on backtrack", a repeated (*ACCEPT) of this type means
|
||||||
|
"succeed on backtrack".
|
||||||
|
|
||||||
Warning: (*ACCEPT) should not be used within a script run group,
|
Warning: (*ACCEPT) should not be used within a script run group,
|
||||||
because it causes an immediate exit from the group, bypassing the
|
because it causes an immediate exit from the group, bypassing the
|
||||||
script run checking.
|
script run checking.
|
||||||
|
@ -9043,31 +9055,32 @@ BACKTRACKING CONTROL
|
||||||
A match with the string "aaaa" always fails, but the callout is taken
|
A match with the string "aaaa" always fails, but the callout is taken
|
||||||
before each backtrack happens (in this example, 10 times).
|
before each backtrack happens (in this example, 10 times).
|
||||||
|
|
||||||
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT)
|
(*ACCEPT:NAME) and (*FAIL:NAME) behave the same as
|
||||||
and (*MARK:NAME)(*FAIL), respectively.
|
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively, that is, a
|
||||||
|
(*MARK) is recorded just before the verb acts.
|
||||||
|
|
||||||
Recording which path was taken
|
Recording which path was taken
|
||||||
|
|
||||||
There is one verb whose main purpose is to track how a match was
|
There is one verb whose main purpose is to track how a match was
|
||||||
arrived at, though it also has a secondary use in conjunction with
|
arrived at, though it also has a secondary use in conjunction with
|
||||||
advancing the match starting point (see (*SKIP) below).
|
advancing the match starting point (see (*SKIP) below).
|
||||||
|
|
||||||
(*MARK:NAME) or (*:NAME)
|
(*MARK:NAME) or (*:NAME)
|
||||||
|
|
||||||
A name is always required with this verb. For all the other backtrack-
|
A name is always required with this verb. For all the other backtrack-
|
||||||
ing control verbs, a NAME argument is optional.
|
ing control verbs, a NAME argument is optional.
|
||||||
|
|
||||||
When a match succeeds, the name of the last-encountered mark name on
|
When a match succeeds, the name of the last-encountered mark name on
|
||||||
the matching path is passed back to the caller as described in the sec-
|
the matching path is passed back to the caller as described in the sec-
|
||||||
tion entitled "Other information about the match" in the pcre2api docu-
|
tion entitled "Other information about the match" in the pcre2api docu-
|
||||||
mentation. This applies to all instances of (*MARK) and other verbs,
|
mentation. This applies to all instances of (*MARK) and other verbs,
|
||||||
including those inside assertions and atomic groups. However, there are
|
including those inside assertions and atomic groups. However, there are
|
||||||
differences in those cases when (*MARK) is used in conjunction with
|
differences in those cases when (*MARK) is used in conjunction with
|
||||||
(*SKIP) as described below.
|
(*SKIP) as described below.
|
||||||
|
|
||||||
The mark name that was last encountered on the matching path is passed
|
The mark name that was last encountered on the matching path is passed
|
||||||
back. A verb without a NAME argument is ignored for this purpose. Here
|
back. A verb without a NAME argument is ignored for this purpose. Here
|
||||||
is an example of pcre2test output, where the "mark" modifier requests
|
is an example of pcre2test output, where the "mark" modifier requests
|
||||||
the retrieval and outputting of (*MARK) data:
|
the retrieval and outputting of (*MARK) data:
|
||||||
|
|
||||||
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
||||||
|
@ -9079,76 +9092,76 @@ BACKTRACKING CONTROL
|
||||||
MK: B
|
MK: B
|
||||||
|
|
||||||
The (*MARK) name is tagged with "MK:" in this output, and in this exam-
|
The (*MARK) name is tagged with "MK:" in this output, and in this exam-
|
||||||
ple it indicates which of the two alternatives matched. This is a more
|
ple it indicates which of the two alternatives matched. This is a more
|
||||||
efficient way of obtaining this information than putting each alterna-
|
efficient way of obtaining this information than putting each alterna-
|
||||||
tive in its own capturing parentheses.
|
tive in its own capturing parentheses.
|
||||||
|
|
||||||
If a verb with a name is encountered in a positive assertion that is
|
If a verb with a name is encountered in a positive assertion that is
|
||||||
true, the name is recorded and passed back if it is the last-encoun-
|
true, the name is recorded and passed back if it is the last-encoun-
|
||||||
tered. This does not happen for negative assertions or failing positive
|
tered. This does not happen for negative assertions or failing positive
|
||||||
assertions.
|
assertions.
|
||||||
|
|
||||||
After a partial match or a failed match, the last encountered name in
|
After a partial match or a failed match, the last encountered name in
|
||||||
the entire match process is returned. For example:
|
the entire match process is returned. For example:
|
||||||
|
|
||||||
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
||||||
data> XP
|
data> XP
|
||||||
No match, mark = B
|
No match, mark = B
|
||||||
|
|
||||||
Note that in this unanchored example the mark is retained from the
|
Note that in this unanchored example the mark is retained from the
|
||||||
match attempt that started at the letter "X" in the subject. Subsequent
|
match attempt that started at the letter "X" in the subject. Subsequent
|
||||||
match attempts starting at "P" and then with an empty string do not get
|
match attempts starting at "P" and then with an empty string do not get
|
||||||
as far as the (*MARK) item, but nevertheless do not reset it.
|
as far as the (*MARK) item, but nevertheless do not reset it.
|
||||||
|
|
||||||
If you are interested in (*MARK) values after failed matches, you
|
If you are interested in (*MARK) values after failed matches, you
|
||||||
should probably set the PCRE2_NO_START_OPTIMIZE option (see above) to
|
should probably set the PCRE2_NO_START_OPTIMIZE option (see above) to
|
||||||
ensure that the match is always attempted.
|
ensure that the match is always attempted.
|
||||||
|
|
||||||
Verbs that act after backtracking
|
Verbs that act after backtracking
|
||||||
|
|
||||||
The following verbs do nothing when they are encountered. Matching con-
|
The following verbs do nothing when they are encountered. Matching con-
|
||||||
tinues with what follows, but if there is a subsequent match failure,
|
tinues with what follows, but if there is a subsequent match failure,
|
||||||
causing a backtrack to the verb, a failure is forced. That is, back-
|
causing a backtrack to the verb, a failure is forced. That is, back-
|
||||||
tracking cannot pass to the left of the verb. However, when one of
|
tracking cannot pass to the left of the verb. However, when one of
|
||||||
these verbs appears inside an atomic group or in a lookaround assertion
|
these verbs appears inside an atomic group or in a lookaround assertion
|
||||||
that is true, its effect is confined to that group, because once the
|
that is true, its effect is confined to that group, because once the
|
||||||
group has been matched, there is never any backtracking into it. Back-
|
group has been matched, there is never any backtracking into it. Back-
|
||||||
tracking from beyond an assertion or an atomic group ignores the entire
|
tracking from beyond an assertion or an atomic group ignores the entire
|
||||||
group, and seeks a preceding backtracking point.
|
group, and seeks a preceding backtracking point.
|
||||||
|
|
||||||
These verbs differ in exactly what kind of failure occurs when back-
|
These verbs differ in exactly what kind of failure occurs when back-
|
||||||
tracking reaches them. The behaviour described below is what happens
|
tracking reaches them. The behaviour described below is what happens
|
||||||
when the verb is not in a subroutine or an assertion. Subsequent sec-
|
when the verb is not in a subroutine or an assertion. Subsequent sec-
|
||||||
tions cover these special cases.
|
tions cover these special cases.
|
||||||
|
|
||||||
(*COMMIT) or (*COMMIT:NAME)
|
(*COMMIT) or (*COMMIT:NAME)
|
||||||
|
|
||||||
This verb causes the whole match to fail outright if there is a later
|
This verb causes the whole match to fail outright if there is a later
|
||||||
matching failure that causes backtracking to reach it. Even if the pat-
|
matching failure that causes backtracking to reach it. Even if the pat-
|
||||||
tern is unanchored, no further attempts to find a match by advancing
|
tern is unanchored, no further attempts to find a match by advancing
|
||||||
the starting point take place. If (*COMMIT) is the only backtracking
|
the starting point take place. If (*COMMIT) is the only backtracking
|
||||||
verb that is encountered, once it has been passed pcre2_match() is com-
|
verb that is encountered, once it has been passed pcre2_match() is com-
|
||||||
mitted to finding a match at the current starting point, or not at all.
|
mitted to finding a match at the current starting point, or not at all.
|
||||||
For example:
|
For example:
|
||||||
|
|
||||||
a+(*COMMIT)b
|
a+(*COMMIT)b
|
||||||
|
|
||||||
This matches "xxaab" but not "aacaab". It can be thought of as a kind
|
This matches "xxaab" but not "aacaab". It can be thought of as a kind
|
||||||
of dynamic anchor, or "I've started, so I must finish."
|
of dynamic anchor, or "I've started, so I must finish."
|
||||||
|
|
||||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
|
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
|
||||||
MIT). It is like (*MARK:NAME) in that the name is remembered for pass-
|
MIT). It is like (*MARK:NAME) in that the name is remembered for pass-
|
||||||
ing back to the caller. However, (*SKIP:NAME) searches only for names
|
ing back to the caller. However, (*SKIP:NAME) searches only for names
|
||||||
that are set with (*MARK), ignoring those set by any of the other back-
|
that are set with (*MARK), ignoring those set by any of the other back-
|
||||||
tracking verbs.
|
tracking verbs.
|
||||||
|
|
||||||
If there is more than one backtracking verb in a pattern, a different
|
If there is more than one backtracking verb in a pattern, a different
|
||||||
one that follows (*COMMIT) may be triggered first, so merely passing
|
one that follows (*COMMIT) may be triggered first, so merely passing
|
||||||
(*COMMIT) during a match does not always guarantee that a match must be
|
(*COMMIT) during a match does not always guarantee that a match must be
|
||||||
at this starting point.
|
at this starting point.
|
||||||
|
|
||||||
Note that (*COMMIT) at the start of a pattern is not the same as an
|
Note that (*COMMIT) at the start of a pattern is not the same as an
|
||||||
anchor, unless PCRE2's start-of-match optimizations are turned off, as
|
anchor, unless PCRE2's start-of-match optimizations are turned off, as
|
||||||
shown in this output from pcre2test:
|
shown in this output from pcre2test:
|
||||||
|
|
||||||
re> /(*COMMIT)abc/
|
re> /(*COMMIT)abc/
|
||||||
|
@ -9159,63 +9172,63 @@ BACKTRACKING CONTROL
|
||||||
data> xyzabc
|
data> xyzabc
|
||||||
No match
|
No match
|
||||||
|
|
||||||
For the first pattern, PCRE2 knows that any match must start with "a",
|
For the first pattern, PCRE2 knows that any match must start with "a",
|
||||||
so the optimization skips along the subject to "a" before applying the
|
so the optimization skips along the subject to "a" before applying the
|
||||||
pattern to the first set of data. The match attempt then succeeds. The
|
pattern to the first set of data. The match attempt then succeeds. The
|
||||||
second pattern disables the optimization that skips along to the first
|
second pattern disables the optimization that skips along to the first
|
||||||
character. The pattern is now applied starting at "x", and so the
|
character. The pattern is now applied starting at "x", and so the
|
||||||
(*COMMIT) causes the match to fail without trying any other starting
|
(*COMMIT) causes the match to fail without trying any other starting
|
||||||
points.
|
points.
|
||||||
|
|
||||||
(*PRUNE) or (*PRUNE:NAME)
|
(*PRUNE) or (*PRUNE:NAME)
|
||||||
|
|
||||||
This verb causes the match to fail at the current starting position in
|
This verb causes the match to fail at the current starting position in
|
||||||
the subject if there is a later matching failure that causes backtrack-
|
the subject if there is a later matching failure that causes backtrack-
|
||||||
ing to reach it. If the pattern is unanchored, the normal "bumpalong"
|
ing to reach it. If the pattern is unanchored, the normal "bumpalong"
|
||||||
advance to the next starting character then happens. Backtracking can
|
advance to the next starting character then happens. Backtracking can
|
||||||
occur as usual to the left of (*PRUNE), before it is reached, or when
|
occur as usual to the left of (*PRUNE), before it is reached, or when
|
||||||
matching to the right of (*PRUNE), but if there is no match to the
|
matching to the right of (*PRUNE), but if there is no match to the
|
||||||
right, backtracking cannot cross (*PRUNE). In simple cases, the use of
|
right, backtracking cannot cross (*PRUNE). In simple cases, the use of
|
||||||
(*PRUNE) is just an alternative to an atomic group or possessive quan-
|
(*PRUNE) is just an alternative to an atomic group or possessive quan-
|
||||||
tifier, but there are some uses of (*PRUNE) that cannot be expressed in
|
tifier, but there are some uses of (*PRUNE) that cannot be expressed in
|
||||||
any other way. In an anchored pattern (*PRUNE) has the same effect as
|
any other way. In an anchored pattern (*PRUNE) has the same effect as
|
||||||
(*COMMIT).
|
(*COMMIT).
|
||||||
|
|
||||||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
|
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
|
||||||
It is like (*MARK:NAME) in that the name is remembered for passing back
|
It is like (*MARK:NAME) in that the name is remembered for passing back
|
||||||
to the caller. However, (*SKIP:NAME) searches only for names set with
|
to the caller. However, (*SKIP:NAME) searches only for names set with
|
||||||
(*MARK), ignoring those set by other backtracking verbs.
|
(*MARK), ignoring those set by other backtracking verbs.
|
||||||
|
|
||||||
(*SKIP)
|
(*SKIP)
|
||||||
|
|
||||||
This verb, when given without a name, is like (*PRUNE), except that if
|
This verb, when given without a name, is like (*PRUNE), except that if
|
||||||
the pattern is unanchored, the "bumpalong" advance is not to the next
|
the pattern is unanchored, the "bumpalong" advance is not to the next
|
||||||
character, but to the position in the subject where (*SKIP) was encoun-
|
character, but to the position in the subject where (*SKIP) was encoun-
|
||||||
tered. (*SKIP) signifies that whatever text was matched leading up to
|
tered. (*SKIP) signifies that whatever text was matched leading up to
|
||||||
it cannot be part of a successful match if there is a later mismatch.
|
it cannot be part of a successful match if there is a later mismatch.
|
||||||
Consider:
|
Consider:
|
||||||
|
|
||||||
a+(*SKIP)b
|
a+(*SKIP)b
|
||||||
|
|
||||||
If the subject is "aaaac...", after the first match attempt fails
|
If the subject is "aaaac...", after the first match attempt fails
|
||||||
(starting at the first character in the string), the starting point
|
(starting at the first character in the string), the starting point
|
||||||
skips on to start the next attempt at "c". Note that a possessive quan-
|
skips on to start the next attempt at "c". Note that a possessive quan-
|
||||||
tifer does not have the same effect as this example; although it would
|
tifer does not have the same effect as this example; although it would
|
||||||
suppress backtracking during the first match attempt, the second
|
suppress backtracking during the first match attempt, the second
|
||||||
attempt would start at the second character instead of skipping on to
|
attempt would start at the second character instead of skipping on to
|
||||||
"c".
|
"c".
|
||||||
|
|
||||||
(*SKIP:NAME)
|
(*SKIP:NAME)
|
||||||
|
|
||||||
When (*SKIP) has an associated name, its behaviour is modified. When
|
When (*SKIP) has an associated name, its behaviour is modified. When
|
||||||
such a (*SKIP) is triggered, the previous path through the pattern is
|
such a (*SKIP) is triggered, the previous path through the pattern is
|
||||||
searched for the most recent (*MARK) that has the same name. If one is
|
searched for the most recent (*MARK) that has the same name. If one is
|
||||||
found, the "bumpalong" advance is to the subject position that corre-
|
found, the "bumpalong" advance is to the subject position that corre-
|
||||||
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
||||||
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
|
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
|
||||||
|
|
||||||
The search for a (*MARK) name uses the normal backtracking mechanism,
|
The search for a (*MARK) name uses the normal backtracking mechanism,
|
||||||
which means that it does not see (*MARK) settings that are inside
|
which means that it does not see (*MARK) settings that are inside
|
||||||
atomic groups or assertions, because they are never re-entered by back-
|
atomic groups or assertions, because they are never re-entered by back-
|
||||||
tracking. Compare the following pcre2test examples:
|
tracking. Compare the following pcre2test examples:
|
||||||
|
|
||||||
|
@ -9229,105 +9242,105 @@ BACKTRACKING CONTROL
|
||||||
0: b
|
0: b
|
||||||
1: b
|
1: b
|
||||||
|
|
||||||
In the first example, the (*MARK) setting is in an atomic group, so it
|
In the first example, the (*MARK) setting is in an atomic group, so it
|
||||||
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
|
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
|
||||||
This allows the second branch of the pattern to be tried at the first
|
This allows the second branch of the pattern to be tried at the first
|
||||||
character position. In the second example, the (*MARK) setting is not
|
character position. In the second example, the (*MARK) setting is not
|
||||||
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
|
in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it
|
||||||
backtracks, and this causes a new matching attempt to start at the sec-
|
backtracks, and this causes a new matching attempt to start at the sec-
|
||||||
ond character. This time, the (*MARK) is never seen because "a" does
|
ond character. This time, the (*MARK) is never seen because "a" does
|
||||||
not match "b", so the matcher immediately jumps to the second branch of
|
not match "b", so the matcher immediately jumps to the second branch of
|
||||||
the pattern.
|
the pattern.
|
||||||
|
|
||||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
||||||
ignores names that are set by other backtracking verbs.
|
ignores names that are set by other backtracking verbs.
|
||||||
|
|
||||||
(*THEN) or (*THEN:NAME)
|
(*THEN) or (*THEN:NAME)
|
||||||
|
|
||||||
This verb causes a skip to the next innermost alternative when back-
|
This verb causes a skip to the next innermost alternative when back-
|
||||||
tracking reaches it. That is, it cancels any further backtracking
|
tracking reaches it. That is, it cancels any further backtracking
|
||||||
within the current alternative. Its name comes from the observation
|
within the current alternative. Its name comes from the observation
|
||||||
that it can be used for a pattern-based if-then-else block:
|
that it can be used for a pattern-based if-then-else block:
|
||||||
|
|
||||||
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
|
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
|
||||||
|
|
||||||
If the COND1 pattern matches, FOO is tried (and possibly further items
|
If the COND1 pattern matches, FOO is tried (and possibly further items
|
||||||
after the end of the group if FOO succeeds); on failure, the matcher
|
after the end of the group if FOO succeeds); on failure, the matcher
|
||||||
skips to the second alternative and tries COND2, without backtracking
|
skips to the second alternative and tries COND2, without backtracking
|
||||||
into COND1. If that succeeds and BAR fails, COND3 is tried. If subse-
|
into COND1. If that succeeds and BAR fails, COND3 is tried. If subse-
|
||||||
quently BAZ fails, there are no more alternatives, so there is a back-
|
quently BAZ fails, there are no more alternatives, so there is a back-
|
||||||
track to whatever came before the entire group. If (*THEN) is not
|
track to whatever came before the entire group. If (*THEN) is not
|
||||||
inside an alternation, it acts like (*PRUNE).
|
inside an alternation, it acts like (*PRUNE).
|
||||||
|
|
||||||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
|
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
|
||||||
It is like (*MARK:NAME) in that the name is remembered for passing back
|
It is like (*MARK:NAME) in that the name is remembered for passing back
|
||||||
to the caller. However, (*SKIP:NAME) searches only for names set with
|
to the caller. However, (*SKIP:NAME) searches only for names set with
|
||||||
(*MARK), ignoring those set by other backtracking verbs.
|
(*MARK), ignoring those set by other backtracking verbs.
|
||||||
|
|
||||||
A group that does not contain a | character is just a part of the
|
A group that does not contain a | character is just a part of the
|
||||||
enclosing alternative; it is not a nested alternation with only one
|
enclosing alternative; it is not a nested alternation with only one
|
||||||
alternative. The effect of (*THEN) extends beyond such a group to the
|
alternative. The effect of (*THEN) extends beyond such a group to the
|
||||||
enclosing alternative. Consider this pattern, where A, B, etc. are
|
enclosing alternative. Consider this pattern, where A, B, etc. are
|
||||||
complex pattern fragments that do not contain any | characters at this
|
complex pattern fragments that do not contain any | characters at this
|
||||||
level:
|
level:
|
||||||
|
|
||||||
A (B(*THEN)C) | D
|
A (B(*THEN)C) | D
|
||||||
|
|
||||||
If A and B are matched, but there is a failure in C, matching does not
|
If A and B are matched, but there is a failure in C, matching does not
|
||||||
backtrack into A; instead it moves to the next alternative, that is, D.
|
backtrack into A; instead it moves to the next alternative, that is, D.
|
||||||
However, if the group containing (*THEN) is given an alternative, it
|
However, if the group containing (*THEN) is given an alternative, it
|
||||||
behaves differently:
|
behaves differently:
|
||||||
|
|
||||||
A (B(*THEN)C | (*FAIL)) | D
|
A (B(*THEN)C | (*FAIL)) | D
|
||||||
|
|
||||||
The effect of (*THEN) is now confined to the inner group. After a fail-
|
The effect of (*THEN) is now confined to the inner group. After a fail-
|
||||||
ure in C, matching moves to (*FAIL), which causes the whole group to
|
ure in C, matching moves to (*FAIL), which causes the whole group to
|
||||||
fail because there are no more alternatives to try. In this case,
|
fail because there are no more alternatives to try. In this case,
|
||||||
matching does backtrack into A.
|
matching does backtrack into A.
|
||||||
|
|
||||||
Note that a conditional group is not considered as having two alterna-
|
Note that a conditional group is not considered as having two alterna-
|
||||||
tives, because only one is ever used. In other words, the | character
|
tives, because only one is ever used. In other words, the | character
|
||||||
in a conditional group has a different meaning. Ignoring white space,
|
in a conditional group has a different meaning. Ignoring white space,
|
||||||
consider:
|
consider:
|
||||||
|
|
||||||
^.*? (?(?=a) a | b(*THEN)c )
|
^.*? (?(?=a) a | b(*THEN)c )
|
||||||
|
|
||||||
If the subject is "ba", this pattern does not match. Because .*? is
|
If the subject is "ba", this pattern does not match. Because .*? is
|
||||||
ungreedy, it initially matches zero characters. The condition (?=a)
|
ungreedy, it initially matches zero characters. The condition (?=a)
|
||||||
then fails, the character "b" is matched, but "c" is not. At this
|
then fails, the character "b" is matched, but "c" is not. At this
|
||||||
point, matching does not backtrack to .*? as might perhaps be expected
|
point, matching does not backtrack to .*? as might perhaps be expected
|
||||||
from the presence of the | character. The conditional group is part of
|
from the presence of the | character. The conditional group is part of
|
||||||
the single alternative that comprises the whole pattern, and so the
|
the single alternative that comprises the whole pattern, and so the
|
||||||
match fails. (If there was a backtrack into .*?, allowing it to match
|
match fails. (If there was a backtrack into .*?, allowing it to match
|
||||||
"b", the match would succeed.)
|
"b", the match would succeed.)
|
||||||
|
|
||||||
The verbs just described provide four different "strengths" of control
|
The verbs just described provide four different "strengths" of control
|
||||||
when subsequent matching fails. (*THEN) is the weakest, carrying on the
|
when subsequent matching fails. (*THEN) is the weakest, carrying on the
|
||||||
match at the next alternative. (*PRUNE) comes next, failing the match
|
match at the next alternative. (*PRUNE) comes next, failing the match
|
||||||
at the current starting position, but allowing an advance to the next
|
at the current starting position, but allowing an advance to the next
|
||||||
character (for an unanchored pattern). (*SKIP) is similar, except that
|
character (for an unanchored pattern). (*SKIP) is similar, except that
|
||||||
the advance may be more than one character. (*COMMIT) is the strongest,
|
the advance may be more than one character. (*COMMIT) is the strongest,
|
||||||
causing the entire match to fail.
|
causing the entire match to fail.
|
||||||
|
|
||||||
More than one backtracking verb
|
More than one backtracking verb
|
||||||
|
|
||||||
If more than one backtracking verb is present in a pattern, the one
|
If more than one backtracking verb is present in a pattern, the one
|
||||||
that is backtracked onto first acts. For example, consider this pat-
|
that is backtracked onto first acts. For example, consider this pat-
|
||||||
tern, where A, B, etc. are complex pattern fragments:
|
tern, where A, B, etc. are complex pattern fragments:
|
||||||
|
|
||||||
(A(*COMMIT)B(*THEN)C|ABD)
|
(A(*COMMIT)B(*THEN)C|ABD)
|
||||||
|
|
||||||
If A matches but B fails, the backtrack to (*COMMIT) causes the entire
|
If A matches but B fails, the backtrack to (*COMMIT) causes the entire
|
||||||
match to fail. However, if A and B match, but C fails, the backtrack to
|
match to fail. However, if A and B match, but C fails, the backtrack to
|
||||||
(*THEN) causes the next alternative (ABD) to be tried. This behaviour
|
(*THEN) causes the next alternative (ABD) to be tried. This behaviour
|
||||||
is consistent, but is not always the same as Perl's. It means that if
|
is consistent, but is not always the same as Perl's. It means that if
|
||||||
two or more backtracking verbs appear in succession, all the the last
|
two or more backtracking verbs appear in succession, all the the last
|
||||||
of them has no effect. Consider this example:
|
of them has no effect. Consider this example:
|
||||||
|
|
||||||
...(*COMMIT)(*PRUNE)...
|
...(*COMMIT)(*PRUNE)...
|
||||||
|
|
||||||
If there is a matching failure to the right, backtracking onto (*PRUNE)
|
If there is a matching failure to the right, backtracking onto (*PRUNE)
|
||||||
causes it to be triggered, and its action is taken. There can never be
|
causes it to be triggered, and its action is taken. There can never be
|
||||||
a backtrack onto (*COMMIT).
|
a backtrack onto (*COMMIT).
|
||||||
|
|
||||||
Backtracking verbs in repeated groups
|
Backtracking verbs in repeated groups
|
||||||
|
@ -9337,42 +9350,42 @@ BACKTRACKING CONTROL
|
||||||
|
|
||||||
/(a(*COMMIT)b)+ac/
|
/(a(*COMMIT)b)+ac/
|
||||||
|
|
||||||
If the subject is "abac", Perl matches unless its optimizations are
|
If the subject is "abac", Perl matches unless its optimizations are
|
||||||
disabled, but PCRE2 always fails because the (*COMMIT) in the second
|
disabled, but PCRE2 always fails because the (*COMMIT) in the second
|
||||||
repeat of the group acts.
|
repeat of the group acts.
|
||||||
|
|
||||||
Backtracking verbs in assertions
|
Backtracking verbs in assertions
|
||||||
|
|
||||||
(*FAIL) in any assertion has its normal effect: it forces an immediate
|
(*FAIL) in any assertion has its normal effect: it forces an immediate
|
||||||
backtrack. The behaviour of the other backtracking verbs depends on
|
backtrack. The behaviour of the other backtracking verbs depends on
|
||||||
whether or not the assertion is standalone or acting as the condition
|
whether or not the assertion is standalone or acting as the condition
|
||||||
in a conditional group.
|
in a conditional group.
|
||||||
|
|
||||||
(*ACCEPT) in a standalone positive assertion causes the assertion to
|
(*ACCEPT) in a standalone positive assertion causes the assertion to
|
||||||
succeed without any further processing; captured strings and a mark
|
succeed without any further processing; captured strings and a mark
|
||||||
name (if set) are retained. In a standalone negative assertion,
|
name (if set) are retained. In a standalone negative assertion,
|
||||||
(*ACCEPT) causes the assertion to fail without any further processing;
|
(*ACCEPT) causes the assertion to fail without any further processing;
|
||||||
captured substrings and any mark name are discarded.
|
captured substrings and any mark name are discarded.
|
||||||
|
|
||||||
If the assertion is a condition, (*ACCEPT) causes the condition to be
|
If the assertion is a condition, (*ACCEPT) causes the condition to be
|
||||||
true for a positive assertion and false for a negative one; captured
|
true for a positive assertion and false for a negative one; captured
|
||||||
substrings are retained in both cases.
|
substrings are retained in both cases.
|
||||||
|
|
||||||
The remaining verbs act only when a later failure causes a backtrack to
|
The remaining verbs act only when a later failure causes a backtrack to
|
||||||
reach them. This means that their effect is confined to the assertion,
|
reach them. This means that their effect is confined to the assertion,
|
||||||
because lookaround assertions are atomic. A backtrack that occurs after
|
because lookaround assertions are atomic. A backtrack that occurs after
|
||||||
an assertion is complete does not jump back into the assertion. Note in
|
an assertion is complete does not jump back into the assertion. Note in
|
||||||
particular that a (*MARK) name that is set in an assertion is not
|
particular that a (*MARK) name that is set in an assertion is not
|
||||||
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
|
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
|
||||||
|
|
||||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
||||||
there are no more branches to try, (*THEN) causes a positive assertion
|
there are no more branches to try, (*THEN) causes a positive assertion
|
||||||
to be false, and a negative assertion to be true.
|
to be false, and a negative assertion to be true.
|
||||||
|
|
||||||
The other backtracking verbs are not treated specially if they appear
|
The other backtracking verbs are not treated specially if they appear
|
||||||
in a standalone positive assertion. In a conditional positive asser-
|
in a standalone positive assertion. In a conditional positive asser-
|
||||||
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
|
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
|
||||||
or (*PRUNE) causes the condition to be false. However, for both stand-
|
or (*PRUNE) causes the condition to be false. However, for both stand-
|
||||||
alone and conditional negative assertions, backtracking into (*COMMIT),
|
alone and conditional negative assertions, backtracking into (*COMMIT),
|
||||||
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
|
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
|
||||||
ing any further alternative branches.
|
ing any further alternative branches.
|
||||||
|
@ -9382,26 +9395,26 @@ BACKTRACKING CONTROL
|
||||||
These behaviours occur whether or not the group is called recursively.
|
These behaviours occur whether or not the group is called recursively.
|
||||||
|
|
||||||
(*ACCEPT) in a group called as a subroutine causes the subroutine match
|
(*ACCEPT) in a group called as a subroutine causes the subroutine match
|
||||||
to succeed without any further processing. Matching then continues
|
to succeed without any further processing. Matching then continues
|
||||||
after the subroutine call. Perl documents this behaviour. Perl's treat-
|
after the subroutine call. Perl documents this behaviour. Perl's treat-
|
||||||
ment of the other verbs in subroutines is different in some cases.
|
ment of the other verbs in subroutines is different in some cases.
|
||||||
|
|
||||||
(*FAIL) in a group called as a subroutine has its normal effect: it
|
(*FAIL) in a group called as a subroutine has its normal effect: it
|
||||||
forces an immediate backtrack.
|
forces an immediate backtrack.
|
||||||
|
|
||||||
(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail
|
(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail
|
||||||
when triggered by being backtracked to in a group called as a subrou-
|
when triggered by being backtracked to in a group called as a subrou-
|
||||||
tine. There is then a backtrack at the outer level.
|
tine. There is then a backtrack at the outer level.
|
||||||
|
|
||||||
(*THEN), when triggered, skips to the next alternative in the innermost
|
(*THEN), when triggered, skips to the next alternative in the innermost
|
||||||
enclosing group that has alternatives (its normal behaviour). However,
|
enclosing group that has alternatives (its normal behaviour). However,
|
||||||
if there is no such group within the subroutine's group, the subroutine
|
if there is no such group within the subroutine's group, the subroutine
|
||||||
match fails and there is a backtrack at the outer level.
|
match fails and there is a backtrack at the outer level.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
|
pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
|
||||||
pcre2(3).
|
pcre2(3).
|
||||||
|
|
||||||
|
|
||||||
|
@ -9414,7 +9427,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 23 May 2019
|
Last updated: 10 June 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "23 May 2019" "PCRE2 10.34"
|
.TH PCRE2PATTERN 3 "10 June 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -3262,8 +3262,8 @@ The doubling is removed before the string is passed to the callout function.
|
||||||
There are a number of special "Backtracking Control Verbs" (to use Perl's
|
There are a number of special "Backtracking Control Verbs" (to use Perl's
|
||||||
terminology) that modify the behaviour of backtracking during matching. They
|
terminology) that modify the behaviour of backtracking during matching. They
|
||||||
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
||||||
possibly behaving differently depending on whether or not a name is present.
|
and may behave differently depending on whether or not a name argument is
|
||||||
The names are not required to be unique within the pattern.
|
present. The names are not required to be unique within the pattern.
|
||||||
.P
|
.P
|
||||||
By default, for compatibility with Perl, a name is any sequence of characters
|
By default, for compatibility with Perl, a name is any sequence of characters
|
||||||
that does not include a closing parenthesis. The name is not processed in
|
that does not include a closing parenthesis. The name is not processed in
|
||||||
|
@ -3287,7 +3287,8 @@ PCRE2_ALT_VERBNAMES is also set.
|
||||||
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
||||||
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
|
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
|
||||||
parenthesis immediately follows the colon, the effect is as if the colon were
|
parenthesis immediately follows the colon, the effect is as if the colon were
|
||||||
not there. Any number of these verbs may occur in a pattern.
|
not there. Any number of these verbs may occur in a pattern. Except for
|
||||||
|
(*ACCEPT), they may not be quantified.
|
||||||
.P
|
.P
|
||||||
Since these verbs are specifically related to backtracking, most of them can be
|
Since these verbs are specifically related to backtracking, most of them can be
|
||||||
used only when the pattern is to be matched using the traditional matching
|
used only when the pattern is to be matched using the traditional matching
|
||||||
|
@ -3361,6 +3362,17 @@ example:
|
||||||
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
|
||||||
the outer parentheses.
|
the outer parentheses.
|
||||||
.P
|
.P
|
||||||
|
(*ACCEPT) is the only backtracking verb that is allowed to be quantified
|
||||||
|
because an ungreedy quantification with a minimum of zero acts only when a
|
||||||
|
backtrack happens. Consider, for example,
|
||||||
|
.sp
|
||||||
|
A(*ACCEPT)??BC
|
||||||
|
.sp
|
||||||
|
where A, B, and C may be complex expressions. After matching "A", the matcher
|
||||||
|
processes "BC"; if that fails, causing a backtrack, (*ACCEPT) is triggered and
|
||||||
|
the match succeeds. Whereas (*COMMIT) (see below) means "fail on backtrack", a
|
||||||
|
repeated (*ACCEPT) of this type means "succeed on backtrack".
|
||||||
|
.P
|
||||||
\fBWarning:\fP (*ACCEPT) should not be used within a script run group, because
|
\fBWarning:\fP (*ACCEPT) should not be used within a script run group, because
|
||||||
it causes an immediate exit from the group, bypassing the script run checking.
|
it causes an immediate exit from the group, bypassing the script run checking.
|
||||||
.sp
|
.sp
|
||||||
|
@ -3377,8 +3389,9 @@ nearest equivalent is the callout feature, as for example in this pattern:
|
||||||
A match with the string "aaaa" always fails, but the callout is taken before
|
A match with the string "aaaa" always fails, but the callout is taken before
|
||||||
each backtrack happens (in this example, 10 times).
|
each backtrack happens (in this example, 10 times).
|
||||||
.P
|
.P
|
||||||
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
|
(*ACCEPT:NAME) and (*FAIL:NAME) behave the same as (*MARK:NAME)(*ACCEPT) and
|
||||||
(*MARK:NAME)(*FAIL), respectively.
|
(*MARK:NAME)(*FAIL), respectively, that is, a (*MARK) is recorded just before
|
||||||
|
the verb acts.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Recording which path was taken"
|
.SS "Recording which path was taken"
|
||||||
|
@ -3764,6 +3777,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 May 2019
|
Last updated: 10 June 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1419,9 +1419,6 @@ the result is "not a repeat quantifier". */
|
||||||
EXIT:
|
EXIT:
|
||||||
if (yield || *errorcodeptr != 0) *ptrptr = p;
|
if (yield || *errorcodeptr != 0) *ptrptr = p;
|
||||||
return yield;
|
return yield;
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -2450,8 +2447,9 @@ must be last. */
|
||||||
|
|
||||||
enum { RANGE_NO, RANGE_STARTED, RANGE_OK_ESCAPED, RANGE_OK_LITERAL };
|
enum { RANGE_NO, RANGE_STARTED, RANGE_OK_ESCAPED, RANGE_OK_LITERAL };
|
||||||
|
|
||||||
/* Only in 32-bit mode can there be literals > META_END. A macros encapsulates
|
/* Only in 32-bit mode can there be literals > META_END. A macro encapsulates
|
||||||
the storing of literal values in the parsed pattern. */
|
the storing of literal values in the main parsed pattern, where they can always
|
||||||
|
be quantified. */
|
||||||
|
|
||||||
#if PCRE2_CODE_UNIT_WIDTH == 32
|
#if PCRE2_CODE_UNIT_WIDTH == 32
|
||||||
#define PARSED_LITERAL(c, p) \
|
#define PARSED_LITERAL(c, p) \
|
||||||
|
@ -2474,6 +2472,7 @@ uint32_t delimiter;
|
||||||
uint32_t namelen;
|
uint32_t namelen;
|
||||||
uint32_t class_range_state;
|
uint32_t class_range_state;
|
||||||
uint32_t *verblengthptr = NULL; /* Value avoids compiler warning */
|
uint32_t *verblengthptr = NULL; /* Value avoids compiler warning */
|
||||||
|
uint32_t *verbstartptr = NULL;
|
||||||
uint32_t *previous_callout = NULL;
|
uint32_t *previous_callout = NULL;
|
||||||
uint32_t *parsed_pattern = cb->parsed_pattern;
|
uint32_t *parsed_pattern = cb->parsed_pattern;
|
||||||
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
||||||
|
@ -2640,13 +2639,15 @@ while (ptr < ptrend)
|
||||||
|
|
||||||
switch(c)
|
switch(c)
|
||||||
{
|
{
|
||||||
default:
|
default: /* Don't use PARSED_LITERAL() because it */
|
||||||
PARSED_LITERAL(c, parsed_pattern);
|
#if PCRE2_CODE_UNIT_WIDTH == 32 /* sets okquantifier. */
|
||||||
|
if (c >= META_END) *parsed_pattern++ = META_BIGVALUE;
|
||||||
|
#endif
|
||||||
|
*parsed_pattern++ = c;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case CHAR_RIGHT_PARENTHESIS:
|
case CHAR_RIGHT_PARENTHESIS:
|
||||||
inverbname = FALSE;
|
inverbname = FALSE;
|
||||||
okquantifier = FALSE; /* Was probably set by literals */
|
|
||||||
/* This is the length in characters */
|
/* This is the length in characters */
|
||||||
verbnamelength = (PCRE2_SIZE)(parsed_pattern - verblengthptr - 1);
|
verbnamelength = (PCRE2_SIZE)(parsed_pattern - verblengthptr - 1);
|
||||||
/* But the limit on the length is in code units */
|
/* But the limit on the length is in code units */
|
||||||
|
@ -3135,6 +3136,21 @@ while (ptr < ptrend)
|
||||||
goto FAILED_BACK;
|
goto FAILED_BACK;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Most (*VERB)s are not allowed to be quantified, but an ungreedy
|
||||||
|
quantifier can be useful for (*ACCEPT) - meaning "succeed on backtrack", a
|
||||||
|
sort of negated (*COMMIT). We therefore allow (*ACCEPT) to be quantified by
|
||||||
|
wrapping it in non-capturing brackets, but we have to allow for a preceding
|
||||||
|
(*MARK) for when (*ACCEPT) has an argument. */
|
||||||
|
|
||||||
|
if (parsed_pattern[-1] == META_ACCEPT)
|
||||||
|
{
|
||||||
|
uint32_t *p;
|
||||||
|
for (p = parsed_pattern - 1; p >= verbstartptr; p--) p[1] = p[0];
|
||||||
|
*verbstartptr = META_NOCAPTURE;
|
||||||
|
parsed_pattern[1] = META_KET;
|
||||||
|
parsed_pattern += 2;
|
||||||
|
}
|
||||||
|
|
||||||
/* Now we can put the quantifier into the parsed pattern vector. At this
|
/* Now we can put the quantifier into the parsed pattern vector. At this
|
||||||
stage, we have only the basic quantifier. The check for a following + or ?
|
stage, we have only the basic quantifier. The check for a following + or ?
|
||||||
modifier happens at the top of the loop, after any intervening comments
|
modifier happens at the top of the loop, after any intervening comments
|
||||||
|
@ -3775,6 +3791,12 @@ while (ptr < ptrend)
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Remember where this verb, possibly with a preceding (*MARK), starts,
|
||||||
|
for handling quantified (*ACCEPT). */
|
||||||
|
|
||||||
|
verbstartptr = parsed_pattern;
|
||||||
|
okquantifier = (verbs[i].meta == META_ACCEPT);
|
||||||
|
|
||||||
/* It appears that Perl allows any characters whatsoever, other than a
|
/* It appears that Perl allows any characters whatsoever, other than a
|
||||||
closing parenthesis, to appear in arguments ("names"), so we no longer
|
closing parenthesis, to appear in arguments ("names"), so we no longer
|
||||||
insist on letters, digits, and underscores. Perl does not, however, do
|
insist on letters, digits, and underscores. Perl does not, however, do
|
||||||
|
@ -9503,10 +9525,10 @@ if (pattern == NULL)
|
||||||
|
|
||||||
if (ccontext == NULL)
|
if (ccontext == NULL)
|
||||||
ccontext = (pcre2_compile_context *)(&PRIV(default_compile_context));
|
ccontext = (pcre2_compile_context *)(&PRIV(default_compile_context));
|
||||||
|
|
||||||
/* PCRE2_MATCH_INVALID_UTF implies UTF */
|
/* PCRE2_MATCH_INVALID_UTF implies UTF */
|
||||||
|
|
||||||
if ((options & PCRE2_MATCH_INVALID_UTF) != 0) options |= PCRE2_UTF;
|
if ((options & PCRE2_MATCH_INVALID_UTF) != 0) options |= PCRE2_UTF;
|
||||||
|
|
||||||
/* Check that all undefined public option bits are zero. */
|
/* Check that all undefined public option bits are zero. */
|
||||||
|
|
||||||
|
|
|
@ -5591,4 +5591,16 @@ a)"xI
|
||||||
|
|
||||||
/\[()]{65535}(?<A>)/expand
|
/\[()]{65535}(?<A>)/expand
|
||||||
|
|
||||||
|
/a(?:(*ACCEPT))??bc/
|
||||||
|
abc
|
||||||
|
axy
|
||||||
|
|
||||||
|
/a(*ACCEPT)??bc/
|
||||||
|
abc
|
||||||
|
axy
|
||||||
|
|
||||||
|
/a(*ACCEPT:XX)??bc/mark
|
||||||
|
abc
|
||||||
|
axy
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -16940,6 +16940,25 @@ Failed: error 197 at offset 131071: too many capturing groups (maximum 65535)
|
||||||
/\[()]{65535}(?<A>)/expand
|
/\[()]{65535}(?<A>)/expand
|
||||||
Failed: error 197 at offset 131075: too many capturing groups (maximum 65535)
|
Failed: error 197 at offset 131075: too many capturing groups (maximum 65535)
|
||||||
|
|
||||||
|
/a(?:(*ACCEPT))??bc/
|
||||||
|
abc
|
||||||
|
0: abc
|
||||||
|
axy
|
||||||
|
0: a
|
||||||
|
|
||||||
|
/a(*ACCEPT)??bc/
|
||||||
|
abc
|
||||||
|
0: abc
|
||||||
|
axy
|
||||||
|
0: a
|
||||||
|
|
||||||
|
/a(*ACCEPT:XX)??bc/mark
|
||||||
|
abc
|
||||||
|
0: abc
|
||||||
|
axy
|
||||||
|
0: a
|
||||||
|
MK: XX
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
Loading…
Reference in New Issue