Implement Perl 5.28's alphabetic lookaround syntax, e.g. (*pla:...) and also
(*atomic:...).
This commit is contained in:
parent
69254c77f1
commit
f26b0b0bae
|
@ -22,6 +22,14 @@ wrong library in some environments.
|
||||||
|
|
||||||
6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
|
6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
|
||||||
|
|
||||||
|
7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
|
||||||
|
lookaround assertions, for example, (*pla:...) and (*atomic:...). These are
|
||||||
|
characterized by a lower case letter following (* and to simplify coding for
|
||||||
|
this, the character tables created by pcre2_maketables() were updated to add a
|
||||||
|
new "is lower case letter" bit. At the same time, the now unused "is
|
||||||
|
hexadecimal digit" bit was removed. The default tables in
|
||||||
|
src/pcre2_chartables.c.dist are updated.
|
||||||
|
|
||||||
|
|
||||||
Version 10.32 10-September-2018
|
Version 10.32 10-September-2018
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
|
@ -2120,6 +2120,11 @@ special parenthesis, starting with (?> as in this example:
|
||||||
<pre>
|
<pre>
|
||||||
(?>\d+)foo
|
(?>\d+)foo
|
||||||
</pre>
|
</pre>
|
||||||
|
Perl 5.28 introduced an experimental alphabetic form starting with (* which may
|
||||||
|
be easier to remember:
|
||||||
|
<pre>
|
||||||
|
(*atomic:\d+)foo
|
||||||
|
</pre>
|
||||||
This kind of parenthesis "locks up" the part of the pattern it contains once
|
This kind of parenthesis "locks up" the part of the pattern it contains once
|
||||||
it has matched, and a failure further into the pattern is prevented from
|
it has matched, and a failure further into the pattern is prevented from
|
||||||
backtracking into it. Backtracking past it to previous items, however, works as
|
backtracking into it. Backtracking past it to previous items, however, works as
|
||||||
|
@ -2342,11 +2347,17 @@ coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described
|
||||||
<P>
|
<P>
|
||||||
More complicated assertions are coded as subpatterns. There are two kinds:
|
More complicated assertions are coded as subpatterns. There are two kinds:
|
||||||
those that look ahead of the current position in the subject string, and those
|
those that look ahead of the current position in the subject string, and those
|
||||||
that look behind it, and in each case an assertion may be positive (must
|
that look behind it, and in each case an assertion may be positive (must match
|
||||||
succeed for matching to continue) or negative (must not succeed for matching to
|
for the assertion to be true) or negative (must not match for the assertion to
|
||||||
continue). An assertion subpattern is matched in the normal way, except that,
|
be true). An assertion subpattern is matched in the normal way, and if it is
|
||||||
when matching continues after a successful assertion, the matching position in
|
true, matching continues after it, but with the matching position in the
|
||||||
the subject string is as it was before the assertion was processed.
|
subject string is was it was before the assertion was processed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
A lookaround assertion may also appear as the condition in a
|
||||||
|
<a href="#conditions">conditional subpattern</a>
|
||||||
|
(see below). In this case, the result of matching the assertion determines
|
||||||
|
which branch of the condition is followed.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Assertion subpatterns are not capturing subpatterns. If an assertion contains
|
Assertion subpatterns are not capturing subpatterns. If an assertion contains
|
||||||
|
@ -2359,7 +2370,7 @@ adjacent characters are the same.
|
||||||
<P>
|
<P>
|
||||||
When a branch within an assertion fails to match, any substrings that were
|
When a branch within an assertion fails to match, any substrings that were
|
||||||
captured are discarded (as happens with any pattern branch that fails to
|
captured are discarded (as happens with any pattern branch that fails to
|
||||||
match). A negative assertion succeeds only when all its branches fail to match;
|
match). A negative assertion is true only when all its branches fail to match;
|
||||||
this means that no captured substrings are ever retained after a successful
|
this means that no captured substrings are ever retained after a successful
|
||||||
negative assertion. When an assertion contains a matching branch, what happens
|
negative assertion. When an assertion contains a matching branch, what happens
|
||||||
depends on the type of assertion.
|
depends on the type of assertion.
|
||||||
|
@ -2368,7 +2379,7 @@ depends on the type of assertion.
|
||||||
For a positive assertion, internally captured substrings in the successful
|
For a positive assertion, internally captured substrings in the successful
|
||||||
branch are retained, and matching continues with the next pattern item after
|
branch are retained, and matching continues with the next pattern item after
|
||||||
the assertion. For a negative assertion, a matching branch means that the
|
the assertion. For a negative assertion, a matching branch means that the
|
||||||
assertion has failed. If the assertion is being used as a condition in a
|
assertion is not true. If such an assertion is being used as a condition in a
|
||||||
<a href="#conditions">conditional subpattern</a>
|
<a href="#conditions">conditional subpattern</a>
|
||||||
(see below), captured substrings are retained, because matching continues with
|
(see below), captured substrings are retained, because matching continues with
|
||||||
the "no" branch of the condition. For other failing negative assertions,
|
the "no" branch of the condition. For other failing negative assertions,
|
||||||
|
@ -2398,6 +2409,25 @@ without the assertion, the order depending on the greediness of the quantifier.
|
||||||
The assertion is obeyed just once when encountered during matching.
|
The assertion is obeyed just once when encountered during matching.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
Alphabetic assertion names
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
Traditionally, symbolic sequences such as (?= and (?<= have been used to specify
|
||||||
|
lookaround assertions. Perl 5.28 introduced some experimental alphabetic
|
||||||
|
alternatives which might be easier to remember. They all start with (* instead
|
||||||
|
of (? and must be written using lower case letters. PCRE2 supports the
|
||||||
|
following synonyms:
|
||||||
|
<pre>
|
||||||
|
(*positive_lookahead: or (*pla: is the same as (?=
|
||||||
|
(*negative_lookahead: or (*nla: is the same as (?!
|
||||||
|
(*positive_lookbehind: or (*plb: is the same as (?<=
|
||||||
|
(*negative_lookbehind: or (*nlb: is the same as (?<!
|
||||||
|
</pre>
|
||||||
|
For example, (*pla:foo) is the same assertion as (?=foo). However, in the
|
||||||
|
following sections, the various assertions are described using the original
|
||||||
|
symbolic forms.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
Lookahead assertions
|
Lookahead assertions
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -3630,7 +3660,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 September 2018
|
Last updated: 24 September 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -436,6 +436,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?>...) atomic, non-capturing group
|
(?>...) atomic, non-capturing group
|
||||||
|
(*atomic:...) atomic, non-capturing group
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
|
<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
|
||||||
|
@ -514,10 +515,21 @@ setting with a similar syntax.
|
||||||
<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?=...) positive look ahead
|
(?=...) )
|
||||||
(?!...) negative look ahead
|
(*pla:...) ) positive lookahead
|
||||||
(?<=...) positive look behind
|
(*positive_lookahead:...) )
|
||||||
(?<!...) negative look behind
|
|
||||||
|
(?!...) )
|
||||||
|
(*nla:...) ) negative lookahead
|
||||||
|
(*negative_lookahead:...) )
|
||||||
|
|
||||||
|
(?<=...) )
|
||||||
|
(*plb:...) ) positive lookbehind
|
||||||
|
(*positive_lookbehind:...) )
|
||||||
|
|
||||||
|
(?<!...) )
|
||||||
|
(*nlb:...) ) negative lookbehind
|
||||||
|
(*negative_lookbehind:...) )
|
||||||
</pre>
|
</pre>
|
||||||
Each top-level branch of a lookbehind must be of a fixed length.
|
Each top-level branch of a lookbehind must be of a fixed length.
|
||||||
</P>
|
</P>
|
||||||
|
@ -634,7 +646,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 September 2018
|
Last updated: 24 September 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -7760,6 +7760,11 @@ ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
|
||||||
|
|
||||||
(?>\d+)foo
|
(?>\d+)foo
|
||||||
|
|
||||||
|
Perl 5.28 introduced an experimental alphabetic form starting with (*
|
||||||
|
which may be easier to remember:
|
||||||
|
|
||||||
|
(*atomic:\d+)foo
|
||||||
|
|
||||||
This kind of parenthesis "locks up" the part of the pattern it con-
|
This kind of parenthesis "locks up" the part of the pattern it con-
|
||||||
tains once it has matched, and a failure further into the pattern is
|
tains once it has matched, and a failure further into the pattern is
|
||||||
prevented from backtracking into it. Backtracking past it to previous
|
prevented from backtracking into it. Backtracking past it to previous
|
||||||
|
@ -7970,12 +7975,16 @@ ASSERTIONS
|
||||||
More complicated assertions are coded as subpatterns. There are two
|
More complicated assertions are coded as subpatterns. There are two
|
||||||
kinds: those that look ahead of the current position in the subject
|
kinds: those that look ahead of the current position in the subject
|
||||||
string, and those that look behind it, and in each case an assertion
|
string, and those that look behind it, and in each case an assertion
|
||||||
may be positive (must succeed for matching to continue) or negative
|
may be positive (must match for the assertion to be true) or negative
|
||||||
(must not succeed for matching to continue). An assertion subpattern is
|
(must not match for the assertion to be true). An assertion subpattern
|
||||||
matched in the normal way, except that, when matching continues after a
|
is matched in the normal way, and if it is true, matching continues
|
||||||
successful assertion, the matching position in the subject string is as
|
after it, but with the matching position in the subject string is was
|
||||||
it was before the assertion was processed.
|
it was before the assertion was processed.
|
||||||
|
|
||||||
|
A lookaround assertion may also appear as the condition in a condi-
|
||||||
|
tional subpattern (see below). In this case, the result of matching the
|
||||||
|
assertion determines which branch of the condition is followed.
|
||||||
|
|
||||||
Assertion subpatterns are not capturing subpatterns. If an assertion
|
Assertion subpatterns are not capturing subpatterns. If an assertion
|
||||||
contains capturing subpatterns within it, these are counted for the
|
contains capturing subpatterns within it, these are counted for the
|
||||||
purposes of numbering the capturing subpatterns in the whole pattern.
|
purposes of numbering the capturing subpatterns in the whole pattern.
|
||||||
|
@ -7985,7 +7994,7 @@ ASSERTIONS
|
||||||
|
|
||||||
When a branch within an assertion fails to match, any substrings that
|
When a branch within an assertion fails to match, any substrings that
|
||||||
were captured are discarded (as happens with any pattern branch that
|
were captured are discarded (as happens with any pattern branch that
|
||||||
fails to match). A negative assertion succeeds only when all its
|
fails to match). A negative assertion is true only when all its
|
||||||
branches fail to match; this means that no captured substrings are ever
|
branches fail to match; this means that no captured substrings are ever
|
||||||
retained after a successful negative assertion. When an assertion con-
|
retained after a successful negative assertion. When an assertion con-
|
||||||
tains a matching branch, what happens depends on the type of assertion.
|
tains a matching branch, what happens depends on the type of assertion.
|
||||||
|
@ -7993,9 +8002,9 @@ ASSERTIONS
|
||||||
For a positive assertion, internally captured substrings in the suc-
|
For a positive assertion, internally captured substrings in the suc-
|
||||||
cessful branch are retained, and matching continues with the next pat-
|
cessful branch are retained, and matching continues with the next pat-
|
||||||
tern item after the assertion. For a negative assertion, a matching
|
tern item after the assertion. For a negative assertion, a matching
|
||||||
branch means that the assertion has failed. If the assertion is being
|
branch means that the assertion is not true. If such an assertion is
|
||||||
used as a condition in a conditional subpattern (see below), captured
|
being used as a condition in a conditional subpattern (see below), cap-
|
||||||
substrings are retained, because matching continues with the "no"
|
tured substrings are retained, because matching continues with the "no"
|
||||||
branch of the condition. For other failing negative assertions, control
|
branch of the condition. For other failing negative assertions, control
|
||||||
passes to the previous backtracking point, thus discarding any captured
|
passes to the previous backtracking point, thus discarding any captured
|
||||||
strings within the assertion.
|
strings within the assertion.
|
||||||
|
@ -8020,6 +8029,23 @@ ASSERTIONS
|
||||||
ignored. The assertion is obeyed just once when encountered during
|
ignored. The assertion is obeyed just once when encountered during
|
||||||
matching.
|
matching.
|
||||||
|
|
||||||
|
Alphabetic assertion names
|
||||||
|
|
||||||
|
Traditionally, symbolic sequences such as (?= and (?<= have been used
|
||||||
|
to specify lookaround assertions. Perl 5.28 introduced some experimen-
|
||||||
|
tal alphabetic alternatives which might be easier to remember. They all
|
||||||
|
start with (* instead of (? and must be written using lower case let-
|
||||||
|
ters. PCRE2 supports the following synonyms:
|
||||||
|
|
||||||
|
(*positive_lookahead: or (*pla: is the same as (?=
|
||||||
|
(*negative_lookahead: or (*nla: is the same as (?!
|
||||||
|
(*positive_lookbehind: or (*plb: is the same as (?<=
|
||||||
|
(*negative_lookbehind: or (*nlb: is the same as (?<!
|
||||||
|
|
||||||
|
For example, (*pla:foo) is the same assertion as (?=foo). However, in
|
||||||
|
the following sections, the various assertions are described using the
|
||||||
|
original symbolic forms.
|
||||||
|
|
||||||
Lookahead assertions
|
Lookahead assertions
|
||||||
|
|
||||||
Lookahead assertions start with (?= for positive assertions and (?! for
|
Lookahead assertions start with (?= for positive assertions and (?! for
|
||||||
|
@ -9179,7 +9205,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 21 September 2018
|
Last updated: 24 September 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -10291,6 +10317,7 @@ CAPTURING
|
||||||
ATOMIC GROUPS
|
ATOMIC GROUPS
|
||||||
|
|
||||||
(?>...) atomic, non-capturing group
|
(?>...) atomic, non-capturing group
|
||||||
|
(*atomic:...) atomic, non-capturing group
|
||||||
|
|
||||||
|
|
||||||
COMMENT
|
COMMENT
|
||||||
|
@ -10367,10 +10394,21 @@ WHAT \R MATCHES
|
||||||
|
|
||||||
LOOKAHEAD AND LOOKBEHIND ASSERTIONS
|
LOOKAHEAD AND LOOKBEHIND ASSERTIONS
|
||||||
|
|
||||||
(?=...) positive look ahead
|
(?=...) )
|
||||||
(?!...) negative look ahead
|
(*pla:...) ) positive lookahead
|
||||||
(?<=...) positive look behind
|
(*positive_lookahead:...) )
|
||||||
(?<!...) negative look behind
|
|
||||||
|
(?!...) )
|
||||||
|
(*nla:...) ) negative lookahead
|
||||||
|
(*negative_lookahead:...) )
|
||||||
|
|
||||||
|
(?<=...) )
|
||||||
|
(*plb:...) ) positive lookbehind
|
||||||
|
(*positive_lookbehind:...) )
|
||||||
|
|
||||||
|
(?<!...) )
|
||||||
|
(*nlb:...) ) negative lookbehind
|
||||||
|
(*negative_lookbehind:...) )
|
||||||
|
|
||||||
Each top-level branch of a lookbehind must be of a fixed length.
|
Each top-level branch of a lookbehind must be of a fixed length.
|
||||||
|
|
||||||
|
@ -10487,7 +10525,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 02 September 2018
|
Last updated: 24 September 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "21 September 2018" "PCRE2 10.33"
|
.TH PCRE2PATTERN 3 "24 September 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -2124,6 +2124,11 @@ special parenthesis, starting with (?> as in this example:
|
||||||
.sp
|
.sp
|
||||||
(?>\ed+)foo
|
(?>\ed+)foo
|
||||||
.sp
|
.sp
|
||||||
|
Perl 5.28 introduced an experimental alphabetic form starting with (* which may
|
||||||
|
be easier to remember:
|
||||||
|
.sp
|
||||||
|
(*atomic:\ed+)foo
|
||||||
|
.sp
|
||||||
This kind of parenthesis "locks up" the part of the pattern it contains once
|
This kind of parenthesis "locks up" the part of the pattern it contains once
|
||||||
it has matched, and a failure further into the pattern is prevented from
|
it has matched, and a failure further into the pattern is prevented from
|
||||||
backtracking into it. Backtracking past it to previous items, however, works as
|
backtracking into it. Backtracking past it to previous items, however, works as
|
||||||
|
@ -2351,11 +2356,19 @@ above.
|
||||||
.P
|
.P
|
||||||
More complicated assertions are coded as subpatterns. There are two kinds:
|
More complicated assertions are coded as subpatterns. There are two kinds:
|
||||||
those that look ahead of the current position in the subject string, and those
|
those that look ahead of the current position in the subject string, and those
|
||||||
that look behind it, and in each case an assertion may be positive (must
|
that look behind it, and in each case an assertion may be positive (must match
|
||||||
succeed for matching to continue) or negative (must not succeed for matching to
|
for the assertion to be true) or negative (must not match for the assertion to
|
||||||
continue). An assertion subpattern is matched in the normal way, except that,
|
be true). An assertion subpattern is matched in the normal way, and if it is
|
||||||
when matching continues after a successful assertion, the matching position in
|
true, matching continues after it, but with the matching position in the
|
||||||
the subject string is as it was before the assertion was processed.
|
subject string is was it was before the assertion was processed.
|
||||||
|
.P
|
||||||
|
A lookaround assertion may also appear as the condition in a
|
||||||
|
.\" HTML <a href="#conditions">
|
||||||
|
.\" </a>
|
||||||
|
conditional subpattern
|
||||||
|
.\"
|
||||||
|
(see below). In this case, the result of matching the assertion determines
|
||||||
|
which branch of the condition is followed.
|
||||||
.P
|
.P
|
||||||
Assertion subpatterns are not capturing subpatterns. If an assertion contains
|
Assertion subpatterns are not capturing subpatterns. If an assertion contains
|
||||||
capturing subpatterns within it, these are counted for the purposes of
|
capturing subpatterns within it, these are counted for the purposes of
|
||||||
|
@ -2366,7 +2379,7 @@ adjacent characters are the same.
|
||||||
.P
|
.P
|
||||||
When a branch within an assertion fails to match, any substrings that were
|
When a branch within an assertion fails to match, any substrings that were
|
||||||
captured are discarded (as happens with any pattern branch that fails to
|
captured are discarded (as happens with any pattern branch that fails to
|
||||||
match). A negative assertion succeeds only when all its branches fail to match;
|
match). A negative assertion is true only when all its branches fail to match;
|
||||||
this means that no captured substrings are ever retained after a successful
|
this means that no captured substrings are ever retained after a successful
|
||||||
negative assertion. When an assertion contains a matching branch, what happens
|
negative assertion. When an assertion contains a matching branch, what happens
|
||||||
depends on the type of assertion.
|
depends on the type of assertion.
|
||||||
|
@ -2374,7 +2387,7 @@ depends on the type of assertion.
|
||||||
For a positive assertion, internally captured substrings in the successful
|
For a positive assertion, internally captured substrings in the successful
|
||||||
branch are retained, and matching continues with the next pattern item after
|
branch are retained, and matching continues with the next pattern item after
|
||||||
the assertion. For a negative assertion, a matching branch means that the
|
the assertion. For a negative assertion, a matching branch means that the
|
||||||
assertion has failed. If the assertion is being used as a condition in a
|
assertion is not true. If such an assertion is being used as a condition in a
|
||||||
.\" HTML <a href="#conditions">
|
.\" HTML <a href="#conditions">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
conditional subpattern
|
conditional subpattern
|
||||||
|
@ -2406,6 +2419,25 @@ without the assertion, the order depending on the greediness of the quantifier.
|
||||||
The assertion is obeyed just once when encountered during matching.
|
The assertion is obeyed just once when encountered during matching.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Alphabetic assertion names"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Traditionally, symbolic sequences such as (?= and (?<= have been used to specify
|
||||||
|
lookaround assertions. Perl 5.28 introduced some experimental alphabetic
|
||||||
|
alternatives which might be easier to remember. They all start with (* instead
|
||||||
|
of (? and must be written using lower case letters. PCRE2 supports the
|
||||||
|
following synonyms:
|
||||||
|
.sp
|
||||||
|
(*positive_lookahead: or (*pla: is the same as (?=
|
||||||
|
(*negative_lookahead: or (*nla: is the same as (?!
|
||||||
|
(*positive_lookbehind: or (*plb: is the same as (?<=
|
||||||
|
(*negative_lookbehind: or (*nlb: is the same as (?<!
|
||||||
|
.sp
|
||||||
|
For example, (*pla:foo) is the same assertion as (?=foo). However, in the
|
||||||
|
following sections, the various assertions are described using the original
|
||||||
|
symbolic forms.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SS "Lookahead assertions"
|
.SS "Lookahead assertions"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -3660,6 +3692,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 September 2018
|
Last updated: 24 September 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "02 September 2018" "PCRE2 10.32"
|
.TH PCRE2SYNTAX 3 "24 September 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -411,6 +411,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
(?>...) atomic, non-capturing group
|
(?>...) atomic, non-capturing group
|
||||||
|
(*atomic:...) atomic, non-capturing group
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "COMMENT"
|
.SH "COMMENT"
|
||||||
|
@ -491,10 +492,21 @@ setting with a similar syntax.
|
||||||
.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
|
.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
(?=...) positive look ahead
|
(?=...) )
|
||||||
(?!...) negative look ahead
|
(*pla:...) ) positive lookahead
|
||||||
(?<=...) positive look behind
|
(*positive_lookahead:...) )
|
||||||
(?<!...) negative look behind
|
.sp
|
||||||
|
(?!...) )
|
||||||
|
(*nla:...) ) negative lookahead
|
||||||
|
(*negative_lookahead:...) )
|
||||||
|
.sp
|
||||||
|
(?<=...) )
|
||||||
|
(*plb:...) ) positive lookbehind
|
||||||
|
(*positive_lookbehind:...) )
|
||||||
|
.sp
|
||||||
|
(?<!...) )
|
||||||
|
(*nlb:...) ) negative lookbehind
|
||||||
|
(*negative_lookbehind:...) )
|
||||||
.sp
|
.sp
|
||||||
Each top-level branch of a lookbehind must be of a fixed length.
|
Each top-level branch of a lookbehind must be of a fixed length.
|
||||||
.
|
.
|
||||||
|
@ -621,6 +633,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 02 September 2018
|
Last updated: 24 September 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -103,19 +103,22 @@ const unsigned char _pcre_default_tables[] = {
|
||||||
0,0,0,0,0,0,0,128,
|
0,0,0,0,0,0,0,128,
|
||||||
255,255,255,255,0,0,0,0,
|
255,255,255,255,0,0,0,0,
|
||||||
0,0,0,0,0,0,0,0,
|
0,0,0,0,0,0,0,0,
|
||||||
|
|
||||||
|
/* Fiddled by hand when the table bits changed. May be broken! */
|
||||||
|
|
||||||
128,0,0,0,0,0,0,0,
|
128,0,0,0,0,0,0,0,
|
||||||
0,1,1,0,1,1,0,0,
|
0,1,1,1,1,1,0,0,
|
||||||
0,0,0,0,0,0,0,0,
|
0,0,0,0,0,0,0,0,
|
||||||
0,0,0,0,0,0,0,0,
|
0,0,0,0,0,0,0,0,
|
||||||
1,0,0,0,128,0,0,0,
|
1,0,0,0,128,0,0,0,
|
||||||
128,128,128,128,0,0,128,0,
|
128,128,128,128,0,0,128,0,
|
||||||
28,28,28,28,28,28,28,28,
|
24,24,24,24,24,24,24,24,
|
||||||
28,28,0,0,0,0,0,128,
|
24,24,0,0,0,0,0,128,
|
||||||
0,26,26,26,26,26,26,18,
|
0,18,18,18,18,18,18,18,
|
||||||
18,18,18,18,18,18,18,18,
|
18,18,18,18,18,18,18,18,
|
||||||
18,18,18,18,18,18,18,18,
|
18,18,18,18,18,18,18,18,
|
||||||
18,18,18,128,128,0,128,16,
|
18,18,18,128,128,0,128,16,
|
||||||
0,26,26,26,26,26,26,18,
|
0,18,18,18,18,18,18,18,
|
||||||
18,18,18,18,18,18,18,18,
|
18,18,18,18,18,18,18,18,
|
||||||
18,18,18,18,18,18,18,18,
|
18,18,18,18,18,18,18,18,
|
||||||
18,18,18,128,128,0,0,0,
|
18,18,18,128,128,0,0,0,
|
||||||
|
@ -125,8 +128,8 @@ const unsigned char _pcre_default_tables[] = {
|
||||||
0,0,0,0,0,0,0,0,
|
0,0,0,0,0,0,0,0,
|
||||||
1,0,0,0,0,0,0,0,
|
1,0,0,0,0,0,0,0,
|
||||||
0,0,18,0,0,0,0,0,
|
0,0,18,0,0,0,0,0,
|
||||||
0,0,20,20,0,18,0,0,
|
0,0,24,24,0,18,0,0,
|
||||||
0,20,18,0,0,0,0,0,
|
0,24,18,0,0,0,0,0,
|
||||||
18,18,18,18,18,18,18,18,
|
18,18,18,18,18,18,18,18,
|
||||||
18,18,18,18,18,18,18,18,
|
18,18,18,18,18,18,18,18,
|
||||||
18,18,18,18,18,18,18,0,
|
18,18,18,18,18,18,18,0,
|
||||||
|
|
12
perltest.sh
12
perltest.sh
|
@ -75,6 +75,10 @@ fi
|
||||||
|
|
||||||
(echo "$prefix" ; cat <<'PERLEND'
|
(echo "$prefix" ; cat <<'PERLEND'
|
||||||
|
|
||||||
|
# The alpha assertions currently give warnings even when -w is not specified.
|
||||||
|
|
||||||
|
no warnings "experimental::alpha_assertions";
|
||||||
|
|
||||||
# Function for turning a string into a string of printing chars.
|
# Function for turning a string into a string of printing chars.
|
||||||
|
|
||||||
sub pchars {
|
sub pchars {
|
||||||
|
@ -129,6 +133,9 @@ else { $outfile = "STDOUT"; }
|
||||||
|
|
||||||
printf($outfile "Perl $] Regular Expressions\n\n");
|
printf($outfile "Perl $] Regular Expressions\n\n");
|
||||||
|
|
||||||
|
$extra_modifiers = "";
|
||||||
|
$default_show_mark = 0;
|
||||||
|
|
||||||
# Main loop
|
# Main loop
|
||||||
|
|
||||||
NEXT_RE:
|
NEXT_RE:
|
||||||
|
@ -370,7 +377,10 @@ for (;;)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
# printf $outfile "\n";
|
# By closing OUTFILE explicitly, we avoid a Perl warning in -w mode
|
||||||
|
# "main::OUTFILE" used only once".
|
||||||
|
|
||||||
|
close(OUTFILE) if $outfile eq "OUTFILE";
|
||||||
|
|
||||||
PERLEND
|
PERLEND
|
||||||
) | $perl $perlarg - $@
|
) | $perl $perlarg - $@
|
||||||
|
|
|
@ -183,10 +183,10 @@ fprintf(f,
|
||||||
"/* This table identifies various classes of character by individual bits:\n"
|
"/* This table identifies various classes of character by individual bits:\n"
|
||||||
" 0x%02x white space character\n"
|
" 0x%02x white space character\n"
|
||||||
" 0x%02x letter\n"
|
" 0x%02x letter\n"
|
||||||
|
" 0x%02x lower case letter\n"
|
||||||
" 0x%02x decimal digit\n"
|
" 0x%02x decimal digit\n"
|
||||||
" 0x%02x hexadecimal digit\n"
|
|
||||||
" 0x%02x alphanumeric or '_'\n*/\n\n",
|
" 0x%02x alphanumeric or '_'\n*/\n\n",
|
||||||
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word);
|
ctype_space, ctype_letter, ctype_lcletter, ctype_digit, ctype_word);
|
||||||
|
|
||||||
fprintf(f, " ");
|
fprintf(f, " ");
|
||||||
for (i = 0; i < 256; i++)
|
for (i = 0; i < 256; i++)
|
||||||
|
|
|
@ -320,6 +320,7 @@ pcre2_pattern_convert(). */
|
||||||
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
||||||
#define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193
|
#define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193
|
||||||
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
|
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
|
||||||
|
#define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN 195
|
||||||
|
|
||||||
|
|
||||||
/* "Expected" matching error codes: no match and partial match. */
|
/* "Expected" matching error codes: no match and partial match. */
|
||||||
|
|
|
@ -157,8 +157,8 @@ graph print, punct, and cntrl. Other classes are built from combinations. */
|
||||||
/* This table identifies various classes of character by individual bits:
|
/* This table identifies various classes of character by individual bits:
|
||||||
0x01 white space character
|
0x01 white space character
|
||||||
0x02 letter
|
0x02 letter
|
||||||
0x04 decimal digit
|
0x04 lower case letter
|
||||||
0x08 hexadecimal digit
|
0x08 decimal digit
|
||||||
0x10 alphanumeric or '_'
|
0x10 alphanumeric or '_'
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
@ -168,16 +168,16 @@ graph print, punct, and cntrl. Other classes are built from combinations. */
|
||||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
|
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
|
||||||
0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - ' */
|
0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - ' */
|
||||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ( - / */
|
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ( - / */
|
||||||
0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */
|
0x18,0x18,0x18,0x18,0x18,0x18,0x18,0x18, /* 0 - 7 */
|
||||||
0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x00, /* 8 - ? */
|
0x18,0x18,0x00,0x00,0x00,0x00,0x00,0x00, /* 8 - ? */
|
||||||
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */
|
0x00,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* @ - G */
|
||||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */
|
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */
|
||||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */
|
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */
|
||||||
0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x10, /* X - _ */
|
0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x10, /* X - _ */
|
||||||
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */
|
0x00,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* ` - g */
|
||||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */
|
0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* h - o */
|
||||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */
|
0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* p - w */
|
||||||
0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x00, /* x -127 */
|
0x16,0x16,0x16,0x00,0x00,0x00,0x00,0x00, /* x -127 */
|
||||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
|
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
|
||||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
|
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
|
||||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
|
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
|
||||||
|
|
|
@ -615,6 +615,46 @@ static const uint32_t verbops[] = {
|
||||||
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
||||||
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
||||||
|
|
||||||
|
/* Table of "alpha assertions" like (*pla:...), similar to the (*VERB) table. */
|
||||||
|
|
||||||
|
typedef struct alasitem {
|
||||||
|
unsigned int len; /* Length of name */
|
||||||
|
uint32_t meta; /* Base META_ code */
|
||||||
|
} alasitem;
|
||||||
|
|
||||||
|
static const char alasnames[] =
|
||||||
|
STRING_pla0
|
||||||
|
STRING_plb0
|
||||||
|
STRING_nla0
|
||||||
|
STRING_nlb0
|
||||||
|
STRING_positive_lookahead0
|
||||||
|
STRING_positive_lookbehind0
|
||||||
|
STRING_negative_lookahead0
|
||||||
|
STRING_negative_lookbehind0
|
||||||
|
STRING_atomic0
|
||||||
|
STRING_sr0
|
||||||
|
STRING_asr0
|
||||||
|
STRING_script_run0
|
||||||
|
STRING_atomic_script_run;
|
||||||
|
|
||||||
|
static const alasitem alasmeta[] = {
|
||||||
|
{ 3, META_LOOKAHEAD },
|
||||||
|
{ 3, META_LOOKBEHIND },
|
||||||
|
{ 3, META_LOOKAHEADNOT },
|
||||||
|
{ 3, META_LOOKBEHINDNOT },
|
||||||
|
{ 18, META_LOOKAHEAD },
|
||||||
|
{ 19, META_LOOKBEHIND },
|
||||||
|
{ 18, META_LOOKAHEADNOT },
|
||||||
|
{ 19, META_LOOKBEHINDNOT },
|
||||||
|
{ 6, META_ATOMIC },
|
||||||
|
{ 2, 0 }, /* sr = script run */
|
||||||
|
{ 3, 0 }, /* asr = atomic script run */
|
||||||
|
{ 10, 0 }, /* script run */
|
||||||
|
{ 17, 0 } /* atomic script run */
|
||||||
|
};
|
||||||
|
|
||||||
|
static const int alascount = sizeof(alasmeta)/sizeof(alasitem);
|
||||||
|
|
||||||
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
||||||
|
|
||||||
static uint32_t chartypeoffset[] = {
|
static uint32_t chartypeoffset[] = {
|
||||||
|
@ -732,7 +772,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
|
||||||
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
||||||
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
||||||
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
||||||
ERR91, ERR92, ERR93, ERR94 };
|
ERR91, ERR92, ERR93, ERR94, ERR95 };
|
||||||
|
|
||||||
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
||||||
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
||||||
|
@ -2133,9 +2173,10 @@ return -1;
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This function is called from parse_regex() below whenever it needs to read
|
/* This function is called from parse_regex() below whenever it needs to read
|
||||||
the name of a subpattern or a (*VERB). The initial pointer must be to the
|
the name of a subpattern or a (*VERB) or an (*alpha_assertion). The initial
|
||||||
character before the name. If that character is '*' we are reading a verb name.
|
pointer must be to the character before the name. If that character is '*' we
|
||||||
The pointer is updated to point after the name, for a VERB, or after tha name's
|
are reading a verb or alpha assertion name. The pointer is updated to point
|
||||||
|
after the name, for a VERB or alpha assertion name, or after tha name's
|
||||||
terminator for a subpattern name. Returning both the offset and the name
|
terminator for a subpattern name. Returning both the offset and the name
|
||||||
pointer is redundant information, but some callers use one and some the other,
|
pointer is redundant information, but some callers use one and some the other,
|
||||||
so it is simplest just to return both.
|
so it is simplest just to return both.
|
||||||
|
@ -2160,27 +2201,29 @@ read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t terminator,
|
||||||
int *errorcodeptr, compile_block *cb)
|
int *errorcodeptr, compile_block *cb)
|
||||||
{
|
{
|
||||||
PCRE2_SPTR ptr = *ptrptr;
|
PCRE2_SPTR ptr = *ptrptr;
|
||||||
BOOL is_verb = (*ptr == CHAR_ASTERISK);
|
BOOL is_group = (*ptr != CHAR_ASTERISK);
|
||||||
uint32_t namelen = 0;
|
uint32_t namelen = 0;
|
||||||
uint32_t ctype = is_verb? ctype_letter : ctype_word;
|
|
||||||
|
|
||||||
if (++ptr >= ptrend)
|
if (++ptr >= ptrend) /* No characters in name */
|
||||||
{
|
{
|
||||||
*errorcodeptr = is_verb? ERR60: /* Verb not recognized or malformed */
|
*errorcodeptr = is_group? ERR62: /* Subpattern name expected */
|
||||||
ERR62; /* Subpattern name expected */
|
ERR60; /* Verb not recognized or malformed */
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* A group name must not start with a digit. If either of the others start with
|
||||||
|
a digit it just won't be recognized. */
|
||||||
|
|
||||||
|
if (is_group && IS_DIGIT(*ptr))
|
||||||
|
{
|
||||||
|
*errorcodeptr = ERR44;
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
}
|
}
|
||||||
|
|
||||||
*nameptr = ptr;
|
*nameptr = ptr;
|
||||||
*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);
|
*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);
|
||||||
|
|
||||||
if (IS_DIGIT(*ptr))
|
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
|
||||||
{
|
|
||||||
*errorcodeptr = ERR44; /* Group name must not start with digit */
|
|
||||||
goto FAILED;
|
|
||||||
}
|
|
||||||
|
|
||||||
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype) != 0)
|
|
||||||
{
|
{
|
||||||
ptr++;
|
ptr++;
|
||||||
namelen++;
|
namelen++;
|
||||||
|
@ -2192,9 +2235,9 @@ while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype) != 0)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Subpattern names must not be empty, and their terminator is checked here.
|
/* Subpattern names must not be empty, and their terminator is checked here.
|
||||||
(What follows a verb name is checked separately.) */
|
(What follows a verb or alpha assertion name is checked separately.) */
|
||||||
|
|
||||||
if (!is_verb)
|
if (is_group)
|
||||||
{
|
{
|
||||||
if (namelen == 0)
|
if (namelen == 0)
|
||||||
{
|
{
|
||||||
|
@ -2652,8 +2695,14 @@ while (ptr < ptrend)
|
||||||
if (expect_cond_assert > 0)
|
if (expect_cond_assert > 0)
|
||||||
{
|
{
|
||||||
BOOL ok = c == CHAR_LEFT_PARENTHESIS && ptrend - ptr >= 3 &&
|
BOOL ok = c == CHAR_LEFT_PARENTHESIS && ptrend - ptr >= 3 &&
|
||||||
ptr[0] == CHAR_QUESTION_MARK;
|
(ptr[0] == CHAR_QUESTION_MARK || ptr[0] == CHAR_ASTERISK);
|
||||||
if (ok) switch(ptr[1])
|
if (ok)
|
||||||
|
{
|
||||||
|
if (ptr[0] == CHAR_ASTERISK) /* New alpha assertion format, possibly */
|
||||||
|
{
|
||||||
|
ok = MAX_255(ptr[1]) && (cb->ctypes[ptr[1]] & ctype_lcletter) != 0;
|
||||||
|
}
|
||||||
|
else switch(ptr[1]) /* Traditional symbolic format */
|
||||||
{
|
{
|
||||||
case CHAR_C:
|
case CHAR_C:
|
||||||
ok = expect_cond_assert == 2;
|
ok = expect_cond_assert == 2;
|
||||||
|
@ -2670,6 +2719,7 @@ while (ptr < ptrend)
|
||||||
default:
|
default:
|
||||||
ok = FALSE;
|
ok = FALSE;
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (!ok)
|
if (!ok)
|
||||||
{
|
{
|
||||||
|
@ -3453,7 +3503,8 @@ while (ptr < ptrend)
|
||||||
case CHAR_LEFT_PARENTHESIS:
|
case CHAR_LEFT_PARENTHESIS:
|
||||||
if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
|
if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
|
||||||
|
|
||||||
/* If ( is not followed by ? it is either a capture or a special verb. */
|
/* If ( is not followed by ? it is either a capture or a special verb or an
|
||||||
|
alpha assertion. */
|
||||||
|
|
||||||
if (*ptr != CHAR_QUESTION_MARK)
|
if (*ptr != CHAR_QUESTION_MARK)
|
||||||
{
|
{
|
||||||
|
@ -3473,13 +3524,88 @@ while (ptr < ptrend)
|
||||||
else *parsed_pattern++ = META_NOCAPTURE;
|
else *parsed_pattern++ = META_NOCAPTURE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Do nothing for (* followed by end of pattern or ) so it gives a "bad
|
||||||
|
quantifier" error rather than "(*MARK) must have an argument". */
|
||||||
|
|
||||||
|
else if (ptrend - ptr <= 1 || (c = ptr[1]) == CHAR_RIGHT_PARENTHESIS)
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Handle "alpha assertions" such as (*pla:...). Most of these are
|
||||||
|
synonyms for the historical symbolic assertions, but the script run ones
|
||||||
|
are new. They are distinguished by starting with a lower case letter.
|
||||||
|
Checking both ends of the alphabet makes this work in all character
|
||||||
|
codes. */
|
||||||
|
|
||||||
|
else if (CHMAX_255(c) && (cb->ctypes[c] & ctype_lcletter) != 0)
|
||||||
|
{
|
||||||
|
uint32_t meta;
|
||||||
|
|
||||||
|
vn = alasnames;
|
||||||
|
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
||||||
|
cb)) goto FAILED;
|
||||||
|
if (ptr >= ptrend || *ptr != CHAR_COLON)
|
||||||
|
{
|
||||||
|
errorcode = ERR95; /* Malformed */
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Scan the table of alpha assertion names */
|
||||||
|
|
||||||
|
for (i = 0; i < alascount; i++)
|
||||||
|
{
|
||||||
|
if (namelen == alasmeta[i].len &&
|
||||||
|
PRIV(strncmp_c8)(name, vn, namelen) == 0)
|
||||||
|
break;
|
||||||
|
vn += alasmeta[i].len + 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (i >= alascount)
|
||||||
|
{
|
||||||
|
errorcode = ERR95; /* Alpha assertion not recognized */
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Check for expecting an assertion condition. If so, only lookaround
|
||||||
|
assertions are valid. */
|
||||||
|
|
||||||
|
meta = alasmeta[i].meta;
|
||||||
|
if (prev_expect_cond_assert > 0 &&
|
||||||
|
(meta < META_LOOKAHEAD || meta > META_LOOKBEHINDNOT))
|
||||||
|
{
|
||||||
|
errorcode = ERR28; /* Assertion expected */
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
|
||||||
|
switch(meta)
|
||||||
|
{
|
||||||
|
case META_ATOMIC:
|
||||||
|
goto ATOMIC_GROUP;
|
||||||
|
|
||||||
|
case META_LOOKAHEAD:
|
||||||
|
goto POSITIVE_LOOK_AHEAD;
|
||||||
|
|
||||||
|
case META_LOOKAHEADNOT:
|
||||||
|
goto NEGATIVE_LOOK_AHEAD;
|
||||||
|
|
||||||
|
case META_LOOKBEHIND:
|
||||||
|
case META_LOOKBEHINDNOT:
|
||||||
|
*parsed_pattern++ = meta;
|
||||||
|
ptr--;
|
||||||
|
goto LOOKBEHIND;
|
||||||
|
|
||||||
|
/* FIXME: Script Run stuff ... */
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
/* ---- Handle (*VERB) and (*VERB:NAME) ---- */
|
/* ---- Handle (*VERB) and (*VERB:NAME) ---- */
|
||||||
|
|
||||||
/* Do nothing for (*) so it gives a "bad quantifier" error rather than
|
else
|
||||||
"(*MARK) must have an argument". */
|
|
||||||
|
|
||||||
else if (ptrend - ptr > 1 && ptr[1] != CHAR_RIGHT_PARENTHESIS)
|
|
||||||
{
|
{
|
||||||
vn = verbnames;
|
vn = verbnames;
|
||||||
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
||||||
|
@ -3946,14 +4072,15 @@ while (ptr < ptrend)
|
||||||
if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
|
if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
|
||||||
nest_depth++;
|
nest_depth++;
|
||||||
|
|
||||||
/* If the next character is ? there must be an assertion next (optionally
|
/* If the next character is ? or * there must be an assertion next
|
||||||
preceded by a callout). We do not check this here, but instead we set
|
(optionally preceded by a callout). We do not check this here, but
|
||||||
expect_cond_assert to 2. If this is still greater than zero (callouts
|
instead we set expect_cond_assert to 2. If this is still greater than
|
||||||
decrement it) when the next assertion is read, it will be marked as a
|
zero (callouts decrement it) when the next assertion is read, it will be
|
||||||
condition that must not be repeated. A value greater than zero also
|
marked as a condition that must not be repeated. A value greater than
|
||||||
causes checking that an assertion (possibly with callout) follows. */
|
zero also causes checking that an assertion (possibly with callout)
|
||||||
|
follows. */
|
||||||
|
|
||||||
if (*ptr == CHAR_QUESTION_MARK)
|
if (*ptr == CHAR_QUESTION_MARK || *ptr == CHAR_ASTERISK)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_COND_ASSERT;
|
*parsed_pattern++ = META_COND_ASSERT;
|
||||||
ptr--; /* Pull pointer back to the opening parenthesis. */
|
ptr--; /* Pull pointer back to the opening parenthesis. */
|
||||||
|
@ -4099,6 +4226,7 @@ while (ptr < ptrend)
|
||||||
/* ---- Atomic group ---- */
|
/* ---- Atomic group ---- */
|
||||||
|
|
||||||
case CHAR_GREATER_THAN_SIGN:
|
case CHAR_GREATER_THAN_SIGN:
|
||||||
|
ATOMIC_GROUP: /* Come from (*atomic: */
|
||||||
*parsed_pattern++ = META_ATOMIC;
|
*parsed_pattern++ = META_ATOMIC;
|
||||||
nest_depth++;
|
nest_depth++;
|
||||||
ptr++;
|
ptr++;
|
||||||
|
@ -4108,11 +4236,13 @@ while (ptr < ptrend)
|
||||||
/* ---- Lookahead assertions ---- */
|
/* ---- Lookahead assertions ---- */
|
||||||
|
|
||||||
case CHAR_EQUALS_SIGN:
|
case CHAR_EQUALS_SIGN:
|
||||||
|
POSITIVE_LOOK_AHEAD: /* Come from (*pla: */
|
||||||
*parsed_pattern++ = META_LOOKAHEAD;
|
*parsed_pattern++ = META_LOOKAHEAD;
|
||||||
ptr++;
|
ptr++;
|
||||||
goto POST_ASSERTION;
|
goto POST_ASSERTION;
|
||||||
|
|
||||||
case CHAR_EXCLAMATION_MARK:
|
case CHAR_EXCLAMATION_MARK:
|
||||||
|
NEGATIVE_LOOK_AHEAD: /* Come from (*nla: */
|
||||||
*parsed_pattern++ = META_LOOKAHEADNOT;
|
*parsed_pattern++ = META_LOOKAHEADNOT;
|
||||||
ptr++;
|
ptr++;
|
||||||
goto POST_ASSERTION;
|
goto POST_ASSERTION;
|
||||||
|
@ -4132,6 +4262,8 @@ while (ptr < ptrend)
|
||||||
}
|
}
|
||||||
*parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)?
|
*parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)?
|
||||||
META_LOOKBEHIND : META_LOOKBEHINDNOT;
|
META_LOOKBEHIND : META_LOOKBEHINDNOT;
|
||||||
|
|
||||||
|
LOOKBEHIND: /* Come from (*plb: and (*nlb: */
|
||||||
*has_lookbehind = TRUE;
|
*has_lookbehind = TRUE;
|
||||||
offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2);
|
offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2);
|
||||||
PUTOFFSET(offset, parsed_pattern);
|
PUTOFFSET(offset, parsed_pattern);
|
||||||
|
|
|
@ -181,6 +181,8 @@ static const unsigned char compile_error_texts[] =
|
||||||
"invalid option bits with PCRE2_LITERAL\0"
|
"invalid option bits with PCRE2_LITERAL\0"
|
||||||
"\\N{U+dddd} is supported only in Unicode (UTF) mode\0"
|
"\\N{U+dddd} is supported only in Unicode (UTF) mode\0"
|
||||||
"invalid hyphen in option setting\0"
|
"invalid hyphen in option setting\0"
|
||||||
|
/* 95 */
|
||||||
|
"(*alpha_assertion) not recognized\0"
|
||||||
;
|
;
|
||||||
|
|
||||||
/* Match-time and UTF error texts are in the same format. */
|
/* Match-time and UTF error texts are in the same format. */
|
||||||
|
|
|
@ -571,8 +571,8 @@ ctype_word has the value 16. */
|
||||||
|
|
||||||
#define ctype_space 0x01
|
#define ctype_space 0x01
|
||||||
#define ctype_letter 0x02
|
#define ctype_letter 0x02
|
||||||
#define ctype_digit 0x04
|
#define ctype_lcletter 0x04
|
||||||
#define ctype_xdigit 0x08 /* not actually used any more */
|
#define ctype_digit 0x08
|
||||||
#define ctype_word 0x10 /* alphanumeric or '_' */
|
#define ctype_word 0x10 /* alphanumeric or '_' */
|
||||||
|
|
||||||
/* Offsets of the various tables from the base tables pointer, and
|
/* Offsets of the various tables from the base tables pointer, and
|
||||||
|
@ -883,6 +883,20 @@ a positive value. */
|
||||||
#define STRING_SKIP0 "SKIP\0"
|
#define STRING_SKIP0 "SKIP\0"
|
||||||
#define STRING_THEN "THEN"
|
#define STRING_THEN "THEN"
|
||||||
|
|
||||||
|
#define STRING_atomic0 "atomic\0"
|
||||||
|
#define STRING_pla0 "pla\0"
|
||||||
|
#define STRING_plb0 "plb\0"
|
||||||
|
#define STRING_nla0 "nla\0"
|
||||||
|
#define STRING_nlb0 "nlb\0"
|
||||||
|
#define STRING_sr0 "sr\0"
|
||||||
|
#define STRING_asr0 "asr\0"
|
||||||
|
#define STRING_positive_lookahead0 "positive_lookahead\0"
|
||||||
|
#define STRING_positive_lookbehind0 "positive_lookbehind\0"
|
||||||
|
#define STRING_negative_lookahead0 "negative_lookahead\0"
|
||||||
|
#define STRING_negative_lookbehind0 "negative_lookbehind\0"
|
||||||
|
#define STRING_script_run0 "script_run\0"
|
||||||
|
#define STRING_atomic_script_run "atomic_script_run"
|
||||||
|
|
||||||
#define STRING_alpha0 "alpha\0"
|
#define STRING_alpha0 "alpha\0"
|
||||||
#define STRING_lower0 "lower\0"
|
#define STRING_lower0 "lower\0"
|
||||||
#define STRING_upper0 "upper\0"
|
#define STRING_upper0 "upper\0"
|
||||||
|
@ -1159,6 +1173,20 @@ only. */
|
||||||
#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0"
|
#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0"
|
||||||
#define STRING_THEN STR_T STR_H STR_E STR_N
|
#define STRING_THEN STR_T STR_H STR_E STR_N
|
||||||
|
|
||||||
|
#define STRING_atomic0 STR_a STR_t STR_o STR_m STR_i STR_c "\0"
|
||||||
|
#define STRING_pla0 STR_p STR_l STR_a "\0"
|
||||||
|
#define STRING_plb0 STR_p STR_l STR_b "\0"
|
||||||
|
#define STRING_nla0 STR_n STR_l STR_a "\0"
|
||||||
|
#define STRING_nlb0 STR_n STR_l STR_b "\0"
|
||||||
|
#define STRING_sr0 STR_s STR_r "\0"
|
||||||
|
#define STRING_asr0 STR_a STR_s STR_r "\0"
|
||||||
|
#define STRING_positive_lookahead0 STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
|
||||||
|
#define STRING_positive_lookbehind0 STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
|
||||||
|
#define STRING_negative_lookahead0 STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
|
||||||
|
#define STRING_negative_lookbehind0 STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
|
||||||
|
#define STRING_script_run0 STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n "\0"
|
||||||
|
#define STRING_atomic_script_run STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n
|
||||||
|
|
||||||
#define STRING_alpha0 STR_a STR_l STR_p STR_h STR_a "\0"
|
#define STRING_alpha0 STR_a STR_l STR_p STR_h STR_a "\0"
|
||||||
#define STRING_lower0 STR_l STR_o STR_w STR_e STR_r "\0"
|
#define STRING_lower0 STR_l STR_o STR_w STR_e STR_r "\0"
|
||||||
#define STRING_upper0 STR_u STR_p STR_p STR_e STR_r "\0"
|
#define STRING_upper0 STR_u STR_p STR_p STR_e STR_r "\0"
|
||||||
|
|
|
@ -138,8 +138,8 @@ for (i = 0; i < 256; i++)
|
||||||
int x = 0;
|
int x = 0;
|
||||||
if (isspace(i)) x += ctype_space;
|
if (isspace(i)) x += ctype_space;
|
||||||
if (isalpha(i)) x += ctype_letter;
|
if (isalpha(i)) x += ctype_letter;
|
||||||
|
if (islower(i)) x += ctype_lcletter;
|
||||||
if (isdigit(i)) x += ctype_digit;
|
if (isdigit(i)) x += ctype_digit;
|
||||||
if (isxdigit(i)) x += ctype_xdigit;
|
|
||||||
if (isalnum(i) || i == '_') x += ctype_word;
|
if (isalnum(i) || i == '_') x += ctype_word;
|
||||||
*p++ = x;
|
*p++ = x;
|
||||||
}
|
}
|
||||||
|
|
|
@ -6263,4 +6263,69 @@ ef) x/x,mark
|
||||||
aBCDEF
|
aBCDEF
|
||||||
AbCDe f
|
AbCDe f
|
||||||
|
|
||||||
|
/(*pla:foo).{6}/
|
||||||
|
abcfoobarxyz
|
||||||
|
\= Expect no match
|
||||||
|
abcfooba
|
||||||
|
|
||||||
|
/(*positive_lookahead:foo).{6}/
|
||||||
|
abcfoobarxyz
|
||||||
|
|
||||||
|
/(?(*pla:foo).{6}|a..)/
|
||||||
|
foobarbaz
|
||||||
|
abcfoobar
|
||||||
|
|
||||||
|
/(?(*positive_lookahead:foo).{6}|a..)/
|
||||||
|
foobarbaz
|
||||||
|
abcfoobar
|
||||||
|
|
||||||
|
/(*plb:foo)bar/
|
||||||
|
abcfoobar
|
||||||
|
\= Expect no match
|
||||||
|
abcbarfoo
|
||||||
|
|
||||||
|
/(*positive_lookbehind:foo)bar/
|
||||||
|
abcfoobar
|
||||||
|
\= Expect no match
|
||||||
|
abcbarfoo
|
||||||
|
|
||||||
|
/(?(*plb:foo)bar|baz)/
|
||||||
|
abcfoobar
|
||||||
|
bazfoobar
|
||||||
|
abcbazfoobar
|
||||||
|
foobazfoobar
|
||||||
|
|
||||||
|
/(?(*positive_lookbehind:foo)bar|baz)/
|
||||||
|
abcfoobar
|
||||||
|
bazfoobar
|
||||||
|
abcbazfoobar
|
||||||
|
foobazfoobar
|
||||||
|
|
||||||
|
/(*nlb:foo)bar/
|
||||||
|
abcbarfoo
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
|
||||||
|
/(*negative_lookbehind:foo)bar/
|
||||||
|
abcbarfoo
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
|
||||||
|
/(?(*nlb:foo)bar|baz)/
|
||||||
|
abcfoobaz
|
||||||
|
abcbarbaz
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
|
||||||
|
/(?(*negative_lookbehind:foo)bar|baz)/
|
||||||
|
abcfoobaz
|
||||||
|
abcbarbaz
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
|
||||||
|
/(*atomic:a+)\w/
|
||||||
|
aaab
|
||||||
|
\= Expect no match
|
||||||
|
aaaa
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -5525,4 +5525,10 @@ a)"xI
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abc\ndef\nxyz
|
abc\ndef\nxyz
|
||||||
|
|
||||||
|
/(?(*ACCEPT)xxx)/
|
||||||
|
|
||||||
|
/(?(*atomic:xx)xxx)/
|
||||||
|
|
||||||
|
/(?(*script_run:xxx)zzz)/
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -9929,4 +9929,100 @@ No match
|
||||||
AbCDe f
|
AbCDe f
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
/(*pla:foo).{6}/
|
||||||
|
abcfoobarxyz
|
||||||
|
0: foobar
|
||||||
|
\= Expect no match
|
||||||
|
abcfooba
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(*positive_lookahead:foo).{6}/
|
||||||
|
abcfoobarxyz
|
||||||
|
0: foobar
|
||||||
|
|
||||||
|
/(?(*pla:foo).{6}|a..)/
|
||||||
|
foobarbaz
|
||||||
|
0: foobar
|
||||||
|
abcfoobar
|
||||||
|
0: abc
|
||||||
|
|
||||||
|
/(?(*positive_lookahead:foo).{6}|a..)/
|
||||||
|
foobarbaz
|
||||||
|
0: foobar
|
||||||
|
abcfoobar
|
||||||
|
0: abc
|
||||||
|
|
||||||
|
/(*plb:foo)bar/
|
||||||
|
abcfoobar
|
||||||
|
0: bar
|
||||||
|
\= Expect no match
|
||||||
|
abcbarfoo
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(*positive_lookbehind:foo)bar/
|
||||||
|
abcfoobar
|
||||||
|
0: bar
|
||||||
|
\= Expect no match
|
||||||
|
abcbarfoo
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(?(*plb:foo)bar|baz)/
|
||||||
|
abcfoobar
|
||||||
|
0: bar
|
||||||
|
bazfoobar
|
||||||
|
0: baz
|
||||||
|
abcbazfoobar
|
||||||
|
0: baz
|
||||||
|
foobazfoobar
|
||||||
|
0: bar
|
||||||
|
|
||||||
|
/(?(*positive_lookbehind:foo)bar|baz)/
|
||||||
|
abcfoobar
|
||||||
|
0: bar
|
||||||
|
bazfoobar
|
||||||
|
0: baz
|
||||||
|
abcbazfoobar
|
||||||
|
0: baz
|
||||||
|
foobazfoobar
|
||||||
|
0: bar
|
||||||
|
|
||||||
|
/(*nlb:foo)bar/
|
||||||
|
abcbarfoo
|
||||||
|
0: bar
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(*negative_lookbehind:foo)bar/
|
||||||
|
abcbarfoo
|
||||||
|
0: bar
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(?(*nlb:foo)bar|baz)/
|
||||||
|
abcfoobaz
|
||||||
|
0: baz
|
||||||
|
abcbarbaz
|
||||||
|
0: bar
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(?(*negative_lookbehind:foo)bar|baz)/
|
||||||
|
abcfoobaz
|
||||||
|
0: baz
|
||||||
|
abcbarbaz
|
||||||
|
0: bar
|
||||||
|
\= Expect no match
|
||||||
|
abcfoobar
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(*atomic:a+)\w/
|
||||||
|
aaab
|
||||||
|
0: aaab
|
||||||
|
\= Expect no match
|
||||||
|
aaaa
|
||||||
|
No match
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -575,7 +575,7 @@ Last code unit = 'b'
|
||||||
Subject length lower bound = 3
|
Subject length lower bound = 3
|
||||||
|
|
||||||
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
||||||
Failed: error 160 at offset 12: (*VERB) not recognized or malformed
|
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
|
||||||
|
|
||||||
/\h/I,utf
|
/\h/I,utf
|
||||||
Capturing subpattern count = 0
|
Capturing subpattern count = 0
|
||||||
|
|
|
@ -538,7 +538,7 @@ No match
|
||||||
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
|
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
|
||||||
|
|
||||||
/(*UTF16)\x{11234}/
|
/(*UTF16)\x{11234}/
|
||||||
Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
Failed: error 160 at offset 7: (*VERB) not recognized or malformed
|
||||||
abcd\x{11234}pqr
|
abcd\x{11234}pqr
|
||||||
|
|
||||||
/(*UTF)\x{11234}/I
|
/(*UTF)\x{11234}/I
|
||||||
|
@ -559,7 +559,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
||||||
abcd\x{11234}pqr
|
abcd\x{11234}pqr
|
||||||
|
|
||||||
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
|
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
|
||||||
Failed: error 160 at offset 12: (*VERB) not recognized or malformed
|
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
|
||||||
|
|
||||||
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
||||||
Capturing subpattern count = 0
|
Capturing subpattern count = 0
|
||||||
|
|
|
@ -16812,6 +16812,15 @@ No match
|
||||||
abc\ndef\nxyz
|
abc\ndef\nxyz
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
/(?(*ACCEPT)xxx)/
|
||||||
|
Failed: error 128 at offset 2: assertion expected after (?( or (?(?C)
|
||||||
|
|
||||||
|
/(?(*atomic:xx)xxx)/
|
||||||
|
Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
|
||||||
|
|
||||||
|
/(?(*script_run:xxx)zzz)/
|
||||||
|
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
Loading…
Reference in New Issue