Implement Perl 5.28's alphabetic lookaround syntax, e.g. (*pla:...) and also
(*atomic:...).
This commit is contained in:
parent
69254c77f1
commit
f26b0b0bae
14
ChangeLog
14
ChangeLog
|
@ -5,8 +5,8 @@ Change Log for PCRE2
|
|||
Version 10.33-RC1 15-September-2018
|
||||
-----------------------------------
|
||||
|
||||
1. Added "allvector" to pcre2test to make it easy to check the part of the
|
||||
ovector that shouldn't be changed, in particular after substitute and failed or
|
||||
1. Added "allvector" to pcre2test to make it easy to check the part of the
|
||||
ovector that shouldn't be changed, in particular after substitute and failed or
|
||||
partial matches.
|
||||
|
||||
2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
|
||||
|
@ -15,13 +15,21 @@ a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
|
|||
3. Added support for callouts from pcre2_substitute().
|
||||
|
||||
4. The POSIX functions are now all called pcre2_regcomp() etc., with wrappers
|
||||
that use the standard POSIX names. This should help avoid linking with the
|
||||
that use the standard POSIX names. This should help avoid linking with the
|
||||
wrong library in some environments.
|
||||
|
||||
5. Fix an xclass matching issue in JIT.
|
||||
|
||||
6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
|
||||
|
||||
7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
|
||||
lookaround assertions, for example, (*pla:...) and (*atomic:...). These are
|
||||
characterized by a lower case letter following (* and to simplify coding for
|
||||
this, the character tables created by pcre2_maketables() were updated to add a
|
||||
new "is lower case letter" bit. At the same time, the now unused "is
|
||||
hexadecimal digit" bit was removed. The default tables in
|
||||
src/pcre2_chartables.c.dist are updated.
|
||||
|
||||
|
||||
Version 10.32 10-September-2018
|
||||
-------------------------------
|
||||
|
|
|
@ -2120,6 +2120,11 @@ special parenthesis, starting with (?> as in this example:
|
|||
<pre>
|
||||
(?>\d+)foo
|
||||
</pre>
|
||||
Perl 5.28 introduced an experimental alphabetic form starting with (* which may
|
||||
be easier to remember:
|
||||
<pre>
|
||||
(*atomic:\d+)foo
|
||||
</pre>
|
||||
This kind of parenthesis "locks up" the part of the pattern it contains once
|
||||
it has matched, and a failure further into the pattern is prevented from
|
||||
backtracking into it. Backtracking past it to previous items, however, works as
|
||||
|
@ -2342,11 +2347,17 @@ coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described
|
|||
<P>
|
||||
More complicated assertions are coded as subpatterns. There are two kinds:
|
||||
those that look ahead of the current position in the subject string, and those
|
||||
that look behind it, and in each case an assertion may be positive (must
|
||||
succeed for matching to continue) or negative (must not succeed for matching to
|
||||
continue). An assertion subpattern is matched in the normal way, except that,
|
||||
when matching continues after a successful assertion, the matching position in
|
||||
the subject string is as it was before the assertion was processed.
|
||||
that look behind it, and in each case an assertion may be positive (must match
|
||||
for the assertion to be true) or negative (must not match for the assertion to
|
||||
be true). An assertion subpattern is matched in the normal way, and if it is
|
||||
true, matching continues after it, but with the matching position in the
|
||||
subject string is was it was before the assertion was processed.
|
||||
</P>
|
||||
<P>
|
||||
A lookaround assertion may also appear as the condition in a
|
||||
<a href="#conditions">conditional subpattern</a>
|
||||
(see below). In this case, the result of matching the assertion determines
|
||||
which branch of the condition is followed.
|
||||
</P>
|
||||
<P>
|
||||
Assertion subpatterns are not capturing subpatterns. If an assertion contains
|
||||
|
@ -2359,7 +2370,7 @@ adjacent characters are the same.
|
|||
<P>
|
||||
When a branch within an assertion fails to match, any substrings that were
|
||||
captured are discarded (as happens with any pattern branch that fails to
|
||||
match). A negative assertion succeeds only when all its branches fail to match;
|
||||
match). A negative assertion is true only when all its branches fail to match;
|
||||
this means that no captured substrings are ever retained after a successful
|
||||
negative assertion. When an assertion contains a matching branch, what happens
|
||||
depends on the type of assertion.
|
||||
|
@ -2368,7 +2379,7 @@ depends on the type of assertion.
|
|||
For a positive assertion, internally captured substrings in the successful
|
||||
branch are retained, and matching continues with the next pattern item after
|
||||
the assertion. For a negative assertion, a matching branch means that the
|
||||
assertion has failed. If the assertion is being used as a condition in a
|
||||
assertion is not true. If such an assertion is being used as a condition in a
|
||||
<a href="#conditions">conditional subpattern</a>
|
||||
(see below), captured substrings are retained, because matching continues with
|
||||
the "no" branch of the condition. For other failing negative assertions,
|
||||
|
@ -2398,6 +2409,25 @@ without the assertion, the order depending on the greediness of the quantifier.
|
|||
The assertion is obeyed just once when encountered during matching.
|
||||
</P>
|
||||
<br><b>
|
||||
Alphabetic assertion names
|
||||
</b><br>
|
||||
<P>
|
||||
Traditionally, symbolic sequences such as (?= and (?<= have been used to specify
|
||||
lookaround assertions. Perl 5.28 introduced some experimental alphabetic
|
||||
alternatives which might be easier to remember. They all start with (* instead
|
||||
of (? and must be written using lower case letters. PCRE2 supports the
|
||||
following synonyms:
|
||||
<pre>
|
||||
(*positive_lookahead: or (*pla: is the same as (?=
|
||||
(*negative_lookahead: or (*nla: is the same as (?!
|
||||
(*positive_lookbehind: or (*plb: is the same as (?<=
|
||||
(*negative_lookbehind: or (*nlb: is the same as (?<!
|
||||
</pre>
|
||||
For example, (*pla:foo) is the same assertion as (?=foo). However, in the
|
||||
following sections, the various assertions are described using the original
|
||||
symbolic forms.
|
||||
</P>
|
||||
<br><b>
|
||||
Lookahead assertions
|
||||
</b><br>
|
||||
<P>
|
||||
|
@ -3630,7 +3660,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 September 2018
|
||||
Last updated: 24 September 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -436,6 +436,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
<P>
|
||||
<pre>
|
||||
(?>...) atomic, non-capturing group
|
||||
(*atomic:...) atomic, non-capturing group
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
|
||||
|
@ -514,12 +515,23 @@ setting with a similar syntax.
|
|||
<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?=...) positive look ahead
|
||||
(?!...) negative look ahead
|
||||
(?<=...) positive look behind
|
||||
(?<!...) negative look behind
|
||||
(?=...) )
|
||||
(*pla:...) ) positive lookahead
|
||||
(*positive_lookahead:...) )
|
||||
|
||||
(?!...) )
|
||||
(*nla:...) ) negative lookahead
|
||||
(*negative_lookahead:...) )
|
||||
|
||||
(?<=...) )
|
||||
(*plb:...) ) positive lookbehind
|
||||
(*positive_lookbehind:...) )
|
||||
|
||||
(?<!...) )
|
||||
(*nlb:...) ) negative lookbehind
|
||||
(*negative_lookbehind:...) )
|
||||
</pre>
|
||||
Each top-level branch of a look behind must be of a fixed length.
|
||||
Each top-level branch of a lookbehind must be of a fixed length.
|
||||
</P>
|
||||
<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
|
||||
<P>
|
||||
|
@ -634,7 +646,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 02 September 2018
|
||||
Last updated: 24 September 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
1180
doc/pcre2.txt
1180
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "21 September 2018" "PCRE2 10.33"
|
||||
.TH PCRE2PATTERN 3 "24 September 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -2124,6 +2124,11 @@ special parenthesis, starting with (?> as in this example:
|
|||
.sp
|
||||
(?>\ed+)foo
|
||||
.sp
|
||||
Perl 5.28 introduced an experimental alphabetic form starting with (* which may
|
||||
be easier to remember:
|
||||
.sp
|
||||
(*atomic:\ed+)foo
|
||||
.sp
|
||||
This kind of parenthesis "locks up" the part of the pattern it contains once
|
||||
it has matched, and a failure further into the pattern is prevented from
|
||||
backtracking into it. Backtracking past it to previous items, however, works as
|
||||
|
@ -2351,11 +2356,19 @@ above.
|
|||
.P
|
||||
More complicated assertions are coded as subpatterns. There are two kinds:
|
||||
those that look ahead of the current position in the subject string, and those
|
||||
that look behind it, and in each case an assertion may be positive (must
|
||||
succeed for matching to continue) or negative (must not succeed for matching to
|
||||
continue). An assertion subpattern is matched in the normal way, except that,
|
||||
when matching continues after a successful assertion, the matching position in
|
||||
the subject string is as it was before the assertion was processed.
|
||||
that look behind it, and in each case an assertion may be positive (must match
|
||||
for the assertion to be true) or negative (must not match for the assertion to
|
||||
be true). An assertion subpattern is matched in the normal way, and if it is
|
||||
true, matching continues after it, but with the matching position in the
|
||||
subject string is was it was before the assertion was processed.
|
||||
.P
|
||||
A lookaround assertion may also appear as the condition in a
|
||||
.\" HTML <a href="#conditions">
|
||||
.\" </a>
|
||||
conditional subpattern
|
||||
.\"
|
||||
(see below). In this case, the result of matching the assertion determines
|
||||
which branch of the condition is followed.
|
||||
.P
|
||||
Assertion subpatterns are not capturing subpatterns. If an assertion contains
|
||||
capturing subpatterns within it, these are counted for the purposes of
|
||||
|
@ -2366,7 +2379,7 @@ adjacent characters are the same.
|
|||
.P
|
||||
When a branch within an assertion fails to match, any substrings that were
|
||||
captured are discarded (as happens with any pattern branch that fails to
|
||||
match). A negative assertion succeeds only when all its branches fail to match;
|
||||
match). A negative assertion is true only when all its branches fail to match;
|
||||
this means that no captured substrings are ever retained after a successful
|
||||
negative assertion. When an assertion contains a matching branch, what happens
|
||||
depends on the type of assertion.
|
||||
|
@ -2374,7 +2387,7 @@ depends on the type of assertion.
|
|||
For a positive assertion, internally captured substrings in the successful
|
||||
branch are retained, and matching continues with the next pattern item after
|
||||
the assertion. For a negative assertion, a matching branch means that the
|
||||
assertion has failed. If the assertion is being used as a condition in a
|
||||
assertion is not true. If such an assertion is being used as a condition in a
|
||||
.\" HTML <a href="#conditions">
|
||||
.\" </a>
|
||||
conditional subpattern
|
||||
|
@ -2406,6 +2419,25 @@ without the assertion, the order depending on the greediness of the quantifier.
|
|||
The assertion is obeyed just once when encountered during matching.
|
||||
.
|
||||
.
|
||||
.SS "Alphabetic assertion names"
|
||||
.rs
|
||||
.sp
|
||||
Traditionally, symbolic sequences such as (?= and (?<= have been used to specify
|
||||
lookaround assertions. Perl 5.28 introduced some experimental alphabetic
|
||||
alternatives which might be easier to remember. They all start with (* instead
|
||||
of (? and must be written using lower case letters. PCRE2 supports the
|
||||
following synonyms:
|
||||
.sp
|
||||
(*positive_lookahead: or (*pla: is the same as (?=
|
||||
(*negative_lookahead: or (*nla: is the same as (?!
|
||||
(*positive_lookbehind: or (*plb: is the same as (?<=
|
||||
(*negative_lookbehind: or (*nlb: is the same as (?<!
|
||||
.sp
|
||||
For example, (*pla:foo) is the same assertion as (?=foo). However, in the
|
||||
following sections, the various assertions are described using the original
|
||||
symbolic forms.
|
||||
.
|
||||
.
|
||||
.SS "Lookahead assertions"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -3660,6 +3692,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 September 2018
|
||||
Last updated: 24 September 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "02 September 2018" "PCRE2 10.32"
|
||||
.TH PCRE2SYNTAX 3 "24 September 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -411,6 +411,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
.rs
|
||||
.sp
|
||||
(?>...) atomic, non-capturing group
|
||||
(*atomic:...) atomic, non-capturing group
|
||||
.
|
||||
.
|
||||
.SH "COMMENT"
|
||||
|
@ -491,12 +492,23 @@ setting with a similar syntax.
|
|||
.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
|
||||
.rs
|
||||
.sp
|
||||
(?=...) positive look ahead
|
||||
(?!...) negative look ahead
|
||||
(?<=...) positive look behind
|
||||
(?<!...) negative look behind
|
||||
(?=...) )
|
||||
(*pla:...) ) positive lookahead
|
||||
(*positive_lookahead:...) )
|
||||
.sp
|
||||
Each top-level branch of a look behind must be of a fixed length.
|
||||
(?!...) )
|
||||
(*nla:...) ) negative lookahead
|
||||
(*negative_lookahead:...) )
|
||||
.sp
|
||||
(?<=...) )
|
||||
(*plb:...) ) positive lookbehind
|
||||
(*positive_lookbehind:...) )
|
||||
.sp
|
||||
(?<!...) )
|
||||
(*nlb:...) ) negative lookbehind
|
||||
(*negative_lookbehind:...) )
|
||||
.sp
|
||||
Each top-level branch of a lookbehind must be of a fixed length.
|
||||
.
|
||||
.
|
||||
.SH "BACKREFERENCES"
|
||||
|
@ -621,6 +633,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 02 September 2018
|
||||
Last updated: 24 September 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -103,19 +103,22 @@ const unsigned char _pcre_default_tables[] = {
|
|||
0,0,0,0,0,0,0,128,
|
||||
255,255,255,255,0,0,0,0,
|
||||
0,0,0,0,0,0,0,0,
|
||||
|
||||
/* Fiddled by hand when the table bits changed. May be broken! */
|
||||
|
||||
128,0,0,0,0,0,0,0,
|
||||
0,1,1,0,1,1,0,0,
|
||||
0,1,1,1,1,1,0,0,
|
||||
0,0,0,0,0,0,0,0,
|
||||
0,0,0,0,0,0,0,0,
|
||||
1,0,0,0,128,0,0,0,
|
||||
128,128,128,128,0,0,128,0,
|
||||
28,28,28,28,28,28,28,28,
|
||||
28,28,0,0,0,0,0,128,
|
||||
0,26,26,26,26,26,26,18,
|
||||
24,24,24,24,24,24,24,24,
|
||||
24,24,0,0,0,0,0,128,
|
||||
0,18,18,18,18,18,18,18,
|
||||
18,18,18,18,18,18,18,18,
|
||||
18,18,18,18,18,18,18,18,
|
||||
18,18,18,128,128,0,128,16,
|
||||
0,26,26,26,26,26,26,18,
|
||||
0,18,18,18,18,18,18,18,
|
||||
18,18,18,18,18,18,18,18,
|
||||
18,18,18,18,18,18,18,18,
|
||||
18,18,18,128,128,0,0,0,
|
||||
|
@ -125,8 +128,8 @@ const unsigned char _pcre_default_tables[] = {
|
|||
0,0,0,0,0,0,0,0,
|
||||
1,0,0,0,0,0,0,0,
|
||||
0,0,18,0,0,0,0,0,
|
||||
0,0,20,20,0,18,0,0,
|
||||
0,20,18,0,0,0,0,0,
|
||||
0,0,24,24,0,18,0,0,
|
||||
0,24,18,0,0,0,0,0,
|
||||
18,18,18,18,18,18,18,18,
|
||||
18,18,18,18,18,18,18,18,
|
||||
18,18,18,18,18,18,18,0,
|
||||
|
|
12
perltest.sh
12
perltest.sh
|
@ -75,6 +75,10 @@ fi
|
|||
|
||||
(echo "$prefix" ; cat <<'PERLEND'
|
||||
|
||||
# The alpha assertions currently give warnings even when -w is not specified.
|
||||
|
||||
no warnings "experimental::alpha_assertions";
|
||||
|
||||
# Function for turning a string into a string of printing chars.
|
||||
|
||||
sub pchars {
|
||||
|
@ -129,6 +133,9 @@ else { $outfile = "STDOUT"; }
|
|||
|
||||
printf($outfile "Perl $] Regular Expressions\n\n");
|
||||
|
||||
$extra_modifiers = "";
|
||||
$default_show_mark = 0;
|
||||
|
||||
# Main loop
|
||||
|
||||
NEXT_RE:
|
||||
|
@ -370,7 +377,10 @@ for (;;)
|
|||
}
|
||||
}
|
||||
|
||||
# printf $outfile "\n";
|
||||
# By closing OUTFILE explicitly, we avoid a Perl warning in -w mode
|
||||
# "main::OUTFILE" used only once".
|
||||
|
||||
close(OUTFILE) if $outfile eq "OUTFILE";
|
||||
|
||||
PERLEND
|
||||
) | $perl $perlarg - $@
|
||||
|
|
|
@ -183,10 +183,10 @@ fprintf(f,
|
|||
"/* This table identifies various classes of character by individual bits:\n"
|
||||
" 0x%02x white space character\n"
|
||||
" 0x%02x letter\n"
|
||||
" 0x%02x lower case letter\n"
|
||||
" 0x%02x decimal digit\n"
|
||||
" 0x%02x hexadecimal digit\n"
|
||||
" 0x%02x alphanumeric or '_'\n*/\n\n",
|
||||
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word);
|
||||
ctype_space, ctype_letter, ctype_lcletter, ctype_digit, ctype_word);
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
|
|
|
@ -320,6 +320,7 @@ pcre2_pattern_convert(). */
|
|||
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
||||
#define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193
|
||||
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
|
||||
#define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN 195
|
||||
|
||||
|
||||
/* "Expected" matching error codes: no match and partial match. */
|
||||
|
|
|
@ -157,8 +157,8 @@ graph print, punct, and cntrl. Other classes are built from combinations. */
|
|||
/* This table identifies various classes of character by individual bits:
|
||||
0x01 white space character
|
||||
0x02 letter
|
||||
0x04 decimal digit
|
||||
0x08 hexadecimal digit
|
||||
0x04 lower case letter
|
||||
0x08 decimal digit
|
||||
0x10 alphanumeric or '_'
|
||||
*/
|
||||
|
||||
|
@ -168,16 +168,16 @@ graph print, punct, and cntrl. Other classes are built from combinations. */
|
|||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
|
||||
0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - ' */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ( - / */
|
||||
0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */
|
||||
0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x00, /* 8 - ? */
|
||||
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */
|
||||
0x18,0x18,0x18,0x18,0x18,0x18,0x18,0x18, /* 0 - 7 */
|
||||
0x18,0x18,0x00,0x00,0x00,0x00,0x00,0x00, /* 8 - ? */
|
||||
0x00,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* @ - G */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */
|
||||
0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x10, /* X - _ */
|
||||
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */
|
||||
0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x00, /* x -127 */
|
||||
0x00,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* ` - g */
|
||||
0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* h - o */
|
||||
0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* p - w */
|
||||
0x16,0x16,0x16,0x00,0x00,0x00,0x00,0x00, /* x -127 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
|
||||
|
|
|
@ -615,6 +615,46 @@ static const uint32_t verbops[] = {
|
|||
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
||||
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
||||
|
||||
/* Table of "alpha assertions" like (*pla:...), similar to the (*VERB) table. */
|
||||
|
||||
typedef struct alasitem {
|
||||
unsigned int len; /* Length of name */
|
||||
uint32_t meta; /* Base META_ code */
|
||||
} alasitem;
|
||||
|
||||
static const char alasnames[] =
|
||||
STRING_pla0
|
||||
STRING_plb0
|
||||
STRING_nla0
|
||||
STRING_nlb0
|
||||
STRING_positive_lookahead0
|
||||
STRING_positive_lookbehind0
|
||||
STRING_negative_lookahead0
|
||||
STRING_negative_lookbehind0
|
||||
STRING_atomic0
|
||||
STRING_sr0
|
||||
STRING_asr0
|
||||
STRING_script_run0
|
||||
STRING_atomic_script_run;
|
||||
|
||||
static const alasitem alasmeta[] = {
|
||||
{ 3, META_LOOKAHEAD },
|
||||
{ 3, META_LOOKBEHIND },
|
||||
{ 3, META_LOOKAHEADNOT },
|
||||
{ 3, META_LOOKBEHINDNOT },
|
||||
{ 18, META_LOOKAHEAD },
|
||||
{ 19, META_LOOKBEHIND },
|
||||
{ 18, META_LOOKAHEADNOT },
|
||||
{ 19, META_LOOKBEHINDNOT },
|
||||
{ 6, META_ATOMIC },
|
||||
{ 2, 0 }, /* sr = script run */
|
||||
{ 3, 0 }, /* asr = atomic script run */
|
||||
{ 10, 0 }, /* script run */
|
||||
{ 17, 0 } /* atomic script run */
|
||||
};
|
||||
|
||||
static const int alascount = sizeof(alasmeta)/sizeof(alasitem);
|
||||
|
||||
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
||||
|
||||
static uint32_t chartypeoffset[] = {
|
||||
|
@ -732,7 +772,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
|
|||
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
||||
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
||||
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
||||
ERR91, ERR92, ERR93, ERR94 };
|
||||
ERR91, ERR92, ERR93, ERR94, ERR95 };
|
||||
|
||||
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
||||
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
||||
|
@ -1447,9 +1487,9 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
|||
c = (uint32_t)i;
|
||||
if (cb != NULL && c == CHAR_CR &&
|
||||
(cb->cx->extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
|
||||
c = CHAR_LF;
|
||||
c = CHAR_LF;
|
||||
}
|
||||
else /* Negative table entry */
|
||||
else /* Negative table entry */
|
||||
{
|
||||
escape = -i; /* Else return a special escape */
|
||||
if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X))
|
||||
|
@ -1499,7 +1539,7 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
|||
}
|
||||
}
|
||||
|
||||
/* Escapes that need further processing, including those that are unknown, have
|
||||
/* Escapes that need further processing, including those that are unknown, have
|
||||
a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
|
||||
\o, and \x are recognized (and \u when BSUX is set). */
|
||||
|
||||
|
@ -2133,9 +2173,10 @@ return -1;
|
|||
*************************************************/
|
||||
|
||||
/* This function is called from parse_regex() below whenever it needs to read
|
||||
the name of a subpattern or a (*VERB). The initial pointer must be to the
|
||||
character before the name. If that character is '*' we are reading a verb name.
|
||||
The pointer is updated to point after the name, for a VERB, or after tha name's
|
||||
the name of a subpattern or a (*VERB) or an (*alpha_assertion). The initial
|
||||
pointer must be to the character before the name. If that character is '*' we
|
||||
are reading a verb or alpha assertion name. The pointer is updated to point
|
||||
after the name, for a VERB or alpha assertion name, or after tha name's
|
||||
terminator for a subpattern name. Returning both the offset and the name
|
||||
pointer is redundant information, but some callers use one and some the other,
|
||||
so it is simplest just to return both.
|
||||
|
@ -2160,27 +2201,29 @@ read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t terminator,
|
|||
int *errorcodeptr, compile_block *cb)
|
||||
{
|
||||
PCRE2_SPTR ptr = *ptrptr;
|
||||
BOOL is_verb = (*ptr == CHAR_ASTERISK);
|
||||
BOOL is_group = (*ptr != CHAR_ASTERISK);
|
||||
uint32_t namelen = 0;
|
||||
uint32_t ctype = is_verb? ctype_letter : ctype_word;
|
||||
|
||||
if (++ptr >= ptrend)
|
||||
if (++ptr >= ptrend) /* No characters in name */
|
||||
{
|
||||
*errorcodeptr = is_verb? ERR60: /* Verb not recognized or malformed */
|
||||
ERR62; /* Subpattern name expected */
|
||||
*errorcodeptr = is_group? ERR62: /* Subpattern name expected */
|
||||
ERR60; /* Verb not recognized or malformed */
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
/* A group name must not start with a digit. If either of the others start with
|
||||
a digit it just won't be recognized. */
|
||||
|
||||
if (is_group && IS_DIGIT(*ptr))
|
||||
{
|
||||
*errorcodeptr = ERR44;
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
*nameptr = ptr;
|
||||
*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);
|
||||
|
||||
if (IS_DIGIT(*ptr))
|
||||
{
|
||||
*errorcodeptr = ERR44; /* Group name must not start with digit */
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype) != 0)
|
||||
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
|
||||
{
|
||||
ptr++;
|
||||
namelen++;
|
||||
|
@ -2192,9 +2235,9 @@ while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype) != 0)
|
|||
}
|
||||
|
||||
/* Subpattern names must not be empty, and their terminator is checked here.
|
||||
(What follows a verb name is checked separately.) */
|
||||
(What follows a verb or alpha assertion name is checked separately.) */
|
||||
|
||||
if (!is_verb)
|
||||
if (is_group)
|
||||
{
|
||||
if (namelen == 0)
|
||||
{
|
||||
|
@ -2652,24 +2695,31 @@ while (ptr < ptrend)
|
|||
if (expect_cond_assert > 0)
|
||||
{
|
||||
BOOL ok = c == CHAR_LEFT_PARENTHESIS && ptrend - ptr >= 3 &&
|
||||
ptr[0] == CHAR_QUESTION_MARK;
|
||||
if (ok) switch(ptr[1])
|
||||
(ptr[0] == CHAR_QUESTION_MARK || ptr[0] == CHAR_ASTERISK);
|
||||
if (ok)
|
||||
{
|
||||
case CHAR_C:
|
||||
ok = expect_cond_assert == 2;
|
||||
break;
|
||||
|
||||
case CHAR_EQUALS_SIGN:
|
||||
case CHAR_EXCLAMATION_MARK:
|
||||
break;
|
||||
|
||||
case CHAR_LESS_THAN_SIGN:
|
||||
ok = ptr[2] == CHAR_EQUALS_SIGN || ptr[2] == CHAR_EXCLAMATION_MARK;
|
||||
break;
|
||||
|
||||
default:
|
||||
ok = FALSE;
|
||||
}
|
||||
if (ptr[0] == CHAR_ASTERISK) /* New alpha assertion format, possibly */
|
||||
{
|
||||
ok = MAX_255(ptr[1]) && (cb->ctypes[ptr[1]] & ctype_lcletter) != 0;
|
||||
}
|
||||
else switch(ptr[1]) /* Traditional symbolic format */
|
||||
{
|
||||
case CHAR_C:
|
||||
ok = expect_cond_assert == 2;
|
||||
break;
|
||||
|
||||
case CHAR_EQUALS_SIGN:
|
||||
case CHAR_EXCLAMATION_MARK:
|
||||
break;
|
||||
|
||||
case CHAR_LESS_THAN_SIGN:
|
||||
ok = ptr[2] == CHAR_EQUALS_SIGN || ptr[2] == CHAR_EXCLAMATION_MARK;
|
||||
break;
|
||||
|
||||
default:
|
||||
ok = FALSE;
|
||||
}
|
||||
}
|
||||
|
||||
if (!ok)
|
||||
{
|
||||
|
@ -3453,7 +3503,8 @@ while (ptr < ptrend)
|
|||
case CHAR_LEFT_PARENTHESIS:
|
||||
if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
|
||||
|
||||
/* If ( is not followed by ? it is either a capture or a special verb. */
|
||||
/* If ( is not followed by ? it is either a capture or a special verb or an
|
||||
alpha assertion. */
|
||||
|
||||
if (*ptr != CHAR_QUESTION_MARK)
|
||||
{
|
||||
|
@ -3473,13 +3524,88 @@ while (ptr < ptrend)
|
|||
else *parsed_pattern++ = META_NOCAPTURE;
|
||||
}
|
||||
|
||||
/* Do nothing for (* followed by end of pattern or ) so it gives a "bad
|
||||
quantifier" error rather than "(*MARK) must have an argument". */
|
||||
|
||||
else if (ptrend - ptr <= 1 || (c = ptr[1]) == CHAR_RIGHT_PARENTHESIS)
|
||||
break;
|
||||
|
||||
/* Handle "alpha assertions" such as (*pla:...). Most of these are
|
||||
synonyms for the historical symbolic assertions, but the script run ones
|
||||
are new. They are distinguished by starting with a lower case letter.
|
||||
Checking both ends of the alphabet makes this work in all character
|
||||
codes. */
|
||||
|
||||
else if (CHMAX_255(c) && (cb->ctypes[c] & ctype_lcletter) != 0)
|
||||
{
|
||||
uint32_t meta;
|
||||
|
||||
vn = alasnames;
|
||||
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
||||
cb)) goto FAILED;
|
||||
if (ptr >= ptrend || *ptr != CHAR_COLON)
|
||||
{
|
||||
errorcode = ERR95; /* Malformed */
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
/* Scan the table of alpha assertion names */
|
||||
|
||||
for (i = 0; i < alascount; i++)
|
||||
{
|
||||
if (namelen == alasmeta[i].len &&
|
||||
PRIV(strncmp_c8)(name, vn, namelen) == 0)
|
||||
break;
|
||||
vn += alasmeta[i].len + 1;
|
||||
}
|
||||
|
||||
if (i >= alascount)
|
||||
{
|
||||
errorcode = ERR95; /* Alpha assertion not recognized */
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
/* Check for expecting an assertion condition. If so, only lookaround
|
||||
assertions are valid. */
|
||||
|
||||
meta = alasmeta[i].meta;
|
||||
if (prev_expect_cond_assert > 0 &&
|
||||
(meta < META_LOOKAHEAD || meta > META_LOOKBEHINDNOT))
|
||||
{
|
||||
errorcode = ERR28; /* Assertion expected */
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
switch(meta)
|
||||
{
|
||||
case META_ATOMIC:
|
||||
goto ATOMIC_GROUP;
|
||||
|
||||
case META_LOOKAHEAD:
|
||||
goto POSITIVE_LOOK_AHEAD;
|
||||
|
||||
case META_LOOKAHEADNOT:
|
||||
goto NEGATIVE_LOOK_AHEAD;
|
||||
|
||||
case META_LOOKBEHIND:
|
||||
case META_LOOKBEHINDNOT:
|
||||
*parsed_pattern++ = meta;
|
||||
ptr--;
|
||||
goto LOOKBEHIND;
|
||||
|
||||
/* FIXME: Script Run stuff ... */
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* ---- Handle (*VERB) and (*VERB:NAME) ---- */
|
||||
|
||||
/* Do nothing for (*) so it gives a "bad quantifier" error rather than
|
||||
"(*MARK) must have an argument". */
|
||||
|
||||
else if (ptrend - ptr > 1 && ptr[1] != CHAR_RIGHT_PARENTHESIS)
|
||||
else
|
||||
{
|
||||
vn = verbnames;
|
||||
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
||||
|
@ -3946,14 +4072,15 @@ while (ptr < ptrend)
|
|||
if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
|
||||
nest_depth++;
|
||||
|
||||
/* If the next character is ? there must be an assertion next (optionally
|
||||
preceded by a callout). We do not check this here, but instead we set
|
||||
expect_cond_assert to 2. If this is still greater than zero (callouts
|
||||
decrement it) when the next assertion is read, it will be marked as a
|
||||
condition that must not be repeated. A value greater than zero also
|
||||
causes checking that an assertion (possibly with callout) follows. */
|
||||
/* If the next character is ? or * there must be an assertion next
|
||||
(optionally preceded by a callout). We do not check this here, but
|
||||
instead we set expect_cond_assert to 2. If this is still greater than
|
||||
zero (callouts decrement it) when the next assertion is read, it will be
|
||||
marked as a condition that must not be repeated. A value greater than
|
||||
zero also causes checking that an assertion (possibly with callout)
|
||||
follows. */
|
||||
|
||||
if (*ptr == CHAR_QUESTION_MARK)
|
||||
if (*ptr == CHAR_QUESTION_MARK || *ptr == CHAR_ASTERISK)
|
||||
{
|
||||
*parsed_pattern++ = META_COND_ASSERT;
|
||||
ptr--; /* Pull pointer back to the opening parenthesis. */
|
||||
|
@ -4099,6 +4226,7 @@ while (ptr < ptrend)
|
|||
/* ---- Atomic group ---- */
|
||||
|
||||
case CHAR_GREATER_THAN_SIGN:
|
||||
ATOMIC_GROUP: /* Come from (*atomic: */
|
||||
*parsed_pattern++ = META_ATOMIC;
|
||||
nest_depth++;
|
||||
ptr++;
|
||||
|
@ -4108,11 +4236,13 @@ while (ptr < ptrend)
|
|||
/* ---- Lookahead assertions ---- */
|
||||
|
||||
case CHAR_EQUALS_SIGN:
|
||||
POSITIVE_LOOK_AHEAD: /* Come from (*pla: */
|
||||
*parsed_pattern++ = META_LOOKAHEAD;
|
||||
ptr++;
|
||||
goto POST_ASSERTION;
|
||||
|
||||
case CHAR_EXCLAMATION_MARK:
|
||||
NEGATIVE_LOOK_AHEAD: /* Come from (*nla: */
|
||||
*parsed_pattern++ = META_LOOKAHEADNOT;
|
||||
ptr++;
|
||||
goto POST_ASSERTION;
|
||||
|
@ -4132,6 +4262,8 @@ while (ptr < ptrend)
|
|||
}
|
||||
*parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)?
|
||||
META_LOOKBEHIND : META_LOOKBEHINDNOT;
|
||||
|
||||
LOOKBEHIND: /* Come from (*plb: and (*nlb: */
|
||||
*has_lookbehind = TRUE;
|
||||
offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2);
|
||||
PUTOFFSET(offset, parsed_pattern);
|
||||
|
|
|
@ -181,6 +181,8 @@ static const unsigned char compile_error_texts[] =
|
|||
"invalid option bits with PCRE2_LITERAL\0"
|
||||
"\\N{U+dddd} is supported only in Unicode (UTF) mode\0"
|
||||
"invalid hyphen in option setting\0"
|
||||
/* 95 */
|
||||
"(*alpha_assertion) not recognized\0"
|
||||
;
|
||||
|
||||
/* Match-time and UTF error texts are in the same format. */
|
||||
|
|
|
@ -569,11 +569,11 @@ these tables. */
|
|||
without checking pcre2_jit_compile.c, which has an assertion to ensure that
|
||||
ctype_word has the value 16. */
|
||||
|
||||
#define ctype_space 0x01
|
||||
#define ctype_letter 0x02
|
||||
#define ctype_digit 0x04
|
||||
#define ctype_xdigit 0x08 /* not actually used any more */
|
||||
#define ctype_word 0x10 /* alphanumeric or '_' */
|
||||
#define ctype_space 0x01
|
||||
#define ctype_letter 0x02
|
||||
#define ctype_lcletter 0x04
|
||||
#define ctype_digit 0x08
|
||||
#define ctype_word 0x10 /* alphanumeric or '_' */
|
||||
|
||||
/* Offsets of the various tables from the base tables pointer, and
|
||||
total length of the tables. */
|
||||
|
@ -874,34 +874,48 @@ a positive value. */
|
|||
#define STR_RIGHT_CURLY_BRACKET "}"
|
||||
#define STR_TILDE "~"
|
||||
|
||||
#define STRING_ACCEPT0 "ACCEPT\0"
|
||||
#define STRING_COMMIT0 "COMMIT\0"
|
||||
#define STRING_F0 "F\0"
|
||||
#define STRING_FAIL0 "FAIL\0"
|
||||
#define STRING_MARK0 "MARK\0"
|
||||
#define STRING_PRUNE0 "PRUNE\0"
|
||||
#define STRING_SKIP0 "SKIP\0"
|
||||
#define STRING_THEN "THEN"
|
||||
#define STRING_ACCEPT0 "ACCEPT\0"
|
||||
#define STRING_COMMIT0 "COMMIT\0"
|
||||
#define STRING_F0 "F\0"
|
||||
#define STRING_FAIL0 "FAIL\0"
|
||||
#define STRING_MARK0 "MARK\0"
|
||||
#define STRING_PRUNE0 "PRUNE\0"
|
||||
#define STRING_SKIP0 "SKIP\0"
|
||||
#define STRING_THEN "THEN"
|
||||
|
||||
#define STRING_alpha0 "alpha\0"
|
||||
#define STRING_lower0 "lower\0"
|
||||
#define STRING_upper0 "upper\0"
|
||||
#define STRING_alnum0 "alnum\0"
|
||||
#define STRING_ascii0 "ascii\0"
|
||||
#define STRING_blank0 "blank\0"
|
||||
#define STRING_cntrl0 "cntrl\0"
|
||||
#define STRING_digit0 "digit\0"
|
||||
#define STRING_graph0 "graph\0"
|
||||
#define STRING_print0 "print\0"
|
||||
#define STRING_punct0 "punct\0"
|
||||
#define STRING_space0 "space\0"
|
||||
#define STRING_word0 "word\0"
|
||||
#define STRING_xdigit "xdigit"
|
||||
#define STRING_atomic0 "atomic\0"
|
||||
#define STRING_pla0 "pla\0"
|
||||
#define STRING_plb0 "plb\0"
|
||||
#define STRING_nla0 "nla\0"
|
||||
#define STRING_nlb0 "nlb\0"
|
||||
#define STRING_sr0 "sr\0"
|
||||
#define STRING_asr0 "asr\0"
|
||||
#define STRING_positive_lookahead0 "positive_lookahead\0"
|
||||
#define STRING_positive_lookbehind0 "positive_lookbehind\0"
|
||||
#define STRING_negative_lookahead0 "negative_lookahead\0"
|
||||
#define STRING_negative_lookbehind0 "negative_lookbehind\0"
|
||||
#define STRING_script_run0 "script_run\0"
|
||||
#define STRING_atomic_script_run "atomic_script_run"
|
||||
|
||||
#define STRING_DEFINE "DEFINE"
|
||||
#define STRING_VERSION "VERSION"
|
||||
#define STRING_WEIRD_STARTWORD "[:<:]]"
|
||||
#define STRING_WEIRD_ENDWORD "[:>:]]"
|
||||
#define STRING_alpha0 "alpha\0"
|
||||
#define STRING_lower0 "lower\0"
|
||||
#define STRING_upper0 "upper\0"
|
||||
#define STRING_alnum0 "alnum\0"
|
||||
#define STRING_ascii0 "ascii\0"
|
||||
#define STRING_blank0 "blank\0"
|
||||
#define STRING_cntrl0 "cntrl\0"
|
||||
#define STRING_digit0 "digit\0"
|
||||
#define STRING_graph0 "graph\0"
|
||||
#define STRING_print0 "print\0"
|
||||
#define STRING_punct0 "punct\0"
|
||||
#define STRING_space0 "space\0"
|
||||
#define STRING_word0 "word\0"
|
||||
#define STRING_xdigit "xdigit"
|
||||
|
||||
#define STRING_DEFINE "DEFINE"
|
||||
#define STRING_VERSION "VERSION"
|
||||
#define STRING_WEIRD_STARTWORD "[:<:]]"
|
||||
#define STRING_WEIRD_ENDWORD "[:>:]]"
|
||||
|
||||
#define STRING_CR_RIGHTPAR "CR)"
|
||||
#define STRING_LF_RIGHTPAR "LF)"
|
||||
|
@ -1150,34 +1164,48 @@ only. */
|
|||
#define STR_RIGHT_CURLY_BRACKET "\175"
|
||||
#define STR_TILDE "\176"
|
||||
|
||||
#define STRING_ACCEPT0 STR_A STR_C STR_C STR_E STR_P STR_T "\0"
|
||||
#define STRING_COMMIT0 STR_C STR_O STR_M STR_M STR_I STR_T "\0"
|
||||
#define STRING_F0 STR_F "\0"
|
||||
#define STRING_FAIL0 STR_F STR_A STR_I STR_L "\0"
|
||||
#define STRING_MARK0 STR_M STR_A STR_R STR_K "\0"
|
||||
#define STRING_PRUNE0 STR_P STR_R STR_U STR_N STR_E "\0"
|
||||
#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0"
|
||||
#define STRING_THEN STR_T STR_H STR_E STR_N
|
||||
#define STRING_ACCEPT0 STR_A STR_C STR_C STR_E STR_P STR_T "\0"
|
||||
#define STRING_COMMIT0 STR_C STR_O STR_M STR_M STR_I STR_T "\0"
|
||||
#define STRING_F0 STR_F "\0"
|
||||
#define STRING_FAIL0 STR_F STR_A STR_I STR_L "\0"
|
||||
#define STRING_MARK0 STR_M STR_A STR_R STR_K "\0"
|
||||
#define STRING_PRUNE0 STR_P STR_R STR_U STR_N STR_E "\0"
|
||||
#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0"
|
||||
#define STRING_THEN STR_T STR_H STR_E STR_N
|
||||
|
||||
#define STRING_alpha0 STR_a STR_l STR_p STR_h STR_a "\0"
|
||||
#define STRING_lower0 STR_l STR_o STR_w STR_e STR_r "\0"
|
||||
#define STRING_upper0 STR_u STR_p STR_p STR_e STR_r "\0"
|
||||
#define STRING_alnum0 STR_a STR_l STR_n STR_u STR_m "\0"
|
||||
#define STRING_ascii0 STR_a STR_s STR_c STR_i STR_i "\0"
|
||||
#define STRING_blank0 STR_b STR_l STR_a STR_n STR_k "\0"
|
||||
#define STRING_cntrl0 STR_c STR_n STR_t STR_r STR_l "\0"
|
||||
#define STRING_digit0 STR_d STR_i STR_g STR_i STR_t "\0"
|
||||
#define STRING_graph0 STR_g STR_r STR_a STR_p STR_h "\0"
|
||||
#define STRING_print0 STR_p STR_r STR_i STR_n STR_t "\0"
|
||||
#define STRING_punct0 STR_p STR_u STR_n STR_c STR_t "\0"
|
||||
#define STRING_space0 STR_s STR_p STR_a STR_c STR_e "\0"
|
||||
#define STRING_word0 STR_w STR_o STR_r STR_d "\0"
|
||||
#define STRING_xdigit STR_x STR_d STR_i STR_g STR_i STR_t
|
||||
#define STRING_atomic0 STR_a STR_t STR_o STR_m STR_i STR_c "\0"
|
||||
#define STRING_pla0 STR_p STR_l STR_a "\0"
|
||||
#define STRING_plb0 STR_p STR_l STR_b "\0"
|
||||
#define STRING_nla0 STR_n STR_l STR_a "\0"
|
||||
#define STRING_nlb0 STR_n STR_l STR_b "\0"
|
||||
#define STRING_sr0 STR_s STR_r "\0"
|
||||
#define STRING_asr0 STR_a STR_s STR_r "\0"
|
||||
#define STRING_positive_lookahead0 STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
|
||||
#define STRING_positive_lookbehind0 STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
|
||||
#define STRING_negative_lookahead0 STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
|
||||
#define STRING_negative_lookbehind0 STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
|
||||
#define STRING_script_run0 STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n "\0"
|
||||
#define STRING_atomic_script_run STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n
|
||||
|
||||
#define STRING_DEFINE STR_D STR_E STR_F STR_I STR_N STR_E
|
||||
#define STRING_VERSION STR_V STR_E STR_R STR_S STR_I STR_O STR_N
|
||||
#define STRING_WEIRD_STARTWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_LESS_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
|
||||
#define STRING_WEIRD_ENDWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_GREATER_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
|
||||
#define STRING_alpha0 STR_a STR_l STR_p STR_h STR_a "\0"
|
||||
#define STRING_lower0 STR_l STR_o STR_w STR_e STR_r "\0"
|
||||
#define STRING_upper0 STR_u STR_p STR_p STR_e STR_r "\0"
|
||||
#define STRING_alnum0 STR_a STR_l STR_n STR_u STR_m "\0"
|
||||
#define STRING_ascii0 STR_a STR_s STR_c STR_i STR_i "\0"
|
||||
#define STRING_blank0 STR_b STR_l STR_a STR_n STR_k "\0"
|
||||
#define STRING_cntrl0 STR_c STR_n STR_t STR_r STR_l "\0"
|
||||
#define STRING_digit0 STR_d STR_i STR_g STR_i STR_t "\0"
|
||||
#define STRING_graph0 STR_g STR_r STR_a STR_p STR_h "\0"
|
||||
#define STRING_print0 STR_p STR_r STR_i STR_n STR_t "\0"
|
||||
#define STRING_punct0 STR_p STR_u STR_n STR_c STR_t "\0"
|
||||
#define STRING_space0 STR_s STR_p STR_a STR_c STR_e "\0"
|
||||
#define STRING_word0 STR_w STR_o STR_r STR_d "\0"
|
||||
#define STRING_xdigit STR_x STR_d STR_i STR_g STR_i STR_t
|
||||
|
||||
#define STRING_DEFINE STR_D STR_E STR_F STR_I STR_N STR_E
|
||||
#define STRING_VERSION STR_V STR_E STR_R STR_S STR_I STR_O STR_N
|
||||
#define STRING_WEIRD_STARTWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_LESS_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
|
||||
#define STRING_WEIRD_ENDWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_GREATER_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
|
||||
|
||||
#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS
|
||||
#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS
|
||||
|
|
|
@ -138,8 +138,8 @@ for (i = 0; i < 256; i++)
|
|||
int x = 0;
|
||||
if (isspace(i)) x += ctype_space;
|
||||
if (isalpha(i)) x += ctype_letter;
|
||||
if (islower(i)) x += ctype_lcletter;
|
||||
if (isdigit(i)) x += ctype_digit;
|
||||
if (isxdigit(i)) x += ctype_xdigit;
|
||||
if (isalnum(i) || i == '_') x += ctype_word;
|
||||
*p++ = x;
|
||||
}
|
||||
|
|
|
@ -6263,4 +6263,69 @@ ef) x/x,mark
|
|||
aBCDEF
|
||||
AbCDe f
|
||||
|
||||
/(*pla:foo).{6}/
|
||||
abcfoobarxyz
|
||||
\= Expect no match
|
||||
abcfooba
|
||||
|
||||
/(*positive_lookahead:foo).{6}/
|
||||
abcfoobarxyz
|
||||
|
||||
/(?(*pla:foo).{6}|a..)/
|
||||
foobarbaz
|
||||
abcfoobar
|
||||
|
||||
/(?(*positive_lookahead:foo).{6}|a..)/
|
||||
foobarbaz
|
||||
abcfoobar
|
||||
|
||||
/(*plb:foo)bar/
|
||||
abcfoobar
|
||||
\= Expect no match
|
||||
abcbarfoo
|
||||
|
||||
/(*positive_lookbehind:foo)bar/
|
||||
abcfoobar
|
||||
\= Expect no match
|
||||
abcbarfoo
|
||||
|
||||
/(?(*plb:foo)bar|baz)/
|
||||
abcfoobar
|
||||
bazfoobar
|
||||
abcbazfoobar
|
||||
foobazfoobar
|
||||
|
||||
/(?(*positive_lookbehind:foo)bar|baz)/
|
||||
abcfoobar
|
||||
bazfoobar
|
||||
abcbazfoobar
|
||||
foobazfoobar
|
||||
|
||||
/(*nlb:foo)bar/
|
||||
abcbarfoo
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
|
||||
/(*negative_lookbehind:foo)bar/
|
||||
abcbarfoo
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
|
||||
/(?(*nlb:foo)bar|baz)/
|
||||
abcfoobaz
|
||||
abcbarbaz
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
|
||||
/(?(*negative_lookbehind:foo)bar|baz)/
|
||||
abcfoobaz
|
||||
abcbarbaz
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
|
||||
/(*atomic:a+)\w/
|
||||
aaab
|
||||
\= Expect no match
|
||||
aaaa
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -5525,4 +5525,10 @@ a)"xI
|
|||
\= Expect no match
|
||||
abc\ndef\nxyz
|
||||
|
||||
/(?(*ACCEPT)xxx)/
|
||||
|
||||
/(?(*atomic:xx)xxx)/
|
||||
|
||||
/(?(*script_run:xxx)zzz)/
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -9929,4 +9929,100 @@ No match
|
|||
AbCDe f
|
||||
No match
|
||||
|
||||
/(*pla:foo).{6}/
|
||||
abcfoobarxyz
|
||||
0: foobar
|
||||
\= Expect no match
|
||||
abcfooba
|
||||
No match
|
||||
|
||||
/(*positive_lookahead:foo).{6}/
|
||||
abcfoobarxyz
|
||||
0: foobar
|
||||
|
||||
/(?(*pla:foo).{6}|a..)/
|
||||
foobarbaz
|
||||
0: foobar
|
||||
abcfoobar
|
||||
0: abc
|
||||
|
||||
/(?(*positive_lookahead:foo).{6}|a..)/
|
||||
foobarbaz
|
||||
0: foobar
|
||||
abcfoobar
|
||||
0: abc
|
||||
|
||||
/(*plb:foo)bar/
|
||||
abcfoobar
|
||||
0: bar
|
||||
\= Expect no match
|
||||
abcbarfoo
|
||||
No match
|
||||
|
||||
/(*positive_lookbehind:foo)bar/
|
||||
abcfoobar
|
||||
0: bar
|
||||
\= Expect no match
|
||||
abcbarfoo
|
||||
No match
|
||||
|
||||
/(?(*plb:foo)bar|baz)/
|
||||
abcfoobar
|
||||
0: bar
|
||||
bazfoobar
|
||||
0: baz
|
||||
abcbazfoobar
|
||||
0: baz
|
||||
foobazfoobar
|
||||
0: bar
|
||||
|
||||
/(?(*positive_lookbehind:foo)bar|baz)/
|
||||
abcfoobar
|
||||
0: bar
|
||||
bazfoobar
|
||||
0: baz
|
||||
abcbazfoobar
|
||||
0: baz
|
||||
foobazfoobar
|
||||
0: bar
|
||||
|
||||
/(*nlb:foo)bar/
|
||||
abcbarfoo
|
||||
0: bar
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
No match
|
||||
|
||||
/(*negative_lookbehind:foo)bar/
|
||||
abcbarfoo
|
||||
0: bar
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
No match
|
||||
|
||||
/(?(*nlb:foo)bar|baz)/
|
||||
abcfoobaz
|
||||
0: baz
|
||||
abcbarbaz
|
||||
0: bar
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
No match
|
||||
|
||||
/(?(*negative_lookbehind:foo)bar|baz)/
|
||||
abcfoobaz
|
||||
0: baz
|
||||
abcbarbaz
|
||||
0: bar
|
||||
\= Expect no match
|
||||
abcfoobar
|
||||
No match
|
||||
|
||||
/(*atomic:a+)\w/
|
||||
aaab
|
||||
0: aaab
|
||||
\= Expect no match
|
||||
aaaa
|
||||
No match
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -575,7 +575,7 @@ Last code unit = 'b'
|
|||
Subject length lower bound = 3
|
||||
|
||||
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
||||
Failed: error 160 at offset 12: (*VERB) not recognized or malformed
|
||||
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
|
||||
|
||||
/\h/I,utf
|
||||
Capturing subpattern count = 0
|
||||
|
|
|
@ -538,7 +538,7 @@ No match
|
|||
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
|
||||
|
||||
/(*UTF16)\x{11234}/
|
||||
Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
||||
Failed: error 160 at offset 7: (*VERB) not recognized or malformed
|
||||
abcd\x{11234}pqr
|
||||
|
||||
/(*UTF)\x{11234}/I
|
||||
|
@ -559,7 +559,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
|||
abcd\x{11234}pqr
|
||||
|
||||
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
|
||||
Failed: error 160 at offset 12: (*VERB) not recognized or malformed
|
||||
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
|
||||
|
||||
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
|
|
|
@ -16812,6 +16812,15 @@ No match
|
|||
abc\ndef\nxyz
|
||||
No match
|
||||
|
||||
/(?(*ACCEPT)xxx)/
|
||||
Failed: error 128 at offset 2: assertion expected after (?( or (?(?C)
|
||||
|
||||
/(?(*atomic:xx)xxx)/
|
||||
Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
|
||||
|
||||
/(?(*script_run:xxx)zzz)/
|
||||
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
|
||||
|
||||
# End of testinput2
|
||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||
Error -62: bad serialized data
|
||||
|
|
Loading…
Reference in New Issue