Add support for (?^) as now supported by Perl.
This commit is contained in:
parent
27337495dc
commit
6e245572b8
|
@ -131,6 +131,8 @@ present.
|
||||||
terminated by (*ACCEPT).
|
terminated by (*ACCEPT).
|
||||||
|
|
||||||
29. Add support for \N{U+dddd}, but not in EBCDIC environments.
|
29. Add support for \N{U+dddd}, but not in EBCDIC environments.
|
||||||
|
|
||||||
|
30. Add support for (?^) for unsetting all imnsx options.
|
||||||
|
|
||||||
|
|
||||||
Version 10.31 12-February-2018
|
Version 10.31 12-February-2018
|
||||||
|
|
|
@ -1466,7 +1466,8 @@ character, even if newlines are coded as CRLF. Without this option, a dot does
|
||||||
not match when the current position in the subject is at a newline. This option
|
not match when the current position in the subject is at a newline. This option
|
||||||
is equivalent to Perl's /s option, and it can be changed within a pattern by a
|
is equivalent to Perl's /s option, and it can be changed within a pattern by a
|
||||||
(?s) option setting. A negative class such as [^a] always matches newline
|
(?s) option setting. A negative class such as [^a] always matches newline
|
||||||
characters, independent of the setting of this option.
|
characters, and the \N escape sequence always matches a non-newline character,
|
||||||
|
independent of the setting of PCRE2_DOTALL.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_DUPNAMES
|
PCRE2_DUPNAMES
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3634,7 +3635,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 July 2018
|
Last updated: 27 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -42,13 +42,14 @@ assertion is a condition that has a matching branch (that is, the condition is
|
||||||
false).
|
false).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
4. The following Perl escape sequences are not supported: \l, \u, \L,
|
4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
|
||||||
\U, and \N when followed by a character name or Unicode value. (\N on its
|
\U, and \N when followed by a character name. \N on its own, matching a
|
||||||
own, matching a non-newline character, is supported.) In fact these are
|
non-newline character, and \N{U+dd..}, matching a Unicode code point, are
|
||||||
|
supported. The escapes that modify the case of following letters are
|
||||||
implemented by Perl's general string-handling and are not part of its pattern
|
implemented by Perl's general string-handling and are not part of its pattern
|
||||||
matching engine. If any of these are encountered by PCRE2, an error is
|
matching engine. If any of these are encountered by PCRE2, an error is
|
||||||
generated by default. However, if the PCRE2_ALT_BSUX option is set,
|
generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
|
||||||
\U and \u are interpreted as ECMAScript interprets them.
|
are interpreted as ECMAScript interprets them.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
||||||
|
@ -61,17 +62,22 @@ internal representation of Unicode characters, there is no need to implement
|
||||||
the somewhat messy concept of surrogates."
|
the somewhat messy concept of surrogates."
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters
|
6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
|
||||||
in between are treated as literals. This is slightly different from Perl in
|
in between are treated as literals. However, this is slightly different from
|
||||||
that $ and @ are also handled as literals inside the quotes. In Perl, they
|
Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
|
||||||
cause variable interpolation (but of course PCRE2 does not have variables).
|
they cause variable interpolation (but of course PCRE2 does not have
|
||||||
Note the following examples:
|
variables). Also, Perl does "double-quotish backslash interpolation" on any
|
||||||
|
backslashes between \Q and \E which, its documentation says, "may lead to
|
||||||
|
confusing results". PCRE2 treats a backslash between \Q and \E just like any
|
||||||
|
other character. Note the following examples:
|
||||||
<pre>
|
<pre>
|
||||||
Pattern PCRE2 matches Perl matches
|
Pattern PCRE2 matches Perl matches
|
||||||
|
|
||||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||||
|
\QA\B\E A\B A\B
|
||||||
|
\Q\\E \ \\E
|
||||||
</pre>
|
</pre>
|
||||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||||
</P>
|
</P>
|
||||||
|
@ -229,9 +235,9 @@ Cambridge, England.
|
||||||
REVISION
|
REVISION
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 18 April 2017
|
Last updated: 28 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -357,13 +357,18 @@ of the pattern.
|
||||||
If you want to remove the special meaning from a sequence of characters, you
|
If you want to remove the special meaning from a sequence of characters, you
|
||||||
can do so by putting them between \Q and \E. This is different from Perl in
|
can do so by putting them between \Q and \E. This is different from Perl in
|
||||||
that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
|
that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
|
||||||
in Perl, $ and @ cause variable interpolation. Note the following examples:
|
in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish
|
||||||
|
backslash interpolation" on any backslashes between \Q and \E which, its
|
||||||
|
documentation says, "may lead to confusing results". PCRE2 treats a backslash
|
||||||
|
between \Q and \E just like any other character. Note the following examples:
|
||||||
<pre>
|
<pre>
|
||||||
Pattern PCRE2 matches Perl matches
|
Pattern PCRE2 matches Perl matches
|
||||||
|
|
||||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||||
|
\QA\B\E A\B A\B
|
||||||
|
\Q\\E \ \\E
|
||||||
</pre>
|
</pre>
|
||||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||||
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
|
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
|
||||||
|
@ -545,7 +550,7 @@ character class, these sequences have different meanings.
|
||||||
Unsupported escape sequences
|
Unsupported escape sequences
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
In Perl, the sequences \l, \L, \u, and \U are recognized by its string
|
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
|
||||||
handler and used to modify the case of following characters. By default, PCRE2
|
handler and used to modify the case of following characters. By default, PCRE2
|
||||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
||||||
is set, \U matches a "U" character, and \u can be used to define a character
|
is set, \U matches a "U" character, and \u can be used to define a character
|
||||||
|
@ -1635,21 +1640,27 @@ Perl option letters enclosed between "(?" and ")". The option letters are
|
||||||
xx for PCRE2_EXTENDED_MORE
|
xx for PCRE2_EXTENDED_MORE
|
||||||
</pre>
|
</pre>
|
||||||
For example, (?im) sets caseless, multiline matching. It is also possible to
|
For example, (?im) sets caseless, multiline matching. It is also possible to
|
||||||
unset these options by preceding the letter with a hyphen. The two "extended"
|
unset these options by preceding the relevant letters with a hyphen, for
|
||||||
options are not independent; unsetting either one cancels the effects of both
|
example (?-im). The two "extended" options are not independent; unsetting either
|
||||||
of them.
|
one cancels the effects of both of them.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
||||||
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||||
permitted. If a letter appears both before and after the hyphen, the option is
|
permitted. Only one hyphen may appear in the options string. If a letter
|
||||||
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
appears both before and after the hyphen, the option is unset. An empty options
|
||||||
effect.
|
setting "(?)" is allowed. Needless to say, it has no effect.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If the first character following (? is a circumflex, it causes all of the above
|
||||||
|
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
|
||||||
|
the circumflex to cause some options to be re-instated, but a hyphen may not
|
||||||
|
appear.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||||
the same way as the Perl-compatible options by using the characters J and U
|
the same way as the Perl-compatible options by using the characters J and U
|
||||||
respectively.
|
respectively. However, these are not unset by (?^).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When one of these option changes occurs at top level (that is, not inside
|
When one of these option changes occurs at top level (that is, not inside
|
||||||
|
@ -3579,7 +3590,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -456,7 +456,15 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
(?x) extended: ignore white space except in classes
|
(?x) extended: ignore white space except in classes
|
||||||
(?xx) as (?x) but also ignore space and tab in classes
|
(?xx) as (?x) but also ignore space and tab in classes
|
||||||
(?-...) unset option(s)
|
(?-...) unset option(s)
|
||||||
|
(?^) unset imnsx options
|
||||||
</pre>
|
</pre>
|
||||||
|
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||||
|
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||||
|
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||||
|
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||||
|
example (?i:...).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
The following are recognized only at the very start of a pattern or after one
|
The following are recognized only at the very start of a pattern or after one
|
||||||
of the newline or \R options with similar syntax. More than one of them may
|
of the newline or \R options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
|
@ -624,7 +632,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
4176
doc/pcre2.txt
4176
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1639,19 +1639,24 @@ Perl option letters enclosed between "(?" and ")". The option letters are
|
||||||
xx for PCRE2_EXTENDED_MORE
|
xx for PCRE2_EXTENDED_MORE
|
||||||
.sp
|
.sp
|
||||||
For example, (?im) sets caseless, multiline matching. It is also possible to
|
For example, (?im) sets caseless, multiline matching. It is also possible to
|
||||||
unset these options by preceding the letter with a hyphen. The two "extended"
|
unset these options by preceding the relevant letters with a hyphen, for
|
||||||
options are not independent; unsetting either one cancels the effects of both
|
example (?-im). The two "extended" options are not independent; unsetting either
|
||||||
of them.
|
one cancels the effects of both of them.
|
||||||
.P
|
.P
|
||||||
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
||||||
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||||
permitted. If a letter appears both before and after the hyphen, the option is
|
permitted. Only one hyphen may appear in the options string. If a letter
|
||||||
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
appears both before and after the hyphen, the option is unset. An empty options
|
||||||
effect.
|
setting "(?)" is allowed. Needless to say, it has no effect.
|
||||||
|
.P
|
||||||
|
If the first character following (? is a circumflex, it causes all of the above
|
||||||
|
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
|
||||||
|
the circumflex to cause some options to be re-instated, but a hyphen may not
|
||||||
|
appear.
|
||||||
.P
|
.P
|
||||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||||
the same way as the Perl-compatible options by using the characters J and U
|
the same way as the Perl-compatible options by using the characters J and U
|
||||||
respectively.
|
respectively. However, these are not unset by (?^).
|
||||||
.P
|
.P
|
||||||
When one of these option changes occurs at top level (that is, not inside
|
When one of these option changes occurs at top level (that is, not inside
|
||||||
subpattern parentheses), the change applies to the remainder of the pattern
|
subpattern parentheses), the change applies to the remainder of the pattern
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "27 July 2018" "PCRE2 10.32"
|
.TH PCRE2SYNTAX 3 "28 July 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -431,7 +431,14 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
(?x) extended: ignore white space except in classes
|
(?x) extended: ignore white space except in classes
|
||||||
(?xx) as (?x) but also ignore space and tab in classes
|
(?xx) as (?x) but also ignore space and tab in classes
|
||||||
(?-...) unset option(s)
|
(?-...) unset option(s)
|
||||||
|
(?^) unset imnsx options
|
||||||
.sp
|
.sp
|
||||||
|
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||||
|
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||||
|
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||||
|
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||||
|
example (?i:...).
|
||||||
|
.P
|
||||||
The following are recognized only at the very start of a pattern or after one
|
The following are recognized only at the very start of a pattern or after one
|
||||||
of the newline or \eR options with similar syntax. More than one of them may
|
of the newline or \eR options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
|
@ -612,6 +619,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -317,6 +317,7 @@ pcre2_pattern_convert(). */
|
||||||
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
|
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
|
||||||
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
||||||
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
|
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
|
||||||
|
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
|
||||||
|
|
||||||
|
|
||||||
/* "Expected" matching error codes: no match and partial match. */
|
/* "Expected" matching error codes: no match and partial match. */
|
||||||
|
|
|
@ -263,7 +263,7 @@ versions. */
|
||||||
#define META_SKIP 0x802d0000u /* kept */
|
#define META_SKIP 0x802d0000u /* kept */
|
||||||
#define META_SKIP_ARG 0x802e0000u /* in */
|
#define META_SKIP_ARG 0x802e0000u /* in */
|
||||||
#define META_THEN 0x802f0000u /* this */
|
#define META_THEN 0x802f0000u /* this */
|
||||||
#define META_THEN_ARG 0x80300000u /* order */
|
#define META_THEN_ARG 0x80300000u /* order */
|
||||||
|
|
||||||
/* These must be kept in groups of adjacent 3 values, and all together. */
|
/* These must be kept in groups of adjacent 3 values, and all together. */
|
||||||
|
|
||||||
|
@ -330,7 +330,7 @@ static unsigned char meta_extra_lengths[] = {
|
||||||
0, /* META_ACCEPT */
|
0, /* META_ACCEPT */
|
||||||
0, /* META_FAIL */
|
0, /* META_FAIL */
|
||||||
0, /* META_COMMIT */
|
0, /* META_COMMIT */
|
||||||
1, /* META_COMMIT_ARG - plus the string length */
|
1, /* META_COMMIT_ARG - plus the string length */
|
||||||
0, /* META_PRUNE */
|
0, /* META_PRUNE */
|
||||||
1, /* META_PRUNE_ARG - plus the string length */
|
1, /* META_PRUNE_ARG - plus the string length */
|
||||||
0, /* META_SKIP */
|
0, /* META_SKIP */
|
||||||
|
@ -612,7 +612,7 @@ static const int verbcount = sizeof(verbs)/sizeof(verbitem);
|
||||||
/* Verb opcodes, indexed by their META code offset from META_MARK. */
|
/* Verb opcodes, indexed by their META code offset from META_MARK. */
|
||||||
|
|
||||||
static const uint32_t verbops[] = {
|
static const uint32_t verbops[] = {
|
||||||
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
|
||||||
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
|
||||||
|
|
||||||
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
|
||||||
|
@ -731,7 +731,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
|
||||||
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
||||||
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
||||||
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
||||||
ERR91, ERR92, ERR93 };
|
ERR91, ERR92, ERR93, ERR94 };
|
||||||
|
|
||||||
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
||||||
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
||||||
|
@ -1441,41 +1441,41 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
||||||
escape = -i; /* Else return a special escape */
|
escape = -i; /* Else return a special escape */
|
||||||
if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X))
|
if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X))
|
||||||
cb->external_flags |= PCRE2_HASBKPORX; /* Note \P, \p, or \X */
|
cb->external_flags |= PCRE2_HASBKPORX; /* Note \P, \p, or \X */
|
||||||
|
|
||||||
/* Perl supports \N{name} for character names and \N{U+dddd} for numerical
|
/* Perl supports \N{name} for character names and \N{U+dddd} for numerical
|
||||||
Unicode code points, as well as plain \N for "not newline". PCRE does not
|
Unicode code points, as well as plain \N for "not newline". PCRE does not
|
||||||
support \N{name}. However, it does support quantification such as \N{2,3},
|
support \N{name}. However, it does support quantification such as \N{2,3},
|
||||||
so if \N{ is not followed by U+dddd we check for a quantifier. */
|
so if \N{ is not followed by U+dddd we check for a quantifier. */
|
||||||
|
|
||||||
if (escape == ESC_N && ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
|
if (escape == ESC_N && ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
|
||||||
{
|
{
|
||||||
PCRE2_SPTR p = ptr + 1;
|
PCRE2_SPTR p = ptr + 1;
|
||||||
|
|
||||||
/* \N{U+ can be handled by the \x{ code. However, this construction is
|
/* \N{U+ can be handled by the \x{ code. However, this construction is
|
||||||
not valid in EBCDIC environments because it specifies a Unicode
|
not valid in EBCDIC environments because it specifies a Unicode
|
||||||
character, not a codepoint in the local code. For example \N{U+0041}
|
character, not a codepoint in the local code. For example \N{U+0041}
|
||||||
must be "A" in all environments. */
|
must be "A" in all environments. */
|
||||||
|
|
||||||
if (ptrend - p > 1 && *p == CHAR_U && p[1] == CHAR_PLUS)
|
if (ptrend - p > 1 && *p == CHAR_U && p[1] == CHAR_PLUS)
|
||||||
{
|
{
|
||||||
#ifdef EBCDIC
|
#ifdef EBCDIC
|
||||||
*errorcodeptr = ERR93;
|
*errorcodeptr = ERR93;
|
||||||
#else
|
#else
|
||||||
ptr = p + 1;
|
ptr = p + 1;
|
||||||
escape = 0; /* Not a fancy escape after all */
|
escape = 0; /* Not a fancy escape after all */
|
||||||
goto COME_FROM_NU;
|
goto COME_FROM_NU;
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Give an error if what follows is not a quantifier, but don't override
|
/* Give an error if what follows is not a quantifier, but don't override
|
||||||
an error set by the quantifier reader (e.g. number overflow). */
|
an error set by the quantifier reader (e.g. number overflow). */
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
if (!read_repeat_counts(&p, ptrend, NULL, NULL, errorcodeptr) &&
|
if (!read_repeat_counts(&p, ptrend, NULL, NULL, errorcodeptr) &&
|
||||||
*errorcodeptr == 0)
|
*errorcodeptr == 0)
|
||||||
*errorcodeptr = ERR37;
|
*errorcodeptr = ERR37;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1762,9 +1762,9 @@ else
|
||||||
{
|
{
|
||||||
if (ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
|
if (ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
|
||||||
{
|
{
|
||||||
#ifndef EBCDIC
|
#ifndef EBCDIC
|
||||||
COME_FROM_NU:
|
COME_FROM_NU:
|
||||||
#endif
|
#endif
|
||||||
if (++ptr >= ptrend || *ptr == CHAR_RIGHT_CURLY_BRACKET)
|
if (++ptr >= ptrend || *ptr == CHAR_RIGHT_CURLY_BRACKET)
|
||||||
{
|
{
|
||||||
*errorcodeptr = ERR78;
|
*errorcodeptr = ERR78;
|
||||||
|
@ -2495,15 +2495,15 @@ while (ptr < ptrend)
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
}
|
}
|
||||||
*verblengthptr = (uint32_t)verbnamelength;
|
*verblengthptr = (uint32_t)verbnamelength;
|
||||||
|
|
||||||
/* If this name was on a verb such as (*ACCEPT) which does not continue,
|
/* If this name was on a verb such as (*ACCEPT) which does not continue,
|
||||||
a (*MARK) was generated for the name. We now add the original verb as the
|
a (*MARK) was generated for the name. We now add the original verb as the
|
||||||
next item. */
|
next item. */
|
||||||
|
|
||||||
if (add_after_mark != 0)
|
if (add_after_mark != 0)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = add_after_mark;
|
*parsed_pattern++ = add_after_mark;
|
||||||
add_after_mark = 0;
|
add_after_mark = 0;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
@ -3498,22 +3498,22 @@ while (ptr < ptrend)
|
||||||
if (*ptr++ == CHAR_COLON) /* Skip past : or ) */
|
if (*ptr++ == CHAR_COLON) /* Skip past : or ) */
|
||||||
{
|
{
|
||||||
/* Some optional arguments can be treated as a preceding (*MARK) */
|
/* Some optional arguments can be treated as a preceding (*MARK) */
|
||||||
|
|
||||||
if (verbs[i].has_arg < 0)
|
if (verbs[i].has_arg < 0)
|
||||||
{
|
{
|
||||||
add_after_mark = verbs[i].meta;
|
add_after_mark = verbs[i].meta;
|
||||||
*parsed_pattern++ = META_MARK;
|
*parsed_pattern++ = META_MARK;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* The remaining verbs with arguments (except *MARK) need a different
|
/* The remaining verbs with arguments (except *MARK) need a different
|
||||||
opcode. */
|
opcode. */
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = verbs[i].meta +
|
*parsed_pattern++ = verbs[i].meta +
|
||||||
((verbs[i].meta != META_MARK)? 0x00010000u:0);
|
((verbs[i].meta != META_MARK)? 0x00010000u:0);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Set up for reading the name in the main loop. */
|
/* Set up for reading the name in the main loop. */
|
||||||
|
|
||||||
verblengthptr = parsed_pattern++;
|
verblengthptr = parsed_pattern++;
|
||||||
|
@ -3576,17 +3576,37 @@ while (ptr < ptrend)
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
BOOL hyphenok = TRUE;
|
||||||
top_nest->reset_group = 0;
|
top_nest->reset_group = 0;
|
||||||
top_nest->max_group = 0;
|
top_nest->max_group = 0;
|
||||||
set = unset = 0;
|
set = unset = 0;
|
||||||
optset = &set;
|
optset = &set;
|
||||||
|
|
||||||
|
/* ^ at the start unsets imnsx and disables the subsequent use of - */
|
||||||
|
|
||||||
|
if (ptr < ptrend && *ptr == CHAR_CIRCUMFLEX_ACCENT)
|
||||||
|
{
|
||||||
|
options &= ~(PCRE2_CASELESS|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE|
|
||||||
|
PCRE2_DOTALL|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE);
|
||||||
|
hyphenok = FALSE;
|
||||||
|
ptr++;
|
||||||
|
}
|
||||||
|
|
||||||
while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
|
while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
|
||||||
*ptr != CHAR_COLON)
|
*ptr != CHAR_COLON)
|
||||||
{
|
{
|
||||||
switch (*ptr++)
|
switch (*ptr++)
|
||||||
{
|
{
|
||||||
case CHAR_MINUS: optset = &unset; break;
|
case CHAR_MINUS:
|
||||||
|
if (!hyphenok)
|
||||||
|
{
|
||||||
|
errorcode = ERR94;
|
||||||
|
ptr--; /* Correct the offset */
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
optset = &unset;
|
||||||
|
hyphenok = FALSE;
|
||||||
|
break;
|
||||||
|
|
||||||
case CHAR_J: /* Record that it changed in the external options */
|
case CHAR_J: /* Record that it changed in the external options */
|
||||||
*optset |= PCRE2_DUPNAMES;
|
*optset |= PCRE2_DUPNAMES;
|
||||||
|
@ -3644,9 +3664,10 @@ while (ptr < ptrend)
|
||||||
}
|
}
|
||||||
else *parsed_pattern++ = META_NOCAPTURE;
|
else *parsed_pattern++ = META_NOCAPTURE;
|
||||||
|
|
||||||
/* If nothing changed, no need to record. */
|
/* If nothing changed, no need to record. The check of hyphenok catches
|
||||||
|
the (?^) case. */
|
||||||
|
|
||||||
if (set != 0 || unset != 0)
|
if (set != 0 || unset != 0 || !hyphenok)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_OPTIONS;
|
*parsed_pattern++ = META_OPTIONS;
|
||||||
*parsed_pattern++ = options;
|
*parsed_pattern++ = options;
|
||||||
|
@ -3952,7 +3973,7 @@ while (ptr < ptrend)
|
||||||
{
|
{
|
||||||
if (++ptr >= ptrend || !IS_DIGIT(*ptr)) goto BAD_VERSION_CONDITION;
|
if (++ptr >= ptrend || !IS_DIGIT(*ptr)) goto BAD_VERSION_CONDITION;
|
||||||
minor = (*ptr++ - CHAR_0) * 10;
|
minor = (*ptr++ - CHAR_0) * 10;
|
||||||
if (IS_DIGIT(*ptr)) minor += *ptr++ - CHAR_0;
|
if (IS_DIGIT(*ptr)) minor += *ptr++ - CHAR_0;
|
||||||
if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS)
|
if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS)
|
||||||
goto BAD_VERSION_CONDITION;
|
goto BAD_VERSION_CONDITION;
|
||||||
}
|
}
|
||||||
|
@ -5709,7 +5730,7 @@ for (;; pptr++)
|
||||||
cb->had_pruneorskip = TRUE;
|
cb->had_pruneorskip = TRUE;
|
||||||
/* Fall through */
|
/* Fall through */
|
||||||
case META_MARK:
|
case META_MARK:
|
||||||
case META_COMMIT_ARG:
|
case META_COMMIT_ARG:
|
||||||
VERB_ARG:
|
VERB_ARG:
|
||||||
*code++ = verbops[(meta - META_MARK) >> 16];
|
*code++ = verbops[(meta - META_MARK) >> 16];
|
||||||
/* The length is in characters. */
|
/* The length is in characters. */
|
||||||
|
@ -8058,7 +8079,7 @@ for (;;)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case OP_MARK:
|
case OP_MARK:
|
||||||
case OP_COMMIT_ARG:
|
case OP_COMMIT_ARG:
|
||||||
case OP_PRUNE_ARG:
|
case OP_PRUNE_ARG:
|
||||||
case OP_SKIP_ARG:
|
case OP_SKIP_ARG:
|
||||||
case OP_THEN_ARG:
|
case OP_THEN_ARG:
|
||||||
|
@ -8367,7 +8388,7 @@ for (;; pptr++)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case META_MARK: /* Add the length of the name. */
|
case META_MARK: /* Add the length of the name. */
|
||||||
case META_COMMIT_ARG:
|
case META_COMMIT_ARG:
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
case META_SKIP_ARG:
|
case META_SKIP_ARG:
|
||||||
case META_THEN_ARG:
|
case META_THEN_ARG:
|
||||||
|
@ -8558,7 +8579,7 @@ for (;; pptr++)
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
|
|
||||||
case META_MARK:
|
case META_MARK:
|
||||||
case META_COMMIT_ARG:
|
case META_COMMIT_ARG:
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
case META_SKIP_ARG:
|
case META_SKIP_ARG:
|
||||||
case META_THEN_ARG:
|
case META_THEN_ARG:
|
||||||
|
@ -8630,31 +8651,31 @@ for (;; pptr++)
|
||||||
case META_LOOKAHEADNOT:
|
case META_LOOKAHEADNOT:
|
||||||
pptr = parsed_skip(pptr + 1, PSKIP_KET);
|
pptr = parsed_skip(pptr + 1, PSKIP_KET);
|
||||||
if (pptr == NULL) goto PARSED_SKIP_FAILED;
|
if (pptr == NULL) goto PARSED_SKIP_FAILED;
|
||||||
|
|
||||||
/* Also ignore any qualifiers that follow a lookahead assertion. */
|
/* Also ignore any qualifiers that follow a lookahead assertion. */
|
||||||
|
|
||||||
switch (pptr[1])
|
switch (pptr[1])
|
||||||
{
|
{
|
||||||
case META_ASTERISK:
|
case META_ASTERISK:
|
||||||
case META_ASTERISK_PLUS:
|
case META_ASTERISK_PLUS:
|
||||||
case META_ASTERISK_QUERY:
|
case META_ASTERISK_QUERY:
|
||||||
case META_PLUS:
|
case META_PLUS:
|
||||||
case META_PLUS_PLUS:
|
case META_PLUS_PLUS:
|
||||||
case META_PLUS_QUERY:
|
case META_PLUS_QUERY:
|
||||||
case META_QUERY:
|
case META_QUERY:
|
||||||
case META_QUERY_PLUS:
|
case META_QUERY_PLUS:
|
||||||
case META_QUERY_QUERY:
|
case META_QUERY_QUERY:
|
||||||
pptr++;
|
pptr++;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case META_MINMAX:
|
case META_MINMAX:
|
||||||
case META_MINMAX_PLUS:
|
case META_MINMAX_PLUS:
|
||||||
case META_MINMAX_QUERY:
|
case META_MINMAX_QUERY:
|
||||||
pptr += 3;
|
pptr += 3;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
default:
|
default:
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
@ -9026,7 +9047,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case META_MARK:
|
case META_MARK:
|
||||||
case META_COMMIT_ARG:
|
case META_COMMIT_ARG:
|
||||||
case META_PRUNE_ARG:
|
case META_PRUNE_ARG:
|
||||||
case META_SKIP_ARG:
|
case META_SKIP_ARG:
|
||||||
case META_THEN_ARG:
|
case META_THEN_ARG:
|
||||||
|
|
|
@ -179,7 +179,8 @@ static const unsigned char compile_error_texts[] =
|
||||||
"internal error: bad code value in parsed_skip()\0"
|
"internal error: bad code value in parsed_skip()\0"
|
||||||
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
|
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
|
||||||
"invalid option bits with PCRE2_LITERAL\0"
|
"invalid option bits with PCRE2_LITERAL\0"
|
||||||
"\\N{U+dddd} is not supported in EBCDIC mode\0"
|
"\\N{U+dddd} is not supported in EBCDIC mode\0"
|
||||||
|
"invalid hyphen in option setting\0"
|
||||||
;
|
;
|
||||||
|
|
||||||
/* Match-time and UTF error texts are in the same format. */
|
/* Match-time and UTF error texts are in the same format. */
|
||||||
|
|
|
@ -6252,4 +6252,10 @@ ef) x/x,mark
|
||||||
|
|
||||||
/(*COMMIT:]w)/
|
/(*COMMIT:]w)/
|
||||||
|
|
||||||
|
/(?i)A(?^)B(?^x:C D)(?^i)e f/
|
||||||
|
aBCDE F
|
||||||
|
\= Expect no match
|
||||||
|
aBCDEF
|
||||||
|
AbCDe f
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -5453,4 +5453,10 @@ a)"xI
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
axy
|
axy
|
||||||
|
|
||||||
|
/(?^x-i)AB/
|
||||||
|
|
||||||
|
/(?^-i)AB/
|
||||||
|
|
||||||
|
/(?x-i-i)/
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -9912,4 +9912,13 @@ No match, mark = X
|
||||||
|
|
||||||
/(*COMMIT:]w)/
|
/(*COMMIT:]w)/
|
||||||
|
|
||||||
|
/(?i)A(?^)B(?^x:C D)(?^i)e f/
|
||||||
|
aBCDE F
|
||||||
|
0: aBCDE F
|
||||||
|
\= Expect no match
|
||||||
|
aBCDEF
|
||||||
|
No match
|
||||||
|
AbCDe f
|
||||||
|
No match
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -16622,6 +16622,15 @@ No match, mark = X
|
||||||
axy
|
axy
|
||||||
No match, mark = X
|
No match, mark = X
|
||||||
|
|
||||||
|
/(?^x-i)AB/
|
||||||
|
Failed: error 194 at offset 4: invalid hyphen in option setting
|
||||||
|
|
||||||
|
/(?^-i)AB/
|
||||||
|
Failed: error 194 at offset 3: invalid hyphen in option setting
|
||||||
|
|
||||||
|
/(?x-i-i)/
|
||||||
|
Failed: error 194 at offset 5: invalid hyphen in option setting
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
Loading…
Reference in New Issue