Add support for (?^) as now supported by Perl.
This commit is contained in:
parent
27337495dc
commit
6e245572b8
|
@ -132,6 +132,8 @@ terminated by (*ACCEPT).
|
|||
|
||||
29. Add support for \N{U+dddd}, but not in EBCDIC environments.
|
||||
|
||||
30. Add support for (?^) for unsetting all imnsx options.
|
||||
|
||||
|
||||
Version 10.31 12-February-2018
|
||||
------------------------------
|
||||
|
|
|
@ -1466,7 +1466,8 @@ character, even if newlines are coded as CRLF. Without this option, a dot does
|
|||
not match when the current position in the subject is at a newline. This option
|
||||
is equivalent to Perl's /s option, and it can be changed within a pattern by a
|
||||
(?s) option setting. A negative class such as [^a] always matches newline
|
||||
characters, independent of the setting of this option.
|
||||
characters, and the \N escape sequence always matches a non-newline character,
|
||||
independent of the setting of PCRE2_DOTALL.
|
||||
<pre>
|
||||
PCRE2_DUPNAMES
|
||||
</pre>
|
||||
|
@ -3634,7 +3635,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 02 July 2018
|
||||
Last updated: 27 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -42,13 +42,14 @@ assertion is a condition that has a matching branch (that is, the condition is
|
|||
false).
|
||||
</P>
|
||||
<P>
|
||||
4. The following Perl escape sequences are not supported: \l, \u, \L,
|
||||
\U, and \N when followed by a character name or Unicode value. (\N on its
|
||||
own, matching a non-newline character, is supported.) In fact these are
|
||||
4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
|
||||
\U, and \N when followed by a character name. \N on its own, matching a
|
||||
non-newline character, and \N{U+dd..}, matching a Unicode code point, are
|
||||
supported. The escapes that modify the case of following letters are
|
||||
implemented by Perl's general string-handling and are not part of its pattern
|
||||
matching engine. If any of these are encountered by PCRE2, an error is
|
||||
generated by default. However, if the PCRE2_ALT_BSUX option is set,
|
||||
\U and \u are interpreted as ECMAScript interprets them.
|
||||
generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
|
||||
are interpreted as ECMAScript interprets them.
|
||||
</P>
|
||||
<P>
|
||||
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
||||
|
@ -61,17 +62,22 @@ internal representation of Unicode characters, there is no need to implement
|
|||
the somewhat messy concept of surrogates."
|
||||
</P>
|
||||
<P>
|
||||
6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters
|
||||
in between are treated as literals. This is slightly different from Perl in
|
||||
that $ and @ are also handled as literals inside the quotes. In Perl, they
|
||||
cause variable interpolation (but of course PCRE2 does not have variables).
|
||||
Note the following examples:
|
||||
6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
|
||||
in between are treated as literals. However, this is slightly different from
|
||||
Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
|
||||
they cause variable interpolation (but of course PCRE2 does not have
|
||||
variables). Also, Perl does "double-quotish backslash interpolation" on any
|
||||
backslashes between \Q and \E which, its documentation says, "may lead to
|
||||
confusing results". PCRE2 treats a backslash between \Q and \E just like any
|
||||
other character. Note the following examples:
|
||||
<pre>
|
||||
Pattern PCRE2 matches Perl matches
|
||||
Pattern PCRE2 matches Perl matches
|
||||
|
||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||
\QA\B\E A\B A\B
|
||||
\Q\\E \ \\E
|
||||
</pre>
|
||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||
</P>
|
||||
|
@ -229,9 +235,9 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 18 April 2017
|
||||
Last updated: 28 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -357,13 +357,18 @@ of the pattern.
|
|||
If you want to remove the special meaning from a sequence of characters, you
|
||||
can do so by putting them between \Q and \E. This is different from Perl in
|
||||
that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
|
||||
in Perl, $ and @ cause variable interpolation. Note the following examples:
|
||||
in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish
|
||||
backslash interpolation" on any backslashes between \Q and \E which, its
|
||||
documentation says, "may lead to confusing results". PCRE2 treats a backslash
|
||||
between \Q and \E just like any other character. Note the following examples:
|
||||
<pre>
|
||||
Pattern PCRE2 matches Perl matches
|
||||
|
||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||
\QA\B\E A\B A\B
|
||||
\Q\\E \ \\E
|
||||
</pre>
|
||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
|
||||
|
@ -545,7 +550,7 @@ character class, these sequences have different meanings.
|
|||
Unsupported escape sequences
|
||||
</b><br>
|
||||
<P>
|
||||
In Perl, the sequences \l, \L, \u, and \U are recognized by its string
|
||||
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
|
||||
handler and used to modify the case of following characters. By default, PCRE2
|
||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
||||
is set, \U matches a "U" character, and \u can be used to define a character
|
||||
|
@ -1635,21 +1640,27 @@ Perl option letters enclosed between "(?" and ")". The option letters are
|
|||
xx for PCRE2_EXTENDED_MORE
|
||||
</pre>
|
||||
For example, (?im) sets caseless, multiline matching. It is also possible to
|
||||
unset these options by preceding the letter with a hyphen. The two "extended"
|
||||
options are not independent; unsetting either one cancels the effects of both
|
||||
of them.
|
||||
unset these options by preceding the relevant letters with a hyphen, for
|
||||
example (?-im). The two "extended" options are not independent; unsetting either
|
||||
one cancels the effects of both of them.
|
||||
</P>
|
||||
<P>
|
||||
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
||||
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||
permitted. If a letter appears both before and after the hyphen, the option is
|
||||
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
||||
effect.
|
||||
permitted. Only one hyphen may appear in the options string. If a letter
|
||||
appears both before and after the hyphen, the option is unset. An empty options
|
||||
setting "(?)" is allowed. Needless to say, it has no effect.
|
||||
</P>
|
||||
<P>
|
||||
If the first character following (? is a circumflex, it causes all of the above
|
||||
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
|
||||
the circumflex to cause some options to be re-instated, but a hyphen may not
|
||||
appear.
|
||||
</P>
|
||||
<P>
|
||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||
the same way as the Perl-compatible options by using the characters J and U
|
||||
respectively.
|
||||
respectively. However, these are not unset by (?^).
|
||||
</P>
|
||||
<P>
|
||||
When one of these option changes occurs at top level (that is, not inside
|
||||
|
@ -3579,7 +3590,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 27 July 2018
|
||||
Last updated: 28 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -456,7 +456,15 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
(?x) extended: ignore white space except in classes
|
||||
(?xx) as (?x) but also ignore space and tab in classes
|
||||
(?-...) unset option(s)
|
||||
(?^) unset imnsx options
|
||||
</pre>
|
||||
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||
example (?i:...).
|
||||
</P>
|
||||
<P>
|
||||
The following are recognized only at the very start of a pattern or after one
|
||||
of the newline or \R options with similar syntax. More than one of them may
|
||||
appear. For the first three, d is a decimal number.
|
||||
|
@ -624,7 +632,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 27 July 2018
|
||||
Last updated: 28 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
4176
doc/pcre2.txt
4176
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1639,19 +1639,24 @@ Perl option letters enclosed between "(?" and ")". The option letters are
|
|||
xx for PCRE2_EXTENDED_MORE
|
||||
.sp
|
||||
For example, (?im) sets caseless, multiline matching. It is also possible to
|
||||
unset these options by preceding the letter with a hyphen. The two "extended"
|
||||
options are not independent; unsetting either one cancels the effects of both
|
||||
of them.
|
||||
unset these options by preceding the relevant letters with a hyphen, for
|
||||
example (?-im). The two "extended" options are not independent; unsetting either
|
||||
one cancels the effects of both of them.
|
||||
.P
|
||||
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
||||
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||
permitted. If a letter appears both before and after the hyphen, the option is
|
||||
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
||||
effect.
|
||||
permitted. Only one hyphen may appear in the options string. If a letter
|
||||
appears both before and after the hyphen, the option is unset. An empty options
|
||||
setting "(?)" is allowed. Needless to say, it has no effect.
|
||||
.P
|
||||
If the first character following (? is a circumflex, it causes all of the above
|
||||
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
|
||||
the circumflex to cause some options to be re-instated, but a hyphen may not
|
||||
appear.
|
||||
.P
|
||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||
the same way as the Perl-compatible options by using the characters J and U
|
||||
respectively.
|
||||
respectively. However, these are not unset by (?^).
|
||||
.P
|
||||
When one of these option changes occurs at top level (that is, not inside
|
||||
subpattern parentheses), the change applies to the remainder of the pattern
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "27 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2SYNTAX 3 "28 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -431,7 +431,14 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
(?x) extended: ignore white space except in classes
|
||||
(?xx) as (?x) but also ignore space and tab in classes
|
||||
(?-...) unset option(s)
|
||||
(?^) unset imnsx options
|
||||
.sp
|
||||
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||
example (?i:...).
|
||||
.P
|
||||
The following are recognized only at the very start of a pattern or after one
|
||||
of the newline or \eR options with similar syntax. More than one of them may
|
||||
appear. For the first three, d is a decimal number.
|
||||
|
@ -612,6 +619,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 27 July 2018
|
||||
Last updated: 28 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -317,6 +317,7 @@ pcre2_pattern_convert(). */
|
|||
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
|
||||
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
||||
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
|
||||
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
|
||||
|
||||
|
||||
/* "Expected" matching error codes: no match and partial match. */
|
||||
|
|
|
@ -731,7 +731,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
|
|||
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
||||
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
||||
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
||||
ERR91, ERR92, ERR93 };
|
||||
ERR91, ERR92, ERR93, ERR94 };
|
||||
|
||||
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
||||
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
||||
|
@ -3576,17 +3576,37 @@ while (ptr < ptrend)
|
|||
|
||||
else
|
||||
{
|
||||
BOOL hyphenok = TRUE;
|
||||
top_nest->reset_group = 0;
|
||||
top_nest->max_group = 0;
|
||||
set = unset = 0;
|
||||
optset = &set;
|
||||
|
||||
/* ^ at the start unsets imnsx and disables the subsequent use of - */
|
||||
|
||||
if (ptr < ptrend && *ptr == CHAR_CIRCUMFLEX_ACCENT)
|
||||
{
|
||||
options &= ~(PCRE2_CASELESS|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE|
|
||||
PCRE2_DOTALL|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE);
|
||||
hyphenok = FALSE;
|
||||
ptr++;
|
||||
}
|
||||
|
||||
while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
|
||||
*ptr != CHAR_COLON)
|
||||
{
|
||||
switch (*ptr++)
|
||||
{
|
||||
case CHAR_MINUS: optset = &unset; break;
|
||||
case CHAR_MINUS:
|
||||
if (!hyphenok)
|
||||
{
|
||||
errorcode = ERR94;
|
||||
ptr--; /* Correct the offset */
|
||||
goto FAILED;
|
||||
}
|
||||
optset = &unset;
|
||||
hyphenok = FALSE;
|
||||
break;
|
||||
|
||||
case CHAR_J: /* Record that it changed in the external options */
|
||||
*optset |= PCRE2_DUPNAMES;
|
||||
|
@ -3644,9 +3664,10 @@ while (ptr < ptrend)
|
|||
}
|
||||
else *parsed_pattern++ = META_NOCAPTURE;
|
||||
|
||||
/* If nothing changed, no need to record. */
|
||||
/* If nothing changed, no need to record. The check of hyphenok catches
|
||||
the (?^) case. */
|
||||
|
||||
if (set != 0 || unset != 0)
|
||||
if (set != 0 || unset != 0 || !hyphenok)
|
||||
{
|
||||
*parsed_pattern++ = META_OPTIONS;
|
||||
*parsed_pattern++ = options;
|
||||
|
|
|
@ -180,6 +180,7 @@ static const unsigned char compile_error_texts[] =
|
|||
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
|
||||
"invalid option bits with PCRE2_LITERAL\0"
|
||||
"\\N{U+dddd} is not supported in EBCDIC mode\0"
|
||||
"invalid hyphen in option setting\0"
|
||||
;
|
||||
|
||||
/* Match-time and UTF error texts are in the same format. */
|
||||
|
|
|
@ -6252,4 +6252,10 @@ ef) x/x,mark
|
|||
|
||||
/(*COMMIT:]w)/
|
||||
|
||||
/(?i)A(?^)B(?^x:C D)(?^i)e f/
|
||||
aBCDE F
|
||||
\= Expect no match
|
||||
aBCDEF
|
||||
AbCDe f
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -5453,4 +5453,10 @@ a)"xI
|
|||
\= Expect no match
|
||||
axy
|
||||
|
||||
/(?^x-i)AB/
|
||||
|
||||
/(?^-i)AB/
|
||||
|
||||
/(?x-i-i)/
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -9912,4 +9912,13 @@ No match, mark = X
|
|||
|
||||
/(*COMMIT:]w)/
|
||||
|
||||
/(?i)A(?^)B(?^x:C D)(?^i)e f/
|
||||
aBCDE F
|
||||
0: aBCDE F
|
||||
\= Expect no match
|
||||
aBCDEF
|
||||
No match
|
||||
AbCDe f
|
||||
No match
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -16622,6 +16622,15 @@ No match, mark = X
|
|||
axy
|
||||
No match, mark = X
|
||||
|
||||
/(?^x-i)AB/
|
||||
Failed: error 194 at offset 4: invalid hyphen in option setting
|
||||
|
||||
/(?^-i)AB/
|
||||
Failed: error 194 at offset 3: invalid hyphen in option setting
|
||||
|
||||
/(?x-i-i)/
|
||||
Failed: error 194 at offset 5: invalid hyphen in option setting
|
||||
|
||||
# End of testinput2
|
||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||
Error -62: bad serialized data
|
||||
|
|
Loading…
Reference in New Issue