Add support for (?^) as now supported by Perl.
This commit is contained in:
parent
27337495dc
commit
6e245572b8
|
@ -132,6 +132,8 @@ terminated by (*ACCEPT).
|
||||||
|
|
||||||
29. Add support for \N{U+dddd}, but not in EBCDIC environments.
|
29. Add support for \N{U+dddd}, but not in EBCDIC environments.
|
||||||
|
|
||||||
|
30. Add support for (?^) for unsetting all imnsx options.
|
||||||
|
|
||||||
|
|
||||||
Version 10.31 12-February-2018
|
Version 10.31 12-February-2018
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
|
@ -1466,7 +1466,8 @@ character, even if newlines are coded as CRLF. Without this option, a dot does
|
||||||
not match when the current position in the subject is at a newline. This option
|
not match when the current position in the subject is at a newline. This option
|
||||||
is equivalent to Perl's /s option, and it can be changed within a pattern by a
|
is equivalent to Perl's /s option, and it can be changed within a pattern by a
|
||||||
(?s) option setting. A negative class such as [^a] always matches newline
|
(?s) option setting. A negative class such as [^a] always matches newline
|
||||||
characters, independent of the setting of this option.
|
characters, and the \N escape sequence always matches a non-newline character,
|
||||||
|
independent of the setting of PCRE2_DOTALL.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_DUPNAMES
|
PCRE2_DUPNAMES
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3634,7 +3635,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 July 2018
|
Last updated: 27 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -42,13 +42,14 @@ assertion is a condition that has a matching branch (that is, the condition is
|
||||||
false).
|
false).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
4. The following Perl escape sequences are not supported: \l, \u, \L,
|
4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
|
||||||
\U, and \N when followed by a character name or Unicode value. (\N on its
|
\U, and \N when followed by a character name. \N on its own, matching a
|
||||||
own, matching a non-newline character, is supported.) In fact these are
|
non-newline character, and \N{U+dd..}, matching a Unicode code point, are
|
||||||
|
supported. The escapes that modify the case of following letters are
|
||||||
implemented by Perl's general string-handling and are not part of its pattern
|
implemented by Perl's general string-handling and are not part of its pattern
|
||||||
matching engine. If any of these are encountered by PCRE2, an error is
|
matching engine. If any of these are encountered by PCRE2, an error is
|
||||||
generated by default. However, if the PCRE2_ALT_BSUX option is set,
|
generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
|
||||||
\U and \u are interpreted as ECMAScript interprets them.
|
are interpreted as ECMAScript interprets them.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
||||||
|
@ -61,17 +62,22 @@ internal representation of Unicode characters, there is no need to implement
|
||||||
the somewhat messy concept of surrogates."
|
the somewhat messy concept of surrogates."
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters
|
6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
|
||||||
in between are treated as literals. This is slightly different from Perl in
|
in between are treated as literals. However, this is slightly different from
|
||||||
that $ and @ are also handled as literals inside the quotes. In Perl, they
|
Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
|
||||||
cause variable interpolation (but of course PCRE2 does not have variables).
|
they cause variable interpolation (but of course PCRE2 does not have
|
||||||
Note the following examples:
|
variables). Also, Perl does "double-quotish backslash interpolation" on any
|
||||||
|
backslashes between \Q and \E which, its documentation says, "may lead to
|
||||||
|
confusing results". PCRE2 treats a backslash between \Q and \E just like any
|
||||||
|
other character. Note the following examples:
|
||||||
<pre>
|
<pre>
|
||||||
Pattern PCRE2 matches Perl matches
|
Pattern PCRE2 matches Perl matches
|
||||||
|
|
||||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||||
|
\QA\B\E A\B A\B
|
||||||
|
\Q\\E \ \\E
|
||||||
</pre>
|
</pre>
|
||||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||||
</P>
|
</P>
|
||||||
|
@ -229,9 +235,9 @@ Cambridge, England.
|
||||||
REVISION
|
REVISION
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 18 April 2017
|
Last updated: 28 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -357,13 +357,18 @@ of the pattern.
|
||||||
If you want to remove the special meaning from a sequence of characters, you
|
If you want to remove the special meaning from a sequence of characters, you
|
||||||
can do so by putting them between \Q and \E. This is different from Perl in
|
can do so by putting them between \Q and \E. This is different from Perl in
|
||||||
that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
|
that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
|
||||||
in Perl, $ and @ cause variable interpolation. Note the following examples:
|
in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish
|
||||||
|
backslash interpolation" on any backslashes between \Q and \E which, its
|
||||||
|
documentation says, "may lead to confusing results". PCRE2 treats a backslash
|
||||||
|
between \Q and \E just like any other character. Note the following examples:
|
||||||
<pre>
|
<pre>
|
||||||
Pattern PCRE2 matches Perl matches
|
Pattern PCRE2 matches Perl matches
|
||||||
|
|
||||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||||
|
\QA\B\E A\B A\B
|
||||||
|
\Q\\E \ \\E
|
||||||
</pre>
|
</pre>
|
||||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||||
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
|
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
|
||||||
|
@ -545,7 +550,7 @@ character class, these sequences have different meanings.
|
||||||
Unsupported escape sequences
|
Unsupported escape sequences
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
In Perl, the sequences \l, \L, \u, and \U are recognized by its string
|
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
|
||||||
handler and used to modify the case of following characters. By default, PCRE2
|
handler and used to modify the case of following characters. By default, PCRE2
|
||||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
||||||
is set, \U matches a "U" character, and \u can be used to define a character
|
is set, \U matches a "U" character, and \u can be used to define a character
|
||||||
|
@ -1635,21 +1640,27 @@ Perl option letters enclosed between "(?" and ")". The option letters are
|
||||||
xx for PCRE2_EXTENDED_MORE
|
xx for PCRE2_EXTENDED_MORE
|
||||||
</pre>
|
</pre>
|
||||||
For example, (?im) sets caseless, multiline matching. It is also possible to
|
For example, (?im) sets caseless, multiline matching. It is also possible to
|
||||||
unset these options by preceding the letter with a hyphen. The two "extended"
|
unset these options by preceding the relevant letters with a hyphen, for
|
||||||
options are not independent; unsetting either one cancels the effects of both
|
example (?-im). The two "extended" options are not independent; unsetting either
|
||||||
of them.
|
one cancels the effects of both of them.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
||||||
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||||
permitted. If a letter appears both before and after the hyphen, the option is
|
permitted. Only one hyphen may appear in the options string. If a letter
|
||||||
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
appears both before and after the hyphen, the option is unset. An empty options
|
||||||
effect.
|
setting "(?)" is allowed. Needless to say, it has no effect.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If the first character following (? is a circumflex, it causes all of the above
|
||||||
|
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
|
||||||
|
the circumflex to cause some options to be re-instated, but a hyphen may not
|
||||||
|
appear.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||||
the same way as the Perl-compatible options by using the characters J and U
|
the same way as the Perl-compatible options by using the characters J and U
|
||||||
respectively.
|
respectively. However, these are not unset by (?^).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When one of these option changes occurs at top level (that is, not inside
|
When one of these option changes occurs at top level (that is, not inside
|
||||||
|
@ -3579,7 +3590,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -456,7 +456,15 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
(?x) extended: ignore white space except in classes
|
(?x) extended: ignore white space except in classes
|
||||||
(?xx) as (?x) but also ignore space and tab in classes
|
(?xx) as (?x) but also ignore space and tab in classes
|
||||||
(?-...) unset option(s)
|
(?-...) unset option(s)
|
||||||
|
(?^) unset imnsx options
|
||||||
</pre>
|
</pre>
|
||||||
|
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||||
|
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||||
|
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||||
|
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||||
|
example (?i:...).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
The following are recognized only at the very start of a pattern or after one
|
The following are recognized only at the very start of a pattern or after one
|
||||||
of the newline or \R options with similar syntax. More than one of them may
|
of the newline or \R options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
|
@ -624,7 +632,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -1454,8 +1454,9 @@ COMPILING A PATTERN
|
||||||
this option, a dot does not match when the current position in the sub-
|
this option, a dot does not match when the current position in the sub-
|
||||||
ject is at a newline. This option is equivalent to Perl's /s option,
|
ject is at a newline. This option is equivalent to Perl's /s option,
|
||||||
and it can be changed within a pattern by a (?s) option setting. A neg-
|
and it can be changed within a pattern by a (?s) option setting. A neg-
|
||||||
ative class such as [^a] always matches newline characters, independent
|
ative class such as [^a] always matches newline characters, and the \N
|
||||||
of the setting of this option.
|
escape sequence always matches a non-newline character, independent of
|
||||||
|
the setting of PCRE2_DOTALL.
|
||||||
|
|
||||||
PCRE2_DUPNAMES
|
PCRE2_DUPNAMES
|
||||||
|
|
||||||
|
@ -3520,7 +3521,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 02 July 2018
|
Last updated: 27 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -4536,13 +4537,14 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
when a negative assertion is a condition that has a matching branch
|
when a negative assertion is a condition that has a matching branch
|
||||||
(that is, the condition is false).
|
(that is, the condition is false).
|
||||||
|
|
||||||
4. The following Perl escape sequences are not supported: \l, \u, \L,
|
4. The following Perl escape sequences are not supported: \F, \l, \L,
|
||||||
\U, and \N when followed by a character name or Unicode value. (\N on
|
\u, \U, and \N when followed by a character name. \N on its own, match-
|
||||||
its own, matching a non-newline character, is supported.) In fact these
|
ing a non-newline character, and \N{U+dd..}, matching a Unicode code
|
||||||
are implemented by Perl's general string-handling and are not part of
|
point, are supported. The escapes that modify the case of following
|
||||||
its pattern matching engine. If any of these are encountered by PCRE2,
|
letters are implemented by Perl's general string-handling and are not
|
||||||
an error is generated by default. However, if the PCRE2_ALT_BSUX option
|
part of its pattern matching engine. If any of these are encountered by
|
||||||
is set, \U and \u are interpreted as ECMAScript interprets them.
|
PCRE2, an error is generated by default. However, if the PCRE2_ALT_BSUX
|
||||||
|
option is set, \U and \u are interpreted as ECMAScript interprets them.
|
||||||
|
|
||||||
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
|
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
|
||||||
is built with Unicode support (the default). The properties that can be
|
is built with Unicode support (the default). The properties that can be
|
||||||
|
@ -4554,11 +4556,15 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
code characters, there is no need to implement the somewhat messy con-
|
code characters, there is no need to implement the somewhat messy con-
|
||||||
cept of surrogates."
|
cept of surrogates."
|
||||||
|
|
||||||
6. PCRE2 does support the \Q...\E escape for quoting substrings. Char-
|
6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
|
||||||
acters in between are treated as literals. This is slightly different
|
in between are treated as literals. However, this is slightly different
|
||||||
from Perl in that $ and @ are also handled as literals inside the
|
from Perl in that $ and @ are also handled as literals inside the
|
||||||
quotes. In Perl, they cause variable interpolation (but of course PCRE2
|
quotes. In Perl, they cause variable interpolation (but of course PCRE2
|
||||||
does not have variables). Note the following examples:
|
does not have variables). Also, Perl does "double-quotish backslash
|
||||||
|
interpolation" on any backslashes between \Q and \E which, its documen-
|
||||||
|
tation says, "may lead to confusing results". PCRE2 treats a backslash
|
||||||
|
between \Q and \E just like any other character. Note the following
|
||||||
|
examples:
|
||||||
|
|
||||||
Pattern PCRE2 matches Perl matches
|
Pattern PCRE2 matches Perl matches
|
||||||
|
|
||||||
|
@ -4566,6 +4572,8 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
contents of $xyz
|
contents of $xyz
|
||||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||||
|
\QA\B\E A\B A\B
|
||||||
|
\Q\\E \ \\E
|
||||||
|
|
||||||
The \Q...\E sequence is recognized both inside and outside character
|
The \Q...\E sequence is recognized both inside and outside character
|
||||||
classes.
|
classes.
|
||||||
|
@ -4699,8 +4707,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 18 April 2017
|
Last updated: 28 July 2018
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@ -6121,7 +6129,10 @@ BACKSLASH
|
||||||
ters, you can do so by putting them between \Q and \E. This is differ-
|
ters, you can do so by putting them between \Q and \E. This is differ-
|
||||||
ent from Perl in that $ and @ are handled as literals in \Q...\E
|
ent from Perl in that $ and @ are handled as literals in \Q...\E
|
||||||
sequences in PCRE2, whereas in Perl, $ and @ cause variable interpola-
|
sequences in PCRE2, whereas in Perl, $ and @ cause variable interpola-
|
||||||
tion. Note the following examples:
|
tion. Also, Perl does "double-quotish backslash interpolation" on any
|
||||||
|
backslashes between \Q and \E which, its documentation says, "may lead
|
||||||
|
to confusing results". PCRE2 treats a backslash between \Q and \E just
|
||||||
|
like any other character. Note the following examples:
|
||||||
|
|
||||||
Pattern PCRE2 matches Perl matches
|
Pattern PCRE2 matches Perl matches
|
||||||
|
|
||||||
|
@ -6129,6 +6140,8 @@ BACKSLASH
|
||||||
contents of $xyz
|
contents of $xyz
|
||||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||||
|
\QA\B\E A\B A\B
|
||||||
|
\Q\\E \ \\E
|
||||||
|
|
||||||
The \Q...\E sequence is recognized both inside and outside character
|
The \Q...\E sequence is recognized both inside and outside character
|
||||||
classes. An isolated \E that is not preceded by \Q is ignored. If \Q
|
classes. An isolated \E that is not preceded by \Q is ignored. If \Q
|
||||||
|
@ -6295,8 +6308,8 @@ BACKSLASH
|
||||||
|
|
||||||
Unsupported escape sequences
|
Unsupported escape sequences
|
||||||
|
|
||||||
In Perl, the sequences \l, \L, \u, and \U are recognized by its string
|
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its
|
||||||
handler and used to modify the case of following characters. By
|
string handler and used to modify the case of following characters. By
|
||||||
default, PCRE2 does not support these escape sequences. However, if the
|
default, PCRE2 does not support these escape sequences. However, if the
|
||||||
PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be
|
PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be
|
||||||
used to define a character by code point, as described above.
|
used to define a character by code point, as described above.
|
||||||
|
@ -7191,19 +7204,25 @@ INTERNAL OPTION SETTING
|
||||||
xx for PCRE2_EXTENDED_MORE
|
xx for PCRE2_EXTENDED_MORE
|
||||||
|
|
||||||
For example, (?im) sets caseless, multiline matching. It is also possi-
|
For example, (?im) sets caseless, multiline matching. It is also possi-
|
||||||
ble to unset these options by preceding the letter with a hyphen. The
|
ble to unset these options by preceding the relevant letters with a
|
||||||
two "extended" options are not independent; unsetting either one can-
|
hyphen, for example (?-im). The two "extended" options are not indepen-
|
||||||
cels the effects of both of them.
|
dent; unsetting either one cancels the effects of both of them.
|
||||||
|
|
||||||
A combined setting and unsetting such as (?im-sx), which sets
|
A combined setting and unsetting such as (?im-sx), which sets
|
||||||
PCRE2_CASELESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and
|
PCRE2_CASELESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and
|
||||||
PCRE2_EXTENDED, is also permitted. If a letter appears both before and
|
PCRE2_EXTENDED, is also permitted. Only one hyphen may appear in the
|
||||||
after the hyphen, the option is unset. An empty options setting "(?)"
|
options string. If a letter appears both before and after the hyphen,
|
||||||
is allowed. Needless to say, it has no effect.
|
the option is unset. An empty options setting "(?)" is allowed. Need-
|
||||||
|
less to say, it has no effect.
|
||||||
|
|
||||||
|
If the first character following (? is a circumflex, it causes all of
|
||||||
|
the above options to be unset. Thus, (?^) is equivalent to (?-imnsx).
|
||||||
|
Letters may follow the circumflex to cause some options to be re-
|
||||||
|
instated, but a hyphen may not appear.
|
||||||
|
|
||||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be
|
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be
|
||||||
changed in the same way as the Perl-compatible options by using the
|
changed in the same way as the Perl-compatible options by using the
|
||||||
characters J and U respectively.
|
characters J and U respectively. However, these are not unset by (?^).
|
||||||
|
|
||||||
When one of these option changes occurs at top level (that is, not
|
When one of these option changes occurs at top level (that is, not
|
||||||
inside subpattern parentheses), the change applies to the remainder of
|
inside subpattern parentheses), the change applies to the remainder of
|
||||||
|
@ -9030,7 +9049,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -10142,6 +10161,13 @@ OPTION SETTING
|
||||||
(?x) extended: ignore white space except in classes
|
(?x) extended: ignore white space except in classes
|
||||||
(?xx) as (?x) but also ignore space and tab in classes
|
(?xx) as (?x) but also ignore space and tab in classes
|
||||||
(?-...) unset option(s)
|
(?-...) unset option(s)
|
||||||
|
(?^) unset imnsx options
|
||||||
|
|
||||||
|
Unsetting x or xx unsets both. Several options may be set at once, and
|
||||||
|
a mixture of setting and unsetting such as (?i-x) is allowed, but there
|
||||||
|
may be only one hyphen. Setting (but no unsetting) is allowed after (?^
|
||||||
|
for example (?^in). An option setting may appear at the start of a non-
|
||||||
|
capturing group, for example (?i:...).
|
||||||
|
|
||||||
The following are recognized only at the very start of a pattern or
|
The following are recognized only at the very start of a pattern or
|
||||||
after one of the newline or \R options with similar syntax. More than
|
after one of the newline or \R options with similar syntax. More than
|
||||||
|
@ -10311,7 +10337,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1639,19 +1639,24 @@ Perl option letters enclosed between "(?" and ")". The option letters are
|
||||||
xx for PCRE2_EXTENDED_MORE
|
xx for PCRE2_EXTENDED_MORE
|
||||||
.sp
|
.sp
|
||||||
For example, (?im) sets caseless, multiline matching. It is also possible to
|
For example, (?im) sets caseless, multiline matching. It is also possible to
|
||||||
unset these options by preceding the letter with a hyphen. The two "extended"
|
unset these options by preceding the relevant letters with a hyphen, for
|
||||||
options are not independent; unsetting either one cancels the effects of both
|
example (?-im). The two "extended" options are not independent; unsetting either
|
||||||
of them.
|
one cancels the effects of both of them.
|
||||||
.P
|
.P
|
||||||
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
|
||||||
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||||
permitted. If a letter appears both before and after the hyphen, the option is
|
permitted. Only one hyphen may appear in the options string. If a letter
|
||||||
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
appears both before and after the hyphen, the option is unset. An empty options
|
||||||
effect.
|
setting "(?)" is allowed. Needless to say, it has no effect.
|
||||||
|
.P
|
||||||
|
If the first character following (? is a circumflex, it causes all of the above
|
||||||
|
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
|
||||||
|
the circumflex to cause some options to be re-instated, but a hyphen may not
|
||||||
|
appear.
|
||||||
.P
|
.P
|
||||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||||
the same way as the Perl-compatible options by using the characters J and U
|
the same way as the Perl-compatible options by using the characters J and U
|
||||||
respectively.
|
respectively. However, these are not unset by (?^).
|
||||||
.P
|
.P
|
||||||
When one of these option changes occurs at top level (that is, not inside
|
When one of these option changes occurs at top level (that is, not inside
|
||||||
subpattern parentheses), the change applies to the remainder of the pattern
|
subpattern parentheses), the change applies to the remainder of the pattern
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "27 July 2018" "PCRE2 10.32"
|
.TH PCRE2SYNTAX 3 "28 July 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -431,7 +431,14 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
(?x) extended: ignore white space except in classes
|
(?x) extended: ignore white space except in classes
|
||||||
(?xx) as (?x) but also ignore space and tab in classes
|
(?xx) as (?x) but also ignore space and tab in classes
|
||||||
(?-...) unset option(s)
|
(?-...) unset option(s)
|
||||||
|
(?^) unset imnsx options
|
||||||
.sp
|
.sp
|
||||||
|
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||||
|
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||||
|
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||||
|
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||||
|
example (?i:...).
|
||||||
|
.P
|
||||||
The following are recognized only at the very start of a pattern or after one
|
The following are recognized only at the very start of a pattern or after one
|
||||||
of the newline or \eR options with similar syntax. More than one of them may
|
of the newline or \eR options with similar syntax. More than one of them may
|
||||||
appear. For the first three, d is a decimal number.
|
appear. For the first three, d is a decimal number.
|
||||||
|
@ -612,6 +619,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 27 July 2018
|
Last updated: 28 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -317,6 +317,7 @@ pcre2_pattern_convert(). */
|
||||||
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
|
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
|
||||||
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
||||||
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
|
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
|
||||||
|
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
|
||||||
|
|
||||||
|
|
||||||
/* "Expected" matching error codes: no match and partial match. */
|
/* "Expected" matching error codes: no match and partial match. */
|
||||||
|
|
|
@ -731,7 +731,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
|
||||||
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
||||||
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
||||||
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
||||||
ERR91, ERR92, ERR93 };
|
ERR91, ERR92, ERR93, ERR94 };
|
||||||
|
|
||||||
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
||||||
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
||||||
|
@ -3576,17 +3576,37 @@ while (ptr < ptrend)
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
BOOL hyphenok = TRUE;
|
||||||
top_nest->reset_group = 0;
|
top_nest->reset_group = 0;
|
||||||
top_nest->max_group = 0;
|
top_nest->max_group = 0;
|
||||||
set = unset = 0;
|
set = unset = 0;
|
||||||
optset = &set;
|
optset = &set;
|
||||||
|
|
||||||
|
/* ^ at the start unsets imnsx and disables the subsequent use of - */
|
||||||
|
|
||||||
|
if (ptr < ptrend && *ptr == CHAR_CIRCUMFLEX_ACCENT)
|
||||||
|
{
|
||||||
|
options &= ~(PCRE2_CASELESS|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE|
|
||||||
|
PCRE2_DOTALL|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE);
|
||||||
|
hyphenok = FALSE;
|
||||||
|
ptr++;
|
||||||
|
}
|
||||||
|
|
||||||
while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
|
while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
|
||||||
*ptr != CHAR_COLON)
|
*ptr != CHAR_COLON)
|
||||||
{
|
{
|
||||||
switch (*ptr++)
|
switch (*ptr++)
|
||||||
{
|
{
|
||||||
case CHAR_MINUS: optset = &unset; break;
|
case CHAR_MINUS:
|
||||||
|
if (!hyphenok)
|
||||||
|
{
|
||||||
|
errorcode = ERR94;
|
||||||
|
ptr--; /* Correct the offset */
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
optset = &unset;
|
||||||
|
hyphenok = FALSE;
|
||||||
|
break;
|
||||||
|
|
||||||
case CHAR_J: /* Record that it changed in the external options */
|
case CHAR_J: /* Record that it changed in the external options */
|
||||||
*optset |= PCRE2_DUPNAMES;
|
*optset |= PCRE2_DUPNAMES;
|
||||||
|
@ -3644,9 +3664,10 @@ while (ptr < ptrend)
|
||||||
}
|
}
|
||||||
else *parsed_pattern++ = META_NOCAPTURE;
|
else *parsed_pattern++ = META_NOCAPTURE;
|
||||||
|
|
||||||
/* If nothing changed, no need to record. */
|
/* If nothing changed, no need to record. The check of hyphenok catches
|
||||||
|
the (?^) case. */
|
||||||
|
|
||||||
if (set != 0 || unset != 0)
|
if (set != 0 || unset != 0 || !hyphenok)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_OPTIONS;
|
*parsed_pattern++ = META_OPTIONS;
|
||||||
*parsed_pattern++ = options;
|
*parsed_pattern++ = options;
|
||||||
|
|
|
@ -180,6 +180,7 @@ static const unsigned char compile_error_texts[] =
|
||||||
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
|
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
|
||||||
"invalid option bits with PCRE2_LITERAL\0"
|
"invalid option bits with PCRE2_LITERAL\0"
|
||||||
"\\N{U+dddd} is not supported in EBCDIC mode\0"
|
"\\N{U+dddd} is not supported in EBCDIC mode\0"
|
||||||
|
"invalid hyphen in option setting\0"
|
||||||
;
|
;
|
||||||
|
|
||||||
/* Match-time and UTF error texts are in the same format. */
|
/* Match-time and UTF error texts are in the same format. */
|
||||||
|
|
|
@ -6252,4 +6252,10 @@ ef) x/x,mark
|
||||||
|
|
||||||
/(*COMMIT:]w)/
|
/(*COMMIT:]w)/
|
||||||
|
|
||||||
|
/(?i)A(?^)B(?^x:C D)(?^i)e f/
|
||||||
|
aBCDE F
|
||||||
|
\= Expect no match
|
||||||
|
aBCDEF
|
||||||
|
AbCDe f
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -5453,4 +5453,10 @@ a)"xI
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
axy
|
axy
|
||||||
|
|
||||||
|
/(?^x-i)AB/
|
||||||
|
|
||||||
|
/(?^-i)AB/
|
||||||
|
|
||||||
|
/(?x-i-i)/
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -9912,4 +9912,13 @@ No match, mark = X
|
||||||
|
|
||||||
/(*COMMIT:]w)/
|
/(*COMMIT:]w)/
|
||||||
|
|
||||||
|
/(?i)A(?^)B(?^x:C D)(?^i)e f/
|
||||||
|
aBCDE F
|
||||||
|
0: aBCDE F
|
||||||
|
\= Expect no match
|
||||||
|
aBCDEF
|
||||||
|
No match
|
||||||
|
AbCDe f
|
||||||
|
No match
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -16622,6 +16622,15 @@ No match, mark = X
|
||||||
axy
|
axy
|
||||||
No match, mark = X
|
No match, mark = X
|
||||||
|
|
||||||
|
/(?^x-i)AB/
|
||||||
|
Failed: error 194 at offset 4: invalid hyphen in option setting
|
||||||
|
|
||||||
|
/(?^-i)AB/
|
||||||
|
Failed: error 194 at offset 3: invalid hyphen in option setting
|
||||||
|
|
||||||
|
/(?x-i-i)/
|
||||||
|
Failed: error 194 at offset 5: invalid hyphen in option setting
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
Loading…
Reference in New Issue