Add support for (?^) as now supported by Perl.

This commit is contained in:
Philip.Hazel 2018-07-28 16:23:24 +00:00
parent 27337495dc
commit 6e245572b8
15 changed files with 2281 additions and 2162 deletions

View File

@ -132,6 +132,8 @@ terminated by (*ACCEPT).
29. Add support for \N{U+dddd}, but not in EBCDIC environments. 29. Add support for \N{U+dddd}, but not in EBCDIC environments.
30. Add support for (?^) for unsetting all imnsx options.
Version 10.31 12-February-2018 Version 10.31 12-February-2018
------------------------------ ------------------------------

View File

@ -1466,7 +1466,8 @@ character, even if newlines are coded as CRLF. Without this option, a dot does
not match when the current position in the subject is at a newline. This option not match when the current position in the subject is at a newline. This option
is equivalent to Perl's /s option, and it can be changed within a pattern by a is equivalent to Perl's /s option, and it can be changed within a pattern by a
(?s) option setting. A negative class such as [^a] always matches newline (?s) option setting. A negative class such as [^a] always matches newline
characters, independent of the setting of this option. characters, and the \N escape sequence always matches a non-newline character,
independent of the setting of PCRE2_DOTALL.
<pre> <pre>
PCRE2_DUPNAMES PCRE2_DUPNAMES
</pre> </pre>
@ -3634,7 +3635,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 02 July 2018 Last updated: 27 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -42,13 +42,14 @@ assertion is a condition that has a matching branch (that is, the condition is
false). false).
</P> </P>
<P> <P>
4. The following Perl escape sequences are not supported: \l, \u, \L, 4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
\U, and \N when followed by a character name or Unicode value. (\N on its \U, and \N when followed by a character name. \N on its own, matching a
own, matching a non-newline character, is supported.) In fact these are non-newline character, and \N{U+dd..}, matching a Unicode code point, are
supported. The escapes that modify the case of following letters are
implemented by Perl's general string-handling and are not part of its pattern implemented by Perl's general string-handling and are not part of its pattern
matching engine. If any of these are encountered by PCRE2, an error is matching engine. If any of these are encountered by PCRE2, an error is
generated by default. However, if the PCRE2_ALT_BSUX option is set, generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
\U and \u are interpreted as ECMAScript interprets them. are interpreted as ECMAScript interprets them.
</P> </P>
<P> <P>
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
@ -61,17 +62,22 @@ internal representation of Unicode characters, there is no need to implement
the somewhat messy concept of surrogates." the somewhat messy concept of surrogates."
</P> </P>
<P> <P>
6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters 6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
in between are treated as literals. This is slightly different from Perl in in between are treated as literals. However, this is slightly different from
that $ and @ are also handled as literals inside the quotes. In Perl, they Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
cause variable interpolation (but of course PCRE2 does not have variables). they cause variable interpolation (but of course PCRE2 does not have
Note the following examples: variables). Also, Perl does "double-quotish backslash interpolation" on any
backslashes between \Q and \E which, its documentation says, "may lead to
confusing results". PCRE2 treats a backslash between \Q and \E just like any
other character. Note the following examples:
<pre> <pre>
Pattern PCRE2 matches Perl matches Pattern PCRE2 matches Perl matches
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz \Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
\QA\B\E A\B A\B
\Q\\E \ \\E
</pre> </pre>
The \Q...\E sequence is recognized both inside and outside character classes. The \Q...\E sequence is recognized both inside and outside character classes.
</P> </P>
@ -229,9 +235,9 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 18 April 2017 Last updated: 28 July 2018
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -357,13 +357,18 @@ of the pattern.
If you want to remove the special meaning from a sequence of characters, you If you want to remove the special meaning from a sequence of characters, you
can do so by putting them between \Q and \E. This is different from Perl in can do so by putting them between \Q and \E. This is different from Perl in
that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
in Perl, $ and @ cause variable interpolation. Note the following examples: in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish
backslash interpolation" on any backslashes between \Q and \E which, its
documentation says, "may lead to confusing results". PCRE2 treats a backslash
between \Q and \E just like any other character. Note the following examples:
<pre> <pre>
Pattern PCRE2 matches Perl matches Pattern PCRE2 matches Perl matches
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz \Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
\QA\B\E A\B A\B
\Q\\E \ \\E
</pre> </pre>
The \Q...\E sequence is recognized both inside and outside character classes. The \Q...\E sequence is recognized both inside and outside character classes.
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
@ -545,7 +550,7 @@ character class, these sequences have different meanings.
Unsupported escape sequences Unsupported escape sequences
</b><br> </b><br>
<P> <P>
In Perl, the sequences \l, \L, \u, and \U are recognized by its string In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
handler and used to modify the case of following characters. By default, PCRE2 handler and used to modify the case of following characters. By default, PCRE2
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
is set, \U matches a "U" character, and \u can be used to define a character is set, \U matches a "U" character, and \u can be used to define a character
@ -1635,21 +1640,27 @@ Perl option letters enclosed between "(?" and ")". The option letters are
xx for PCRE2_EXTENDED_MORE xx for PCRE2_EXTENDED_MORE
</pre> </pre>
For example, (?im) sets caseless, multiline matching. It is also possible to For example, (?im) sets caseless, multiline matching. It is also possible to
unset these options by preceding the letter with a hyphen. The two "extended" unset these options by preceding the relevant letters with a hyphen, for
options are not independent; unsetting either one cancels the effects of both example (?-im). The two "extended" options are not independent; unsetting either
of them. one cancels the effects of both of them.
</P> </P>
<P> <P>
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
permitted. If a letter appears both before and after the hyphen, the option is permitted. Only one hyphen may appear in the options string. If a letter
unset. An empty options setting "(?)" is allowed. Needless to say, it has no appears both before and after the hyphen, the option is unset. An empty options
effect. setting "(?)" is allowed. Needless to say, it has no effect.
</P>
<P>
If the first character following (? is a circumflex, it causes all of the above
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
the circumflex to cause some options to be re-instated, but a hyphen may not
appear.
</P> </P>
<P> <P>
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
the same way as the Perl-compatible options by using the characters J and U the same way as the Perl-compatible options by using the characters J and U
respectively. respectively. However, these are not unset by (?^).
</P> </P>
<P> <P>
When one of these option changes occurs at top level (that is, not inside When one of these option changes occurs at top level (that is, not inside
@ -3579,7 +3590,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 July 2018 Last updated: 28 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -456,7 +456,15 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?x) extended: ignore white space except in classes (?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes (?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s) (?-...) unset option(s)
(?^) unset imnsx options
</pre> </pre>
Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capturing group, for
example (?i:...).
</P>
<P>
The following are recognized only at the very start of a pattern or after one The following are recognized only at the very start of a pattern or after one
of the newline or \R options with similar syntax. More than one of them may of the newline or \R options with similar syntax. More than one of them may
appear. For the first three, d is a decimal number. appear. For the first three, d is a decimal number.
@ -624,7 +632,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br> <br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 July 2018 Last updated: 28 July 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -1454,8 +1454,9 @@ COMPILING A PATTERN
this option, a dot does not match when the current position in the sub- this option, a dot does not match when the current position in the sub-
ject is at a newline. This option is equivalent to Perl's /s option, ject is at a newline. This option is equivalent to Perl's /s option,
and it can be changed within a pattern by a (?s) option setting. A neg- and it can be changed within a pattern by a (?s) option setting. A neg-
ative class such as [^a] always matches newline characters, independent ative class such as [^a] always matches newline characters, and the \N
of the setting of this option. escape sequence always matches a non-newline character, independent of
the setting of PCRE2_DOTALL.
PCRE2_DUPNAMES PCRE2_DUPNAMES
@ -3520,7 +3521,7 @@ AUTHOR
REVISION REVISION
Last updated: 02 July 2018 Last updated: 27 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -4536,13 +4537,14 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
when a negative assertion is a condition that has a matching branch when a negative assertion is a condition that has a matching branch
(that is, the condition is false). (that is, the condition is false).
4. The following Perl escape sequences are not supported: \l, \u, \L, 4. The following Perl escape sequences are not supported: \F, \l, \L,
\U, and \N when followed by a character name or Unicode value. (\N on \u, \U, and \N when followed by a character name. \N on its own, match-
its own, matching a non-newline character, is supported.) In fact these ing a non-newline character, and \N{U+dd..}, matching a Unicode code
are implemented by Perl's general string-handling and are not part of point, are supported. The escapes that modify the case of following
its pattern matching engine. If any of these are encountered by PCRE2, letters are implemented by Perl's general string-handling and are not
an error is generated by default. However, if the PCRE2_ALT_BSUX option part of its pattern matching engine. If any of these are encountered by
is set, \U and \u are interpreted as ECMAScript interprets them. PCRE2, an error is generated by default. However, if the PCRE2_ALT_BSUX
option is set, \U and \u are interpreted as ECMAScript interprets them.
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
is built with Unicode support (the default). The properties that can be is built with Unicode support (the default). The properties that can be
@ -4554,11 +4556,15 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
code characters, there is no need to implement the somewhat messy con- code characters, there is no need to implement the somewhat messy con-
cept of surrogates." cept of surrogates."
6. PCRE2 does support the \Q...\E escape for quoting substrings. Char- 6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
acters in between are treated as literals. This is slightly different in between are treated as literals. However, this is slightly different
from Perl in that $ and @ are also handled as literals inside the from Perl in that $ and @ are also handled as literals inside the
quotes. In Perl, they cause variable interpolation (but of course PCRE2 quotes. In Perl, they cause variable interpolation (but of course PCRE2
does not have variables). Note the following examples: does not have variables). Also, Perl does "double-quotish backslash
interpolation" on any backslashes between \Q and \E which, its documen-
tation says, "may lead to confusing results". PCRE2 treats a backslash
between \Q and \E just like any other character. Note the following
examples:
Pattern PCRE2 matches Perl matches Pattern PCRE2 matches Perl matches
@ -4566,6 +4572,8 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
contents of $xyz contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz \Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
\QA\B\E A\B A\B
\Q\\E \ \\E
The \Q...\E sequence is recognized both inside and outside character The \Q...\E sequence is recognized both inside and outside character
classes. classes.
@ -4699,8 +4707,8 @@ AUTHOR
REVISION REVISION
Last updated: 18 April 2017 Last updated: 28 July 2018
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -6121,7 +6129,10 @@ BACKSLASH
ters, you can do so by putting them between \Q and \E. This is differ- ters, you can do so by putting them between \Q and \E. This is differ-
ent from Perl in that $ and @ are handled as literals in \Q...\E ent from Perl in that $ and @ are handled as literals in \Q...\E
sequences in PCRE2, whereas in Perl, $ and @ cause variable interpola- sequences in PCRE2, whereas in Perl, $ and @ cause variable interpola-
tion. Note the following examples: tion. Also, Perl does "double-quotish backslash interpolation" on any
backslashes between \Q and \E which, its documentation says, "may lead
to confusing results". PCRE2 treats a backslash between \Q and \E just
like any other character. Note the following examples:
Pattern PCRE2 matches Perl matches Pattern PCRE2 matches Perl matches
@ -6129,6 +6140,8 @@ BACKSLASH
contents of $xyz contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz \Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
\QA\B\E A\B A\B
\Q\\E \ \\E
The \Q...\E sequence is recognized both inside and outside character The \Q...\E sequence is recognized both inside and outside character
classes. An isolated \E that is not preceded by \Q is ignored. If \Q classes. An isolated \E that is not preceded by \Q is ignored. If \Q
@ -6295,8 +6308,8 @@ BACKSLASH
Unsupported escape sequences Unsupported escape sequences
In Perl, the sequences \l, \L, \u, and \U are recognized by its string In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its
handler and used to modify the case of following characters. By string handler and used to modify the case of following characters. By
default, PCRE2 does not support these escape sequences. However, if the default, PCRE2 does not support these escape sequences. However, if the
PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be
used to define a character by code point, as described above. used to define a character by code point, as described above.
@ -7191,19 +7204,25 @@ INTERNAL OPTION SETTING
xx for PCRE2_EXTENDED_MORE xx for PCRE2_EXTENDED_MORE
For example, (?im) sets caseless, multiline matching. It is also possi- For example, (?im) sets caseless, multiline matching. It is also possi-
ble to unset these options by preceding the letter with a hyphen. The ble to unset these options by preceding the relevant letters with a
two "extended" options are not independent; unsetting either one can- hyphen, for example (?-im). The two "extended" options are not indepen-
cels the effects of both of them. dent; unsetting either one cancels the effects of both of them.
A combined setting and unsetting such as (?im-sx), which sets A combined setting and unsetting such as (?im-sx), which sets
PCRE2_CASELESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_CASELESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and
PCRE2_EXTENDED, is also permitted. If a letter appears both before and PCRE2_EXTENDED, is also permitted. Only one hyphen may appear in the
after the hyphen, the option is unset. An empty options setting "(?)" options string. If a letter appears both before and after the hyphen,
is allowed. Needless to say, it has no effect. the option is unset. An empty options setting "(?)" is allowed. Need-
less to say, it has no effect.
If the first character following (? is a circumflex, it causes all of
the above options to be unset. Thus, (?^) is equivalent to (?-imnsx).
Letters may follow the circumflex to cause some options to be re-
instated, but a hyphen may not appear.
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be
changed in the same way as the Perl-compatible options by using the changed in the same way as the Perl-compatible options by using the
characters J and U respectively. characters J and U respectively. However, these are not unset by (?^).
When one of these option changes occurs at top level (that is, not When one of these option changes occurs at top level (that is, not
inside subpattern parentheses), the change applies to the remainder of inside subpattern parentheses), the change applies to the remainder of
@ -9030,7 +9049,7 @@ AUTHOR
REVISION REVISION
Last updated: 27 July 2018 Last updated: 28 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -10142,6 +10161,13 @@ OPTION SETTING
(?x) extended: ignore white space except in classes (?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes (?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s) (?-...) unset option(s)
(?^) unset imnsx options
Unsetting x or xx unsets both. Several options may be set at once, and
a mixture of setting and unsetting such as (?i-x) is allowed, but there
may be only one hyphen. Setting (but no unsetting) is allowed after (?^
for example (?^in). An option setting may appear at the start of a non-
capturing group, for example (?i:...).
The following are recognized only at the very start of a pattern or The following are recognized only at the very start of a pattern or
after one of the newline or \R options with similar syntax. More than after one of the newline or \R options with similar syntax. More than
@ -10311,7 +10337,7 @@ AUTHOR
REVISION REVISION
Last updated: 27 July 2018 Last updated: 28 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1639,19 +1639,24 @@ Perl option letters enclosed between "(?" and ")". The option letters are
xx for PCRE2_EXTENDED_MORE xx for PCRE2_EXTENDED_MORE
.sp .sp
For example, (?im) sets caseless, multiline matching. It is also possible to For example, (?im) sets caseless, multiline matching. It is also possible to
unset these options by preceding the letter with a hyphen. The two "extended" unset these options by preceding the relevant letters with a hyphen, for
options are not independent; unsetting either one cancels the effects of both example (?-im). The two "extended" options are not independent; unsetting either
of them. one cancels the effects of both of them.
.P .P
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
permitted. If a letter appears both before and after the hyphen, the option is permitted. Only one hyphen may appear in the options string. If a letter
unset. An empty options setting "(?)" is allowed. Needless to say, it has no appears both before and after the hyphen, the option is unset. An empty options
effect. setting "(?)" is allowed. Needless to say, it has no effect.
.P
If the first character following (? is a circumflex, it causes all of the above
options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
the circumflex to cause some options to be re-instated, but a hyphen may not
appear.
.P .P
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
the same way as the Perl-compatible options by using the characters J and U the same way as the Perl-compatible options by using the characters J and U
respectively. respectively. However, these are not unset by (?^).
.P .P
When one of these option changes occurs at top level (that is, not inside When one of these option changes occurs at top level (that is, not inside
subpattern parentheses), the change applies to the remainder of the pattern subpattern parentheses), the change applies to the remainder of the pattern

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "27 July 2018" "PCRE2 10.32" .TH PCRE2SYNTAX 3 "28 July 2018" "PCRE2 10.32"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -431,7 +431,14 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?x) extended: ignore white space except in classes (?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes (?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s) (?-...) unset option(s)
(?^) unset imnsx options
.sp .sp
Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capturing group, for
example (?i:...).
.P
The following are recognized only at the very start of a pattern or after one The following are recognized only at the very start of a pattern or after one
of the newline or \eR options with similar syntax. More than one of them may of the newline or \eR options with similar syntax. More than one of them may
appear. For the first three, d is a decimal number. appear. For the first three, d is a decimal number.
@ -612,6 +619,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 27 July 2018 Last updated: 28 July 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -317,6 +317,7 @@ pcre2_pattern_convert(). */
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191 #define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192 #define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193 #define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194
/* "Expected" matching error codes: no match and partial match. */ /* "Expected" matching error codes: no match and partial match. */

View File

@ -731,7 +731,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
ERR91, ERR92, ERR93 }; ERR91, ERR92, ERR93, ERR94 };
/* This is a table of start-of-pattern options such as (*UTF) and settings such /* This is a table of start-of-pattern options such as (*UTF) and settings such
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
@ -3576,17 +3576,37 @@ while (ptr < ptrend)
else else
{ {
BOOL hyphenok = TRUE;
top_nest->reset_group = 0; top_nest->reset_group = 0;
top_nest->max_group = 0; top_nest->max_group = 0;
set = unset = 0; set = unset = 0;
optset = &set; optset = &set;
/* ^ at the start unsets imnsx and disables the subsequent use of - */
if (ptr < ptrend && *ptr == CHAR_CIRCUMFLEX_ACCENT)
{
options &= ~(PCRE2_CASELESS|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE|
PCRE2_DOTALL|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE);
hyphenok = FALSE;
ptr++;
}
while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS && while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
*ptr != CHAR_COLON) *ptr != CHAR_COLON)
{ {
switch (*ptr++) switch (*ptr++)
{ {
case CHAR_MINUS: optset = &unset; break; case CHAR_MINUS:
if (!hyphenok)
{
errorcode = ERR94;
ptr--; /* Correct the offset */
goto FAILED;
}
optset = &unset;
hyphenok = FALSE;
break;
case CHAR_J: /* Record that it changed in the external options */ case CHAR_J: /* Record that it changed in the external options */
*optset |= PCRE2_DUPNAMES; *optset |= PCRE2_DUPNAMES;
@ -3644,9 +3664,10 @@ while (ptr < ptrend)
} }
else *parsed_pattern++ = META_NOCAPTURE; else *parsed_pattern++ = META_NOCAPTURE;
/* If nothing changed, no need to record. */ /* If nothing changed, no need to record. The check of hyphenok catches
the (?^) case. */
if (set != 0 || unset != 0) if (set != 0 || unset != 0 || !hyphenok)
{ {
*parsed_pattern++ = META_OPTIONS; *parsed_pattern++ = META_OPTIONS;
*parsed_pattern++ = options; *parsed_pattern++ = options;

View File

@ -180,6 +180,7 @@ static const unsigned char compile_error_texts[] =
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0" "PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
"invalid option bits with PCRE2_LITERAL\0" "invalid option bits with PCRE2_LITERAL\0"
"\\N{U+dddd} is not supported in EBCDIC mode\0" "\\N{U+dddd} is not supported in EBCDIC mode\0"
"invalid hyphen in option setting\0"
; ;
/* Match-time and UTF error texts are in the same format. */ /* Match-time and UTF error texts are in the same format. */

6
testdata/testinput1 vendored
View File

@ -6252,4 +6252,10 @@ ef) x/x,mark
/(*COMMIT:]w)/ /(*COMMIT:]w)/
/(?i)A(?^)B(?^x:C D)(?^i)e f/
aBCDE F
\= Expect no match
aBCDEF
AbCDe f
# End of testinput1 # End of testinput1

6
testdata/testinput2 vendored
View File

@ -5453,4 +5453,10 @@ a)"xI
\= Expect no match \= Expect no match
axy axy
/(?^x-i)AB/
/(?^-i)AB/
/(?x-i-i)/
# End of testinput2 # End of testinput2

View File

@ -9912,4 +9912,13 @@ No match, mark = X
/(*COMMIT:]w)/ /(*COMMIT:]w)/
/(?i)A(?^)B(?^x:C D)(?^i)e f/
aBCDE F
0: aBCDE F
\= Expect no match
aBCDEF
No match
AbCDe f
No match
# End of testinput1 # End of testinput1

View File

@ -16622,6 +16622,15 @@ No match, mark = X
axy axy
No match, mark = X No match, mark = X
/(?^x-i)AB/
Failed: error 194 at offset 4: invalid hyphen in option setting
/(?^-i)AB/
Failed: error 194 at offset 3: invalid hyphen in option setting
/(?x-i-i)/
Failed: error 194 at offset 5: invalid hyphen in option setting
# End of testinput2 # End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number) Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data