Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax.
This commit is contained in:
parent
d90de8b053
commit
8c8deae8eb
|
@ -125,6 +125,9 @@ processing or a crash could result.
|
||||||
names, as Perl does. There was a small bug in this new code, found by
|
names, as Perl does. There was a small bug in this new code, found by
|
||||||
ClusterFuzz 12950, fixed before release.
|
ClusterFuzz 12950, fixed before release.
|
||||||
|
|
||||||
|
31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh}
|
||||||
|
construct.
|
||||||
|
|
||||||
|
|
||||||
Version 10.32 10-September-2018
|
Version 10.32 10-September-2018
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
|
@ -86,7 +86,12 @@ PCRE2 must be built with Unicode support (the default) in order to use
|
||||||
PCRE2_UTF, PCRE2_UCP and related options.
|
PCRE2_UTF, PCRE2_UCP and related options.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The yield of the function is a pointer to a private data structure that
|
Additional options may be set in the compile context via the
|
||||||
|
<a href="pcre2_set_compile_extra_options.html"><b>pcre2_set_compile_extra_options</b></a>
|
||||||
|
function.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The yield of this function is a pointer to a private data structure that
|
||||||
contains the compiled pattern, or NULL if an error was detected.
|
contains the compiled pattern, or NULL if an error was detected.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
|
|
@ -20,7 +20,7 @@ SYNOPSIS
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
|
<b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>extra_options</i>);</b>
|
<b> uint32_t <i>extra_options</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
DESCRIPTION
|
DESCRIPTION
|
||||||
|
@ -31,6 +31,7 @@ housed in a compile context. It completely replaces all the bits. The extra
|
||||||
options are:
|
options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
|
||||||
|
PCRE2_EXTRA_ALT_BSUX Extended alternate \u, \U, and \x handling
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
|
||||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF Interpret \r as \n
|
PCRE2_EXTRA_ESCAPED_CR_IS_LF Interpret \r as \n
|
||||||
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
|
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
|
||||||
|
|
|
@ -1298,7 +1298,7 @@ are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
|
||||||
Copies of both the code and the tables are made, with the new code pointing to
|
Copies of both the code and the tables are made, with the new code pointing to
|
||||||
the new tables. The memory for the new tables is automatically freed when
|
the new tables. The memory for the new tables is automatically freed when
|
||||||
<b>pcre2_code_free()</b> is called for the new copy of the compiled code. If
|
<b>pcre2_code_free()</b> is called for the new copy of the compiled code. If
|
||||||
<b>pcre2_code_copy_withy_tables()</b> is called with a NULL argument, it returns
|
<b>pcre2_code_copy_with_tables()</b> is called with a NULL argument, it returns
|
||||||
NULL.
|
NULL.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1315,7 +1315,7 @@ PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
||||||
settings that affect the compilation. It should be zero if no options are
|
settings that affect the compilation. It should be zero if none of them are
|
||||||
required. The available options are described below. Some of them (in
|
required. The available options are described below. Some of them (in
|
||||||
particular, those that are compatible with Perl, but some others as well) can
|
particular, those that are compatible with Perl, but some others as well) can
|
||||||
also be set and unset from within the pattern (see the detailed description in
|
also be set and unset from within the pattern (see the detailed description in
|
||||||
|
@ -1330,8 +1330,9 @@ compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
|
||||||
options can be set at the time of matching as well as at compile time.
|
options can be set at the time of matching as well as at compile time.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Other, less frequently required compile-time parameters (for example, the
|
Some additional options and less frequently required compile-time parameters
|
||||||
newline setting) can be provided in a compile context (as described
|
(for example, the newline setting) can be provided in a compile context (as
|
||||||
|
described
|
||||||
<a href="#compilecontext">above).</a>
|
<a href="#compilecontext">above).</a>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1384,7 +1385,13 @@ This code fragment shows a typical straightforward call to
|
||||||
&errorcode, /* for error code */
|
&errorcode, /* for error code */
|
||||||
&erroffset, /* for error offset */
|
&erroffset, /* for error offset */
|
||||||
NULL); /* no compile context */
|
NULL); /* no compile context */
|
||||||
</pre>
|
|
||||||
|
</PRE>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Main compile options
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
The following names for option bits are defined in the <b>pcre2.h</b> header
|
The following names for option bits are defined in the <b>pcre2.h</b> header
|
||||||
file:
|
file:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -1424,6 +1431,14 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
|
||||||
to match. By default, as in Perl, a hexadecimal number is always expected after
|
to match. By default, as in Perl, a hexadecimal number is always expected after
|
||||||
\x, but it may have zero, one, or two digits (so, for example, \xz matches a
|
\x, but it may have zero, one, or two digits (so, for example, \xz matches a
|
||||||
binary zero character followed by z).
|
binary zero character followed by z).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
ECMAscript 6 added additional functionality to \u. This can be accessed using
|
||||||
|
the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
|
||||||
|
<a href="#extracompileoptions">below).</a>
|
||||||
|
Note that this alternative escape handling applies only to patterns. Neither of
|
||||||
|
these options affects the processing of replacement strings passed to
|
||||||
|
<b>pcre2_substitute()</b>.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ALT_CIRCUMFLEX
|
PCRE2_ALT_CIRCUMFLEX
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1830,9 +1845,8 @@ characters with code points greater than 127.
|
||||||
Extra compile options
|
Extra compile options
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Unlike the main compile-time options, the extra options are not saved with the
|
The option bits that can be set in a compile context by calling the
|
||||||
compiled pattern. The option bits that can be set in a compile context by
|
<b>pcre2_set_compile_extra_options()</b> function are as follows:
|
||||||
calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
|
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1857,6 +1871,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
||||||
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
||||||
incorporated in the compiled pattern. However, they can only match subject
|
incorporated in the compiled pattern. However, they can only match subject
|
||||||
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
||||||
|
<pre>
|
||||||
|
PCRE2_EXTRA_ALT_BSUX
|
||||||
|
</pre>
|
||||||
|
The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in
|
||||||
|
the way that ECMAscript (aka JavaScript) does. Additional functionality was
|
||||||
|
defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of
|
||||||
|
PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal
|
||||||
|
character code, where hhh.. is any number of hexadecimal digits.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3382,7 +3404,8 @@ capture groups and letters within \Q...\E quoted sequences.
|
||||||
<P>
|
<P>
|
||||||
Note that case forcing sequences such as \U...\E do not nest. For example,
|
Note that case forcing sequences such as \U...\E do not nest. For example,
|
||||||
the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
|
the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
|
||||||
effect.
|
effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do
|
||||||
|
not apply to not apply to replacement strings.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||||
|
@ -3784,7 +3807,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 04 February 2019
|
Last updated: 12 February 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -47,8 +47,9 @@ non-newline character, and \N{U+dd..}, matching a Unicode code point, are
|
||||||
supported. The escapes that modify the case of following letters are
|
supported. The escapes that modify the case of following letters are
|
||||||
implemented by Perl's general string-handling and are not part of its pattern
|
implemented by Perl's general string-handling and are not part of its pattern
|
||||||
matching engine. If any of these are encountered by PCRE2, an error is
|
matching engine. If any of these are encountered by PCRE2, an error is
|
||||||
generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
|
generated by default. However, if either of the PCRE2_ALT_BSUX or
|
||||||
are interpreted as ECMAScript interprets them.
|
PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript
|
||||||
|
interprets them.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
||||||
|
@ -233,7 +234,7 @@ Cambridge, England.
|
||||||
REVISION
|
REVISION
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 03 February 2019
|
Last updated: 12 February 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -399,12 +399,33 @@ environment, these escapes are as follows:
|
||||||
\xhh character with hex code hh
|
\xhh character with hex code hh
|
||||||
\x{hhh..} character with hex code hhh..
|
\x{hhh..} character with hex code hhh..
|
||||||
\N{U+hhh..} character with Unicode hex code point hhh..
|
\N{U+hhh..} character with Unicode hex code point hhh..
|
||||||
\uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
|
||||||
</pre>
|
</pre>
|
||||||
There are some legacy applications where the escape sequence \r is expected to
|
By default, after \x that is not followed by {, from zero to two hexadecimal
|
||||||
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
|
digits are read (letters can be in upper or lower case). Any number of
|
||||||
pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
|
hexadecimal digits may appear between \x{ and }. If a character other than a
|
||||||
(carriage return) character.
|
hexadecimal digit appears between \x{ and }, or if there is no terminating },
|
||||||
|
an error occurs.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Characters whose code points are less than 256 can be defined by either of the
|
||||||
|
two syntaxes for \x or by an octal sequence. There is no difference in the way
|
||||||
|
they are handled. For example, \xdc is exactly the same as \x{dc} or \334.
|
||||||
|
However, using the braced versions does make such sequences easier to read.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Support is available for some ECMAScript (aka JavaScript) escape sequences via
|
||||||
|
two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed
|
||||||
|
by { is not recognized. Only if \x is followed by two hexadecimal digits is it
|
||||||
|
recognized as a character escape. Otherwise it is interpreted as a literal "x"
|
||||||
|
character. In this mode, support for code points greater than 256 is provided
|
||||||
|
by \u, which must be followed by four hexadecimal digits; otherwise it is
|
||||||
|
interpreted as a literal "u" character.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
|
||||||
|
\u{hhh..} is recognized as the character specified by hexadecimal code point.
|
||||||
|
There may be any number of hexadecimal digits. This syntax is from ECMAScript
|
||||||
|
6.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
|
The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
|
||||||
|
@ -414,6 +435,12 @@ Note that when \N is not followed by an opening brace (curly bracket) it has
|
||||||
an entirely different meaning, matching any character that is not a newline.
|
an entirely different meaning, matching any character that is not a newline.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
There are some legacy applications where the escape sequence \r is expected to
|
||||||
|
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
|
||||||
|
pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
|
||||||
|
(carriage return) character.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
The precise effect of \cx on ASCII characters is as follows: if x is a lower
|
The precise effect of \cx on ASCII characters is as follows: if x is a lower
|
||||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||||
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||||
|
@ -500,28 +527,6 @@ Note that octal values of 100 or greater that are specified using this syntax
|
||||||
must not be introduced by a leading zero, because no more than three octal
|
must not be introduced by a leading zero, because no more than three octal
|
||||||
digits are ever read.
|
digits are ever read.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
|
||||||
By default, after \x that is not followed by {, from zero to two hexadecimal
|
|
||||||
digits are read (letters can be in upper or lower case). Any number of
|
|
||||||
hexadecimal digits may appear between \x{ and }. If a character other than
|
|
||||||
a hexadecimal digit appears between \x{ and }, or if there is no terminating
|
|
||||||
}, an error occurs.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
|
|
||||||
described only when it is followed by two hexadecimal digits. Otherwise, it
|
|
||||||
matches a literal "x" character. In this mode, support for code points greater
|
|
||||||
than 256 is provided by \u, which must be followed by four hexadecimal digits;
|
|
||||||
otherwise it matches a literal "u" character. This syntax makes PCRE2 behave
|
|
||||||
like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
|
|
||||||
supported.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
Characters whose value is less than 256 can be defined by either of the two
|
|
||||||
syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no difference in
|
|
||||||
the way they are handled. For example, \xdc is exactly the same as \x{dc} (or
|
|
||||||
\u00dc in PCRE2_ALT_BSUX mode).
|
|
||||||
</P>
|
|
||||||
<br><b>
|
<br><b>
|
||||||
Constraints on character values
|
Constraints on character values
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -560,9 +565,10 @@ Unsupported escape sequences
|
||||||
<P>
|
<P>
|
||||||
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
|
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
|
||||||
handler and used to modify the case of following characters. By default, PCRE2
|
handler and used to modify the case of following characters. By default, PCRE2
|
||||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
does not support these escape sequences in patterns. However, if either of the
|
||||||
is set, \U matches a "U" character, and \u can be used to define a character
|
PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U"
|
||||||
by code point, as described above.
|
character, and \u can be used to define a character by code point, as
|
||||||
|
described above.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Absolute and relative backreferences
|
Absolute and relative backreferences
|
||||||
|
@ -3721,7 +3727,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 04 February 2019
|
Last updated: 12 February 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -58,7 +58,8 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
|
<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
This table applies to ASCII and Unicode environments.
|
This table applies to ASCII and Unicode environments. An unrecognized escape
|
||||||
|
sequence causes an error.
|
||||||
<pre>
|
<pre>
|
||||||
\a alarm, that is, the BEL character (hex 07)
|
\a alarm, that is, the BEL character (hex 07)
|
||||||
\cx "control-x", where x is any ASCII printing character
|
\cx "control-x", where x is any ASCII printing character
|
||||||
|
@ -70,12 +71,25 @@ This table applies to ASCII and Unicode environments.
|
||||||
\0dd character with octal code 0dd
|
\0dd character with octal code 0dd
|
||||||
\ddd character with octal code ddd, or backreference
|
\ddd character with octal code ddd, or backreference
|
||||||
\o{ddd..} character with octal code ddd..
|
\o{ddd..} character with octal code ddd..
|
||||||
\U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
|
||||||
\N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
|
\N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
|
||||||
\uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
|
||||||
\xhh character with hex code hh
|
\xhh character with hex code hh
|
||||||
\x{hh..} character with hex code hh..
|
\x{hh..} character with hex code hh..
|
||||||
</pre>
|
</pre>
|
||||||
|
If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
|
||||||
|
following are also recognized:
|
||||||
|
<pre>
|
||||||
|
\U the character "U"
|
||||||
|
\uhhhh character with hex code hhhh
|
||||||
|
\u{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
|
||||||
|
</pre>
|
||||||
|
When \x is not followed by {, from zero to two hexadecimal digits are read,
|
||||||
|
but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
|
||||||
|
recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||||
|
Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
|
||||||
|
or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
|
||||||
|
matches a literal "u".
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
Note that \0dd is always an octal code. The treatment of backslash followed by
|
Note that \0dd is always an octal code. The treatment of backslash followed by
|
||||||
a non-zero digit is complicated; for details see the section
|
a non-zero digit is complicated; for details see the section
|
||||||
<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
|
<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
|
||||||
|
@ -86,13 +100,6 @@ also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
|
||||||
supported in EBCDIC environments. Note that \N not followed by an opening
|
supported in EBCDIC environments. Note that \N not followed by an opening
|
||||||
curly bracket has a different meaning (see below).
|
curly bracket has a different meaning (see below).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
|
||||||
When \x is not followed by {, from zero to two hexadecimal digits are read,
|
|
||||||
but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
|
|
||||||
be recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
|
||||||
Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
|
|
||||||
it matches a literal "u".
|
|
||||||
</P>
|
|
||||||
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
|
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -660,7 +667,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 03 February 2019
|
Last updated: 11 February 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -609,6 +609,7 @@ for a description of the effects of these options.
|
||||||
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||||
/x extended set PCRE2_EXTENDED
|
/x extended set PCRE2_EXTENDED
|
||||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||||
|
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
literal set PCRE2_LITERAL
|
literal set PCRE2_LITERAL
|
||||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||||
|
@ -2075,7 +2076,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 03 February 2019
|
Last updated: 11 February 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
1865
doc/pcre2.txt
1865
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
|
.TH PCRE2_COMPILE 3 "11 February 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -73,7 +73,13 @@ The option bits are:
|
||||||
PCRE2 must be built with Unicode support (the default) in order to use
|
PCRE2 must be built with Unicode support (the default) in order to use
|
||||||
PCRE2_UTF, PCRE2_UCP and related options.
|
PCRE2_UTF, PCRE2_UCP and related options.
|
||||||
.P
|
.P
|
||||||
The yield of the function is a pointer to a private data structure that
|
Additional options may be set in the compile context via the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2_set_compile_extra_options\fP
|
||||||
|
.\"
|
||||||
|
function.
|
||||||
|
.P
|
||||||
|
The yield of this function is a pointer to a private data structure that
|
||||||
contains the compiled pattern, or NULL if an error was detected.
|
contains the compiled pattern, or NULL if an error was detected.
|
||||||
.P
|
.P
|
||||||
There is a complete description of the PCRE2 native API, with more detail on
|
There is a complete description of the PCRE2 native API, with more detail on
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "21 September 2018" "PCRE2 10.33"
|
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "11 February 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.PP
|
.PP
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP,
|
.B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP,
|
||||||
.B " PCRE2_SIZE \fIextra_options\fP);"
|
.B " uint32_t \fIextra_options\fP);"
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
|
@ -21,6 +21,9 @@ options are:
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \ex{df800} to \ex{dfff}
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \ex{df800} to \ex{dfff}
|
||||||
in UTF-8 and UTF-32 modes
|
in UTF-8 and UTF-32 modes
|
||||||
|
.\" JOIN
|
||||||
|
PCRE2_EXTRA_ALT_BSUX Extended alternate \eu, \eU, and \ex
|
||||||
|
handling
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
|
||||||
a literal following character
|
a literal following character
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
|
.TH PCRE2API 3 "12 February 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1231,7 +1231,7 @@ are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility.
|
||||||
Copies of both the code and the tables are made, with the new code pointing to
|
Copies of both the code and the tables are made, with the new code pointing to
|
||||||
the new tables. The memory for the new tables is automatically freed when
|
the new tables. The memory for the new tables is automatically freed when
|
||||||
\fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
|
\fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
|
||||||
\fBpcre2_code_copy_withy_tables()\fP is called with a NULL argument, it returns
|
\fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
|
||||||
NULL.
|
NULL.
|
||||||
.P
|
.P
|
||||||
NOTE: When one of the matching functions is called, pointers to the compiled
|
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||||
|
@ -1252,7 +1252,7 @@ below.
|
||||||
.\"
|
.\"
|
||||||
.P
|
.P
|
||||||
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
|
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
|
||||||
settings that affect the compilation. It should be zero if no options are
|
settings that affect the compilation. It should be zero if none of them are
|
||||||
required. The available options are described below. Some of them (in
|
required. The available options are described below. Some of them (in
|
||||||
particular, those that are compatible with Perl, but some others as well) can
|
particular, those that are compatible with Perl, but some others as well) can
|
||||||
also be set and unset from within the pattern (see the detailed description in
|
also be set and unset from within the pattern (see the detailed description in
|
||||||
|
@ -1267,8 +1267,9 @@ contents of the \fIoptions\fP argument specifies their settings at the start of
|
||||||
compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
|
compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
|
||||||
options can be set at the time of matching as well as at compile time.
|
options can be set at the time of matching as well as at compile time.
|
||||||
.P
|
.P
|
||||||
Other, less frequently required compile-time parameters (for example, the
|
Some additional options and less frequently required compile-time parameters
|
||||||
newline setting) can be provided in a compile context (as described
|
(for example, the newline setting) can be provided in a compile context (as
|
||||||
|
described
|
||||||
.\" HTML <a href="#compilecontext">
|
.\" HTML <a href="#compilecontext">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
above).
|
above).
|
||||||
|
@ -1325,6 +1326,11 @@ This code fragment shows a typical straightforward call to
|
||||||
&erroffset, /* for error offset */
|
&erroffset, /* for error offset */
|
||||||
NULL); /* no compile context */
|
NULL); /* no compile context */
|
||||||
.sp
|
.sp
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Main compile options"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
The following names for option bits are defined in the \fBpcre2.h\fP header
|
The following names for option bits are defined in the \fBpcre2.h\fP header
|
||||||
file:
|
file:
|
||||||
.sp
|
.sp
|
||||||
|
@ -1361,6 +1367,16 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
|
||||||
to match. By default, as in Perl, a hexadecimal number is always expected after
|
to match. By default, as in Perl, a hexadecimal number is always expected after
|
||||||
\ex, but it may have zero, one, or two digits (so, for example, \exz matches a
|
\ex, but it may have zero, one, or two digits (so, for example, \exz matches a
|
||||||
binary zero character followed by z).
|
binary zero character followed by z).
|
||||||
|
.P
|
||||||
|
ECMAscript 6 added additional functionality to \eu. This can be accessed using
|
||||||
|
the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
|
||||||
|
.\" HTML <a href="#extracompileoptions">
|
||||||
|
.\" </a>
|
||||||
|
below).
|
||||||
|
.\"
|
||||||
|
Note that this alternative escape handling applies only to patterns. Neither of
|
||||||
|
these options affects the processing of replacement strings passed to
|
||||||
|
\fBpcre2_substitute()\fP.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ALT_CIRCUMFLEX
|
PCRE2_ALT_CIRCUMFLEX
|
||||||
.sp
|
.sp
|
||||||
|
@ -1788,9 +1804,8 @@ characters with code points greater than 127.
|
||||||
.SS "Extra compile options"
|
.SS "Extra compile options"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
Unlike the main compile-time options, the extra options are not saved with the
|
The option bits that can be set in a compile context by calling the
|
||||||
compiled pattern. The option bits that can be set in a compile context by
|
\fBpcre2_set_compile_extra_options()\fP function are as follows:
|
||||||
calling the \fBpcre2_set_compile_extra_options()\fP function are as follows:
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
.sp
|
.sp
|
||||||
|
@ -1813,6 +1828,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
||||||
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
||||||
incorporated in the compiled pattern. However, they can only match subject
|
incorporated in the compiled pattern. However, they can only match subject
|
||||||
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
||||||
|
.sp
|
||||||
|
PCRE2_EXTRA_ALT_BSUX
|
||||||
|
.sp
|
||||||
|
The original option PCRE2_ALT_BSUX causes PCRE2 to process \eU, \eu, and \ex in
|
||||||
|
the way that ECMAscript (aka JavaScript) does. Additional functionality was
|
||||||
|
defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of
|
||||||
|
PCRE2_ALT_BSUX, but in addition it recognizes \eu{hhh..} as a hexadecimal
|
||||||
|
character code, where hhh.. is any number of hexadecimal digits.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||||
.sp
|
.sp
|
||||||
|
@ -3383,7 +3406,8 @@ capture groups and letters within \eQ...\eE quoted sequences.
|
||||||
.P
|
.P
|
||||||
Note that case forcing sequences such as \eU...\eE do not nest. For example,
|
Note that case forcing sequences such as \eU...\eE do not nest. For example,
|
||||||
the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
|
the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
|
||||||
effect.
|
effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do
|
||||||
|
not apply to not apply to replacement strings.
|
||||||
.P
|
.P
|
||||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||||
flexibility to capture group substitution. The syntax is similar to that used
|
flexibility to capture group substitution. The syntax is similar to that used
|
||||||
|
@ -3792,6 +3816,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 04 February 2019
|
Last updated: 12 February 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
|
.TH PCRE2COMPAT 3 "12 February 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||||
|
@ -33,8 +33,9 @@ non-newline character, and \eN{U+dd..}, matching a Unicode code point, are
|
||||||
supported. The escapes that modify the case of following letters are
|
supported. The escapes that modify the case of following letters are
|
||||||
implemented by Perl's general string-handling and are not part of its pattern
|
implemented by Perl's general string-handling and are not part of its pattern
|
||||||
matching engine. If any of these are encountered by PCRE2, an error is
|
matching engine. If any of these are encountered by PCRE2, an error is
|
||||||
generated by default. However, if the PCRE2_ALT_BSUX option is set, \eU and \eu
|
generated by default. However, if either of the PCRE2_ALT_BSUX or
|
||||||
are interpreted as ECMAScript interprets them.
|
PCRE2_EXTRA_ALT_BSUX options is set, \eU and \eu are interpreted as ECMAScript
|
||||||
|
interprets them.
|
||||||
.P
|
.P
|
||||||
5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
|
5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
|
||||||
built with Unicode support (the default). The properties that can be tested
|
built with Unicode support (the default). The properties that can be tested
|
||||||
|
@ -198,6 +199,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 03 February 2019
|
Last updated: 12 February 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "04 February 2019" "PCRE2 10.33"
|
.TH PCRE2PATTERN 3 "12 February 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -373,12 +373,30 @@ environment, these escapes are as follows:
|
||||||
\exhh character with hex code hh
|
\exhh character with hex code hh
|
||||||
\ex{hhh..} character with hex code hhh..
|
\ex{hhh..} character with hex code hhh..
|
||||||
\eN{U+hhh..} character with Unicode hex code point hhh..
|
\eN{U+hhh..} character with Unicode hex code point hhh..
|
||||||
\euhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
|
||||||
.sp
|
.sp
|
||||||
There are some legacy applications where the escape sequence \er is expected to
|
By default, after \ex that is not followed by {, from zero to two hexadecimal
|
||||||
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
|
digits are read (letters can be in upper or lower case). Any number of
|
||||||
pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
|
hexadecimal digits may appear between \ex{ and }. If a character other than a
|
||||||
(carriage return) character.
|
hexadecimal digit appears between \ex{ and }, or if there is no terminating },
|
||||||
|
an error occurs.
|
||||||
|
.P
|
||||||
|
Characters whose code points are less than 256 can be defined by either of the
|
||||||
|
two syntaxes for \ex or by an octal sequence. There is no difference in the way
|
||||||
|
they are handled. For example, \exdc is exactly the same as \ex{dc} or \e334.
|
||||||
|
However, using the braced versions does make such sequences easier to read.
|
||||||
|
.P
|
||||||
|
Support is available for some ECMAScript (aka JavaScript) escape sequences via
|
||||||
|
two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \ex followed
|
||||||
|
by { is not recognized. Only if \ex is followed by two hexadecimal digits is it
|
||||||
|
recognized as a character escape. Otherwise it is interpreted as a literal "x"
|
||||||
|
character. In this mode, support for code points greater than 256 is provided
|
||||||
|
by \eu, which must be followed by four hexadecimal digits; otherwise it is
|
||||||
|
interpreted as a literal "u" character.
|
||||||
|
.P
|
||||||
|
PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
|
||||||
|
\eu{hhh..} is recognized as the character specified by hexadecimal code point.
|
||||||
|
There may be any number of hexadecimal digits. This syntax is from ECMAScript
|
||||||
|
6.
|
||||||
.P
|
.P
|
||||||
The \eN{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
|
The \eN{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
|
||||||
is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
|
is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
|
||||||
|
@ -386,6 +404,11 @@ is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
|
||||||
Note that when \eN is not followed by an opening brace (curly bracket) it has
|
Note that when \eN is not followed by an opening brace (curly bracket) it has
|
||||||
an entirely different meaning, matching any character that is not a newline.
|
an entirely different meaning, matching any character that is not a newline.
|
||||||
.P
|
.P
|
||||||
|
There are some legacy applications where the escape sequence \er is expected to
|
||||||
|
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
|
||||||
|
pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
|
||||||
|
(carriage return) character.
|
||||||
|
.P
|
||||||
The precise effect of \ecx on ASCII characters is as follows: if x is a lower
|
The precise effect of \ecx on ASCII characters is as follows: if x is a lower
|
||||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||||
40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||||
|
@ -477,25 +500,6 @@ for themselves. For example, outside a character class:
|
||||||
Note that octal values of 100 or greater that are specified using this syntax
|
Note that octal values of 100 or greater that are specified using this syntax
|
||||||
must not be introduced by a leading zero, because no more than three octal
|
must not be introduced by a leading zero, because no more than three octal
|
||||||
digits are ever read.
|
digits are ever read.
|
||||||
.P
|
|
||||||
By default, after \ex that is not followed by {, from zero to two hexadecimal
|
|
||||||
digits are read (letters can be in upper or lower case). Any number of
|
|
||||||
hexadecimal digits may appear between \ex{ and }. If a character other than
|
|
||||||
a hexadecimal digit appears between \ex{ and }, or if there is no terminating
|
|
||||||
}, an error occurs.
|
|
||||||
.P
|
|
||||||
If the PCRE2_ALT_BSUX option is set, the interpretation of \ex is as just
|
|
||||||
described only when it is followed by two hexadecimal digits. Otherwise, it
|
|
||||||
matches a literal "x" character. In this mode, support for code points greater
|
|
||||||
than 256 is provided by \eu, which must be followed by four hexadecimal digits;
|
|
||||||
otherwise it matches a literal "u" character. This syntax makes PCRE2 behave
|
|
||||||
like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
|
|
||||||
supported.
|
|
||||||
.P
|
|
||||||
Characters whose value is less than 256 can be defined by either of the two
|
|
||||||
syntaxes for \ex (or by \eu in PCRE2_ALT_BSUX mode). There is no difference in
|
|
||||||
the way they are handled. For example, \exdc is exactly the same as \ex{dc} (or
|
|
||||||
\eu00dc in PCRE2_ALT_BSUX mode).
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Constraints on character values"
|
.SS "Constraints on character values"
|
||||||
|
@ -534,9 +538,10 @@ character class, these sequences have different meanings.
|
||||||
.sp
|
.sp
|
||||||
In Perl, the sequences \eF, \el, \eL, \eu, and \eU are recognized by its string
|
In Perl, the sequences \eF, \el, \eL, \eu, and \eU are recognized by its string
|
||||||
handler and used to modify the case of following characters. By default, PCRE2
|
handler and used to modify the case of following characters. By default, PCRE2
|
||||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
does not support these escape sequences in patterns. However, if either of the
|
||||||
is set, \eU matches a "U" character, and \eu can be used to define a character
|
PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \eU matches a "U"
|
||||||
by code point, as described above.
|
character, and \eu can be used to define a character by code point, as
|
||||||
|
described above.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Absolute and relative backreferences"
|
.SS "Absolute and relative backreferences"
|
||||||
|
@ -3758,6 +3763,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 04 February 2019
|
Last updated: 12 February 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
|
.TH PCRE2SYNTAX 3 "11 February 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -22,7 +22,8 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||||
.SH "ESCAPED CHARACTERS"
|
.SH "ESCAPED CHARACTERS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
This table applies to ASCII and Unicode environments.
|
This table applies to ASCII and Unicode environments. An unrecognized escape
|
||||||
|
sequence causes an error.
|
||||||
.sp
|
.sp
|
||||||
\ea alarm, that is, the BEL character (hex 07)
|
\ea alarm, that is, the BEL character (hex 07)
|
||||||
\ecx "control-x", where x is any ASCII printing character
|
\ecx "control-x", where x is any ASCII printing character
|
||||||
|
@ -34,12 +35,24 @@ This table applies to ASCII and Unicode environments.
|
||||||
\e0dd character with octal code 0dd
|
\e0dd character with octal code 0dd
|
||||||
\eddd character with octal code ddd, or backreference
|
\eddd character with octal code ddd, or backreference
|
||||||
\eo{ddd..} character with octal code ddd..
|
\eo{ddd..} character with octal code ddd..
|
||||||
\eU "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
|
||||||
\eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
|
\eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
|
||||||
\euhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
|
||||||
\exhh character with hex code hh
|
\exhh character with hex code hh
|
||||||
\ex{hh..} character with hex code hh..
|
\ex{hh..} character with hex code hh..
|
||||||
.sp
|
.sp
|
||||||
|
If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
|
||||||
|
following are also recognized:
|
||||||
|
.sp
|
||||||
|
\eU the character "U"
|
||||||
|
\euhhhh character with hex code hhhh
|
||||||
|
\eu{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
|
||||||
|
.sp
|
||||||
|
When \ex is not followed by {, from zero to two hexadecimal digits are read,
|
||||||
|
but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be
|
||||||
|
recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||||
|
Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits
|
||||||
|
or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
|
||||||
|
matches a literal "u".
|
||||||
|
.P
|
||||||
Note that \e0dd is always an octal code. The treatment of backslash followed by
|
Note that \e0dd is always an octal code. The treatment of backslash followed by
|
||||||
a non-zero digit is complicated; for details see the section
|
a non-zero digit is complicated; for details see the section
|
||||||
.\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
|
.\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
|
||||||
|
@ -54,12 +67,6 @@ documentation, where details of escape processing in EBCDIC environments are
|
||||||
also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not
|
also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not
|
||||||
supported in EBCDIC environments. Note that \eN not followed by an opening
|
supported in EBCDIC environments. Note that \eN not followed by an opening
|
||||||
curly bracket has a different meaning (see below).
|
curly bracket has a different meaning (see below).
|
||||||
.P
|
|
||||||
When \ex is not followed by {, from zero to two hexadecimal digits are read,
|
|
||||||
but if PCRE2_ALT_BSUX is set, \ex must be followed by two hexadecimal digits to
|
|
||||||
be recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
|
||||||
Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits,
|
|
||||||
it matches a literal "u".
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "CHARACTER TYPES"
|
.SH "CHARACTER TYPES"
|
||||||
|
@ -647,6 +654,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 03 February 2019
|
Last updated: 11 February 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
|
.TH PCRE2TEST 1 "11 February 2019" "PCRE 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -568,6 +568,7 @@ for a description of the effects of these options.
|
||||||
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||||
/x extended set PCRE2_EXTENDED
|
/x extended set PCRE2_EXTENDED
|
||||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||||
|
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
literal set PCRE2_LITERAL
|
literal set PCRE2_LITERAL
|
||||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||||
|
@ -2056,6 +2057,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 03 February 2019
|
Last updated: 11 February 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -547,6 +547,7 @@ PATTERN MODIFIERS
|
||||||
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||||
/x extended set PCRE2_EXTENDED
|
/x extended set PCRE2_EXTENDED
|
||||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||||
|
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
literal set PCRE2_LITERAL
|
literal set PCRE2_LITERAL
|
||||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||||
|
@ -1887,5 +1888,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 03 February 2019
|
Last updated: 11 February 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
|
|
|
@ -150,6 +150,7 @@ D is inspected during pcre2_dfa_match() execution
|
||||||
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
|
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
|
||||||
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
|
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
|
||||||
#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */
|
#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */
|
||||||
|
#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */
|
||||||
|
|
||||||
/* These are for pcre2_jit_compile(). */
|
/* These are for pcre2_jit_compile(). */
|
||||||
|
|
||||||
|
|
|
@ -764,7 +764,7 @@ are allowed. */
|
||||||
#define PUBLIC_COMPILE_EXTRA_OPTIONS \
|
#define PUBLIC_COMPILE_EXTRA_OPTIONS \
|
||||||
(PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \
|
(PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \
|
||||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF)
|
PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX)
|
||||||
|
|
||||||
/* Compile time error code numbers. They are given names so that they can more
|
/* Compile time error code numbers. They are given names so that they can more
|
||||||
easily be tracked. When a new number is added, the tables called eint1 and
|
easily be tracked. When a new number is added, the tables called eint1 and
|
||||||
|
@ -1459,7 +1459,8 @@ Returns: zero => a data character
|
||||||
|
|
||||||
int
|
int
|
||||||
PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr,
|
PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr,
|
||||||
int *errorcodeptr, uint32_t options, BOOL isclass, compile_block *cb)
|
int *errorcodeptr, uint32_t options, uint32_t extra_options, BOOL isclass,
|
||||||
|
compile_block *cb)
|
||||||
{
|
{
|
||||||
BOOL utf = (options & PCRE2_UTF) != 0;
|
BOOL utf = (options & PCRE2_UTF) != 0;
|
||||||
PCRE2_SPTR ptr = *ptrptr;
|
PCRE2_SPTR ptr = *ptrptr;
|
||||||
|
@ -1495,8 +1496,7 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
||||||
if (i > 0)
|
if (i > 0)
|
||||||
{
|
{
|
||||||
c = (uint32_t)i;
|
c = (uint32_t)i;
|
||||||
if (cb != NULL && c == CHAR_CR &&
|
if (c == CHAR_CR && (extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
|
||||||
(cb->cx->extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
|
|
||||||
c = CHAR_LF;
|
c = CHAR_LF;
|
||||||
}
|
}
|
||||||
else /* Negative table entry */
|
else /* Negative table entry */
|
||||||
|
@ -1551,22 +1551,28 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
||||||
|
|
||||||
/* Escapes that need further processing, including those that are unknown, have
|
/* Escapes that need further processing, including those that are unknown, have
|
||||||
a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
|
a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
|
||||||
\o, and \x are recognized (and \u when BSUX is set). */
|
\o, and \x are recognized (\u and \U can never appear as they are used for case
|
||||||
|
forcing). */
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
int s;
|
||||||
PCRE2_SPTR oldptr;
|
PCRE2_SPTR oldptr;
|
||||||
BOOL overflow;
|
BOOL overflow;
|
||||||
int s;
|
BOOL alt_bsux =
|
||||||
|
((options & PCRE2_ALT_BSUX) | (extra_options & PCRE2_EXTRA_ALT_BSUX)) != 0;
|
||||||
|
|
||||||
/* Filter calls from pcre2_substitute(). */
|
/* Filter calls from pcre2_substitute(). */
|
||||||
|
|
||||||
if (cb == NULL && c != CHAR_c && c != CHAR_o && c != CHAR_x &&
|
if (cb == NULL)
|
||||||
(c != CHAR_u || (options & PCRE2_ALT_BSUX) != 0))
|
|
||||||
{
|
{
|
||||||
*errorcodeptr = ERR3;
|
if (c != CHAR_c && c != CHAR_o && c != CHAR_x)
|
||||||
return 0;
|
{
|
||||||
}
|
*errorcodeptr = ERR3;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
alt_bsux = FALSE; /* Do not modify \x handling */
|
||||||
|
}
|
||||||
|
|
||||||
switch (c)
|
switch (c)
|
||||||
{
|
{
|
||||||
|
@ -1579,40 +1585,74 @@ else
|
||||||
*errorcodeptr = ERR37;
|
*errorcodeptr = ERR37;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* \u is unrecognized when PCRE2_ALT_BSUX is not set. When it is treated
|
/* \u is unrecognized when neither PCRE2_ALT_BSUX nor PCRE2_EXTRA_ALT_BSUX
|
||||||
specially, \u must be followed by four hex digits. Otherwise it is a
|
is set. Otherwise, \u must be followed by exactly four hex digits or, if
|
||||||
lowercase u letter. */
|
PCRE2_EXTRA_ALT_BSUX is set, by any number of hex digits in braces.
|
||||||
|
Otherwise it is a lowercase u letter. This gives some compatibility with
|
||||||
|
ECMAScript (aka JavaScript). */
|
||||||
|
|
||||||
case CHAR_u:
|
case CHAR_u:
|
||||||
if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37; else
|
if (!alt_bsux) *errorcodeptr = ERR37; else
|
||||||
{
|
{
|
||||||
uint32_t xc;
|
uint32_t xc;
|
||||||
if (ptrend - ptr < 4) break; /* Less than 4 chars */
|
|
||||||
if ((cc = XDIGIT(ptr[0])) == 0xff) break; /* Not a hex digit */
|
if (*ptr == CHAR_LEFT_CURLY_BRACKET &&
|
||||||
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
(extra_options & PCRE2_EXTRA_ALT_BSUX) != 0)
|
||||||
cc = (cc << 4) | xc;
|
{
|
||||||
if ((xc = XDIGIT(ptr[2])) == 0xff) break; /* Not a hex digit */
|
PCRE2_SPTR hptr = ptr + 1;
|
||||||
cc = (cc << 4) | xc;
|
cc = 0;
|
||||||
if ((xc = XDIGIT(ptr[3])) == 0xff) break; /* Not a hex digit */
|
|
||||||
c = (cc << 4) | xc;
|
while (hptr < ptrend && (xc = XDIGIT(*hptr)) != 0xff)
|
||||||
ptr += 4;
|
{
|
||||||
|
if ((cc & 0xf0000000) != 0) /* Test for 32-bit overflow */
|
||||||
|
{
|
||||||
|
*errorcodeptr = ERR77;
|
||||||
|
ptr = hptr; /* Show where */
|
||||||
|
break; /* *hptr != } will cause another break below */
|
||||||
|
}
|
||||||
|
cc = (cc << 4) | xc;
|
||||||
|
hptr++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (hptr == ptr + 1 || /* No hex digits */
|
||||||
|
hptr >= ptrend || /* Hit end of input */
|
||||||
|
*hptr != CHAR_RIGHT_CURLY_BRACKET) /* No } terminator */
|
||||||
|
break; /* Hex escape not recognized */
|
||||||
|
|
||||||
|
c = cc; /* Accept the code point */
|
||||||
|
ptr = hptr + 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
else /* Must be exactly 4 hex digits */
|
||||||
|
{
|
||||||
|
if (ptrend - ptr < 4) break; /* Less than 4 chars */
|
||||||
|
if ((cc = XDIGIT(ptr[0])) == 0xff) break; /* Not a hex digit */
|
||||||
|
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
||||||
|
cc = (cc << 4) | xc;
|
||||||
|
if ((xc = XDIGIT(ptr[2])) == 0xff) break; /* Not a hex digit */
|
||||||
|
cc = (cc << 4) | xc;
|
||||||
|
if ((xc = XDIGIT(ptr[3])) == 0xff) break; /* Not a hex digit */
|
||||||
|
c = (cc << 4) | xc;
|
||||||
|
ptr += 4;
|
||||||
|
}
|
||||||
|
|
||||||
if (utf)
|
if (utf)
|
||||||
{
|
{
|
||||||
if (c > 0x10ffffU) *errorcodeptr = ERR77;
|
if (c > 0x10ffffU) *errorcodeptr = ERR77;
|
||||||
else
|
else
|
||||||
if (c >= 0xd800 && c <= 0xdfff &&
|
if (c >= 0xd800 && c <= 0xdfff &&
|
||||||
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
(extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||||
*errorcodeptr = ERR73;
|
*errorcodeptr = ERR73;
|
||||||
}
|
}
|
||||||
else if (c > MAX_NON_UTF_CHAR) *errorcodeptr = ERR77;
|
else if (c > MAX_NON_UTF_CHAR) *errorcodeptr = ERR77;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* \U is unrecognized unless PCRE2_ALT_BSUX is set, in which case it is an
|
/* \U is unrecognized unless PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set,
|
||||||
upper case letter. */
|
in which case it is an upper case letter. */
|
||||||
|
|
||||||
case CHAR_U:
|
case CHAR_U:
|
||||||
if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37;
|
if (!alt_bsux) *errorcodeptr = ERR37;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* In a character class, \g is just a literal "g". Outside a character
|
/* In a character class, \g is just a literal "g". Outside a character
|
||||||
|
@ -1791,8 +1831,8 @@ else
|
||||||
}
|
}
|
||||||
else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
|
else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
|
||||||
{
|
{
|
||||||
if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
|
if (utf && c >= 0xd800 && c <= 0xdfff &&
|
||||||
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
|
(extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||||
{
|
{
|
||||||
ptr--;
|
ptr--;
|
||||||
*errorcodeptr = ERR73;
|
*errorcodeptr = ERR73;
|
||||||
|
@ -1806,11 +1846,11 @@ else
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* \x is complicated. When PCRE2_ALT_BSUX is set, \x must be followed by
|
/* When PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set, \x must be followed
|
||||||
two hexadecimal digits. Otherwise it is a lowercase x letter. */
|
by two hexadecimal digits. Otherwise it is a lowercase x letter. */
|
||||||
|
|
||||||
case CHAR_x:
|
case CHAR_x:
|
||||||
if ((options & PCRE2_ALT_BSUX) != 0)
|
if (alt_bsux)
|
||||||
{
|
{
|
||||||
uint32_t xc;
|
uint32_t xc;
|
||||||
if (ptrend - ptr < 2) break; /* Less than 2 characters */
|
if (ptrend - ptr < 2) break; /* Less than 2 characters */
|
||||||
|
@ -1818,9 +1858,9 @@ else
|
||||||
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
||||||
c = (cc << 4) | xc;
|
c = (cc << 4) | xc;
|
||||||
ptr += 2;
|
ptr += 2;
|
||||||
} /* End PCRE2_ALT_BSUX handling */
|
}
|
||||||
|
|
||||||
/* Handle \x in Perl's style. \x{ddd} is a character number which can be
|
/* Handle \x in Perl's style. \x{ddd} is a character code which can be
|
||||||
greater than 0xff in UTF-8 or non-8bit mode, but only if the ddd are hex
|
greater than 0xff in UTF-8 or non-8bit mode, but only if the ddd are hex
|
||||||
digits. If not, { used to be treated as a data character. However, Perl
|
digits. If not, { used to be treated as a data character. However, Perl
|
||||||
seems to read hex digits up to the first non-such, and ignore the rest, so
|
seems to read hex digits up to the first non-such, and ignore the rest, so
|
||||||
|
@ -1864,8 +1904,8 @@ else
|
||||||
}
|
}
|
||||||
else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
|
else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
|
||||||
{
|
{
|
||||||
if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
|
if (utf && c >= 0xd800 && c <= 0xdfff &&
|
||||||
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
|
(extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||||
{
|
{
|
||||||
ptr--;
|
ptr--;
|
||||||
*errorcodeptr = ERR73;
|
*errorcodeptr = ERR73;
|
||||||
|
@ -2438,6 +2478,7 @@ uint32_t *parsed_pattern = cb->parsed_pattern;
|
||||||
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
||||||
uint32_t meta_quantifier = 0;
|
uint32_t meta_quantifier = 0;
|
||||||
uint32_t add_after_mark = 0;
|
uint32_t add_after_mark = 0;
|
||||||
|
uint32_t extra_options = cb->cx->extra_options;
|
||||||
uint16_t nest_depth = 0;
|
uint16_t nest_depth = 0;
|
||||||
int after_manual_callout = 0;
|
int after_manual_callout = 0;
|
||||||
int expect_cond_assert = 0;
|
int expect_cond_assert = 0;
|
||||||
|
@ -2461,12 +2502,12 @@ nest_save *top_nest, *end_nests;
|
||||||
/* Insert leading items for word and line matching (features provided for the
|
/* Insert leading items for word and line matching (features provided for the
|
||||||
benefit of pcre2grep). */
|
benefit of pcre2grep). */
|
||||||
|
|
||||||
if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_CIRCUMFLEX;
|
*parsed_pattern++ = META_CIRCUMFLEX;
|
||||||
*parsed_pattern++ = META_NOCAPTURE;
|
*parsed_pattern++ = META_NOCAPTURE;
|
||||||
}
|
}
|
||||||
else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_ESCAPE + ESC_b;
|
*parsed_pattern++ = META_ESCAPE + ESC_b;
|
||||||
*parsed_pattern++ = META_NOCAPTURE;
|
*parsed_pattern++ = META_NOCAPTURE;
|
||||||
|
@ -2631,7 +2672,7 @@ while (ptr < ptrend)
|
||||||
if ((options & PCRE2_ALT_VERBNAMES) != 0)
|
if ((options & PCRE2_ALT_VERBNAMES) != 0)
|
||||||
{
|
{
|
||||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
||||||
FALSE, cb);
|
cb->cx->extra_options, FALSE, cb);
|
||||||
if (errorcode != 0) goto FAILED;
|
if (errorcode != 0) goto FAILED;
|
||||||
}
|
}
|
||||||
else escape = 0; /* Treat all as literal */
|
else escape = 0; /* Treat all as literal */
|
||||||
|
@ -2821,11 +2862,11 @@ while (ptr < ptrend)
|
||||||
case CHAR_BACKSLASH:
|
case CHAR_BACKSLASH:
|
||||||
tempptr = ptr;
|
tempptr = ptr;
|
||||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
||||||
FALSE, cb);
|
cb->cx->extra_options, FALSE, cb);
|
||||||
if (errorcode != 0)
|
if (errorcode != 0)
|
||||||
{
|
{
|
||||||
ESCAPE_FAILED:
|
ESCAPE_FAILED:
|
||||||
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
ptr = tempptr;
|
ptr = tempptr;
|
||||||
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
|
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
|
||||||
|
@ -3382,12 +3423,12 @@ while (ptr < ptrend)
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
tempptr = ptr;
|
tempptr = ptr;
|
||||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
|
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
||||||
options, TRUE, cb);
|
cb->cx->extra_options, TRUE, cb);
|
||||||
|
|
||||||
if (errorcode != 0)
|
if (errorcode != 0)
|
||||||
{
|
{
|
||||||
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
ptr = tempptr;
|
ptr = tempptr;
|
||||||
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
|
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
|
||||||
|
@ -4545,12 +4586,12 @@ parsed_pattern = manage_callouts(ptr, &previous_callout, auto_callout,
|
||||||
/* Insert trailing items for word and line matching (features provided for the
|
/* Insert trailing items for word and line matching (features provided for the
|
||||||
benefit of pcre2grep). */
|
benefit of pcre2grep). */
|
||||||
|
|
||||||
if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_KET;
|
*parsed_pattern++ = META_KET;
|
||||||
*parsed_pattern++ = META_DOLLAR;
|
*parsed_pattern++ = META_DOLLAR;
|
||||||
}
|
}
|
||||||
else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
||||||
{
|
{
|
||||||
*parsed_pattern++ = META_KET;
|
*parsed_pattern++ = META_KET;
|
||||||
*parsed_pattern++ = META_ESCAPE + ESC_b;
|
*parsed_pattern++ = META_ESCAPE + ESC_b;
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -1942,7 +1942,7 @@ is available. */
|
||||||
extern int _pcre2_auto_possessify(PCRE2_UCHAR *, BOOL,
|
extern int _pcre2_auto_possessify(PCRE2_UCHAR *, BOOL,
|
||||||
const compile_block *);
|
const compile_block *);
|
||||||
extern int _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *,
|
extern int _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *,
|
||||||
int *, uint32_t, BOOL, compile_block *);
|
int *, uint32_t, uint32_t, BOOL, compile_block *);
|
||||||
extern PCRE2_SPTR _pcre2_extuni(uint32_t, PCRE2_SPTR, PCRE2_SPTR, PCRE2_SPTR,
|
extern PCRE2_SPTR _pcre2_extuni(uint32_t, PCRE2_SPTR, PCRE2_SPTR, PCRE2_SPTR,
|
||||||
BOOL, int *);
|
BOOL, int *);
|
||||||
extern PCRE2_SPTR _pcre2_find_bracket(PCRE2_SPTR, BOOL, int);
|
extern PCRE2_SPTR _pcre2_find_bracket(PCRE2_SPTR, BOOL, int);
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -129,7 +129,7 @@ for (; ptr < ptrend; ptr++)
|
||||||
|
|
||||||
ptr += 1; /* Must point after \ */
|
ptr += 1; /* Must point after \ */
|
||||||
erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
|
erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
|
||||||
code->overall_options, FALSE, NULL);
|
code->overall_options, code->extra_options, FALSE, NULL);
|
||||||
ptr -= 1; /* Back to last code unit of escape */
|
ptr -= 1; /* Back to last code unit of escape */
|
||||||
if (errorcode != 0)
|
if (errorcode != 0)
|
||||||
{
|
{
|
||||||
|
@ -774,7 +774,7 @@ do
|
||||||
|
|
||||||
ptr++; /* Point after \ */
|
ptr++; /* Point after \ */
|
||||||
rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
|
rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
|
||||||
code->overall_options, FALSE, NULL);
|
code->overall_options, code->extra_options, FALSE, NULL);
|
||||||
if (errorcode != 0) goto BADESCAPE;
|
if (errorcode != 0) goto BADESCAPE;
|
||||||
|
|
||||||
switch(rc)
|
switch(rc)
|
||||||
|
|
|
@ -646,6 +646,7 @@ static modstruct modlist[] = {
|
||||||
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
||||||
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
||||||
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
||||||
|
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
|
||||||
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
||||||
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
||||||
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
|
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
|
||||||
|
@ -4189,10 +4190,11 @@ show_compile_extra_options(uint32_t options, const char *before,
|
||||||
const char *after)
|
const char *after)
|
||||||
{
|
{
|
||||||
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
|
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
|
||||||
else fprintf(outfile, "%s%s%s%s%s%s%s",
|
else fprintf(outfile, "%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
|
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
|
||||||
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
|
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
|
||||||
|
((options & PCRE2_EXTRA_ALT_BSUX) != 0)? " extra_alt_bsux" : "",
|
||||||
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
|
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
|
||||||
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
|
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
|
||||||
((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
|
((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
|
||||||
|
|
|
@ -2408,13 +2408,13 @@
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
cat
|
cat
|
||||||
|
|
||||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||||
cat
|
cat
|
||||||
|
|
||||||
/TA]/
|
/TA]/
|
||||||
The ACTA] comes
|
The ACTA] comes
|
||||||
|
|
||||||
/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/TA]/allow_empty_class,match_unset_backref,dupnames
|
||||||
The ACTA] comes
|
The ACTA] comes
|
||||||
|
|
||||||
/(?2)[]a()b](abc)/
|
/(?2)[]a()b](abc)/
|
||||||
|
@ -2446,25 +2446,25 @@
|
||||||
|
|
||||||
/a[^]b/
|
/a[^]b/
|
||||||
|
|
||||||
/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[]b/allow_empty_class,match_unset_backref,dupnames
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
|
|
||||||
/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[]+b/allow_empty_class,match_unset_backref,dupnames
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
|
|
||||||
/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[]*+b/allow_empty_class,match_unset_backref,dupnames
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
|
|
||||||
/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[^]b/allow_empty_class,match_unset_backref,dupnames
|
||||||
aXb
|
aXb
|
||||||
a\nb
|
a\nb
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
|
|
||||||
/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[^]+b/allow_empty_class,match_unset_backref,dupnames
|
||||||
aXb
|
aXb
|
||||||
a\nX\nXb
|
a\nX\nXb
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
@ -2903,10 +2903,10 @@
|
||||||
xxxxabcde\=ps
|
xxxxabcde\=ps
|
||||||
xxxxabcde\=ph
|
xxxxabcde\=ph
|
||||||
|
|
||||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||||
cat
|
cat
|
||||||
|
|
||||||
/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
|
||||||
cat
|
cat
|
||||||
|
|
||||||
/(\3)(\1)(a)/I
|
/(\3)(\1)(a)/I
|
||||||
|
@ -3418,6 +3418,14 @@
|
||||||
aU0041z
|
aU0041z
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
aAz
|
aAz
|
||||||
|
|
||||||
|
/^\u{7a}/alt_bsux
|
||||||
|
u{7a}
|
||||||
|
\= Expect no match
|
||||||
|
zoo
|
||||||
|
|
||||||
|
/^\u{7a}/extra_alt_bsux
|
||||||
|
zoo
|
||||||
|
|
||||||
/(?(?=c)c|d)++Y/B
|
/(?(?=c)c|d)++Y/B
|
||||||
|
|
||||||
|
|
|
@ -333,13 +333,13 @@
|
||||||
|
|
||||||
/[[:a\x{100}b:]]/utf
|
/[[:a\x{100}b:]]/utf
|
||||||
|
|
||||||
/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/a[^]b/utf,allow_empty_class,match_unset_backref
|
||||||
a\x{1234}b
|
a\x{1234}b
|
||||||
a\nb
|
a\nb
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
|
|
||||||
/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/a[^]+b/utf,allow_empty_class,match_unset_backref
|
||||||
aXb
|
aXb
|
||||||
a\nX\nX\x{1234}b
|
a\nX\nX\x{1234}b
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
@ -814,6 +814,9 @@
|
||||||
|
|
||||||
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||||
|
|
||||||
|
/^\u{0000000000010ffff}/utf,extra_alt_bsux
|
||||||
|
\x{10ffff}
|
||||||
|
|
||||||
/^a+[a\x{200}]/B,utf
|
/^a+[a\x{200}]/B,utf
|
||||||
aa
|
aa
|
||||||
|
|
||||||
|
|
|
@ -8774,7 +8774,7 @@ No match
|
||||||
cat
|
cat
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||||
cat
|
cat
|
||||||
0: a
|
0: a
|
||||||
1:
|
1:
|
||||||
|
@ -8785,7 +8785,7 @@ No match
|
||||||
The ACTA] comes
|
The ACTA] comes
|
||||||
0: TA]
|
0: TA]
|
||||||
|
|
||||||
/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/TA]/allow_empty_class,match_unset_backref,dupnames
|
||||||
The ACTA] comes
|
The ACTA] comes
|
||||||
0: TA]
|
0: TA]
|
||||||
|
|
||||||
|
@ -8833,22 +8833,22 @@ Failed: error 106 at offset 4: missing terminating ] for character class
|
||||||
/a[^]b/
|
/a[^]b/
|
||||||
Failed: error 106 at offset 5: missing terminating ] for character class
|
Failed: error 106 at offset 5: missing terminating ] for character class
|
||||||
|
|
||||||
/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[]b/allow_empty_class,match_unset_backref,dupnames
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[]+b/allow_empty_class,match_unset_backref,dupnames
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[]*+b/allow_empty_class,match_unset_backref,dupnames
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
ab
|
ab
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[^]b/allow_empty_class,match_unset_backref,dupnames
|
||||||
aXb
|
aXb
|
||||||
0: aXb
|
0: aXb
|
||||||
a\nb
|
a\nb
|
||||||
|
@ -8857,7 +8857,7 @@ No match
|
||||||
ab
|
ab
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/a[^]+b/allow_empty_class,match_unset_backref,dupnames
|
||||||
aXb
|
aXb
|
||||||
0: aXb
|
0: aXb
|
||||||
a\nX\nXb
|
a\nX\nXb
|
||||||
|
@ -9971,17 +9971,17 @@ Partial match: abca
|
||||||
xxxxabcde\=ph
|
xxxxabcde\=ph
|
||||||
Partial match: abcde
|
Partial match: abcde
|
||||||
|
|
||||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||||
cat
|
cat
|
||||||
0: a
|
0: a
|
||||||
1:
|
1:
|
||||||
2:
|
2:
|
||||||
3: a
|
3: a
|
||||||
|
|
||||||
/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
|
||||||
Capture group count = 3
|
Capture group count = 3
|
||||||
Max back reference = 3
|
Max back reference = 3
|
||||||
Options: alt_bsux allow_empty_class dupnames match_unset_backref
|
Options: allow_empty_class dupnames match_unset_backref
|
||||||
Last code unit = 'a'
|
Last code unit = 'a'
|
||||||
Subject length lower bound = 1
|
Subject length lower bound = 1
|
||||||
cat
|
cat
|
||||||
|
@ -11364,6 +11364,17 @@ No match
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
aAz
|
aAz
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
/^\u{7a}/alt_bsux
|
||||||
|
u{7a}
|
||||||
|
0: u{7a}
|
||||||
|
\= Expect no match
|
||||||
|
zoo
|
||||||
|
No match
|
||||||
|
|
||||||
|
/^\u{7a}/extra_alt_bsux
|
||||||
|
zoo
|
||||||
|
0: z
|
||||||
|
|
||||||
/(?(?=c)c|d)++Y/B
|
/(?(?=c)c|d)++Y/B
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -798,7 +798,7 @@ No match
|
||||||
/[[:a\x{100}b:]]/utf
|
/[[:a\x{100}b:]]/utf
|
||||||
Failed: error 130 at offset 3: unknown POSIX class name
|
Failed: error 130 at offset 3: unknown POSIX class name
|
||||||
|
|
||||||
/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/a[^]b/utf,allow_empty_class,match_unset_backref
|
||||||
a\x{1234}b
|
a\x{1234}b
|
||||||
0: a\x{1234}b
|
0: a\x{1234}b
|
||||||
a\nb
|
a\nb
|
||||||
|
@ -807,7 +807,7 @@ Failed: error 130 at offset 3: unknown POSIX class name
|
||||||
ab
|
ab
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/a[^]+b/utf,allow_empty_class,match_unset_backref
|
||||||
aXb
|
aXb
|
||||||
0: aXb
|
0: aXb
|
||||||
a\nX\nX\x{1234}b
|
a\nX\nX\x{1234}b
|
||||||
|
@ -1734,6 +1734,10 @@ No match
|
||||||
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||||
Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
|
Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
|
||||||
|
|
||||||
|
/^\u{0000000000010ffff}/utf,extra_alt_bsux
|
||||||
|
\x{10ffff}
|
||||||
|
0: \x{10ffff}
|
||||||
|
|
||||||
/^a+[a\x{200}]/B,utf
|
/^a+[a\x{200}]/B,utf
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
Bra
|
Bra
|
||||||
|
|
Loading…
Reference in New Issue