Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax.
This commit is contained in:
parent
d90de8b053
commit
8c8deae8eb
|
@ -125,6 +125,9 @@ processing or a crash could result.
|
|||
names, as Perl does. There was a small bug in this new code, found by
|
||||
ClusterFuzz 12950, fixed before release.
|
||||
|
||||
31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh}
|
||||
construct.
|
||||
|
||||
|
||||
Version 10.32 10-September-2018
|
||||
-------------------------------
|
||||
|
|
|
@ -86,7 +86,12 @@ PCRE2 must be built with Unicode support (the default) in order to use
|
|||
PCRE2_UTF, PCRE2_UCP and related options.
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
Additional options may be set in the compile context via the
|
||||
<a href="pcre2_set_compile_extra_options.html"><b>pcre2_set_compile_extra_options</b></a>
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
The yield of this function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
</P>
|
||||
<P>
|
||||
|
|
|
@ -20,7 +20,7 @@ SYNOPSIS
|
|||
</P>
|
||||
<P>
|
||||
<b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
|
||||
<b> PCRE2_SIZE <i>extra_options</i>);</b>
|
||||
<b> uint32_t <i>extra_options</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -31,6 +31,7 @@ housed in a compile context. It completely replaces all the bits. The extra
|
|||
options are:
|
||||
<pre>
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
|
||||
PCRE2_EXTRA_ALT_BSUX Extended alternate \u, \U, and \x handling
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
|
||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF Interpret \r as \n
|
||||
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
|
||||
|
|
|
@ -1298,7 +1298,7 @@ are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
|
|||
Copies of both the code and the tables are made, with the new code pointing to
|
||||
the new tables. The memory for the new tables is automatically freed when
|
||||
<b>pcre2_code_free()</b> is called for the new copy of the compiled code. If
|
||||
<b>pcre2_code_copy_withy_tables()</b> is called with a NULL argument, it returns
|
||||
<b>pcre2_code_copy_with_tables()</b> is called with a NULL argument, it returns
|
||||
NULL.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -1315,7 +1315,7 @@ PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
|
|||
</P>
|
||||
<P>
|
||||
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
||||
settings that affect the compilation. It should be zero if no options are
|
||||
settings that affect the compilation. It should be zero if none of them are
|
||||
required. The available options are described below. Some of them (in
|
||||
particular, those that are compatible with Perl, but some others as well) can
|
||||
also be set and unset from within the pattern (see the detailed description in
|
||||
|
@ -1330,8 +1330,9 @@ compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
|
|||
options can be set at the time of matching as well as at compile time.
|
||||
</P>
|
||||
<P>
|
||||
Other, less frequently required compile-time parameters (for example, the
|
||||
newline setting) can be provided in a compile context (as described
|
||||
Some additional options and less frequently required compile-time parameters
|
||||
(for example, the newline setting) can be provided in a compile context (as
|
||||
described
|
||||
<a href="#compilecontext">above).</a>
|
||||
</P>
|
||||
<P>
|
||||
|
@ -1384,7 +1385,13 @@ This code fragment shows a typical straightforward call to
|
|||
&errorcode, /* for error code */
|
||||
&erroffset, /* for error offset */
|
||||
NULL); /* no compile context */
|
||||
</pre>
|
||||
|
||||
</PRE>
|
||||
</P>
|
||||
<br><b>
|
||||
Main compile options
|
||||
</b><br>
|
||||
<P>
|
||||
The following names for option bits are defined in the <b>pcre2.h</b> header
|
||||
file:
|
||||
<pre>
|
||||
|
@ -1424,6 +1431,14 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
|
|||
to match. By default, as in Perl, a hexadecimal number is always expected after
|
||||
\x, but it may have zero, one, or two digits (so, for example, \xz matches a
|
||||
binary zero character followed by z).
|
||||
</P>
|
||||
<P>
|
||||
ECMAscript 6 added additional functionality to \u. This can be accessed using
|
||||
the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
|
||||
<a href="#extracompileoptions">below).</a>
|
||||
Note that this alternative escape handling applies only to patterns. Neither of
|
||||
these options affects the processing of replacement strings passed to
|
||||
<b>pcre2_substitute()</b>.
|
||||
<pre>
|
||||
PCRE2_ALT_CIRCUMFLEX
|
||||
</pre>
|
||||
|
@ -1830,9 +1845,8 @@ characters with code points greater than 127.
|
|||
Extra compile options
|
||||
</b><br>
|
||||
<P>
|
||||
Unlike the main compile-time options, the extra options are not saved with the
|
||||
compiled pattern. The option bits that can be set in a compile context by
|
||||
calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
|
||||
The option bits that can be set in a compile context by calling the
|
||||
<b>pcre2_set_compile_extra_options()</b> function are as follows:
|
||||
<pre>
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
</pre>
|
||||
|
@ -1857,6 +1871,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
|||
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
||||
incorporated in the compiled pattern. However, they can only match subject
|
||||
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
||||
<pre>
|
||||
PCRE2_EXTRA_ALT_BSUX
|
||||
</pre>
|
||||
The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in
|
||||
the way that ECMAscript (aka JavaScript) does. Additional functionality was
|
||||
defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of
|
||||
PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal
|
||||
character code, where hhh.. is any number of hexadecimal digits.
|
||||
<pre>
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||
</pre>
|
||||
|
@ -3382,7 +3404,8 @@ capture groups and letters within \Q...\E quoted sequences.
|
|||
<P>
|
||||
Note that case forcing sequences such as \U...\E do not nest. For example,
|
||||
the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
|
||||
effect.
|
||||
effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do
|
||||
not apply to not apply to replacement strings.
|
||||
</P>
|
||||
<P>
|
||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||
|
@ -3784,7 +3807,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 04 February 2019
|
||||
Last updated: 12 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -47,8 +47,9 @@ non-newline character, and \N{U+dd..}, matching a Unicode code point, are
|
|||
supported. The escapes that modify the case of following letters are
|
||||
implemented by Perl's general string-handling and are not part of its pattern
|
||||
matching engine. If any of these are encountered by PCRE2, an error is
|
||||
generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
|
||||
are interpreted as ECMAScript interprets them.
|
||||
generated by default. However, if either of the PCRE2_ALT_BSUX or
|
||||
PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript
|
||||
interprets them.
|
||||
</P>
|
||||
<P>
|
||||
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
||||
|
@ -233,7 +234,7 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 12 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -399,12 +399,33 @@ environment, these escapes are as follows:
|
|||
\xhh character with hex code hh
|
||||
\x{hhh..} character with hex code hhh..
|
||||
\N{U+hhh..} character with Unicode hex code point hhh..
|
||||
\uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
||||
</pre>
|
||||
There are some legacy applications where the escape sequence \r is expected to
|
||||
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
|
||||
pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
|
||||
(carriage return) character.
|
||||
By default, after \x that is not followed by {, from zero to two hexadecimal
|
||||
digits are read (letters can be in upper or lower case). Any number of
|
||||
hexadecimal digits may appear between \x{ and }. If a character other than a
|
||||
hexadecimal digit appears between \x{ and }, or if there is no terminating },
|
||||
an error occurs.
|
||||
</P>
|
||||
<P>
|
||||
Characters whose code points are less than 256 can be defined by either of the
|
||||
two syntaxes for \x or by an octal sequence. There is no difference in the way
|
||||
they are handled. For example, \xdc is exactly the same as \x{dc} or \334.
|
||||
However, using the braced versions does make such sequences easier to read.
|
||||
</P>
|
||||
<P>
|
||||
Support is available for some ECMAScript (aka JavaScript) escape sequences via
|
||||
two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed
|
||||
by { is not recognized. Only if \x is followed by two hexadecimal digits is it
|
||||
recognized as a character escape. Otherwise it is interpreted as a literal "x"
|
||||
character. In this mode, support for code points greater than 256 is provided
|
||||
by \u, which must be followed by four hexadecimal digits; otherwise it is
|
||||
interpreted as a literal "u" character.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
|
||||
\u{hhh..} is recognized as the character specified by hexadecimal code point.
|
||||
There may be any number of hexadecimal digits. This syntax is from ECMAScript
|
||||
6.
|
||||
</P>
|
||||
<P>
|
||||
The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
|
||||
|
@ -414,6 +435,12 @@ Note that when \N is not followed by an opening brace (curly bracket) it has
|
|||
an entirely different meaning, matching any character that is not a newline.
|
||||
</P>
|
||||
<P>
|
||||
There are some legacy applications where the escape sequence \r is expected to
|
||||
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
|
||||
pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
|
||||
(carriage return) character.
|
||||
</P>
|
||||
<P>
|
||||
The precise effect of \cx on ASCII characters is as follows: if x is a lower
|
||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||
|
@ -500,28 +527,6 @@ Note that octal values of 100 or greater that are specified using this syntax
|
|||
must not be introduced by a leading zero, because no more than three octal
|
||||
digits are ever read.
|
||||
</P>
|
||||
<P>
|
||||
By default, after \x that is not followed by {, from zero to two hexadecimal
|
||||
digits are read (letters can be in upper or lower case). Any number of
|
||||
hexadecimal digits may appear between \x{ and }. If a character other than
|
||||
a hexadecimal digit appears between \x{ and }, or if there is no terminating
|
||||
}, an error occurs.
|
||||
</P>
|
||||
<P>
|
||||
If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
|
||||
described only when it is followed by two hexadecimal digits. Otherwise, it
|
||||
matches a literal "x" character. In this mode, support for code points greater
|
||||
than 256 is provided by \u, which must be followed by four hexadecimal digits;
|
||||
otherwise it matches a literal "u" character. This syntax makes PCRE2 behave
|
||||
like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
|
||||
supported.
|
||||
</P>
|
||||
<P>
|
||||
Characters whose value is less than 256 can be defined by either of the two
|
||||
syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no difference in
|
||||
the way they are handled. For example, \xdc is exactly the same as \x{dc} (or
|
||||
\u00dc in PCRE2_ALT_BSUX mode).
|
||||
</P>
|
||||
<br><b>
|
||||
Constraints on character values
|
||||
</b><br>
|
||||
|
@ -560,9 +565,10 @@ Unsupported escape sequences
|
|||
<P>
|
||||
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
|
||||
handler and used to modify the case of following characters. By default, PCRE2
|
||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
||||
is set, \U matches a "U" character, and \u can be used to define a character
|
||||
by code point, as described above.
|
||||
does not support these escape sequences in patterns. However, if either of the
|
||||
PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U"
|
||||
character, and \u can be used to define a character by code point, as
|
||||
described above.
|
||||
</P>
|
||||
<br><b>
|
||||
Absolute and relative backreferences
|
||||
|
@ -3721,7 +3727,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 04 February 2019
|
||||
Last updated: 12 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -58,7 +58,8 @@ documentation. This document contains a quick-reference summary of the syntax.
|
|||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
|
||||
<P>
|
||||
This table applies to ASCII and Unicode environments.
|
||||
This table applies to ASCII and Unicode environments. An unrecognized escape
|
||||
sequence causes an error.
|
||||
<pre>
|
||||
\a alarm, that is, the BEL character (hex 07)
|
||||
\cx "control-x", where x is any ASCII printing character
|
||||
|
@ -70,12 +71,25 @@ This table applies to ASCII and Unicode environments.
|
|||
\0dd character with octal code 0dd
|
||||
\ddd character with octal code ddd, or backreference
|
||||
\o{ddd..} character with octal code ddd..
|
||||
\U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
||||
\N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
|
||||
\uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
||||
\xhh character with hex code hh
|
||||
\x{hh..} character with hex code hh..
|
||||
</pre>
|
||||
If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
|
||||
following are also recognized:
|
||||
<pre>
|
||||
\U the character "U"
|
||||
\uhhhh character with hex code hhhh
|
||||
\u{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
|
||||
</pre>
|
||||
When \x is not followed by {, from zero to two hexadecimal digits are read,
|
||||
but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
|
||||
recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||
Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
|
||||
or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
|
||||
matches a literal "u".
|
||||
</P>
|
||||
<P>
|
||||
Note that \0dd is always an octal code. The treatment of backslash followed by
|
||||
a non-zero digit is complicated; for details see the section
|
||||
<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
|
||||
|
@ -86,13 +100,6 @@ also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
|
|||
supported in EBCDIC environments. Note that \N not followed by an opening
|
||||
curly bracket has a different meaning (see below).
|
||||
</P>
|
||||
<P>
|
||||
When \x is not followed by {, from zero to two hexadecimal digits are read,
|
||||
but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
|
||||
be recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||
Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
|
||||
it matches a literal "u".
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
|
@ -660,7 +667,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 11 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -609,6 +609,7 @@ for a description of the effects of these options.
|
|||
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||
/x extended set PCRE2_EXTENDED
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
|
||||
firstline set PCRE2_FIRSTLINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
|
@ -2075,7 +2076,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 11 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
1865
doc/pcre2.txt
1865
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
|
||||
.TH PCRE2_COMPILE 3 "11 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -73,7 +73,13 @@ The option bits are:
|
|||
PCRE2 must be built with Unicode support (the default) in order to use
|
||||
PCRE2_UTF, PCRE2_UCP and related options.
|
||||
.P
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
Additional options may be set in the compile context via the
|
||||
.\" HREF
|
||||
\fBpcre2_set_compile_extra_options\fP
|
||||
.\"
|
||||
function.
|
||||
.P
|
||||
The yield of this function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API, with more detail on
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "21 September 2018" "PCRE2 10.33"
|
||||
.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "11 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
|||
.PP
|
||||
.nf
|
||||
.B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP,
|
||||
.B " PCRE2_SIZE \fIextra_options\fP);"
|
||||
.B " uint32_t \fIextra_options\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
|
@ -21,6 +21,9 @@ options are:
|
|||
.\" JOIN
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \ex{df800} to \ex{dfff}
|
||||
in UTF-8 and UTF-32 modes
|
||||
.\" JOIN
|
||||
PCRE2_EXTRA_ALT_BSUX Extended alternate \eu, \eU, and \ex
|
||||
handling
|
||||
.\" JOIN
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
|
||||
a literal following character
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
|
||||
.TH PCRE2API 3 "12 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -1231,7 +1231,7 @@ are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility.
|
|||
Copies of both the code and the tables are made, with the new code pointing to
|
||||
the new tables. The memory for the new tables is automatically freed when
|
||||
\fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
|
||||
\fBpcre2_code_copy_withy_tables()\fP is called with a NULL argument, it returns
|
||||
\fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
|
||||
NULL.
|
||||
.P
|
||||
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||
|
@ -1252,7 +1252,7 @@ below.
|
|||
.\"
|
||||
.P
|
||||
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
|
||||
settings that affect the compilation. It should be zero if no options are
|
||||
settings that affect the compilation. It should be zero if none of them are
|
||||
required. The available options are described below. Some of them (in
|
||||
particular, those that are compatible with Perl, but some others as well) can
|
||||
also be set and unset from within the pattern (see the detailed description in
|
||||
|
@ -1267,8 +1267,9 @@ contents of the \fIoptions\fP argument specifies their settings at the start of
|
|||
compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
|
||||
options can be set at the time of matching as well as at compile time.
|
||||
.P
|
||||
Other, less frequently required compile-time parameters (for example, the
|
||||
newline setting) can be provided in a compile context (as described
|
||||
Some additional options and less frequently required compile-time parameters
|
||||
(for example, the newline setting) can be provided in a compile context (as
|
||||
described
|
||||
.\" HTML <a href="#compilecontext">
|
||||
.\" </a>
|
||||
above).
|
||||
|
@ -1325,6 +1326,11 @@ This code fragment shows a typical straightforward call to
|
|||
&erroffset, /* for error offset */
|
||||
NULL); /* no compile context */
|
||||
.sp
|
||||
.
|
||||
.
|
||||
.SS "Main compile options"
|
||||
.rs
|
||||
.sp
|
||||
The following names for option bits are defined in the \fBpcre2.h\fP header
|
||||
file:
|
||||
.sp
|
||||
|
@ -1361,6 +1367,16 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
|
|||
to match. By default, as in Perl, a hexadecimal number is always expected after
|
||||
\ex, but it may have zero, one, or two digits (so, for example, \exz matches a
|
||||
binary zero character followed by z).
|
||||
.P
|
||||
ECMAscript 6 added additional functionality to \eu. This can be accessed using
|
||||
the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
|
||||
.\" HTML <a href="#extracompileoptions">
|
||||
.\" </a>
|
||||
below).
|
||||
.\"
|
||||
Note that this alternative escape handling applies only to patterns. Neither of
|
||||
these options affects the processing of replacement strings passed to
|
||||
\fBpcre2_substitute()\fP.
|
||||
.sp
|
||||
PCRE2_ALT_CIRCUMFLEX
|
||||
.sp
|
||||
|
@ -1788,9 +1804,8 @@ characters with code points greater than 127.
|
|||
.SS "Extra compile options"
|
||||
.rs
|
||||
.sp
|
||||
Unlike the main compile-time options, the extra options are not saved with the
|
||||
compiled pattern. The option bits that can be set in a compile context by
|
||||
calling the \fBpcre2_set_compile_extra_options()\fP function are as follows:
|
||||
The option bits that can be set in a compile context by calling the
|
||||
\fBpcre2_set_compile_extra_options()\fP function are as follows:
|
||||
.sp
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
.sp
|
||||
|
@ -1813,6 +1828,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
|
|||
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
|
||||
incorporated in the compiled pattern. However, they can only match subject
|
||||
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
|
||||
.sp
|
||||
PCRE2_EXTRA_ALT_BSUX
|
||||
.sp
|
||||
The original option PCRE2_ALT_BSUX causes PCRE2 to process \eU, \eu, and \ex in
|
||||
the way that ECMAscript (aka JavaScript) does. Additional functionality was
|
||||
defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of
|
||||
PCRE2_ALT_BSUX, but in addition it recognizes \eu{hhh..} as a hexadecimal
|
||||
character code, where hhh.. is any number of hexadecimal digits.
|
||||
.sp
|
||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||
.sp
|
||||
|
@ -3383,7 +3406,8 @@ capture groups and letters within \eQ...\eE quoted sequences.
|
|||
.P
|
||||
Note that case forcing sequences such as \eU...\eE do not nest. For example,
|
||||
the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
|
||||
effect.
|
||||
effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do
|
||||
not apply to not apply to replacement strings.
|
||||
.P
|
||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||
flexibility to capture group substitution. The syntax is similar to that used
|
||||
|
@ -3792,6 +3816,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 04 February 2019
|
||||
Last updated: 12 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
|
||||
.TH PCRE2COMPAT 3 "12 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||
|
@ -33,8 +33,9 @@ non-newline character, and \eN{U+dd..}, matching a Unicode code point, are
|
|||
supported. The escapes that modify the case of following letters are
|
||||
implemented by Perl's general string-handling and are not part of its pattern
|
||||
matching engine. If any of these are encountered by PCRE2, an error is
|
||||
generated by default. However, if the PCRE2_ALT_BSUX option is set, \eU and \eu
|
||||
are interpreted as ECMAScript interprets them.
|
||||
generated by default. However, if either of the PCRE2_ALT_BSUX or
|
||||
PCRE2_EXTRA_ALT_BSUX options is set, \eU and \eu are interpreted as ECMAScript
|
||||
interprets them.
|
||||
.P
|
||||
5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
|
||||
built with Unicode support (the default). The properties that can be tested
|
||||
|
@ -198,6 +199,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 12 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "04 February 2019" "PCRE2 10.33"
|
||||
.TH PCRE2PATTERN 3 "12 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -373,12 +373,30 @@ environment, these escapes are as follows:
|
|||
\exhh character with hex code hh
|
||||
\ex{hhh..} character with hex code hhh..
|
||||
\eN{U+hhh..} character with Unicode hex code point hhh..
|
||||
\euhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
||||
.sp
|
||||
There are some legacy applications where the escape sequence \er is expected to
|
||||
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
|
||||
pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
|
||||
(carriage return) character.
|
||||
By default, after \ex that is not followed by {, from zero to two hexadecimal
|
||||
digits are read (letters can be in upper or lower case). Any number of
|
||||
hexadecimal digits may appear between \ex{ and }. If a character other than a
|
||||
hexadecimal digit appears between \ex{ and }, or if there is no terminating },
|
||||
an error occurs.
|
||||
.P
|
||||
Characters whose code points are less than 256 can be defined by either of the
|
||||
two syntaxes for \ex or by an octal sequence. There is no difference in the way
|
||||
they are handled. For example, \exdc is exactly the same as \ex{dc} or \e334.
|
||||
However, using the braced versions does make such sequences easier to read.
|
||||
.P
|
||||
Support is available for some ECMAScript (aka JavaScript) escape sequences via
|
||||
two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \ex followed
|
||||
by { is not recognized. Only if \ex is followed by two hexadecimal digits is it
|
||||
recognized as a character escape. Otherwise it is interpreted as a literal "x"
|
||||
character. In this mode, support for code points greater than 256 is provided
|
||||
by \eu, which must be followed by four hexadecimal digits; otherwise it is
|
||||
interpreted as a literal "u" character.
|
||||
.P
|
||||
PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
|
||||
\eu{hhh..} is recognized as the character specified by hexadecimal code point.
|
||||
There may be any number of hexadecimal digits. This syntax is from ECMAScript
|
||||
6.
|
||||
.P
|
||||
The \eN{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
|
||||
is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
|
||||
|
@ -386,6 +404,11 @@ is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
|
|||
Note that when \eN is not followed by an opening brace (curly bracket) it has
|
||||
an entirely different meaning, matching any character that is not a newline.
|
||||
.P
|
||||
There are some legacy applications where the escape sequence \er is expected to
|
||||
match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
|
||||
pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
|
||||
(carriage return) character.
|
||||
.P
|
||||
The precise effect of \ecx on ASCII characters is as follows: if x is a lower
|
||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||
40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||
|
@ -477,25 +500,6 @@ for themselves. For example, outside a character class:
|
|||
Note that octal values of 100 or greater that are specified using this syntax
|
||||
must not be introduced by a leading zero, because no more than three octal
|
||||
digits are ever read.
|
||||
.P
|
||||
By default, after \ex that is not followed by {, from zero to two hexadecimal
|
||||
digits are read (letters can be in upper or lower case). Any number of
|
||||
hexadecimal digits may appear between \ex{ and }. If a character other than
|
||||
a hexadecimal digit appears between \ex{ and }, or if there is no terminating
|
||||
}, an error occurs.
|
||||
.P
|
||||
If the PCRE2_ALT_BSUX option is set, the interpretation of \ex is as just
|
||||
described only when it is followed by two hexadecimal digits. Otherwise, it
|
||||
matches a literal "x" character. In this mode, support for code points greater
|
||||
than 256 is provided by \eu, which must be followed by four hexadecimal digits;
|
||||
otherwise it matches a literal "u" character. This syntax makes PCRE2 behave
|
||||
like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
|
||||
supported.
|
||||
.P
|
||||
Characters whose value is less than 256 can be defined by either of the two
|
||||
syntaxes for \ex (or by \eu in PCRE2_ALT_BSUX mode). There is no difference in
|
||||
the way they are handled. For example, \exdc is exactly the same as \ex{dc} (or
|
||||
\eu00dc in PCRE2_ALT_BSUX mode).
|
||||
.
|
||||
.
|
||||
.SS "Constraints on character values"
|
||||
|
@ -534,9 +538,10 @@ character class, these sequences have different meanings.
|
|||
.sp
|
||||
In Perl, the sequences \eF, \el, \eL, \eu, and \eU are recognized by its string
|
||||
handler and used to modify the case of following characters. By default, PCRE2
|
||||
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
|
||||
is set, \eU matches a "U" character, and \eu can be used to define a character
|
||||
by code point, as described above.
|
||||
does not support these escape sequences in patterns. However, if either of the
|
||||
PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \eU matches a "U"
|
||||
character, and \eu can be used to define a character by code point, as
|
||||
described above.
|
||||
.
|
||||
.
|
||||
.SS "Absolute and relative backreferences"
|
||||
|
@ -3758,6 +3763,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 04 February 2019
|
||||
Last updated: 12 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
|
||||
.TH PCRE2SYNTAX 3 "11 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -22,7 +22,8 @@ documentation. This document contains a quick-reference summary of the syntax.
|
|||
.SH "ESCAPED CHARACTERS"
|
||||
.rs
|
||||
.sp
|
||||
This table applies to ASCII and Unicode environments.
|
||||
This table applies to ASCII and Unicode environments. An unrecognized escape
|
||||
sequence causes an error.
|
||||
.sp
|
||||
\ea alarm, that is, the BEL character (hex 07)
|
||||
\ecx "control-x", where x is any ASCII printing character
|
||||
|
@ -34,12 +35,24 @@ This table applies to ASCII and Unicode environments.
|
|||
\e0dd character with octal code 0dd
|
||||
\eddd character with octal code ddd, or backreference
|
||||
\eo{ddd..} character with octal code ddd..
|
||||
\eU "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
||||
\eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
|
||||
\euhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
||||
\exhh character with hex code hh
|
||||
\ex{hh..} character with hex code hh..
|
||||
.sp
|
||||
If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
|
||||
following are also recognized:
|
||||
.sp
|
||||
\eU the character "U"
|
||||
\euhhhh character with hex code hhhh
|
||||
\eu{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
|
||||
.sp
|
||||
When \ex is not followed by {, from zero to two hexadecimal digits are read,
|
||||
but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be
|
||||
recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||
Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits
|
||||
or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
|
||||
matches a literal "u".
|
||||
.P
|
||||
Note that \e0dd is always an octal code. The treatment of backslash followed by
|
||||
a non-zero digit is complicated; for details see the section
|
||||
.\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
|
||||
|
@ -54,12 +67,6 @@ documentation, where details of escape processing in EBCDIC environments are
|
|||
also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not
|
||||
supported in EBCDIC environments. Note that \eN not followed by an opening
|
||||
curly bracket has a different meaning (see below).
|
||||
.P
|
||||
When \ex is not followed by {, from zero to two hexadecimal digits are read,
|
||||
but if PCRE2_ALT_BSUX is set, \ex must be followed by two hexadecimal digits to
|
||||
be recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||
Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits,
|
||||
it matches a literal "u".
|
||||
.
|
||||
.
|
||||
.SH "CHARACTER TYPES"
|
||||
|
@ -647,6 +654,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 11 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
|
||||
.TH PCRE2TEST 1 "11 February 2019" "PCRE 10.33"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -568,6 +568,7 @@ for a description of the effects of these options.
|
|||
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||
/x extended set PCRE2_EXTENDED
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
|
||||
firstline set PCRE2_FIRSTLINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
|
@ -2056,6 +2057,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 11 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -547,6 +547,7 @@ PATTERN MODIFIERS
|
|||
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||
/x extended set PCRE2_EXTENDED
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
|
||||
firstline set PCRE2_FIRSTLINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
|
@ -1887,5 +1888,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 03 February 2019
|
||||
Last updated: 11 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
|
|
|
@ -150,6 +150,7 @@ D is inspected during pcre2_dfa_match() execution
|
|||
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
|
||||
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
|
||||
#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */
|
||||
#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */
|
||||
|
||||
/* These are for pcre2_jit_compile(). */
|
||||
|
||||
|
|
|
@ -764,7 +764,7 @@ are allowed. */
|
|||
#define PUBLIC_COMPILE_EXTRA_OPTIONS \
|
||||
(PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \
|
||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF)
|
||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX)
|
||||
|
||||
/* Compile time error code numbers. They are given names so that they can more
|
||||
easily be tracked. When a new number is added, the tables called eint1 and
|
||||
|
@ -1459,7 +1459,8 @@ Returns: zero => a data character
|
|||
|
||||
int
|
||||
PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr,
|
||||
int *errorcodeptr, uint32_t options, BOOL isclass, compile_block *cb)
|
||||
int *errorcodeptr, uint32_t options, uint32_t extra_options, BOOL isclass,
|
||||
compile_block *cb)
|
||||
{
|
||||
BOOL utf = (options & PCRE2_UTF) != 0;
|
||||
PCRE2_SPTR ptr = *ptrptr;
|
||||
|
@ -1495,8 +1496,7 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
|||
if (i > 0)
|
||||
{
|
||||
c = (uint32_t)i;
|
||||
if (cb != NULL && c == CHAR_CR &&
|
||||
(cb->cx->extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
|
||||
if (c == CHAR_CR && (extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
|
||||
c = CHAR_LF;
|
||||
}
|
||||
else /* Negative table entry */
|
||||
|
@ -1551,22 +1551,28 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
|||
|
||||
/* Escapes that need further processing, including those that are unknown, have
|
||||
a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
|
||||
\o, and \x are recognized (and \u when BSUX is set). */
|
||||
\o, and \x are recognized (\u and \U can never appear as they are used for case
|
||||
forcing). */
|
||||
|
||||
else
|
||||
{
|
||||
int s;
|
||||
PCRE2_SPTR oldptr;
|
||||
BOOL overflow;
|
||||
int s;
|
||||
BOOL alt_bsux =
|
||||
((options & PCRE2_ALT_BSUX) | (extra_options & PCRE2_EXTRA_ALT_BSUX)) != 0;
|
||||
|
||||
/* Filter calls from pcre2_substitute(). */
|
||||
|
||||
if (cb == NULL && c != CHAR_c && c != CHAR_o && c != CHAR_x &&
|
||||
(c != CHAR_u || (options & PCRE2_ALT_BSUX) != 0))
|
||||
if (cb == NULL)
|
||||
{
|
||||
*errorcodeptr = ERR3;
|
||||
return 0;
|
||||
}
|
||||
if (c != CHAR_c && c != CHAR_o && c != CHAR_x)
|
||||
{
|
||||
*errorcodeptr = ERR3;
|
||||
return 0;
|
||||
}
|
||||
alt_bsux = FALSE; /* Do not modify \x handling */
|
||||
}
|
||||
|
||||
switch (c)
|
||||
{
|
||||
|
@ -1579,40 +1585,74 @@ else
|
|||
*errorcodeptr = ERR37;
|
||||
break;
|
||||
|
||||
/* \u is unrecognized when PCRE2_ALT_BSUX is not set. When it is treated
|
||||
specially, \u must be followed by four hex digits. Otherwise it is a
|
||||
lowercase u letter. */
|
||||
/* \u is unrecognized when neither PCRE2_ALT_BSUX nor PCRE2_EXTRA_ALT_BSUX
|
||||
is set. Otherwise, \u must be followed by exactly four hex digits or, if
|
||||
PCRE2_EXTRA_ALT_BSUX is set, by any number of hex digits in braces.
|
||||
Otherwise it is a lowercase u letter. This gives some compatibility with
|
||||
ECMAScript (aka JavaScript). */
|
||||
|
||||
case CHAR_u:
|
||||
if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37; else
|
||||
if (!alt_bsux) *errorcodeptr = ERR37; else
|
||||
{
|
||||
uint32_t xc;
|
||||
if (ptrend - ptr < 4) break; /* Less than 4 chars */
|
||||
if ((cc = XDIGIT(ptr[0])) == 0xff) break; /* Not a hex digit */
|
||||
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
||||
cc = (cc << 4) | xc;
|
||||
if ((xc = XDIGIT(ptr[2])) == 0xff) break; /* Not a hex digit */
|
||||
cc = (cc << 4) | xc;
|
||||
if ((xc = XDIGIT(ptr[3])) == 0xff) break; /* Not a hex digit */
|
||||
c = (cc << 4) | xc;
|
||||
ptr += 4;
|
||||
|
||||
if (*ptr == CHAR_LEFT_CURLY_BRACKET &&
|
||||
(extra_options & PCRE2_EXTRA_ALT_BSUX) != 0)
|
||||
{
|
||||
PCRE2_SPTR hptr = ptr + 1;
|
||||
cc = 0;
|
||||
|
||||
while (hptr < ptrend && (xc = XDIGIT(*hptr)) != 0xff)
|
||||
{
|
||||
if ((cc & 0xf0000000) != 0) /* Test for 32-bit overflow */
|
||||
{
|
||||
*errorcodeptr = ERR77;
|
||||
ptr = hptr; /* Show where */
|
||||
break; /* *hptr != } will cause another break below */
|
||||
}
|
||||
cc = (cc << 4) | xc;
|
||||
hptr++;
|
||||
}
|
||||
|
||||
if (hptr == ptr + 1 || /* No hex digits */
|
||||
hptr >= ptrend || /* Hit end of input */
|
||||
*hptr != CHAR_RIGHT_CURLY_BRACKET) /* No } terminator */
|
||||
break; /* Hex escape not recognized */
|
||||
|
||||
c = cc; /* Accept the code point */
|
||||
ptr = hptr + 1;
|
||||
}
|
||||
|
||||
else /* Must be exactly 4 hex digits */
|
||||
{
|
||||
if (ptrend - ptr < 4) break; /* Less than 4 chars */
|
||||
if ((cc = XDIGIT(ptr[0])) == 0xff) break; /* Not a hex digit */
|
||||
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
||||
cc = (cc << 4) | xc;
|
||||
if ((xc = XDIGIT(ptr[2])) == 0xff) break; /* Not a hex digit */
|
||||
cc = (cc << 4) | xc;
|
||||
if ((xc = XDIGIT(ptr[3])) == 0xff) break; /* Not a hex digit */
|
||||
c = (cc << 4) | xc;
|
||||
ptr += 4;
|
||||
}
|
||||
|
||||
if (utf)
|
||||
{
|
||||
if (c > 0x10ffffU) *errorcodeptr = ERR77;
|
||||
else
|
||||
if (c >= 0xd800 && c <= 0xdfff &&
|
||||
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||
*errorcodeptr = ERR73;
|
||||
(extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||
*errorcodeptr = ERR73;
|
||||
}
|
||||
else if (c > MAX_NON_UTF_CHAR) *errorcodeptr = ERR77;
|
||||
}
|
||||
break;
|
||||
|
||||
/* \U is unrecognized unless PCRE2_ALT_BSUX is set, in which case it is an
|
||||
upper case letter. */
|
||||
/* \U is unrecognized unless PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set,
|
||||
in which case it is an upper case letter. */
|
||||
|
||||
case CHAR_U:
|
||||
if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37;
|
||||
if (!alt_bsux) *errorcodeptr = ERR37;
|
||||
break;
|
||||
|
||||
/* In a character class, \g is just a literal "g". Outside a character
|
||||
|
@ -1791,8 +1831,8 @@ else
|
|||
}
|
||||
else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
|
||||
{
|
||||
if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
|
||||
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
|
||||
if (utf && c >= 0xd800 && c <= 0xdfff &&
|
||||
(extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||
{
|
||||
ptr--;
|
||||
*errorcodeptr = ERR73;
|
||||
|
@ -1806,11 +1846,11 @@ else
|
|||
}
|
||||
break;
|
||||
|
||||
/* \x is complicated. When PCRE2_ALT_BSUX is set, \x must be followed by
|
||||
two hexadecimal digits. Otherwise it is a lowercase x letter. */
|
||||
/* When PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set, \x must be followed
|
||||
by two hexadecimal digits. Otherwise it is a lowercase x letter. */
|
||||
|
||||
case CHAR_x:
|
||||
if ((options & PCRE2_ALT_BSUX) != 0)
|
||||
if (alt_bsux)
|
||||
{
|
||||
uint32_t xc;
|
||||
if (ptrend - ptr < 2) break; /* Less than 2 characters */
|
||||
|
@ -1818,9 +1858,9 @@ else
|
|||
if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */
|
||||
c = (cc << 4) | xc;
|
||||
ptr += 2;
|
||||
} /* End PCRE2_ALT_BSUX handling */
|
||||
}
|
||||
|
||||
/* Handle \x in Perl's style. \x{ddd} is a character number which can be
|
||||
/* Handle \x in Perl's style. \x{ddd} is a character code which can be
|
||||
greater than 0xff in UTF-8 or non-8bit mode, but only if the ddd are hex
|
||||
digits. If not, { used to be treated as a data character. However, Perl
|
||||
seems to read hex digits up to the first non-such, and ignore the rest, so
|
||||
|
@ -1864,8 +1904,8 @@ else
|
|||
}
|
||||
else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
|
||||
{
|
||||
if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
|
||||
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
|
||||
if (utf && c >= 0xd800 && c <= 0xdfff &&
|
||||
(extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
|
||||
{
|
||||
ptr--;
|
||||
*errorcodeptr = ERR73;
|
||||
|
@ -2438,6 +2478,7 @@ uint32_t *parsed_pattern = cb->parsed_pattern;
|
|||
uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
|
||||
uint32_t meta_quantifier = 0;
|
||||
uint32_t add_after_mark = 0;
|
||||
uint32_t extra_options = cb->cx->extra_options;
|
||||
uint16_t nest_depth = 0;
|
||||
int after_manual_callout = 0;
|
||||
int expect_cond_assert = 0;
|
||||
|
@ -2461,12 +2502,12 @@ nest_save *top_nest, *end_nests;
|
|||
/* Insert leading items for word and line matching (features provided for the
|
||||
benefit of pcre2grep). */
|
||||
|
||||
if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
||||
if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
||||
{
|
||||
*parsed_pattern++ = META_CIRCUMFLEX;
|
||||
*parsed_pattern++ = META_NOCAPTURE;
|
||||
}
|
||||
else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
||||
else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
||||
{
|
||||
*parsed_pattern++ = META_ESCAPE + ESC_b;
|
||||
*parsed_pattern++ = META_NOCAPTURE;
|
||||
|
@ -2631,7 +2672,7 @@ while (ptr < ptrend)
|
|||
if ((options & PCRE2_ALT_VERBNAMES) != 0)
|
||||
{
|
||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
||||
FALSE, cb);
|
||||
cb->cx->extra_options, FALSE, cb);
|
||||
if (errorcode != 0) goto FAILED;
|
||||
}
|
||||
else escape = 0; /* Treat all as literal */
|
||||
|
@ -2821,11 +2862,11 @@ while (ptr < ptrend)
|
|||
case CHAR_BACKSLASH:
|
||||
tempptr = ptr;
|
||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
||||
FALSE, cb);
|
||||
cb->cx->extra_options, FALSE, cb);
|
||||
if (errorcode != 0)
|
||||
{
|
||||
ESCAPE_FAILED:
|
||||
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||
if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||
goto FAILED;
|
||||
ptr = tempptr;
|
||||
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
|
||||
|
@ -3382,12 +3423,12 @@ while (ptr < ptrend)
|
|||
else
|
||||
{
|
||||
tempptr = ptr;
|
||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
|
||||
options, TRUE, cb);
|
||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
|
||||
cb->cx->extra_options, TRUE, cb);
|
||||
|
||||
if (errorcode != 0)
|
||||
{
|
||||
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||
if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||
goto FAILED;
|
||||
ptr = tempptr;
|
||||
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
|
||||
|
@ -4545,12 +4586,12 @@ parsed_pattern = manage_callouts(ptr, &previous_callout, auto_callout,
|
|||
/* Insert trailing items for word and line matching (features provided for the
|
||||
benefit of pcre2grep). */
|
||||
|
||||
if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
||||
if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
|
||||
{
|
||||
*parsed_pattern++ = META_KET;
|
||||
*parsed_pattern++ = META_DOLLAR;
|
||||
}
|
||||
else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
||||
else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
|
||||
{
|
||||
*parsed_pattern++ = META_KET;
|
||||
*parsed_pattern++ = META_ESCAPE + ESC_b;
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -1942,7 +1942,7 @@ is available. */
|
|||
extern int _pcre2_auto_possessify(PCRE2_UCHAR *, BOOL,
|
||||
const compile_block *);
|
||||
extern int _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *,
|
||||
int *, uint32_t, BOOL, compile_block *);
|
||||
int *, uint32_t, uint32_t, BOOL, compile_block *);
|
||||
extern PCRE2_SPTR _pcre2_extuni(uint32_t, PCRE2_SPTR, PCRE2_SPTR, PCRE2_SPTR,
|
||||
BOOL, int *);
|
||||
extern PCRE2_SPTR _pcre2_find_bracket(PCRE2_SPTR, BOOL, int);
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -129,7 +129,7 @@ for (; ptr < ptrend; ptr++)
|
|||
|
||||
ptr += 1; /* Must point after \ */
|
||||
erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
|
||||
code->overall_options, FALSE, NULL);
|
||||
code->overall_options, code->extra_options, FALSE, NULL);
|
||||
ptr -= 1; /* Back to last code unit of escape */
|
||||
if (errorcode != 0)
|
||||
{
|
||||
|
@ -774,7 +774,7 @@ do
|
|||
|
||||
ptr++; /* Point after \ */
|
||||
rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
|
||||
code->overall_options, FALSE, NULL);
|
||||
code->overall_options, code->extra_options, FALSE, NULL);
|
||||
if (errorcode != 0) goto BADESCAPE;
|
||||
|
||||
switch(rc)
|
||||
|
|
|
@ -646,6 +646,7 @@ static modstruct modlist[] = {
|
|||
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
||||
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
||||
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
||||
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
|
||||
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
||||
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
||||
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
|
||||
|
@ -4189,10 +4190,11 @@ show_compile_extra_options(uint32_t options, const char *before,
|
|||
const char *after)
|
||||
{
|
||||
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
|
||||
else fprintf(outfile, "%s%s%s%s%s%s%s",
|
||||
else fprintf(outfile, "%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
|
||||
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
|
||||
((options & PCRE2_EXTRA_ALT_BSUX) != 0)? " extra_alt_bsux" : "",
|
||||
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
|
||||
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
|
||||
((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
|
||||
|
|
|
@ -2408,13 +2408,13 @@
|
|||
\= Expect no match
|
||||
cat
|
||||
|
||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||
cat
|
||||
|
||||
/TA]/
|
||||
The ACTA] comes
|
||||
|
||||
/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/TA]/allow_empty_class,match_unset_backref,dupnames
|
||||
The ACTA] comes
|
||||
|
||||
/(?2)[]a()b](abc)/
|
||||
|
@ -2446,25 +2446,25 @@
|
|||
|
||||
/a[^]b/
|
||||
|
||||
/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[]b/allow_empty_class,match_unset_backref,dupnames
|
||||
\= Expect no match
|
||||
ab
|
||||
|
||||
/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[]+b/allow_empty_class,match_unset_backref,dupnames
|
||||
\= Expect no match
|
||||
ab
|
||||
|
||||
/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[]*+b/allow_empty_class,match_unset_backref,dupnames
|
||||
\= Expect no match
|
||||
ab
|
||||
|
||||
/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[^]b/allow_empty_class,match_unset_backref,dupnames
|
||||
aXb
|
||||
a\nb
|
||||
\= Expect no match
|
||||
ab
|
||||
|
||||
/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[^]+b/allow_empty_class,match_unset_backref,dupnames
|
||||
aXb
|
||||
a\nX\nXb
|
||||
\= Expect no match
|
||||
|
@ -2903,10 +2903,10 @@
|
|||
xxxxabcde\=ps
|
||||
xxxxabcde\=ph
|
||||
|
||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||
cat
|
||||
|
||||
/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
|
||||
cat
|
||||
|
||||
/(\3)(\1)(a)/I
|
||||
|
@ -3418,6 +3418,14 @@
|
|||
aU0041z
|
||||
\= Expect no match
|
||||
aAz
|
||||
|
||||
/^\u{7a}/alt_bsux
|
||||
u{7a}
|
||||
\= Expect no match
|
||||
zoo
|
||||
|
||||
/^\u{7a}/extra_alt_bsux
|
||||
zoo
|
||||
|
||||
/(?(?=c)c|d)++Y/B
|
||||
|
||||
|
|
|
@ -333,13 +333,13 @@
|
|||
|
||||
/[[:a\x{100}b:]]/utf
|
||||
|
||||
/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||
/a[^]b/utf,allow_empty_class,match_unset_backref
|
||||
a\x{1234}b
|
||||
a\nb
|
||||
\= Expect no match
|
||||
ab
|
||||
|
||||
/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||
/a[^]+b/utf,allow_empty_class,match_unset_backref
|
||||
aXb
|
||||
a\nX\nX\x{1234}b
|
||||
\= Expect no match
|
||||
|
@ -814,6 +814,9 @@
|
|||
|
||||
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||
|
||||
/^\u{0000000000010ffff}/utf,extra_alt_bsux
|
||||
\x{10ffff}
|
||||
|
||||
/^a+[a\x{200}]/B,utf
|
||||
aa
|
||||
|
||||
|
|
|
@ -8774,7 +8774,7 @@ No match
|
|||
cat
|
||||
No match
|
||||
|
||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||
cat
|
||||
0: a
|
||||
1:
|
||||
|
@ -8785,7 +8785,7 @@ No match
|
|||
The ACTA] comes
|
||||
0: TA]
|
||||
|
||||
/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/TA]/allow_empty_class,match_unset_backref,dupnames
|
||||
The ACTA] comes
|
||||
0: TA]
|
||||
|
||||
|
@ -8833,22 +8833,22 @@ Failed: error 106 at offset 4: missing terminating ] for character class
|
|||
/a[^]b/
|
||||
Failed: error 106 at offset 5: missing terminating ] for character class
|
||||
|
||||
/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[]b/allow_empty_class,match_unset_backref,dupnames
|
||||
\= Expect no match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[]+b/allow_empty_class,match_unset_backref,dupnames
|
||||
\= Expect no match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[]*+b/allow_empty_class,match_unset_backref,dupnames
|
||||
\= Expect no match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[^]b/allow_empty_class,match_unset_backref,dupnames
|
||||
aXb
|
||||
0: aXb
|
||||
a\nb
|
||||
|
@ -8857,7 +8857,7 @@ No match
|
|||
ab
|
||||
No match
|
||||
|
||||
/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/a[^]+b/allow_empty_class,match_unset_backref,dupnames
|
||||
aXb
|
||||
0: aXb
|
||||
a\nX\nXb
|
||||
|
@ -9971,17 +9971,17 @@ Partial match: abca
|
|||
xxxxabcde\=ph
|
||||
Partial match: abcde
|
||||
|
||||
/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
|
||||
cat
|
||||
0: a
|
||||
1:
|
||||
2:
|
||||
3: a
|
||||
|
||||
/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||
/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
|
||||
Capture group count = 3
|
||||
Max back reference = 3
|
||||
Options: alt_bsux allow_empty_class dupnames match_unset_backref
|
||||
Options: allow_empty_class dupnames match_unset_backref
|
||||
Last code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
cat
|
||||
|
@ -11364,6 +11364,17 @@ No match
|
|||
\= Expect no match
|
||||
aAz
|
||||
No match
|
||||
|
||||
/^\u{7a}/alt_bsux
|
||||
u{7a}
|
||||
0: u{7a}
|
||||
\= Expect no match
|
||||
zoo
|
||||
No match
|
||||
|
||||
/^\u{7a}/extra_alt_bsux
|
||||
zoo
|
||||
0: z
|
||||
|
||||
/(?(?=c)c|d)++Y/B
|
||||
------------------------------------------------------------------
|
||||
|
|
|
@ -798,7 +798,7 @@ No match
|
|||
/[[:a\x{100}b:]]/utf
|
||||
Failed: error 130 at offset 3: unknown POSIX class name
|
||||
|
||||
/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||
/a[^]b/utf,allow_empty_class,match_unset_backref
|
||||
a\x{1234}b
|
||||
0: a\x{1234}b
|
||||
a\nb
|
||||
|
@ -807,7 +807,7 @@ Failed: error 130 at offset 3: unknown POSIX class name
|
|||
ab
|
||||
No match
|
||||
|
||||
/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||
/a[^]+b/utf,allow_empty_class,match_unset_backref
|
||||
aXb
|
||||
0: aXb
|
||||
a\nX\nX\x{1234}b
|
||||
|
@ -1734,6 +1734,10 @@ No match
|
|||
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||
Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
|
||||
|
||||
/^\u{0000000000010ffff}/utf,extra_alt_bsux
|
||||
\x{10ffff}
|
||||
0: \x{10ffff}
|
||||
|
||||
/^a+[a\x{200}]/B,utf
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
|
|
Loading…
Reference in New Issue