Add support for \N{U+dd...}, for ASCII and Unicode modes only.
This commit is contained in:
parent
775481293a
commit
e9aa3c0a21
|
@ -130,6 +130,8 @@ present.
|
|||
28. A (*MARK) name was not being passed back for positive assertions that were
|
||||
terminated by (*ACCEPT).
|
||||
|
||||
29. Add support for \N{U+dddd}, but not in EBCDIC environments.
|
||||
|
||||
|
||||
Version 10.31 12-February-2018
|
||||
------------------------------
|
||||
|
|
|
@ -249,10 +249,11 @@ is used.
|
|||
<P>
|
||||
The newline convention affects where the circumflex and dollar assertions are
|
||||
true. It also affects the interpretation of the dot metacharacter when
|
||||
PCRE2_DOTALL is not set, and the behaviour of \N. However, it does not affect
|
||||
what the \R escape sequence matches. By default, this is any Unicode newline
|
||||
sequence, for Perl compatibility. However, this can be changed; see the next
|
||||
section and the description of \R in the section entitled
|
||||
PCRE2_DOTALL is not set, and the behaviour of \N when not followed by an
|
||||
opening brace. However, it does not affect what the \R escape sequence
|
||||
matches. By default, this is any Unicode newline sequence, for Perl
|
||||
compatibility. However, this can be changed; see the next section and the
|
||||
description of \R in the section entitled
|
||||
<a href="#newlineseq">"Newline sequences"</a>
|
||||
below. A change of \R setting can be combined with a change of newline
|
||||
convention.
|
||||
|
@ -394,8 +395,15 @@ these escapes are as follows:
|
|||
\o{ddd..} character with octal code ddd..
|
||||
\xhh character with hex code hh
|
||||
\x{hhh..} character with hex code hhh.. (default mode)
|
||||
\N{U+hhh..} character with Unicode code point hhh..
|
||||
\uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
||||
</pre>
|
||||
Note that when \N is not followed by an opening brace (curly bracket) it has
|
||||
an entirely different meaning, matching any character that is not a newline.
|
||||
Perl also uses \N{name} to specify characters by Unicode name; PCRE2 does not
|
||||
support this.
|
||||
</P>
|
||||
<P>
|
||||
The precise effect of \cx on ASCII characters is as follows: if x is a lower
|
||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||
|
@ -404,14 +412,14 @@ code unit following \c has a value less than 32 or greater than 126, a
|
|||
compile-time error occurs.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
|
||||
generate the appropriate EBCDIC code values. The \c escape is processed
|
||||
as specified for Perl in the <b>perlebcdic</b> document. The only characters
|
||||
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
|
||||
other character provokes a compile-time error. The sequence \c@ encodes
|
||||
character code 0; after \c the letters (in either case) encode characters 1-26
|
||||
(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex
|
||||
1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F).
|
||||
When PCRE2 is compiled in EBCDIC mode, \N{U+hhh..} is not supported. \a, \e,
|
||||
\f, \n, \r, and \t generate the appropriate EBCDIC code values. The \c
|
||||
escape is processed as specified for Perl in the <b>perlebcdic</b> document. The
|
||||
only characters that are allowed after \c are A-Z, a-z, or one of @, [, \, ],
|
||||
^, _, or ?. Any other character provokes a compile-time error. The sequence
|
||||
\c@ encodes character code 0; after \c the letters (in either case) encode
|
||||
characters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31
|
||||
(hex 1B to hex 1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F).
|
||||
</P>
|
||||
<P>
|
||||
Thus, apart from \c?, these escapes generate the same character code values as
|
||||
|
@ -443,9 +451,9 @@ to be unambiguously specified.
|
|||
</P>
|
||||
<P>
|
||||
For greater clarity and unambiguity, it is best to avoid following \ by a
|
||||
digit greater than zero. Instead, use \o{} or \x{} to specify character
|
||||
numbers, and \g{} to specify backreferences. The following paragraphs
|
||||
describe the old, ambiguous syntax.
|
||||
digit greater than zero. Instead, use \o{} or \x{} to specify numerical
|
||||
character code points, and \g{} to specify backreferences. The following
|
||||
paragraphs describe the old, ambiguous syntax.
|
||||
</P>
|
||||
<P>
|
||||
The handling of a backslash followed by a digit other than 0 is complicated,
|
||||
|
@ -528,10 +536,10 @@ and outside character classes. In addition, inside a character class, \b is
|
|||
interpreted as the backspace character (hex 08).
|
||||
</P>
|
||||
<P>
|
||||
\N is not allowed in a character class. \B, \R, and \X are not special
|
||||
inside a character class. Like other unrecognized alphabetic escape sequences,
|
||||
they cause an error. Outside a character class, these sequences have different
|
||||
meanings.
|
||||
When not followed by an opening brace, \N is not allowed in a character class.
|
||||
\B, \R, and \X are not special inside a character class. Like other
|
||||
unrecognized alphabetic escape sequences, they cause an error. Outside a
|
||||
character class, these sequences have different meanings.
|
||||
</P>
|
||||
<br><b>
|
||||
Unsupported escape sequences
|
||||
|
@ -577,6 +585,7 @@ Another use of backslash is for specifying generic character types:
|
|||
\D any character that is not a decimal digit
|
||||
\h any horizontal white space character
|
||||
\H any character that is not a horizontal white space character
|
||||
\N any character that is not a newline
|
||||
\s any white space character
|
||||
\S any character that is not a white space character
|
||||
\v any vertical white space character
|
||||
|
@ -584,11 +593,14 @@ Another use of backslash is for specifying generic character types:
|
|||
\w any "word" character
|
||||
\W any "non-word" character
|
||||
</pre>
|
||||
There is also the single sequence \N, which matches a non-newline character.
|
||||
This is the same as
|
||||
The \N escape sequence has the same meaning as
|
||||
<a href="#fullstopdot">the "." metacharacter</a>
|
||||
when PCRE2_DOTALL is not set. Perl also uses \N to match characters by name;
|
||||
PCRE2 does not support this.
|
||||
when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the
|
||||
meaning of \N. Note that when \N is followed by an opening brace it has a
|
||||
different meaning. See the section entitled
|
||||
<a href="#digitsafterbackslash">"Non-printing characters"</a>
|
||||
above for details. Perl also uses \N{name} to specify characters by Unicode
|
||||
name; PCRE2 does not support this.
|
||||
</P>
|
||||
<P>
|
||||
Each pair of lower and upper case escape sequences partitions the complete set
|
||||
|
@ -1297,9 +1309,15 @@ dollar, the only relationship being that they both involve newlines. Dot has no
|
|||
special meaning in a character class.
|
||||
</P>
|
||||
<P>
|
||||
The escape sequence \N behaves like a dot, except that it is not affected by
|
||||
the PCRE2_DOTALL option. In other words, it matches any character except one
|
||||
that signifies the end of a line. Perl also uses \N to match characters by
|
||||
The escape sequence \N when not followed by an opening brace behaves like a
|
||||
dot, except that it is not affected by the PCRE2_DOTALL option. In other words,
|
||||
it matches any character except one that signifies the end of a line.
|
||||
</P>
|
||||
<P>
|
||||
When \N is followed by an opening brace it has a different meaning. See the
|
||||
section entitled
|
||||
<a href="digitsafterbackslash">"Non-printing characters"</a>
|
||||
above for details. Perl also uses \N{name} to specify characters by Unicode
|
||||
name; PCRE2 does not support this.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">MATCHING A SINGLE CODE UNIT</a><br>
|
||||
|
@ -1385,10 +1403,11 @@ string, and therefore it fails if the current pointer is at the end of the
|
|||
string.
|
||||
</P>
|
||||
<P>
|
||||
When caseless matching is set, any letters in a class represent both their
|
||||
upper case and lower case versions, so for example, a caseless [aeiou] matches
|
||||
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
|
||||
caseful version would.
|
||||
Characters in a class may be specified by their code points using \o, \x, or
|
||||
\N{U+hh..} in the usual way. When caseless matching is set, any letters in a
|
||||
class represent both their upper case and lower case versions, so for example,
|
||||
a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
|
||||
match "A", whereas a caseful version would.
|
||||
</P>
|
||||
<P>
|
||||
Characters that might indicate line breaks are never treated in any special way
|
||||
|
@ -1397,17 +1416,18 @@ whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
|
|||
class such as [^a] always matches one of these characters.
|
||||
</P>
|
||||
<P>
|
||||
The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
|
||||
\V, \w, and \W may appear in a character class, and add the characters that
|
||||
they match to the class. For example, [\dABCDEF] matches any hexadecimal
|
||||
digit. In UTF modes, the PCRE2_UCP option affects the meanings of \d, \s, \w
|
||||
and their upper case partners, just as it does when they appear outside a
|
||||
character class, as described in the section entitled
|
||||
The generic character type escape sequences \d, \D, \h, \H, \p, \P, \s,
|
||||
\S, \v, \V, \w, and \W may appear in a character class, and add the
|
||||
characters that they match to the class. For example, [\dABCDEF] matches any
|
||||
hexadecimal digit. In UTF modes, the PCRE2_UCP option affects the meanings of
|
||||
\d, \s, \w and their upper case partners, just as it does when they appear
|
||||
outside a character class, as described in the section entitled
|
||||
<a href="#genericchartypes">"Generic character types"</a>
|
||||
above. The escape sequence \b has a different meaning inside a character
|
||||
class; it matches the backspace character. The sequences \B, \N, \R, and \X
|
||||
are not special inside a character class. Like any other unrecognized escape
|
||||
sequences, they cause an error.
|
||||
class; it matches the backspace character. The sequences \B, \R, and \X are
|
||||
not special inside a character class. Like any other unrecognized escape
|
||||
sequences, they cause an error. The same is true for \N when not followed by
|
||||
an opening brace.
|
||||
</P>
|
||||
<P>
|
||||
The minus (hyphen) character can be used to specify a range of characters in a
|
||||
|
@ -3559,7 +3579,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 20 July 2018
|
||||
Last updated: 27 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -70,9 +70,10 @@ This table applies to ASCII and Unicode environments.
|
|||
\ddd character with octal code ddd, or backreference
|
||||
\o{ddd..} character with octal code ddd..
|
||||
\U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
||||
\N{U+hh..} character with Unicode code point hh..
|
||||
\uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
||||
\xhh character with hex code hh
|
||||
\x{hhh..} character with hex code hhh..
|
||||
\x{hh..} character with hex code hh..
|
||||
</pre>
|
||||
Note that \0dd is always an octal code. The treatment of backslash followed by
|
||||
a non-zero digit is complicated; for details see the section
|
||||
|
@ -80,7 +81,9 @@ a non-zero digit is complicated; for details see the section
|
|||
in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation, where details of escape processing in EBCDIC environments are
|
||||
also given.
|
||||
also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
|
||||
supported in EBCDIC environments. Note that \N not followed by an opening
|
||||
curly bracket has a different meaning (see below).
|
||||
</P>
|
||||
<P>
|
||||
When \x is not followed by {, from zero to two hexadecimal digits are read,
|
||||
|
@ -621,7 +624,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 July 2018
|
||||
Last updated: 27 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
118
doc/pcre2.txt
118
doc/pcre2.txt
|
@ -6015,12 +6015,13 @@ SPECIAL START-OF-PATTERN ITEMS
|
|||
|
||||
The newline convention affects where the circumflex and dollar asser-
|
||||
tions are true. It also affects the interpretation of the dot metachar-
|
||||
acter when PCRE2_DOTALL is not set, and the behaviour of \N. However,
|
||||
it does not affect what the \R escape sequence matches. By default,
|
||||
this is any Unicode newline sequence, for Perl compatibility. However,
|
||||
this can be changed; see the next section and the description of \R in
|
||||
the section entitled "Newline sequences" below. A change of \R setting
|
||||
can be combined with a change of newline convention.
|
||||
acter when PCRE2_DOTALL is not set, and the behaviour of \N when not
|
||||
followed by an opening brace. However, it does not affect what the \R
|
||||
escape sequence matches. By default, this is any Unicode newline
|
||||
sequence, for Perl compatibility. However, this can be changed; see the
|
||||
next section and the description of \R in the section entitled "Newline
|
||||
sequences" below. A change of \R setting can be combined with a change
|
||||
of newline convention.
|
||||
|
||||
Specifying what \R matches
|
||||
|
||||
|
@ -6158,8 +6159,14 @@ BACKSLASH
|
|||
\o{ddd..} character with octal code ddd..
|
||||
\xhh character with hex code hh
|
||||
\x{hhh..} character with hex code hhh.. (default mode)
|
||||
\N{U+hhh..} character with Unicode code point hhh..
|
||||
\uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
||||
|
||||
Note that when \N is not followed by an opening brace (curly bracket)
|
||||
it has an entirely different meaning, matching any character that is
|
||||
not a newline. Perl also uses \N{name} to specify characters by Uni-
|
||||
code name; PCRE2 does not support this.
|
||||
|
||||
The precise effect of \cx on ASCII characters is as follows: if x is a
|
||||
lower case letter, it is converted to upper case. Then bit 6 of the
|
||||
character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
|
||||
|
@ -6167,15 +6174,15 @@ BACKSLASH
|
|||
hex 7B (; is 3B). If the code unit following \c has a value less than
|
||||
32 or greater than 126, a compile-time error occurs.
|
||||
|
||||
When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t gen-
|
||||
erate the appropriate EBCDIC code values. The \c escape is processed as
|
||||
specified for Perl in the perlebcdic document. The only characters that
|
||||
are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?.
|
||||
Any other character provokes a compile-time error. The sequence \c@
|
||||
encodes character code 0; after \c the letters (in either case) encode
|
||||
characters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters
|
||||
27-31 (hex 1B to hex 1F), and \c? becomes either 255 (hex FF) or 95
|
||||
(hex 5F).
|
||||
When PCRE2 is compiled in EBCDIC mode, \N{U+hhh..} is not supported.
|
||||
\a, \e, \f, \n, \r, and \t generate the appropriate EBCDIC code values.
|
||||
The \c escape is processed as specified for Perl in the perlebcdic doc-
|
||||
ument. The only characters that are allowed after \c are A-Z, a-z, or
|
||||
one of @, [, \, ], ^, _, or ?. Any other character provokes a compile-
|
||||
time error. The sequence \c@ encodes character code 0; after \c the
|
||||
letters (in either case) encode characters 1-26 (hex 01 to hex 1A); [,
|
||||
\, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and \c?
|
||||
becomes either 255 (hex FF) or 95 (hex 5F).
|
||||
|
||||
Thus, apart from \c?, these escapes generate the same character code
|
||||
values as they do in an ASCII environment, though the meanings of the
|
||||
|
@ -6203,9 +6210,9 @@ BACKSLASH
|
|||
numbers and backreferences to be unambiguously specified.
|
||||
|
||||
For greater clarity and unambiguity, it is best to avoid following \ by
|
||||
a digit greater than zero. Instead, use \o{} or \x{} to specify charac-
|
||||
ter numbers, and \g{} to specify backreferences. The following para-
|
||||
graphs describe the old, ambiguous syntax.
|
||||
a digit greater than zero. Instead, use \o{} or \x{} to specify numeri-
|
||||
cal character code points, and \g{} to specify backreferences. The fol-
|
||||
lowing paragraphs describe the old, ambiguous syntax.
|
||||
|
||||
The handling of a backslash followed by a digit other than 0 is compli-
|
||||
cated, and Perl has changed over time, causing PCRE2 also to change.
|
||||
|
@ -6281,10 +6288,10 @@ BACKSLASH
|
|||
inside and outside character classes. In addition, inside a character
|
||||
class, \b is interpreted as the backspace character (hex 08).
|
||||
|
||||
\N is not allowed in a character class. \B, \R, and \X are not special
|
||||
inside a character class. Like other unrecognized alphabetic escape
|
||||
sequences, they cause an error. Outside a character class, these
|
||||
sequences have different meanings.
|
||||
When not followed by an opening brace, \N is not allowed in a character
|
||||
class. \B, \R, and \X are not special inside a character class. Like
|
||||
other unrecognized alphabetic escape sequences, they cause an error.
|
||||
Outside a character class, these sequences have different meanings.
|
||||
|
||||
Unsupported escape sequences
|
||||
|
||||
|
@ -6318,6 +6325,7 @@ BACKSLASH
|
|||
\D any character that is not a decimal digit
|
||||
\h any horizontal white space character
|
||||
\H any character that is not a horizontal white space character
|
||||
\N any character that is not a newline
|
||||
\s any white space character
|
||||
\S any character that is not a white space character
|
||||
\v any vertical white space character
|
||||
|
@ -6325,10 +6333,12 @@ BACKSLASH
|
|||
\w any "word" character
|
||||
\W any "non-word" character
|
||||
|
||||
There is also the single sequence \N, which matches a non-newline char-
|
||||
acter. This is the same as the "." metacharacter when PCRE2_DOTALL is
|
||||
not set. Perl also uses \N to match characters by name; PCRE2 does not
|
||||
support this.
|
||||
The \N escape sequence has the same meaning as the "." metacharacter
|
||||
when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change
|
||||
the meaning of \N. Note that when \N is followed by an opening brace it
|
||||
has a different meaning. See the section entitled "Non-printing charac-
|
||||
ters" above for details. Perl also uses \N{name} to specify characters
|
||||
by Unicode name; PCRE2 does not support this.
|
||||
|
||||
Each pair of lower and upper case escape sequences partitions the com-
|
||||
plete set of characters into two disjoint sets. Any given character
|
||||
|
@ -6867,10 +6877,15 @@ FULL STOP (PERIOD, DOT) AND \N
|
|||
flex and dollar, the only relationship being that they both involve
|
||||
newlines. Dot has no special meaning in a character class.
|
||||
|
||||
The escape sequence \N behaves like a dot, except that it is not
|
||||
affected by the PCRE2_DOTALL option. In other words, it matches any
|
||||
character except one that signifies the end of a line. Perl also uses
|
||||
\N to match characters by name; PCRE2 does not support this.
|
||||
The escape sequence \N when not followed by an opening brace behaves
|
||||
like a dot, except that it is not affected by the PCRE2_DOTALL option.
|
||||
In other words, it matches any character except one that signifies the
|
||||
end of a line.
|
||||
|
||||
When \N is followed by an opening brace it has a different meaning. See
|
||||
the section entitled "Non-printing characters" above for details. Perl
|
||||
also uses \N{name} to specify characters by Unicode name; PCRE2 does
|
||||
not support this.
|
||||
|
||||
|
||||
MATCHING A SINGLE CODE UNIT
|
||||
|
@ -6951,10 +6966,12 @@ SQUARE BRACKETS AND CHARACTER CLASSES
|
|||
sumes a character from the subject string, and therefore it fails if
|
||||
the current pointer is at the end of the string.
|
||||
|
||||
When caseless matching is set, any letters in a class represent both
|
||||
their upper case and lower case versions, so for example, a caseless
|
||||
[aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
|
||||
match "A", whereas a caseful version would.
|
||||
Characters in a class may be specified by their code points using \o,
|
||||
\x, or \N{U+hh..} in the usual way. When caseless matching is set, any
|
||||
letters in a class represent both their upper case and lower case ver-
|
||||
sions, so for example, a caseless [aeiou] matches "A" as well as "a",
|
||||
and a caseless [^aeiou] does not match "A", whereas a caseful version
|
||||
would.
|
||||
|
||||
Characters that might indicate line breaks are never treated in any
|
||||
special way when matching character classes, whatever line-ending
|
||||
|
@ -6962,17 +6979,18 @@ SQUARE BRACKETS AND CHARACTER CLASSES
|
|||
PCRE2_MULTILINE options is used. A class such as [^a] always matches
|
||||
one of these characters.
|
||||
|
||||
The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V,
|
||||
\w, and \W may appear in a character class, and add the characters that
|
||||
they match to the class. For example, [\dABCDEF] matches any hexadeci-
|
||||
mal digit. In UTF modes, the PCRE2_UCP option affects the meanings of
|
||||
\d, \s, \w and their upper case partners, just as it does when they
|
||||
appear outside a character class, as described in the section entitled
|
||||
"Generic character types" above. The escape sequence \b has a different
|
||||
meaning inside a character class; it matches the backspace character.
|
||||
The sequences \B, \N, \R, and \X are not special inside a character
|
||||
class. Like any other unrecognized escape sequences, they cause an
|
||||
error.
|
||||
The generic character type escape sequences \d, \D, \h, \H, \p, \P, \s,
|
||||
\S, \v, \V, \w, and \W may appear in a character class, and add the
|
||||
characters that they match to the class. For example, [\dABCDEF]
|
||||
matches any hexadecimal digit. In UTF modes, the PCRE2_UCP option
|
||||
affects the meanings of \d, \s, \w and their upper case partners, just
|
||||
as it does when they appear outside a character class, as described in
|
||||
the section entitled "Generic character types" above. The escape
|
||||
sequence \b has a different meaning inside a character class; it
|
||||
matches the backspace character. The sequences \B, \R, and \X are not
|
||||
special inside a character class. Like any other unrecognized escape
|
||||
sequences, they cause an error. The same is true for \N when not fol-
|
||||
lowed by an opening brace.
|
||||
|
||||
The minus (hyphen) character can be used to specify a range of charac-
|
||||
ters in a character class. For example, [d-m] matches any letter
|
||||
|
@ -9012,7 +9030,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 20 July 2018
|
||||
Last updated: 27 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -9873,14 +9891,18 @@ ESCAPED CHARACTERS
|
|||
\ddd character with octal code ddd, or backreference
|
||||
\o{ddd..} character with octal code ddd..
|
||||
\U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
||||
\N{U+hh..} character with Unicode code point hh..
|
||||
\uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
||||
\xhh character with hex code hh
|
||||
\x{hhh..} character with hex code hhh..
|
||||
\x{hh..} character with hex code hh..
|
||||
|
||||
Note that \0dd is always an octal code. The treatment of backslash fol-
|
||||
lowed by a non-zero digit is complicated; for details see the section
|
||||
"Non-printing characters" in the pcre2pattern documentation, where
|
||||
details of escape processing in EBCDIC environments are also given.
|
||||
\N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not supported in
|
||||
EBCDIC environments. Note that \N not followed by an opening curly
|
||||
bracket has a different meaning (see below).
|
||||
|
||||
When \x is not followed by {, from zero to two hexadecimal digits are
|
||||
read, but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadec-
|
||||
|
@ -10289,7 +10311,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 21 July 2018
|
||||
Last updated: 27 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "02 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2API 3 "27 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -1400,7 +1400,8 @@ character, even if newlines are coded as CRLF. Without this option, a dot does
|
|||
not match when the current position in the subject is at a newline. This option
|
||||
is equivalent to Perl's /s option, and it can be changed within a pattern by a
|
||||
(?s) option setting. A negative class such as [^a] always matches newline
|
||||
characters, independent of the setting of this option.
|
||||
characters, and the \eN escape sequence always matches a non-newline character,
|
||||
independent of the setting of PCRE2_DOTALL.
|
||||
.sp
|
||||
PCRE2_DUPNAMES
|
||||
.sp
|
||||
|
@ -3640,6 +3641,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 02 July 2018
|
||||
Last updated: 27 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "20 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2PATTERN 3 "27 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -218,10 +218,11 @@ is used.
|
|||
.P
|
||||
The newline convention affects where the circumflex and dollar assertions are
|
||||
true. It also affects the interpretation of the dot metacharacter when
|
||||
PCRE2_DOTALL is not set, and the behaviour of \eN. However, it does not affect
|
||||
what the \eR escape sequence matches. By default, this is any Unicode newline
|
||||
sequence, for Perl compatibility. However, this can be changed; see the next
|
||||
section and the description of \eR in the section entitled
|
||||
PCRE2_DOTALL is not set, and the behaviour of \eN when not followed by an
|
||||
opening brace. However, it does not affect what the \eR escape sequence
|
||||
matches. By default, this is any Unicode newline sequence, for Perl
|
||||
compatibility. However, this can be changed; see the next section and the
|
||||
description of \eR in the section entitled
|
||||
.\" HTML <a href="#newlineseq">
|
||||
.\" </a>
|
||||
"Newline sequences"
|
||||
|
@ -371,8 +372,14 @@ these escapes are as follows:
|
|||
\eo{ddd..} character with octal code ddd..
|
||||
\exhh character with hex code hh
|
||||
\ex{hhh..} character with hex code hhh.. (default mode)
|
||||
\eN{U+hhh..} character with Unicode code point hhh..
|
||||
\euhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
|
||||
.sp
|
||||
Note that when \eN is not followed by an opening brace (curly bracket) it has
|
||||
an entirely different meaning, matching any character that is not a newline.
|
||||
Perl also uses \eN{name} to specify characters by Unicode name; PCRE2 does not
|
||||
support this.
|
||||
.P
|
||||
The precise effect of \ecx on ASCII characters is as follows: if x is a lower
|
||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||
40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||
|
@ -380,14 +387,14 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
|
|||
code unit following \ec has a value less than 32 or greater than 126, a
|
||||
compile-time error occurs.
|
||||
.P
|
||||
When PCRE2 is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
|
||||
generate the appropriate EBCDIC code values. The \ec escape is processed
|
||||
as specified for Perl in the \fBperlebcdic\fP document. The only characters
|
||||
that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
|
||||
other character provokes a compile-time error. The sequence \ec@ encodes
|
||||
character code 0; after \ec the letters (in either case) encode characters 1-26
|
||||
(hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex
|
||||
1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F).
|
||||
When PCRE2 is compiled in EBCDIC mode, \eN{U+hhh..} is not supported. \ea, \ee,
|
||||
\ef, \en, \er, and \et generate the appropriate EBCDIC code values. The \ec
|
||||
escape is processed as specified for Perl in the \fBperlebcdic\fP document. The
|
||||
only characters that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ],
|
||||
^, _, or ?. Any other character provokes a compile-time error. The sequence
|
||||
\ec@ encodes character code 0; after \ec the letters (in either case) encode
|
||||
characters 1-26 (hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31
|
||||
(hex 1B to hex 1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F).
|
||||
.P
|
||||
Thus, apart from \ec?, these escapes generate the same character code values as
|
||||
they do in an ASCII environment, though the meanings of the values mostly
|
||||
|
@ -414,9 +421,9 @@ numbers greater than 0777, and it also allows octal numbers and backreferences
|
|||
to be unambiguously specified.
|
||||
.P
|
||||
For greater clarity and unambiguity, it is best to avoid following \e by a
|
||||
digit greater than zero. Instead, use \eo{} or \ex{} to specify character
|
||||
numbers, and \eg{} to specify backreferences. The following paragraphs
|
||||
describe the old, ambiguous syntax.
|
||||
digit greater than zero. Instead, use \eo{} or \ex{} to specify numerical
|
||||
character code points, and \eg{} to specify backreferences. The following
|
||||
paragraphs describe the old, ambiguous syntax.
|
||||
.P
|
||||
The handling of a backslash followed by a digit other than 0 is complicated,
|
||||
and Perl has changed over time, causing PCRE2 also to change.
|
||||
|
@ -507,10 +514,10 @@ All the sequences that define a single character value can be used both inside
|
|||
and outside character classes. In addition, inside a character class, \eb is
|
||||
interpreted as the backspace character (hex 08).
|
||||
.P
|
||||
\eN is not allowed in a character class. \eB, \eR, and \eX are not special
|
||||
inside a character class. Like other unrecognized alphabetic escape sequences,
|
||||
they cause an error. Outside a character class, these sequences have different
|
||||
meanings.
|
||||
When not followed by an opening brace, \eN is not allowed in a character class.
|
||||
\eB, \eR, and \eX are not special inside a character class. Like other
|
||||
unrecognized alphabetic escape sequences, they cause an error. Outside a
|
||||
character class, these sequences have different meanings.
|
||||
.
|
||||
.
|
||||
.SS "Unsupported escape sequences"
|
||||
|
@ -569,6 +576,7 @@ Another use of backslash is for specifying generic character types:
|
|||
\eD any character that is not a decimal digit
|
||||
\eh any horizontal white space character
|
||||
\eH any character that is not a horizontal white space character
|
||||
\eN any character that is not a newline
|
||||
\es any white space character
|
||||
\eS any character that is not a white space character
|
||||
\ev any vertical white space character
|
||||
|
@ -576,14 +584,20 @@ Another use of backslash is for specifying generic character types:
|
|||
\ew any "word" character
|
||||
\eW any "non-word" character
|
||||
.sp
|
||||
There is also the single sequence \eN, which matches a non-newline character.
|
||||
This is the same as
|
||||
The \eN escape sequence has the same meaning as
|
||||
.\" HTML <a href="#fullstopdot">
|
||||
.\" </a>
|
||||
the "." metacharacter
|
||||
.\"
|
||||
when PCRE2_DOTALL is not set. Perl also uses \eN to match characters by name;
|
||||
PCRE2 does not support this.
|
||||
when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the
|
||||
meaning of \eN. Note that when \eN is followed by an opening brace it has a
|
||||
different meaning. See the section entitled
|
||||
.\" HTML <a href="#digitsafterbackslash">
|
||||
.\" </a>
|
||||
"Non-printing characters"
|
||||
.\"
|
||||
above for details. Perl also uses \eN{name} to specify characters by Unicode
|
||||
name; PCRE2 does not support this.
|
||||
.P
|
||||
Each pair of lower and upper case escape sequences partitions the complete set
|
||||
of characters into two disjoint sets. Any given character matches one, and only
|
||||
|
@ -1289,9 +1303,17 @@ The handling of dot is entirely independent of the handling of circumflex and
|
|||
dollar, the only relationship being that they both involve newlines. Dot has no
|
||||
special meaning in a character class.
|
||||
.P
|
||||
The escape sequence \eN behaves like a dot, except that it is not affected by
|
||||
the PCRE2_DOTALL option. In other words, it matches any character except one
|
||||
that signifies the end of a line. Perl also uses \eN to match characters by
|
||||
The escape sequence \eN when not followed by an opening brace behaves like a
|
||||
dot, except that it is not affected by the PCRE2_DOTALL option. In other words,
|
||||
it matches any character except one that signifies the end of a line.
|
||||
.P
|
||||
When \eN is followed by an opening brace it has a different meaning. See the
|
||||
section entitled
|
||||
.\" HTML <a href="digitsafterbackslash">
|
||||
.\" </a>
|
||||
"Non-printing characters"
|
||||
.\"
|
||||
above for details. Perl also uses \eN{name} to specify characters by Unicode
|
||||
name; PCRE2 does not support this.
|
||||
.
|
||||
.
|
||||
|
@ -1380,30 +1402,32 @@ circumflex is not an assertion; it still consumes a character from the subject
|
|||
string, and therefore it fails if the current pointer is at the end of the
|
||||
string.
|
||||
.P
|
||||
When caseless matching is set, any letters in a class represent both their
|
||||
upper case and lower case versions, so for example, a caseless [aeiou] matches
|
||||
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
|
||||
caseful version would.
|
||||
Characters in a class may be specified by their code points using \eo, \ex, or
|
||||
\eN{U+hh..} in the usual way. When caseless matching is set, any letters in a
|
||||
class represent both their upper case and lower case versions, so for example,
|
||||
a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
|
||||
match "A", whereas a caseful version would.
|
||||
.P
|
||||
Characters that might indicate line breaks are never treated in any special way
|
||||
when matching character classes, whatever line-ending sequence is in use, and
|
||||
whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
|
||||
class such as [^a] always matches one of these characters.
|
||||
.P
|
||||
The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
|
||||
\eV, \ew, and \eW may appear in a character class, and add the characters that
|
||||
they match to the class. For example, [\edABCDEF] matches any hexadecimal
|
||||
digit. In UTF modes, the PCRE2_UCP option affects the meanings of \ed, \es, \ew
|
||||
and their upper case partners, just as it does when they appear outside a
|
||||
character class, as described in the section entitled
|
||||
The generic character type escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es,
|
||||
\eS, \ev, \eV, \ew, and \eW may appear in a character class, and add the
|
||||
characters that they match to the class. For example, [\edABCDEF] matches any
|
||||
hexadecimal digit. In UTF modes, the PCRE2_UCP option affects the meanings of
|
||||
\ed, \es, \ew and their upper case partners, just as it does when they appear
|
||||
outside a character class, as described in the section entitled
|
||||
.\" HTML <a href="#genericchartypes">
|
||||
.\" </a>
|
||||
"Generic character types"
|
||||
.\"
|
||||
above. The escape sequence \eb has a different meaning inside a character
|
||||
class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
|
||||
are not special inside a character class. Like any other unrecognized escape
|
||||
sequences, they cause an error.
|
||||
class; it matches the backspace character. The sequences \eB, \eR, and \eX are
|
||||
not special inside a character class. Like any other unrecognized escape
|
||||
sequences, they cause an error. The same is true for \eN when not followed by
|
||||
an opening brace.
|
||||
.P
|
||||
The minus (hyphen) character can be used to specify a range of characters in a
|
||||
character class. For example, [d-m] matches any letter between d and m,
|
||||
|
@ -3580,6 +3604,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 20 July 2018
|
||||
Last updated: 27 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "21 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2SYNTAX 3 "27 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -35,9 +35,10 @@ This table applies to ASCII and Unicode environments.
|
|||
\eddd character with octal code ddd, or backreference
|
||||
\eo{ddd..} character with octal code ddd..
|
||||
\eU "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
||||
\eN{U+hh..} character with Unicode code point hh..
|
||||
\euhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
||||
\exhh character with hex code hh
|
||||
\ex{hhh..} character with hex code hhh..
|
||||
\ex{hh..} character with hex code hh..
|
||||
.sp
|
||||
Note that \e0dd is always an octal code. The treatment of backslash followed by
|
||||
a non-zero digit is complicated; for details see the section
|
||||
|
@ -50,7 +51,9 @@ in the
|
|||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
documentation, where details of escape processing in EBCDIC environments are
|
||||
also given.
|
||||
also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not
|
||||
supported in EBCDIC environments. Note that \eN not followed by an opening
|
||||
curly bracket has a different meaning (see below).
|
||||
.P
|
||||
When \ex is not followed by {, from zero to two hexadecimal digits are read,
|
||||
but if PCRE2_ALT_BSUX is set, \ex must be followed by two hexadecimal digits to
|
||||
|
@ -609,6 +612,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 July 2018
|
||||
Last updated: 27 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -316,6 +316,7 @@ pcre2_pattern_convert(). */
|
|||
#define PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP 190
|
||||
#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191
|
||||
#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192
|
||||
#define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC 193
|
||||
|
||||
|
||||
/* "Expected" matching error codes: no match and partial match. */
|
||||
|
|
|
@ -731,7 +731,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
|
|||
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
|
||||
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
|
||||
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
|
||||
ERR91, ERR92};
|
||||
ERR91, ERR92, ERR93 };
|
||||
|
||||
/* This is a table of start-of-pattern options such as (*UTF) and settings such
|
||||
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
|
||||
|
@ -1441,6 +1441,42 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
|
|||
escape = -i; /* Else return a special escape */
|
||||
if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X))
|
||||
cb->external_flags |= PCRE2_HASBKPORX; /* Note \P, \p, or \X */
|
||||
|
||||
/* Perl supports \N{name} for character names and \N{U+dddd} for numerical
|
||||
Unicode code points, as well as plain \N for "not newline". PCRE does not
|
||||
support \N{name}. However, it does support quantification such as \N{2,3},
|
||||
so if \N{ is not followed by U+dddd we check for a quantifier. */
|
||||
|
||||
if (escape == ESC_N && ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
|
||||
{
|
||||
PCRE2_SPTR p = ptr + 1;
|
||||
|
||||
/* \N{U+ can be handled by the \x{ code. However, this construction is
|
||||
not valid in EBCDIC environments because it specifies a Unicode
|
||||
character, not a codepoint in the local code. For example \N{U+0041}
|
||||
must be "A" in all environments. */
|
||||
|
||||
if (ptrend - p > 1 && *p == CHAR_U && p[1] == CHAR_PLUS)
|
||||
{
|
||||
#ifdef EBCDIC
|
||||
*errorcodeptr = ERR93;
|
||||
#else
|
||||
ptr = p + 1;
|
||||
escape = 0; /* Not a fancy escape after all */
|
||||
goto COME_FROM_NU;
|
||||
#endif
|
||||
}
|
||||
|
||||
/* Give an error if what follows is not a quantifier, but don't override
|
||||
an error set by the quantifier reader (e.g. number overflow). */
|
||||
|
||||
else
|
||||
{
|
||||
if (!read_repeat_counts(&p, ptrend, NULL, NULL, errorcodeptr) &&
|
||||
*errorcodeptr == 0)
|
||||
*errorcodeptr = ERR37;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1725,6 +1761,9 @@ else
|
|||
{
|
||||
if (ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
|
||||
{
|
||||
#ifndef EBCDIC
|
||||
COME_FROM_NU:
|
||||
#endif
|
||||
if (++ptr >= ptrend || *ptr == CHAR_RIGHT_CURLY_BRACKET)
|
||||
{
|
||||
*errorcodeptr = ERR78;
|
||||
|
@ -1858,19 +1897,6 @@ else
|
|||
}
|
||||
}
|
||||
|
||||
/* Perl supports \N{name} for character names, as well as plain \N for "not
|
||||
newline". PCRE does not support \N{name}. However, it does support
|
||||
quantification such as \N{2,3}. */
|
||||
|
||||
if (escape == ESC_N && ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET &&
|
||||
ptrend - ptr > 2)
|
||||
{
|
||||
PCRE2_SPTR p = ptr + 1;
|
||||
if (!read_repeat_counts(&p, ptrend, NULL, NULL, errorcodeptr) &&
|
||||
*errorcodeptr == 0)
|
||||
*errorcodeptr = ERR37;
|
||||
}
|
||||
|
||||
/* Set the pointer to the next character before returning. */
|
||||
|
||||
*ptrptr = ptr;
|
||||
|
@ -3223,7 +3249,6 @@ while (ptr < ptrend)
|
|||
tempptr = ptr;
|
||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
|
||||
options, TRUE, cb);
|
||||
|
||||
if (errorcode != 0)
|
||||
{
|
||||
CLASS_ESCAPE_FAILED:
|
||||
|
|
|
@ -161,7 +161,7 @@ static const unsigned char compile_error_texts[] =
|
|||
"using UCP is disabled by the application\0"
|
||||
"name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
|
||||
"character code point value in \\u.... sequence is too large\0"
|
||||
"digits missing in \\x{} or \\o{}\0"
|
||||
"digits missing in \\x{} or \\o{} or \\N{U+}\0"
|
||||
"syntax error or number too big in (?(VERSION condition\0"
|
||||
/* 80 */
|
||||
"internal error: unknown opcode in auto_possessify()\0"
|
||||
|
@ -179,6 +179,7 @@ static const unsigned char compile_error_texts[] =
|
|||
"internal error: bad code value in parsed_skip()\0"
|
||||
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
|
||||
"invalid option bits with PCRE2_LITERAL\0"
|
||||
"\\N{U+dddd} is not supported in EBCDIC mode\0"
|
||||
;
|
||||
|
||||
/* Match-time and UTF error texts are in the same format. */
|
||||
|
|
|
@ -2288,4 +2288,10 @@
|
|||
\= Expect no match
|
||||
\x{123}\x{124}\x{123}
|
||||
|
||||
/\N{U+1234}/utf
|
||||
\x{1234}
|
||||
|
||||
/[\N{U+1234}]/utf
|
||||
\x{1234}
|
||||
|
||||
# End of testinput4
|
||||
|
|
|
@ -2087,4 +2087,8 @@
|
|||
\x{655}
|
||||
\x{1D1AA}
|
||||
|
||||
/\N{U+}/
|
||||
|
||||
/\N{U}/
|
||||
|
||||
# End of testinput5
|
||||
|
|
|
@ -13194,7 +13194,7 @@ Failed: error 167 at offset 5: non-hex character in \x{} (closing brace missing?
|
|||
Failed: error 167 at offset 7: non-hex character in \x{} (closing brace missing?)
|
||||
|
||||
/^A\x{/
|
||||
Failed: error 178 at offset 5: digits missing in \x{} or \o{}
|
||||
Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+}
|
||||
|
||||
/[ab]++/B,no_auto_possess
|
||||
------------------------------------------------------------------
|
||||
|
@ -13408,7 +13408,7 @@ Failed: error 133 at offset 7: parentheses are too deeply nested (stack check)
|
|||
Failed: error 155 at offset 2: missing opening brace after \o
|
||||
|
||||
/\o{}/
|
||||
Failed: error 178 at offset 3: digits missing in \x{} or \o{}
|
||||
Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+}
|
||||
|
||||
/\o{whatever}/
|
||||
Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missing?)
|
||||
|
@ -13416,7 +13416,7 @@ Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missin
|
|||
/\xthing/
|
||||
|
||||
/\x{}/
|
||||
Failed: error 178 at offset 3: digits missing in \x{} or \o{}
|
||||
Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+}
|
||||
|
||||
/\x{whatever}/
|
||||
Failed: error 167 at offset 3: non-hex character in \x{} (closing brace missing?)
|
||||
|
|
|
@ -3704,4 +3704,12 @@ No match
|
|||
\x{123}\x{124}\x{123}
|
||||
No match
|
||||
|
||||
/\N{U+1234}/utf
|
||||
\x{1234}
|
||||
0: \x{1234}
|
||||
|
||||
/[\N{U+1234}]/utf
|
||||
\x{1234}
|
||||
0: \x{1234}
|
||||
|
||||
# End of testinput4
|
||||
|
|
|
@ -4750,4 +4750,10 @@ No match
|
|||
\x{1D1AA}
|
||||
0: \x{1d1aa}
|
||||
|
||||
/\N{U+}/
|
||||
Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+}
|
||||
|
||||
/\N{U}/
|
||||
Failed: error 137 at offset 2: PCRE does not support \L, \l, \N{name}, \U, or \u
|
||||
|
||||
# End of testinput5
|
||||
|
|
|
@ -1,3 +1,4 @@
|
|||
PCRE2 version 10.32-RC1 2018-02-19
|
||||
# This is a specialized test for checking, when PCRE2 is compiled with the
|
||||
# EBCDIC option but in an ASCII environment, that newline, white space, and \c
|
||||
# functionality is working. It catches cases where explicit values such as 0x0a
|
||||
|
@ -200,6 +201,6 @@ No match
|
|||
0: \xff
|
||||
|
||||
/\ƒ&/
|
||||
Failed: error 168 at offset 2: \c\x20must\x20be\x20followed\x20by\x20a\x20letter\x20or\x20one\x20of\x20[\]^_\x3f
|
||||
Failed: error 168 at offset 3: \c\x20must\x20be\x20followed\x20by\x20a\x20letter\x20or\x20one\x20of\x20[\]^_\x3f
|
||||
|
||||
# End
|
||||
|
|
Loading…
Reference in New Issue