Documentation for Bidi_Control and Bidi_Class
This commit is contained in:
parent
0246c6bf64
commit
30abd0ac8d
|
@ -39,6 +39,8 @@ pcre2_substitute(), and the replacement argument of the latter, if the pointer
|
|||
is NULL and the length is zero, treat as an empty string. Apparently a number
|
||||
of applications treat NULL/0 in this way.
|
||||
|
||||
14. Added support for Bidi_Class and Bidi_Control Unicode properties.
|
||||
|
||||
|
||||
Version 10.39 29-October-2021
|
||||
-----------------------------
|
||||
|
|
|
@ -2055,8 +2055,8 @@ point. However, this applies only to characters whose code points are less than
|
|||
\d.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE2 is built with Unicode support (the default), the Unicode properties
|
||||
of all characters can be tested with \p and \P, or, alternatively, the
|
||||
When PCRE2 is built with Unicode support (the default), certain Unicode
|
||||
character properties can be tested with \p and \P, or, alternatively, the
|
||||
PCRE2_UCP option can be set when a pattern is compiled; this causes \w and
|
||||
friends to use Unicode property support instead of the built-in tables.
|
||||
PCRE2_UCP also causes upper/lower casing operations on characters with code
|
||||
|
@ -4018,7 +4018,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 November 2021
|
||||
Last updated: 08 December 2021
|
||||
<br>
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -142,8 +142,9 @@ locked this out by setting PCRE2_NEVER_UTF.
|
|||
UTF support allows the libraries to process character code points up to
|
||||
0x10ffff in the strings that they handle. Unicode support also gives access to
|
||||
the Unicode properties of characters, using pattern escapes such as \P, \p,
|
||||
and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
|
||||
supported. Details are given in the
|
||||
and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i>,
|
||||
script names, and some bi-directional properties are supported. Details are
|
||||
given in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
|
@ -615,9 +616,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC26" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 20 March 2020
|
||||
Last updated: 08 December 2021
|
||||
<br>
|
||||
Copyright © 1997-2020 University of Cambridge.
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -66,9 +66,9 @@ interprets them.
|
|||
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
|
||||
built with Unicode support (the default). The properties that can be tested
|
||||
with \p and \P are limited to the general category properties such as Lu and
|
||||
Nd, script names such as Greek or Han, and the derived properties Any and L&.
|
||||
Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
|
||||
is limited. See the
|
||||
Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
|
||||
derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
|
||||
(surrogate) property, but in PCRE2 its use is limited. See the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation for details. The long synonyms for property names that Perl
|
||||
supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
|
||||
|
@ -257,7 +257,7 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 01 December 2021
|
||||
Last updated: 08 December 2021
|
||||
<br>
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -783,11 +783,13 @@ escape sequences are:
|
|||
\P{<i>xx</i>} a character without the <i>xx</i> property
|
||||
\X a Unicode extended grapheme cluster
|
||||
</pre>
|
||||
The property names represented by <i>xx</i> above are case-sensitive. There is
|
||||
support for Unicode script names, Unicode general category properties, "Any",
|
||||
which matches any character (including newline), and some special PCRE2
|
||||
properties (described in the
|
||||
<a href="#extraprops">next section).</a>
|
||||
The property names represented by <i>xx</i> above are not case-sensitive, and in
|
||||
accordance with Unicode's "loose matching" rules, spaces, hyphens, and
|
||||
underscores are ignored. There is support for Unicode script names, Unicode
|
||||
general category properties, "Any", which matches any character (including
|
||||
newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
|
||||
(described
|
||||
<a href="#extraprops">below).</a>
|
||||
Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
|
||||
Note that \P{Any} does not match any characters, so always causes a match
|
||||
failure.
|
||||
|
@ -1030,9 +1032,9 @@ The following general category property codes are supported:
|
|||
Zp Paragraph separator
|
||||
Zs Space separator
|
||||
</pre>
|
||||
The special property L& is also supported: it matches a character that has
|
||||
the Lu, Ll, or Lt property, in other words, a letter that is not classified as
|
||||
a modifier or "other".
|
||||
The special property LC, which has the synonym L&, is also supported: it
|
||||
matches a character that has the Lu, Ll, or Lt property, in other words, a
|
||||
letter that is not classified as a modifier or "other".
|
||||
</P>
|
||||
<P>
|
||||
The Cs (Surrogate) property applies only to characters whose code points are in
|
||||
|
@ -1067,6 +1069,45 @@ properties in PCRE2 by default, though you can make them do so by setting the
|
|||
PCRE2_UCP option or by starting the pattern with (*UCP).
|
||||
</P>
|
||||
<br><b>
|
||||
Bi-directional properties for \p and \P
|
||||
</b><br>
|
||||
<P>
|
||||
Two properties relating to bi-directional text are supported:
|
||||
<pre>
|
||||
\p{Bidi_Control} matches a Bidi control character
|
||||
\p{Bidi_Class:<class>} matches a character with the given class
|
||||
</pre>
|
||||
The recognized classes are:
|
||||
<pre>
|
||||
AL Arabic letter
|
||||
AN Arabic number
|
||||
B paragraph separator
|
||||
BN boundary neutral
|
||||
CS common separator
|
||||
EN European number
|
||||
ES European separator
|
||||
ET European terminator
|
||||
FSI first strong isolate
|
||||
L left-to-right
|
||||
LRE left-to-right embedding
|
||||
LRI left-to-right isolate
|
||||
LRO left-to-right override
|
||||
NSM non-spacing mark
|
||||
ON other neutral
|
||||
PDF pop directional format
|
||||
PDI pop directional isolate
|
||||
R right-to-left
|
||||
RLE right-to-left embedding
|
||||
RLI right-to-left isolate
|
||||
RLO right-to-left override
|
||||
S segment separator
|
||||
WS which space
|
||||
</pre>
|
||||
For Bidi_Class, an equals sign may be used instead of a colon. The class names
|
||||
are case-insensitive. As for other properties, only the short names are
|
||||
recognized.
|
||||
</P>
|
||||
<br><b>
|
||||
Extended grapheme clusters
|
||||
</b><br>
|
||||
<P>
|
||||
|
@ -3861,7 +3902,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 01 December 2021
|
||||
Last updated: 08 December 2021
|
||||
<br>
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -20,28 +20,29 @@ please consult the man page, in case the conversion went wrong.
|
|||
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
|
||||
<li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
|
||||
<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
|
||||
<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
|
||||
<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
|
||||
<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
|
||||
<li><a name="TOC11" href="#SEC11">REPORTED MATCH POINT SETTING</a>
|
||||
<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
|
||||
<li><a name="TOC13" href="#SEC13">CAPTURING</a>
|
||||
<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
|
||||
<li><a name="TOC15" href="#SEC15">COMMENT</a>
|
||||
<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
|
||||
<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
|
||||
<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
|
||||
<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
|
||||
<li><a name="TOC20" href="#SEC20">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
|
||||
<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
|
||||
<li><a name="TOC22" href="#SEC22">BACKREFERENCES</a>
|
||||
<li><a name="TOC23" href="#SEC23">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
|
||||
<li><a name="TOC24" href="#SEC24">CONDITIONAL PATTERNS</a>
|
||||
<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
|
||||
<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
|
||||
<li><a name="TOC27" href="#SEC27">SEE ALSO</a>
|
||||
<li><a name="TOC28" href="#SEC28">AUTHOR</a>
|
||||
<li><a name="TOC29" href="#SEC29">REVISION</a>
|
||||
<li><a name="TOC8" href="#SEC8">BIDI_PROPERTIES FOR \p AND \P</a>
|
||||
<li><a name="TOC9" href="#SEC9">CHARACTER CLASSES</a>
|
||||
<li><a name="TOC10" href="#SEC10">QUANTIFIERS</a>
|
||||
<li><a name="TOC11" href="#SEC11">ANCHORS AND SIMPLE ASSERTIONS</a>
|
||||
<li><a name="TOC12" href="#SEC12">REPORTED MATCH POINT SETTING</a>
|
||||
<li><a name="TOC13" href="#SEC13">ALTERNATION</a>
|
||||
<li><a name="TOC14" href="#SEC14">CAPTURING</a>
|
||||
<li><a name="TOC15" href="#SEC15">ATOMIC GROUPS</a>
|
||||
<li><a name="TOC16" href="#SEC16">COMMENT</a>
|
||||
<li><a name="TOC17" href="#SEC17">OPTION SETTING</a>
|
||||
<li><a name="TOC18" href="#SEC18">NEWLINE CONVENTION</a>
|
||||
<li><a name="TOC19" href="#SEC19">WHAT \R MATCHES</a>
|
||||
<li><a name="TOC20" href="#SEC20">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
|
||||
<li><a name="TOC21" href="#SEC21">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
|
||||
<li><a name="TOC22" href="#SEC22">SCRIPT RUNS</a>
|
||||
<li><a name="TOC23" href="#SEC23">BACKREFERENCES</a>
|
||||
<li><a name="TOC24" href="#SEC24">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
|
||||
<li><a name="TOC25" href="#SEC25">CONDITIONAL PATTERNS</a>
|
||||
<li><a name="TOC26" href="#SEC26">BACKTRACKING CONTROL</a>
|
||||
<li><a name="TOC27" href="#SEC27">CALLOUTS</a>
|
||||
<li><a name="TOC28" href="#SEC28">SEE ALSO</a>
|
||||
<li><a name="TOC29" href="#SEC29">AUTHOR</a>
|
||||
<li><a name="TOC30" href="#SEC30">REVISION</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
|
||||
<P>
|
||||
|
@ -362,7 +363,40 @@ Yezidi,
|
|||
Yi,
|
||||
Zanabazar_Square.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
|
||||
<br><a name="SEC8" href="#TOC1">BIDI_PROPERTIES FOR \p AND \P</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
\p{Bidi_Control} matches a Bidi control character
|
||||
\p{Bidi_Class:<class>} matches a character with the given class
|
||||
</pre>
|
||||
The recognized classes are:
|
||||
<pre>
|
||||
AL Arabic letter
|
||||
AN Arabic number
|
||||
B paragraph separator
|
||||
BN boundary neutral
|
||||
CS common separator
|
||||
EN European number
|
||||
ES European separator
|
||||
ET European terminator
|
||||
FSI first strong isolate
|
||||
L left-to-right
|
||||
LRE left-to-right embedding
|
||||
LRI left-to-right isolate
|
||||
LRO left-to-right override
|
||||
NSM non-spacing mark
|
||||
ON other neutral
|
||||
PDF pop directional format
|
||||
PDI pop directional isolate
|
||||
R right-to-left
|
||||
RLE right-to-left embedding
|
||||
RLI right-to-left isolate
|
||||
RLO right-to-left override
|
||||
S segment separator
|
||||
WS which space
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">CHARACTER CLASSES</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
[...] positive character class
|
||||
|
@ -390,7 +424,7 @@ In PCRE2, POSIX character set names recognize only ASCII characters by default,
|
|||
but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||
\Q...\E inside a character class.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
|
||||
<br><a name="SEC10" href="#TOC1">QUANTIFIERS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
? 0 or 1, greedy
|
||||
|
@ -411,7 +445,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
{n,}? n or more, lazy
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
|
||||
<br><a name="SEC11" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
\b word boundary
|
||||
|
@ -429,7 +463,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
\G first matching position in subject
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
|
||||
<br><a name="SEC12" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
\K set reported start of match
|
||||
|
@ -439,13 +473,13 @@ for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
|||
option is set, the previous behaviour is re-enabled. When this option is set,
|
||||
\K is honoured in positive assertions, but ignored in negative ones.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
|
||||
<br><a name="SEC13" href="#TOC1">ALTERNATION</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
expr|expr|expr...
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
|
||||
<br><a name="SEC14" href="#TOC1">CAPTURING</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(...) capture group
|
||||
|
@ -460,20 +494,20 @@ In non-UTF modes, names may contain underscores and ASCII letters and digits;
|
|||
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
|
||||
both cases, a name must not start with a digit.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
|
||||
<br><a name="SEC15" href="#TOC1">ATOMIC GROUPS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?>...) atomic non-capture group
|
||||
(*atomic:...) atomic non-capture group
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
|
||||
<br><a name="SEC16" href="#TOC1">COMMENT</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?#....) comment (not nestable)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
|
||||
<br><a name="SEC17" href="#TOC1">OPTION SETTING</a><br>
|
||||
<P>
|
||||
Changes of these options within a group are automatically cancelled at the end
|
||||
of the group.
|
||||
|
@ -518,7 +552,7 @@ not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
|
|||
application can lock out the use of (*UTF) and (*UCP) by setting the
|
||||
PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
|
||||
<br><a name="SEC18" href="#TOC1">NEWLINE CONVENTION</a><br>
|
||||
<P>
|
||||
These are recognized only at the very start of the pattern or after option
|
||||
settings with a similar syntax.
|
||||
|
@ -531,7 +565,7 @@ settings with a similar syntax.
|
|||
(*NUL) the NUL character (binary zero)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<br><a name="SEC19" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<P>
|
||||
These are recognized only at the very start of the pattern or after option
|
||||
setting with a similar syntax.
|
||||
|
@ -540,7 +574,7 @@ setting with a similar syntax.
|
|||
(*BSR_UNICODE) any Unicode newline sequence
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?=...) )
|
||||
|
@ -561,7 +595,7 @@ setting with a similar syntax.
|
|||
</pre>
|
||||
Each top-level branch of a lookbehind must be of a fixed length.
|
||||
</P>
|
||||
<br><a name="SEC20" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
||||
<P>
|
||||
These assertions are specific to PCRE2 and are not Perl-compatible.
|
||||
<pre>
|
||||
|
@ -574,7 +608,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
|||
(*non_atomic_positive_lookbehind:...) )
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
|
||||
<br><a name="SEC22" href="#TOC1">SCRIPT RUNS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(*script_run:...) ) script run, can be backtracked into
|
||||
|
@ -584,7 +618,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
|||
(*asr:...) )
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">BACKREFERENCES</a><br>
|
||||
<br><a name="SEC23" href="#TOC1">BACKREFERENCES</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
\n reference by number (can be ambiguous)
|
||||
|
@ -601,7 +635,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
|||
(?P=name) reference by name (Python)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
|
||||
<br><a name="SEC24" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?R) recurse whole pattern
|
||||
|
@ -620,7 +654,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
|||
\g'-n' call subroutine by relative number (PCRE2 extension)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC24" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
||||
<br><a name="SEC25" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?(condition)yes-pattern)
|
||||
|
@ -643,7 +677,7 @@ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
|
|||
conditions or recursion tests. Such a condition is interpreted as a reference
|
||||
condition if the relevant named group exists.
|
||||
</P>
|
||||
<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||
<br><a name="SEC26" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||
<P>
|
||||
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
||||
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
||||
|
@ -670,7 +704,7 @@ pattern is not anchored.
|
|||
The effect of one of these verbs in a group called as a subroutine is confined
|
||||
to the subroutine call.
|
||||
</P>
|
||||
<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
|
||||
<br><a name="SEC27" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?C) callout (assumed number 0)
|
||||
|
@ -681,12 +715,12 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
|
|||
start and the end), and the starting delimiter { matched with the ending
|
||||
delimiter }. To encode the ending delimiter within the string, double it.
|
||||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
|
||||
<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
||||
<b>pcre2matching</b>(3), <b>pcre2</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
|
@ -695,9 +729,9 @@ Retired from University Computing Service
|
|||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC29" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 August 2021
|
||||
Last updated: 08 December 2021
|
||||
<br>
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -52,13 +52,13 @@ When PCRE2 is built with Unicode support, the escape sequences \p{..},
|
|||
\P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
|
||||
The Unicode properties that can be tested are limited to the general category
|
||||
properties such as Lu for an upper case letter or Nd for a decimal number, the
|
||||
Unicode script names such as Arabic or Han, and the derived properties Any and
|
||||
L&. Full lists are given in the
|
||||
Unicode script names such as Arabic or Han, Bidi_Class, Bidi_Control, and the
|
||||
derived properties Any and LC (synonym L&). Full lists are given in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
and
|
||||
<a href="pcre2syntax.html"><b>pcre2syntax</b></a>
|
||||
documentation. Only the short names for properties are supported. For example,
|
||||
\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
|
||||
\p{L} matches a letter. Its longer synonym, \p{Letter}, is not supported.
|
||||
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
|
||||
compatibility with Perl 5.6. PCRE2 does not support this.
|
||||
</P>
|
||||
|
@ -486,9 +486,9 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 23 February 2020
|
||||
Last updated: 08 December 2021
|
||||
<br>
|
||||
Copyright © 1997-2020 University of Cambridge.
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
150
doc/pcre2.txt
150
doc/pcre2.txt
|
@ -2012,13 +2012,13 @@ LOCALE SUPPORT
|
|||
code points are less than 256. By default, higher-valued code points
|
||||
never match escapes such as \w or \d.
|
||||
|
||||
When PCRE2 is built with Unicode support (the default), the Unicode
|
||||
properties of all characters can be tested with \p and \P, or, alterna-
|
||||
tively, the PCRE2_UCP option can be set when a pattern is compiled;
|
||||
this causes \w and friends to use Unicode property support instead of
|
||||
the built-in tables. PCRE2_UCP also causes upper/lower casing opera-
|
||||
tions on characters with code points greater than 127 to use Unicode
|
||||
properties. These effects apply even when PCRE2_UTF is not set.
|
||||
When PCRE2 is built with Unicode support (the default), certain Unicode
|
||||
character properties can be tested with \p and \P, or, alternatively,
|
||||
the PCRE2_UCP option can be set when a pattern is compiled; this causes
|
||||
\w and friends to use Unicode property support instead of the built-in
|
||||
tables. PCRE2_UCP also causes upper/lower casing operations on charac-
|
||||
ters with code points greater than 127 to use Unicode properties. These
|
||||
effects apply even when PCRE2_UTF is not set.
|
||||
|
||||
The use of locales with Unicode is discouraged. If you are handling
|
||||
characters with code points greater than 127, you should either use
|
||||
|
@ -3857,7 +3857,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 30 November 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -3970,8 +3970,8 @@ UNICODE AND UTF SUPPORT
|
|||
0x10ffff in the strings that they handle. Unicode support also gives
|
||||
access to the Unicode properties of characters, using pattern escapes
|
||||
such as \P, \p, and \X. Only the general category properties such as Lu
|
||||
and Nd are supported. Details are given in the pcre2pattern documenta-
|
||||
tion.
|
||||
and Nd, script names, and some bi-directional properties are supported.
|
||||
Details are given in the pcre2pattern documentation.
|
||||
|
||||
Pattern escapes such as \d and \w do not by default make use of Unicode
|
||||
properties. The application can request that they do by setting the
|
||||
|
@ -4453,8 +4453,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 20 March 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -4941,12 +4941,13 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
|||
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
|
||||
is built with Unicode support (the default). The properties that can be
|
||||
tested with \p and \P are limited to the general category properties
|
||||
such as Lu and Nd, script names such as Greek or Han, and the derived
|
||||
properties Any and L&. Both PCRE2 and Perl support the Cs (surrogate)
|
||||
property, but in PCRE2 its use is limited. See the pcre2pattern docu-
|
||||
mentation for details. The long synonyms for property names that Perl
|
||||
supports (such as \p{Letter}) are not supported by PCRE2, nor is it
|
||||
permitted to prefix any of these properties with "Is".
|
||||
such as Lu and Nd, script names such as Greek or Han, Bidi_Class,
|
||||
Bidi_Control, and the derived properties Any and LC (synonym L&). Both
|
||||
PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its
|
||||
use is limited. See the pcre2pattern documentation for details. The
|
||||
long synonyms for property names that Perl supports (such as \p{Let-
|
||||
ter}) are not supported by PCRE2, nor is it permitted to prefix any of
|
||||
these properties with "Is".
|
||||
|
||||
7. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
|
||||
in between are treated as literals. However, this is slightly different
|
||||
|
@ -5105,7 +5106,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 01 December 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -6894,13 +6895,14 @@ BACKSLASH
|
|||
\P{xx} a character without the xx property
|
||||
\X a Unicode extended grapheme cluster
|
||||
|
||||
The property names represented by xx above are case-sensitive. There is
|
||||
support for Unicode script names, Unicode general category properties,
|
||||
"Any", which matches any character (including newline), and some spe-
|
||||
cial PCRE2 properties (described in the next section). Other Perl
|
||||
properties such as "InMusicalSymbols" are not supported by PCRE2. Note
|
||||
that \P{Any} does not match any characters, so always causes a match
|
||||
failure.
|
||||
The property names represented by xx above are not case-sensitive, and
|
||||
in accordance with Unicode's "loose matching" rules, spaces, hyphens,
|
||||
and underscores are ignored. There is support for Unicode script names,
|
||||
Unicode general category properties, "Any", which matches any character
|
||||
(including newline), Bidi_Control, Bidi_Class, and some special PCRE2
|
||||
properties (described below). Other Perl properties such as "InMusi-
|
||||
calSymbols" are not supported by PCRE2. Note that \P{Any} does not
|
||||
match any characters, so always causes a match failure.
|
||||
|
||||
Sets of Unicode characters are defined as belonging to certain scripts.
|
||||
A character from one of these sets can be matched using a script name.
|
||||
|
@ -7000,9 +7002,9 @@ BACKSLASH
|
|||
Zp Paragraph separator
|
||||
Zs Space separator
|
||||
|
||||
The special property L& is also supported: it matches a character that
|
||||
has the Lu, Ll, or Lt property, in other words, a letter that is not
|
||||
classified as a modifier or "other".
|
||||
The special property LC, which has the synonym L&, is also supported:
|
||||
it matches a character that has the Lu, Ll, or Lt property, in other
|
||||
words, a letter that is not classified as a modifier or "other".
|
||||
|
||||
The Cs (Surrogate) property applies only to characters whose code
|
||||
points are in the range U+D800 to U+DFFF. These characters are no dif-
|
||||
|
@ -7031,6 +7033,43 @@ BACKSLASH
|
|||
them do so by setting the PCRE2_UCP option or by starting the pattern
|
||||
with (*UCP).
|
||||
|
||||
Bi-directional properties for \p and \P
|
||||
|
||||
Two properties relating to bi-directional text are supported:
|
||||
|
||||
\p{Bidi_Control} matches a Bidi control character
|
||||
\p{Bidi_Class:<class>} matches a character with the given class
|
||||
|
||||
The recognized classes are:
|
||||
|
||||
AL Arabic letter
|
||||
AN Arabic number
|
||||
B paragraph separator
|
||||
BN boundary neutral
|
||||
CS common separator
|
||||
EN European number
|
||||
ES European separator
|
||||
ET European terminator
|
||||
FSI first strong isolate
|
||||
L left-to-right
|
||||
LRE left-to-right embedding
|
||||
LRI left-to-right isolate
|
||||
LRO left-to-right override
|
||||
NSM non-spacing mark
|
||||
ON other neutral
|
||||
PDF pop directional format
|
||||
PDI pop directional isolate
|
||||
R right-to-left
|
||||
RLE right-to-left embedding
|
||||
RLI right-to-left isolate
|
||||
RLO right-to-left override
|
||||
S segment separator
|
||||
WS which space
|
||||
|
||||
For Bidi_Class, an equals sign may be used instead of a colon. The
|
||||
class names are case-insensitive. As for other properties, only the
|
||||
short names are recognized.
|
||||
|
||||
Extended grapheme clusters
|
||||
|
||||
The \X escape matches any number of Unicode characters that form an
|
||||
|
@ -9659,7 +9698,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 01 December 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -10698,6 +10737,38 @@ SCRIPT NAMES FOR \p AND \P
|
|||
Warang_Citi, Yezidi, Yi, Zanabazar_Square.
|
||||
|
||||
|
||||
BIDI_PROPERTIES FOR \p AND \P
|
||||
|
||||
\p{Bidi_Control} matches a Bidi control character
|
||||
\p{Bidi_Class:<class>} matches a character with the given class
|
||||
|
||||
The recognized classes are:
|
||||
|
||||
AL Arabic letter
|
||||
AN Arabic number
|
||||
B paragraph separator
|
||||
BN boundary neutral
|
||||
CS common separator
|
||||
EN European number
|
||||
ES European separator
|
||||
ET European terminator
|
||||
FSI first strong isolate
|
||||
L left-to-right
|
||||
LRE left-to-right embedding
|
||||
LRI left-to-right isolate
|
||||
LRO left-to-right override
|
||||
NSM non-spacing mark
|
||||
ON other neutral
|
||||
PDF pop directional format
|
||||
PDI pop directional isolate
|
||||
R right-to-left
|
||||
RLE right-to-left embedding
|
||||
RLI right-to-left isolate
|
||||
RLO right-to-left override
|
||||
S segment separator
|
||||
WS which space
|
||||
|
||||
|
||||
CHARACTER CLASSES
|
||||
|
||||
[...] positive character class
|
||||
|
@ -11027,7 +11098,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 30 August 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -11073,12 +11144,13 @@ UNICODE PROPERTY SUPPORT
|
|||
ting. The Unicode properties that can be tested are limited to the
|
||||
general category properties such as Lu for an upper case letter or Nd
|
||||
for a decimal number, the Unicode script names such as Arabic or Han,
|
||||
and the derived properties Any and L&. Full lists are given in the
|
||||
pcre2pattern and pcre2syntax documentation. Only the short names for
|
||||
properties are supported. For example, \p{L} matches a letter. Its Perl
|
||||
synonym, \p{Letter}, is not supported. Furthermore, in Perl, many
|
||||
properties may optionally be prefixed by "Is", for compatibility with
|
||||
Perl 5.6. PCRE2 does not support this.
|
||||
Bidi_Class, Bidi_Control, and the derived properties Any and LC (syn-
|
||||
onym L&). Full lists are given in the pcre2pattern and pcre2syntax doc-
|
||||
umentation. Only the short names for properties are supported. For ex-
|
||||
ample, \p{L} matches a letter. Its longer synonym, \p{Letter}, is not
|
||||
supported. Furthermore, in Perl, many properties may optionally be
|
||||
prefixed by "Is", for compatibility with Perl 5.6. PCRE2 does not sup-
|
||||
port this.
|
||||
|
||||
|
||||
WIDE CHARACTERS AND UTF MODES
|
||||
|
@ -11462,8 +11534,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 23 February 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "30 November 2021" "PCRE2 10.40"
|
||||
.TH PCRE2API 3 "08 December 2021" "PCRE2 10.40"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -2015,8 +2015,8 @@ point. However, this applies only to characters whose code points are less than
|
|||
256. By default, higher-valued code points never match escapes such as \ew or
|
||||
\ed.
|
||||
.P
|
||||
When PCRE2 is built with Unicode support (the default), the Unicode properties
|
||||
of all characters can be tested with \ep and \eP, or, alternatively, the
|
||||
When PCRE2 is built with Unicode support (the default), certain Unicode
|
||||
character properties can be tested with \ep and \eP, or, alternatively, the
|
||||
PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
|
||||
friends to use Unicode property support instead of the built-in tables.
|
||||
PCRE2_UCP also causes upper/lower casing operations on characters with code
|
||||
|
@ -4025,6 +4025,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 30 November 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2BUILD 3 "20 March 2020" "PCRE2 10.35"
|
||||
.TH PCRE2BUILD 3 "08 December 2021" "PCRE2 10.40"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.
|
||||
|
@ -122,8 +122,9 @@ locked this out by setting PCRE2_NEVER_UTF.
|
|||
UTF support allows the libraries to process character code points up to
|
||||
0x10ffff in the strings that they handle. Unicode support also gives access to
|
||||
the Unicode properties of characters, using pattern escapes such as \eP, \ep,
|
||||
and \eX. Only the general category properties such as \fILu\fP and \fINd\fP are
|
||||
supported. Details are given in the
|
||||
and \eX. Only the general category properties such as \fILu\fP and \fINd\fP,
|
||||
script names, and some bi-directional properties are supported. Details are
|
||||
given in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -633,6 +634,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 20 March 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2COMPAT 3 "01 December 2021" "PCRE2 10.40"
|
||||
.TH PCRE2COMPAT 3 "08 December 2021" "PCRE2 10.40"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||
|
@ -50,9 +50,9 @@ interprets them.
|
|||
6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
|
||||
built with Unicode support (the default). The properties that can be tested
|
||||
with \ep and \eP are limited to the general category properties such as Lu and
|
||||
Nd, script names such as Greek or Han, and the derived properties Any and L&.
|
||||
Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
|
||||
is limited. See the
|
||||
Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
|
||||
derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
|
||||
(surrogate) property, but in PCRE2 its use is limited. See the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -222,6 +222,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 01 December 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "01 December 2021" "PCRE2 10.40"
|
||||
.TH PCRE2PATTERN 3 "08 December 2021" "PCRE2 10.40"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -779,13 +779,15 @@ escape sequences are:
|
|||
\eP{\fIxx\fP} a character without the \fIxx\fP property
|
||||
\eX a Unicode extended grapheme cluster
|
||||
.sp
|
||||
The property names represented by \fIxx\fP above are case-sensitive. There is
|
||||
support for Unicode script names, Unicode general category properties, "Any",
|
||||
which matches any character (including newline), and some special PCRE2
|
||||
properties (described in the
|
||||
The property names represented by \fIxx\fP above are not case-sensitive, and in
|
||||
accordance with Unicode's "loose matching" rules, spaces, hyphens, and
|
||||
underscores are ignored. There is support for Unicode script names, Unicode
|
||||
general category properties, "Any", which matches any character (including
|
||||
newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
|
||||
(described
|
||||
.\" HTML <a href="#extraprops">
|
||||
.\" </a>
|
||||
next section).
|
||||
below).
|
||||
.\"
|
||||
Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
|
||||
Note that \eP{Any} does not match any characters, so always causes a match
|
||||
|
@ -1025,9 +1027,9 @@ The following general category property codes are supported:
|
|||
Zp Paragraph separator
|
||||
Zs Space separator
|
||||
.sp
|
||||
The special property L& is also supported: it matches a character that has
|
||||
the Lu, Ll, or Lt property, in other words, a letter that is not classified as
|
||||
a modifier or "other".
|
||||
The special property LC, which has the synonym L&, is also supported: it
|
||||
matches a character that has the Lu, Ll, or Lt property, in other words, a
|
||||
letter that is not classified as a modifier or "other".
|
||||
.P
|
||||
The Cs (Surrogate) property applies only to characters whose code points are in
|
||||
the range U+D800 to U+DFFF. These characters are no different to any other
|
||||
|
@ -1059,6 +1061,45 @@ properties in PCRE2 by default, though you can make them do so by setting the
|
|||
PCRE2_UCP option or by starting the pattern with (*UCP).
|
||||
.
|
||||
.
|
||||
.SS "Bi-directional properties for \ep and \eP"
|
||||
.rs
|
||||
.sp
|
||||
Two properties relating to bi-directional text are supported:
|
||||
.sp
|
||||
\ep{Bidi_Control} matches a Bidi control character
|
||||
\ep{Bidi_Class:<class>} matches a character with the given class
|
||||
.sp
|
||||
The recognized classes are:
|
||||
.sp
|
||||
AL Arabic letter
|
||||
AN Arabic number
|
||||
B paragraph separator
|
||||
BN boundary neutral
|
||||
CS common separator
|
||||
EN European number
|
||||
ES European separator
|
||||
ET European terminator
|
||||
FSI first strong isolate
|
||||
L left-to-right
|
||||
LRE left-to-right embedding
|
||||
LRI left-to-right isolate
|
||||
LRO left-to-right override
|
||||
NSM non-spacing mark
|
||||
ON other neutral
|
||||
PDF pop directional format
|
||||
PDI pop directional isolate
|
||||
R right-to-left
|
||||
RLE right-to-left embedding
|
||||
RLI right-to-left isolate
|
||||
RLO right-to-left override
|
||||
S segment separator
|
||||
WS which space
|
||||
.sp
|
||||
For Bidi_Class, an equals sign may be used instead of a colon. The class names
|
||||
are case-insensitive. As for other properties, only the short names are
|
||||
recognized.
|
||||
.
|
||||
.
|
||||
.SS Extended grapheme clusters
|
||||
.rs
|
||||
.sp
|
||||
|
@ -3909,6 +3950,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 01 December 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "30 August 2021" "PCRE2 10.38"
|
||||
.TH PCRE2SYNTAX 3 "08 December 2021" "PCRE2 10.40"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -333,6 +333,39 @@ Yi,
|
|||
Zanabazar_Square.
|
||||
.
|
||||
.
|
||||
.SH "BIDI_PROPERTIES FOR \ep AND \eP"
|
||||
.rs
|
||||
.sp
|
||||
\ep{Bidi_Control} matches a Bidi control character
|
||||
\ep{Bidi_Class:<class>} matches a character with the given class
|
||||
.sp
|
||||
The recognized classes are:
|
||||
.sp
|
||||
AL Arabic letter
|
||||
AN Arabic number
|
||||
B paragraph separator
|
||||
BN boundary neutral
|
||||
CS common separator
|
||||
EN European number
|
||||
ES European separator
|
||||
ET European terminator
|
||||
FSI first strong isolate
|
||||
L left-to-right
|
||||
LRE left-to-right embedding
|
||||
LRI left-to-right isolate
|
||||
LRO left-to-right override
|
||||
NSM non-spacing mark
|
||||
ON other neutral
|
||||
PDF pop directional format
|
||||
PDI pop directional isolate
|
||||
R right-to-left
|
||||
RLE right-to-left embedding
|
||||
RLI right-to-left isolate
|
||||
RLO right-to-left override
|
||||
S segment separator
|
||||
WS which space
|
||||
.
|
||||
.
|
||||
.SH "CHARACTER CLASSES"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -684,6 +717,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 30 August 2021
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2UNICODE 3 "23 February 2020" "PCRE2 10.35"
|
||||
.TH PCRE2UNICODE 3 "08 December 2021" "PCRE2 10.40"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions (revised API)
|
||||
.SH "UNICODE AND UTF SUPPORT"
|
||||
|
@ -42,8 +42,8 @@ When PCRE2 is built with Unicode support, the escape sequences \ep{..},
|
|||
\eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
|
||||
The Unicode properties that can be tested are limited to the general category
|
||||
properties such as Lu for an upper case letter or Nd for a decimal number, the
|
||||
Unicode script names such as Arabic or Han, and the derived properties Any and
|
||||
L&. Full lists are given in the
|
||||
Unicode script names such as Arabic or Han, Bidi_Class, Bidi_Control, and the
|
||||
derived properties Any and LC (synonym L&). Full lists are given in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -52,7 +52,7 @@ and
|
|||
\fBpcre2syntax\fP
|
||||
.\"
|
||||
documentation. Only the short names for properties are supported. For example,
|
||||
\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
|
||||
\ep{L} matches a letter. Its longer synonym, \ep{Letter}, is not supported.
|
||||
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
|
||||
compatibility with Perl 5.6. PCRE2 does not support this.
|
||||
.
|
||||
|
@ -457,6 +457,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 23 February 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
Last updated: 08 December 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
.fi
|
||||
|
|
Loading…
Reference in New Issue