Documentation update for binary property support
This commit is contained in:
parent
bf35c0518c
commit
7f7d3e8521
|
@ -776,8 +776,17 @@ can be used in any mode, though in 8-bit and 16-bit non-UTF modes these
|
||||||
sequences are of course limited to testing characters whose code points are
|
sequences are of course limited to testing characters whose code points are
|
||||||
less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points
|
less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points
|
||||||
greater than 0x10ffff (the Unicode limit) may be encountered. These are all
|
greater than 0x10ffff (the Unicode limit) may be encountered. These are all
|
||||||
treated as being in the Unknown script and with an unassigned type. The extra
|
treated as being in the Unknown script and with an unassigned type.
|
||||||
escape sequences are:
|
</P>
|
||||||
|
<P>
|
||||||
|
Matching characters by Unicode property is not fast, because PCRE2 has to do a
|
||||||
|
multistage table lookup in order to find a character's property. That is why
|
||||||
|
the traditional escape sequences such as \d and \w do not use Unicode
|
||||||
|
properties in PCRE2 by default, though you can make them do so by setting the
|
||||||
|
PCRE2_UCP option or by starting the pattern with (*UCP).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The extra escape sequences that provide property support are:
|
||||||
<pre>
|
<pre>
|
||||||
\p{<i>xx</i>} a character with the <i>xx</i> property
|
\p{<i>xx</i>} a character with the <i>xx</i> property
|
||||||
\P{<i>xx</i>} a character without the <i>xx</i> property
|
\P{<i>xx</i>} a character without the <i>xx</i> property
|
||||||
|
@ -787,17 +796,20 @@ The property names represented by <i>xx</i> above are not case-sensitive, and in
|
||||||
accordance with Unicode's "loose matching" rules, spaces, hyphens, and
|
accordance with Unicode's "loose matching" rules, spaces, hyphens, and
|
||||||
underscores are ignored. There is support for Unicode script names, Unicode
|
underscores are ignored. There is support for Unicode script names, Unicode
|
||||||
general category properties, "Any", which matches any character (including
|
general category properties, "Any", which matches any character (including
|
||||||
newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
|
newline), Bidi_Class, a number of binary (yes/no) properties, and some special
|
||||||
(described
|
PCRE2 properties (described
|
||||||
<a href="#extraprops">below).</a>
|
<a href="#extraprops">below).</a>
|
||||||
Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
|
Certain other Perl properties such as "InMusicalSymbols" are not supported by
|
||||||
Note that \P{Any} does not match any characters, so always causes a match
|
PCRE2. Note that \P{Any} does not match any characters, so always causes a
|
||||||
failure.
|
match failure.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Script properties for \p and \P
|
||||||
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
There are three different syntax forms for matching a script. Each Unicode
|
There are three different syntax forms for matching a script. Each Unicode
|
||||||
character has a basic script and, optionally, a list of other scripts ("Script
|
character has a basic script and, optionally, a list of other scripts ("Script
|
||||||
Extentions") with which it is commonly used. Using the Adlam script as an
|
Extensions") with which it is commonly used. Using the Adlam script as an
|
||||||
example, \p{sc:Adlam} matches characters whose basic script is Adlam, whereas
|
example, \p{sc:Adlam} matches characters whose basic script is Adlam, whereas
|
||||||
\p{scx:Adlam} matches, in addition, characters that have Adlam in their
|
\p{scx:Adlam} matches, in addition, characters that have Adlam in their
|
||||||
extensions list. The full names "script" and "script extensions" for the
|
extensions list. The full names "script" and "script extensions" for the
|
||||||
|
@ -810,171 +822,16 @@ interpretation at release 5.26 and PCRE2 changed at release 10.40.
|
||||||
Unassigned characters (and in non-UTF 32-bit mode, characters with code points
|
Unassigned characters (and in non-UTF 32-bit mode, characters with code points
|
||||||
greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not
|
greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not
|
||||||
part of an identified script are lumped together as "Common". The current list
|
part of an identified script are lumped together as "Common". The current list
|
||||||
of script names and their 4-letter abbreviations is:
|
of recognized script names and their 4-character abbreviations can be obtained
|
||||||
</P>
|
by running this command:
|
||||||
<P>
|
<pre>
|
||||||
Adlam (Adlm),
|
pcre2test -LS
|
||||||
Ahom (Ahom),
|
|
||||||
Anatolian_Hieroglyphs (Hluw),
|
</PRE>
|
||||||
Arabic (Arab),
|
|
||||||
Armenian (Armn),
|
|
||||||
Avestan (Avst),
|
|
||||||
Balinese (Bali),
|
|
||||||
Bamum (Bamu),
|
|
||||||
Bassa_Vah (Bass),
|
|
||||||
Batak (Batk),
|
|
||||||
Bengali (Beng),
|
|
||||||
Bhaiksuki (Bhks),
|
|
||||||
Bopomofo (Bopo),
|
|
||||||
Brahmi (Brah),
|
|
||||||
Braille (Brai),
|
|
||||||
Buginese (Bugi),
|
|
||||||
Buhid (Buhd),
|
|
||||||
Canadian_Aboriginal (Cans),
|
|
||||||
Carian (Cari),
|
|
||||||
Caucasian_Albanian (Aghb),
|
|
||||||
Chakma (Cakm),
|
|
||||||
Cham (Cham),
|
|
||||||
Cherokee (Cher),
|
|
||||||
Chorasmian (Chrs),
|
|
||||||
Common (Zyyy),
|
|
||||||
Coptic (Copt),
|
|
||||||
Cuneiform (Xsux),
|
|
||||||
Cypriot (Cprt),
|
|
||||||
Cypro_Minoan (Cpmn),
|
|
||||||
Cyrillic (Cyrl),
|
|
||||||
Deseret (Dsrt),
|
|
||||||
Devanagari (Deva),
|
|
||||||
Dives_Akuru (Diak),
|
|
||||||
Dogra (Dogr),
|
|
||||||
Duployan (Dupl),
|
|
||||||
Egyptian_Hieroglyphs (Egyp),
|
|
||||||
Elbasan (Elba),
|
|
||||||
Elymaic (Elym),
|
|
||||||
Ethiopic (Ethi),
|
|
||||||
Georgian (Geor),
|
|
||||||
Glagolitic (Glag),
|
|
||||||
Gothic (Goth),
|
|
||||||
Grantha (Gran),
|
|
||||||
Greek (Grek),
|
|
||||||
Gujarati (Gujr),
|
|
||||||
Gunjala_Gondi (Gong),
|
|
||||||
Gurmukhi (Guru),
|
|
||||||
Han (Hani),
|
|
||||||
Hangul (Hang),
|
|
||||||
Hanifi_Rohingya (Rohg),
|
|
||||||
Hanunoo (Hano),
|
|
||||||
Hatran (Hatr),
|
|
||||||
Hebrew (Hebr),
|
|
||||||
Hiragana (Hira),
|
|
||||||
Imperial_Aramaic (Armi),
|
|
||||||
Inherited (Zinh),
|
|
||||||
Inscriptional_Pahlavi (Phli),
|
|
||||||
Inscriptional_Parthian (Prti),
|
|
||||||
Javanese (Java),
|
|
||||||
Kaithi (Kthi),
|
|
||||||
Kannada (Knda),
|
|
||||||
Katakana (Kana),
|
|
||||||
Kayah_Li (Kali),
|
|
||||||
Kharoshthi (Khar),
|
|
||||||
Khitan_Small_Script (Kits),
|
|
||||||
Khmer (Khmr),
|
|
||||||
Khojki (Khoj),
|
|
||||||
Khudawadi (Sind),
|
|
||||||
Lao (Laoo),
|
|
||||||
Latin (Latn),
|
|
||||||
Lepcha (Lepc),
|
|
||||||
Limbu (Limb),
|
|
||||||
Linear_A (Lina),
|
|
||||||
Linear_B (Linb),
|
|
||||||
Lisu (Lisu),
|
|
||||||
Lycian (Lyci),
|
|
||||||
Lydian (Lydi),
|
|
||||||
Mahajani (Majh),
|
|
||||||
Makasar (Maka),
|
|
||||||
Malayalam (Mlym),
|
|
||||||
Mandaic (Mand),
|
|
||||||
Manichaean (Mani),
|
|
||||||
Marchen (Marc),
|
|
||||||
Masaram_Gondi (Gonm),
|
|
||||||
Medefaidrin (Medf),
|
|
||||||
Meetei_Mayek (Mtei),
|
|
||||||
Mende_Kikakui (Mend),
|
|
||||||
Meroitic_Cursive (Merc),
|
|
||||||
Meroitic_Hieroglyphs (Mero),
|
|
||||||
Miao (Miao),
|
|
||||||
Modi (Modi),
|
|
||||||
Mongolian (Mong),
|
|
||||||
Mro (Mroo),
|
|
||||||
Multani (Mult),
|
|
||||||
Myanmar (Mymr),
|
|
||||||
Nabataean (Nbar),
|
|
||||||
Nandinagari (Nand),
|
|
||||||
New_Tai_Lue (Talu),
|
|
||||||
Newa (Newa),
|
|
||||||
Nko (Nkoo),
|
|
||||||
Nushu (Nshu),
|
|
||||||
Nyiakeng_Puachue_Hmong (Hmnp),
|
|
||||||
Ogham (Ogam),
|
|
||||||
Ol_Chiki (Olck),
|
|
||||||
Old_Hungarian (Hung),
|
|
||||||
Old_Italic (Olck),
|
|
||||||
Old_North_Arabian (Narb),
|
|
||||||
Old_Permic (Perm),
|
|
||||||
Old_Persian (Orkh),
|
|
||||||
Old_Sogdian (Sogo),
|
|
||||||
Old_South_Arabian (Sarb),
|
|
||||||
Old_Turkic (Orkh),
|
|
||||||
Old_Uyghur (Ougr),
|
|
||||||
Oriya (Orya),
|
|
||||||
Osage (Osge),
|
|
||||||
Osmanya (Osma),
|
|
||||||
Pahawh_Hmong (Hmng),
|
|
||||||
Palmyrene (Palm),
|
|
||||||
Pau_Cin_Hau (Pauc),
|
|
||||||
Phags_Pa (Phag),
|
|
||||||
Phoenician (Phnx),
|
|
||||||
Psalter_Pahlavi (Phli),
|
|
||||||
Rejang (Rjng),
|
|
||||||
Runic (Runr),
|
|
||||||
Samaritan (Samr),
|
|
||||||
Saurashtra (Saur),
|
|
||||||
Sharada (Shrd),
|
|
||||||
Shavian (Shaw),
|
|
||||||
Siddham (Sidd),
|
|
||||||
SignWriting (Sgnw),
|
|
||||||
Sinhala (Sinh),
|
|
||||||
Sogdian (Sogd),
|
|
||||||
Sora_Sompeng (Sora),
|
|
||||||
Soyombo (Soyo),
|
|
||||||
Sundanese (Sund),
|
|
||||||
Syloti_Nagri (Sylo),
|
|
||||||
Syriac (Syrc),
|
|
||||||
Tagalog (Tglg),
|
|
||||||
Tagbanwa (Tagb),
|
|
||||||
Tai_Le (Tale),
|
|
||||||
Tai_Tham (Lana),
|
|
||||||
Tai_Viet (Tavt),
|
|
||||||
Takri (Takr),
|
|
||||||
Tamil (Taml),
|
|
||||||
Tangsa (Tngs),
|
|
||||||
Tangut (Tang),
|
|
||||||
Telugu (Telu),
|
|
||||||
Thaana (Thaa),
|
|
||||||
Thai (Thai),
|
|
||||||
Tibetan (Tibt),
|
|
||||||
Tifinagh (Tfng),
|
|
||||||
Tirhuta (Tirh),
|
|
||||||
Toto (Toto),
|
|
||||||
Ugaritic (Ugar),
|
|
||||||
Vai (Vaii),
|
|
||||||
Vithkuqi (Vith),
|
|
||||||
Wancho (Wcho),
|
|
||||||
Warang_Citi (Wara),
|
|
||||||
Yezidi (Yezi),
|
|
||||||
Yi (Yiii),
|
|
||||||
Zanabazar_Square (Zanb).
|
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
The general category property for \p and \P
|
||||||
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Each character has exactly one Unicode general category property, specified by
|
Each character has exactly one Unicode general category property, specified by
|
||||||
a two-letter abbreviation. For compatibility with Perl, negation can be
|
a two-letter abbreviation. For compatibility with Perl, negation can be
|
||||||
|
@ -1065,22 +922,23 @@ Specifying caseless matching does not affect these escape sequences. For
|
||||||
example, \p{Lu} always matches only upper case letters. This is different from
|
example, \p{Lu} always matches only upper case letters. This is different from
|
||||||
the behaviour of current versions of Perl.
|
the behaviour of current versions of Perl.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
|
||||||
Matching characters by Unicode property is not fast, because PCRE2 has to do a
|
|
||||||
multistage table lookup in order to find a character's property. That is why
|
|
||||||
the traditional escape sequences such as \d and \w do not use Unicode
|
|
||||||
properties in PCRE2 by default, though you can make them do so by setting the
|
|
||||||
PCRE2_UCP option or by starting the pattern with (*UCP).
|
|
||||||
</P>
|
|
||||||
<br><b>
|
<br><b>
|
||||||
Bi-directional properties for \p and \P
|
Binary (yes/no) properties for \p and \P
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
Unicode defines a number of binary properties, that is, properties whose only
|
||||||
|
values are true or false. You can obtain a list of those that are recognized by
|
||||||
|
\p and \P, along with their abbreviations, by running this command:
|
||||||
|
<pre>
|
||||||
|
pcre2test -LP
|
||||||
|
|
||||||
|
</PRE>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
The Bidi_Class property for \p and \P
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Two properties relating to bi-directional text (each with a shorter synonym)
|
|
||||||
are supported:
|
|
||||||
<pre>
|
<pre>
|
||||||
\p{Bidi_Control} matches a Bidi control character
|
|
||||||
\p{Bidi_C} matches a Bidi control character
|
|
||||||
\p{Bidi_Class:<class>} matches a character with the given class
|
\p{Bidi_Class:<class>} matches a character with the given class
|
||||||
\p{BC:<class>} matches a character with the given class
|
\p{BC:<class>} matches a character with the given class
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1110,8 +968,8 @@ The recognized classes are:
|
||||||
S segment separator
|
S segment separator
|
||||||
WS which space
|
WS which space
|
||||||
</pre>
|
</pre>
|
||||||
For Bidi_Class, an equals sign may be used instead of a colon. The class names
|
An equals sign may be used instead of a colon. The class names are
|
||||||
are case-insensitive; only the short names listed above are recognized.
|
case-insensitive; only the short names listed above are recognized.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Extended grapheme clusters
|
Extended grapheme clusters
|
||||||
|
@ -3908,9 +3766,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 December 2021
|
Last updated: 12 January 2022
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2021 University of Cambridge.
|
Copyright © 1997-2022 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -19,30 +19,31 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
|
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
|
||||||
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
|
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
|
||||||
<li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
|
<li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
|
||||||
<li><a name="TOC7" href="#SEC7">SCRIPT MATCHING WITH \p AND \P</a>
|
<li><a name="TOC7" href="#SEC7">BINARY PROPERTIES FOR \p AND \P</a>
|
||||||
<li><a name="TOC8" href="#SEC8">BIDI_PROPERTIES FOR \p AND \P</a>
|
<li><a name="TOC8" href="#SEC8">SCRIPT MATCHING WITH \p AND \P</a>
|
||||||
<li><a name="TOC9" href="#SEC9">CHARACTER CLASSES</a>
|
<li><a name="TOC9" href="#SEC9">THE BIDI_CLASS PROPERTY FOR \p AND \P</a>
|
||||||
<li><a name="TOC10" href="#SEC10">QUANTIFIERS</a>
|
<li><a name="TOC10" href="#SEC10">CHARACTER CLASSES</a>
|
||||||
<li><a name="TOC11" href="#SEC11">ANCHORS AND SIMPLE ASSERTIONS</a>
|
<li><a name="TOC11" href="#SEC11">QUANTIFIERS</a>
|
||||||
<li><a name="TOC12" href="#SEC12">REPORTED MATCH POINT SETTING</a>
|
<li><a name="TOC12" href="#SEC12">ANCHORS AND SIMPLE ASSERTIONS</a>
|
||||||
<li><a name="TOC13" href="#SEC13">ALTERNATION</a>
|
<li><a name="TOC13" href="#SEC13">REPORTED MATCH POINT SETTING</a>
|
||||||
<li><a name="TOC14" href="#SEC14">CAPTURING</a>
|
<li><a name="TOC14" href="#SEC14">ALTERNATION</a>
|
||||||
<li><a name="TOC15" href="#SEC15">ATOMIC GROUPS</a>
|
<li><a name="TOC15" href="#SEC15">CAPTURING</a>
|
||||||
<li><a name="TOC16" href="#SEC16">COMMENT</a>
|
<li><a name="TOC16" href="#SEC16">ATOMIC GROUPS</a>
|
||||||
<li><a name="TOC17" href="#SEC17">OPTION SETTING</a>
|
<li><a name="TOC17" href="#SEC17">COMMENT</a>
|
||||||
<li><a name="TOC18" href="#SEC18">NEWLINE CONVENTION</a>
|
<li><a name="TOC18" href="#SEC18">OPTION SETTING</a>
|
||||||
<li><a name="TOC19" href="#SEC19">WHAT \R MATCHES</a>
|
<li><a name="TOC19" href="#SEC19">NEWLINE CONVENTION</a>
|
||||||
<li><a name="TOC20" href="#SEC20">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
|
<li><a name="TOC20" href="#SEC20">WHAT \R MATCHES</a>
|
||||||
<li><a name="TOC21" href="#SEC21">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
|
<li><a name="TOC21" href="#SEC21">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
|
||||||
<li><a name="TOC22" href="#SEC22">SCRIPT RUNS</a>
|
<li><a name="TOC22" href="#SEC22">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
|
||||||
<li><a name="TOC23" href="#SEC23">BACKREFERENCES</a>
|
<li><a name="TOC23" href="#SEC23">SCRIPT RUNS</a>
|
||||||
<li><a name="TOC24" href="#SEC24">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
|
<li><a name="TOC24" href="#SEC24">BACKREFERENCES</a>
|
||||||
<li><a name="TOC25" href="#SEC25">CONDITIONAL PATTERNS</a>
|
<li><a name="TOC25" href="#SEC25">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
|
||||||
<li><a name="TOC26" href="#SEC26">BACKTRACKING CONTROL</a>
|
<li><a name="TOC26" href="#SEC26">CONDITIONAL PATTERNS</a>
|
||||||
<li><a name="TOC27" href="#SEC27">CALLOUTS</a>
|
<li><a name="TOC27" href="#SEC27">BACKTRACKING CONTROL</a>
|
||||||
<li><a name="TOC28" href="#SEC28">SEE ALSO</a>
|
<li><a name="TOC28" href="#SEC28">CALLOUTS</a>
|
||||||
<li><a name="TOC29" href="#SEC29">AUTHOR</a>
|
<li><a name="TOC29" href="#SEC29">SEE ALSO</a>
|
||||||
<li><a name="TOC30" href="#SEC30">REVISION</a>
|
<li><a name="TOC30" href="#SEC30">AUTHOR</a>
|
||||||
|
<li><a name="TOC31" href="#SEC31">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
|
<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -205,180 +206,27 @@ matching" rules.
|
||||||
Perl and POSIX space are now the same. Perl added VT to its space character set
|
Perl and POSIX space are now the same. Perl added VT to its space character set
|
||||||
at release 5.18.
|
at release 5.18.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC7" href="#TOC1">SCRIPT MATCHING WITH \p AND \P</a><br>
|
<br><a name="SEC7" href="#TOC1">BINARY PROPERTIES FOR \p AND \P</a><br>
|
||||||
<P>
|
<P>
|
||||||
The following script names and their 4-letter abbreviations are recognized in
|
Unicode defines a number of binary properties, that is, properties whose only
|
||||||
|
values are true or false. You can obtain a list of those that are recognized by
|
||||||
|
\p and \P, along with their abbreviations, by running this command:
|
||||||
|
<pre>
|
||||||
|
pcre2test -LP
|
||||||
|
</PRE>
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC8" href="#TOC1">SCRIPT MATCHING WITH \p AND \P</a><br>
|
||||||
|
<P>
|
||||||
|
Many script names and their 4-letter abbreviations are recognized in
|
||||||
\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P of
|
\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P of
|
||||||
course):
|
course). You can obtain a list of these scripts by running this command:
|
||||||
|
<pre>
|
||||||
|
pcre2test -LS
|
||||||
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<br><a name="SEC9" href="#TOC1">THE BIDI_CLASS PROPERTY FOR \p AND \P</a><br>
|
||||||
Adlam (Adlm),
|
|
||||||
Ahom (Ahom),
|
|
||||||
Anatolian_Hieroglyphs (Hluw),
|
|
||||||
Arabic (Arab),
|
|
||||||
Armenian (Armn),
|
|
||||||
Avestan (Avst),
|
|
||||||
Balinese (Bali),
|
|
||||||
Bamum (Bamu),
|
|
||||||
Bassa_Vah (Bass),
|
|
||||||
Batak (Batk),
|
|
||||||
Bengali (Beng),
|
|
||||||
Bhaiksuki (Bhks),
|
|
||||||
Bopomofo (Bopo),
|
|
||||||
Brahmi (Brah),
|
|
||||||
Braille (Brai),
|
|
||||||
Buginese (Bugi),
|
|
||||||
Buhid (Buhd),
|
|
||||||
Canadian_Aboriginal (Cans),
|
|
||||||
Carian (Cari),
|
|
||||||
Caucasian_Albanian (Aghb),
|
|
||||||
Chakma (Cakm),
|
|
||||||
Cham (Cham),
|
|
||||||
Cherokee (Cher),
|
|
||||||
Chorasmian (Chrs),
|
|
||||||
Common (Zyyy),
|
|
||||||
Coptic (Copt),
|
|
||||||
Cuneiform (Xsux),
|
|
||||||
Cypriot (Cprt),
|
|
||||||
Cypro_Minoan (Cpmn),
|
|
||||||
Cyrillic (Cyrl),
|
|
||||||
Deseret (Dsrt),
|
|
||||||
Devanagari (Deva),
|
|
||||||
Dives_Akuru (Diak),
|
|
||||||
Dogra (Dogr),
|
|
||||||
Duployan (Dupl),
|
|
||||||
Egyptian_Hieroglyphs (Egyp),
|
|
||||||
Elbasan (Elba),
|
|
||||||
Elymaic (Elym),
|
|
||||||
Ethiopic (Ethi),
|
|
||||||
Georgian (Geor),
|
|
||||||
Glagolitic (Glag),
|
|
||||||
Gothic (Goth),
|
|
||||||
Grantha (Gran),
|
|
||||||
Greek (Grek),
|
|
||||||
Gujarati (Gujr),
|
|
||||||
Gunjala_Gondi (Gong),
|
|
||||||
Gurmukhi (Guru),
|
|
||||||
Han (Hani),
|
|
||||||
Hangul (Hang),
|
|
||||||
Hanifi_Rohingya (Rohg),
|
|
||||||
Hanunoo (Hano),
|
|
||||||
Hatran (Hatr),
|
|
||||||
Hebrew (Hebr),
|
|
||||||
Hiragana (Hira),
|
|
||||||
Imperial_Aramaic (Armi),
|
|
||||||
Inherited (Zinh),
|
|
||||||
Inscriptional_Pahlavi (Phli),
|
|
||||||
Inscriptional_Parthian (Prti),
|
|
||||||
Javanese (Java),
|
|
||||||
Kaithi (Kthi),
|
|
||||||
Kannada (Knda),
|
|
||||||
Katakana (Kana),
|
|
||||||
Kayah_Li (Kali),
|
|
||||||
Kharoshthi (Khar),
|
|
||||||
Khitan_Small_Script (Kits),
|
|
||||||
Khmer (Khmr),
|
|
||||||
Khojki (Khoj),
|
|
||||||
Khudawadi (Sind),
|
|
||||||
Lao (Laoo),
|
|
||||||
Latin (Latn),
|
|
||||||
Lepcha (Lepc),
|
|
||||||
Limbu (Limb),
|
|
||||||
Linear_A (Lina),
|
|
||||||
Linear_B (Linb),
|
|
||||||
Lisu (Lisu),
|
|
||||||
Lycian (Lyci),
|
|
||||||
Lydian (Lydi),
|
|
||||||
Mahajani (Majh),
|
|
||||||
Makasar (Maka),
|
|
||||||
Malayalam (Mlym),
|
|
||||||
Mandaic (Mand),
|
|
||||||
Manichaean (Mani),
|
|
||||||
Marchen (Marc),
|
|
||||||
Masaram_Gondi (Gonm),
|
|
||||||
Medefaidrin (Medf),
|
|
||||||
Meetei_Mayek (Mtei),
|
|
||||||
Mende_Kikakui (Mend),
|
|
||||||
Meroitic_Cursive (Merc),
|
|
||||||
Meroitic_Hieroglyphs (Mero),
|
|
||||||
Miao (Miao),
|
|
||||||
Modi (Modi),
|
|
||||||
Mongolian (Mong),
|
|
||||||
Mro (Mroo),
|
|
||||||
Multani (Mult),
|
|
||||||
Myanmar (Mymr),
|
|
||||||
Nabataean (Nbar),
|
|
||||||
Nandinagari (Nand),
|
|
||||||
New_Tai_Lue (Talu),
|
|
||||||
Newa (Newa),
|
|
||||||
Nko (Nkoo),
|
|
||||||
Nushu (Nshu),
|
|
||||||
Nyiakeng_Puachue_Hmong (Hmnp),
|
|
||||||
Ogham (Ogam),
|
|
||||||
Ol_Chiki (Olck),
|
|
||||||
Old_Hungarian (Hung),
|
|
||||||
Old_Italic (Olck),
|
|
||||||
Old_North_Arabian (Narb),
|
|
||||||
Old_Permic (Perm),
|
|
||||||
Old_Persian (Orkh),
|
|
||||||
Old_Sogdian (Sogo),
|
|
||||||
Old_South_Arabian (Sarb),
|
|
||||||
Old_Turkic (Orkh),
|
|
||||||
Old_Uyghur (Ougr),
|
|
||||||
Oriya (Orya),
|
|
||||||
Osage (Osge),
|
|
||||||
Osmanya (Osma),
|
|
||||||
Pahawh_Hmong (Hmng),
|
|
||||||
Palmyrene (Palm),
|
|
||||||
Pau_Cin_Hau (Pauc),
|
|
||||||
Phags_Pa (Phag),
|
|
||||||
Phoenician (Phnx),
|
|
||||||
Psalter_Pahlavi (Phli),
|
|
||||||
Rejang (Rjng),
|
|
||||||
Runic (Runr),
|
|
||||||
Samaritan (Samr),
|
|
||||||
Saurashtra (Saur),
|
|
||||||
Sharada (Shrd),
|
|
||||||
Shavian (Shaw),
|
|
||||||
Siddham (Sidd),
|
|
||||||
SignWriting (Sgnw),
|
|
||||||
Sinhala (Sinh),
|
|
||||||
Sogdian (Sogd),
|
|
||||||
Sora_Sompeng (Sora),
|
|
||||||
Soyombo (Soyo),
|
|
||||||
Sundanese (Sund),
|
|
||||||
Syloti_Nagri (Sylo),
|
|
||||||
Syriac (Syrc),
|
|
||||||
Tagalog (Tglg),
|
|
||||||
Tagbanwa (Tagb),
|
|
||||||
Tai_Le (Tale),
|
|
||||||
Tai_Tham (Lana),
|
|
||||||
Tai_Viet (Tavt),
|
|
||||||
Takri (Takr),
|
|
||||||
Tamil (Taml),
|
|
||||||
Tangsa (Tngs),
|
|
||||||
Tangut (Tang),
|
|
||||||
Telugu (Telu),
|
|
||||||
Thaana (Thaa),
|
|
||||||
Thai (Thai),
|
|
||||||
Tibetan (Tibt),
|
|
||||||
Tifinagh (Tfng),
|
|
||||||
Tirhuta (Tirh),
|
|
||||||
Toto (Toto),
|
|
||||||
Ugaritic (Ugar),
|
|
||||||
Vai (Vaii),
|
|
||||||
Vithkuqi (Vith),
|
|
||||||
Wancho (Wcho),
|
|
||||||
Warang_Citi (Wara),
|
|
||||||
Yezidi (Yezi),
|
|
||||||
Yi (Yiii),
|
|
||||||
Zanabazar_Square (Zanb).
|
|
||||||
</P>
|
|
||||||
<br><a name="SEC8" href="#TOC1">BIDI_PROPERTIES FOR \p AND \P</a><br>
|
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
\p{Bidi_Control} matches a Bidi control character
|
|
||||||
\p{Bidi_C} matches a Bidi control character
|
|
||||||
\p{Bidi_Class:<class>} matches a character with the given class
|
\p{Bidi_Class:<class>} matches a character with the given class
|
||||||
\p{BC:<class>} matches a character with the given class
|
\p{BC:<class>} matches a character with the given class
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -409,7 +257,7 @@ The recognized classes are:
|
||||||
WS which space
|
WS which space
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">CHARACTER CLASSES</a><br>
|
<br><a name="SEC10" href="#TOC1">CHARACTER CLASSES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
[...] positive character class
|
[...] positive character class
|
||||||
|
@ -437,7 +285,7 @@ In PCRE2, POSIX character set names recognize only ASCII characters by default,
|
||||||
but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
\Q...\E inside a character class.
|
\Q...\E inside a character class.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC10" href="#TOC1">QUANTIFIERS</a><br>
|
<br><a name="SEC11" href="#TOC1">QUANTIFIERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
? 0 or 1, greedy
|
? 0 or 1, greedy
|
||||||
|
@ -458,7 +306,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
{n,}? n or more, lazy
|
{n,}? n or more, lazy
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
|
<br><a name="SEC12" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
\b word boundary
|
\b word boundary
|
||||||
|
@ -476,7 +324,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
\G first matching position in subject
|
\G first matching position in subject
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
|
<br><a name="SEC13" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
\K set reported start of match
|
\K set reported start of match
|
||||||
|
@ -486,13 +334,13 @@ for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||||
option is set, the previous behaviour is re-enabled. When this option is set,
|
option is set, the previous behaviour is re-enabled. When this option is set,
|
||||||
\K is honoured in positive assertions, but ignored in negative ones.
|
\K is honoured in positive assertions, but ignored in negative ones.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC13" href="#TOC1">ALTERNATION</a><br>
|
<br><a name="SEC14" href="#TOC1">ALTERNATION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
expr|expr|expr...
|
expr|expr|expr...
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC14" href="#TOC1">CAPTURING</a><br>
|
<br><a name="SEC15" href="#TOC1">CAPTURING</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(...) capture group
|
(...) capture group
|
||||||
|
@ -507,20 +355,20 @@ In non-UTF modes, names may contain underscores and ASCII letters and digits;
|
||||||
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
|
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
|
||||||
both cases, a name must not start with a digit.
|
both cases, a name must not start with a digit.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC15" href="#TOC1">ATOMIC GROUPS</a><br>
|
<br><a name="SEC16" href="#TOC1">ATOMIC GROUPS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?>...) atomic non-capture group
|
(?>...) atomic non-capture group
|
||||||
(*atomic:...) atomic non-capture group
|
(*atomic:...) atomic non-capture group
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">COMMENT</a><br>
|
<br><a name="SEC17" href="#TOC1">COMMENT</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?#....) comment (not nestable)
|
(?#....) comment (not nestable)
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC17" href="#TOC1">OPTION SETTING</a><br>
|
<br><a name="SEC18" href="#TOC1">OPTION SETTING</a><br>
|
||||||
<P>
|
<P>
|
||||||
Changes of these options within a group are automatically cancelled at the end
|
Changes of these options within a group are automatically cancelled at the end
|
||||||
of the group.
|
of the group.
|
||||||
|
@ -565,7 +413,7 @@ not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
|
||||||
application can lock out the use of (*UTF) and (*UCP) by setting the
|
application can lock out the use of (*UTF) and (*UCP) by setting the
|
||||||
PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
|
PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC18" href="#TOC1">NEWLINE CONVENTION</a><br>
|
<br><a name="SEC19" href="#TOC1">NEWLINE CONVENTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
These are recognized only at the very start of the pattern or after option
|
These are recognized only at the very start of the pattern or after option
|
||||||
settings with a similar syntax.
|
settings with a similar syntax.
|
||||||
|
@ -578,7 +426,7 @@ settings with a similar syntax.
|
||||||
(*NUL) the NUL character (binary zero)
|
(*NUL) the NUL character (binary zero)
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">WHAT \R MATCHES</a><br>
|
<br><a name="SEC20" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||||
<P>
|
<P>
|
||||||
These are recognized only at the very start of the pattern or after option
|
These are recognized only at the very start of the pattern or after option
|
||||||
setting with a similar syntax.
|
setting with a similar syntax.
|
||||||
|
@ -587,7 +435,7 @@ setting with a similar syntax.
|
||||||
(*BSR_UNICODE) any Unicode newline sequence
|
(*BSR_UNICODE) any Unicode newline sequence
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
<br><a name="SEC21" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?=...) )
|
(?=...) )
|
||||||
|
@ -608,7 +456,7 @@ setting with a similar syntax.
|
||||||
</pre>
|
</pre>
|
||||||
Each top-level branch of a lookbehind must be of a fixed length.
|
Each top-level branch of a lookbehind must be of a fixed length.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
These assertions are specific to PCRE2 and are not Perl-compatible.
|
These assertions are specific to PCRE2 and are not Perl-compatible.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -621,7 +469,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
||||||
(*non_atomic_positive_lookbehind:...) )
|
(*non_atomic_positive_lookbehind:...) )
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC22" href="#TOC1">SCRIPT RUNS</a><br>
|
<br><a name="SEC23" href="#TOC1">SCRIPT RUNS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(*script_run:...) ) script run, can be backtracked into
|
(*script_run:...) ) script run, can be backtracked into
|
||||||
|
@ -631,7 +479,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
||||||
(*asr:...) )
|
(*asr:...) )
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC23" href="#TOC1">BACKREFERENCES</a><br>
|
<br><a name="SEC24" href="#TOC1">BACKREFERENCES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
\n reference by number (can be ambiguous)
|
\n reference by number (can be ambiguous)
|
||||||
|
@ -648,7 +496,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
||||||
(?P=name) reference by name (Python)
|
(?P=name) reference by name (Python)
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC24" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
|
<br><a name="SEC25" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?R) recurse whole pattern
|
(?R) recurse whole pattern
|
||||||
|
@ -667,7 +515,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
|
||||||
\g'-n' call subroutine by relative number (PCRE2 extension)
|
\g'-n' call subroutine by relative number (PCRE2 extension)
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC25" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
<br><a name="SEC26" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?(condition)yes-pattern)
|
(?(condition)yes-pattern)
|
||||||
|
@ -690,7 +538,7 @@ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
|
||||||
conditions or recursion tests. Such a condition is interpreted as a reference
|
conditions or recursion tests. Such a condition is interpreted as a reference
|
||||||
condition if the relevant named group exists.
|
condition if the relevant named group exists.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC26" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
<br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||||
<P>
|
<P>
|
||||||
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
|
||||||
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
|
||||||
|
@ -717,7 +565,7 @@ pattern is not anchored.
|
||||||
The effect of one of these verbs in a group called as a subroutine is confined
|
The effect of one of these verbs in a group called as a subroutine is confined
|
||||||
to the subroutine call.
|
to the subroutine call.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">CALLOUTS</a><br>
|
<br><a name="SEC28" href="#TOC1">CALLOUTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
(?C) callout (assumed number 0)
|
(?C) callout (assumed number 0)
|
||||||
|
@ -728,12 +576,12 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
|
||||||
start and the end), and the starting delimiter { matched with the ending
|
start and the end), and the starting delimiter { matched with the ending
|
||||||
delimiter }. To encode the ending delimiter within the string, double it.
|
delimiter }. To encode the ending delimiter within the string, double it.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC29" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
||||||
<b>pcre2matching</b>(3), <b>pcre2</b>(3).
|
<b>pcre2matching</b>(3), <b>pcre2</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC30" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -742,11 +590,11 @@ Retired from University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 December 2021
|
Last updated: 12 January 2022
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2021 University of Cambridge.
|
Copyright © 1997-2022 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -253,7 +253,19 @@ available, and the use of JIT for matching is verified.
|
||||||
<b>-LM</b>
|
<b>-LM</b>
|
||||||
List modifiers: write a list of available pattern and subject modifiers to the
|
List modifiers: write a list of available pattern and subject modifiers to the
|
||||||
standard output, then exit with zero exit code. All other options are ignored.
|
standard output, then exit with zero exit code. All other options are ignored.
|
||||||
If both -C and -LM are present, whichever is first is recognized.
|
If both -C and any -Lx options are present, whichever is first is recognized.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>-LP</b>
|
||||||
|
List properties: write a list of recognized Unicode properties to the standard
|
||||||
|
output, then exit with zero exit code. All other options are ignored. If both
|
||||||
|
-C and any -Lx options are present, whichever is first is recognized.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>-LS</b>
|
||||||
|
List scripts: write a list of recogized Unicode script names to the standard
|
||||||
|
output, then exit with zero exit code. All other options are ignored. If both
|
||||||
|
-C and any -Lx options are present, whichever is first is recognized.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-pattern</b> <i>modifier-list</i>
|
<b>-pattern</b> <i>modifier-list</i>
|
||||||
|
@ -2129,9 +2141,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 November 2021
|
Last updated: 12 January 2022
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2021 University of Cambridge.
|
Copyright © 1997-2022 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
172
doc/pcre2.txt
172
doc/pcre2.txt
|
@ -6889,8 +6889,16 @@ BACKSLASH
|
||||||
ters whose code points are less than U+0100 and U+10000, respectively.
|
ters whose code points are less than U+0100 and U+10000, respectively.
|
||||||
In 32-bit non-UTF mode, code points greater than 0x10ffff (the Unicode
|
In 32-bit non-UTF mode, code points greater than 0x10ffff (the Unicode
|
||||||
limit) may be encountered. These are all treated as being in the Un-
|
limit) may be encountered. These are all treated as being in the Un-
|
||||||
known script and with an unassigned type. The extra escape sequences
|
known script and with an unassigned type.
|
||||||
are:
|
|
||||||
|
Matching characters by Unicode property is not fast, because PCRE2 has
|
||||||
|
to do a multistage table lookup in order to find a character's prop-
|
||||||
|
erty. That is why the traditional escape sequences such as \d and \w do
|
||||||
|
not use Unicode properties in PCRE2 by default, though you can make
|
||||||
|
them do so by setting the PCRE2_UCP option or by starting the pattern
|
||||||
|
with (*UCP).
|
||||||
|
|
||||||
|
The extra escape sequences that provide property support are:
|
||||||
|
|
||||||
\p{xx} a character with the xx property
|
\p{xx} a character with the xx property
|
||||||
\P{xx} a character without the xx property
|
\P{xx} a character without the xx property
|
||||||
|
@ -6900,14 +6908,17 @@ BACKSLASH
|
||||||
in accordance with Unicode's "loose matching" rules, spaces, hyphens,
|
in accordance with Unicode's "loose matching" rules, spaces, hyphens,
|
||||||
and underscores are ignored. There is support for Unicode script names,
|
and underscores are ignored. There is support for Unicode script names,
|
||||||
Unicode general category properties, "Any", which matches any character
|
Unicode general category properties, "Any", which matches any character
|
||||||
(including newline), Bidi_Control, Bidi_Class, and some special PCRE2
|
(including newline), Bidi_Class, a number of binary (yes/no) proper-
|
||||||
properties (described below). Other Perl properties such as "InMusi-
|
ties, and some special PCRE2 properties (described below). Certain
|
||||||
calSymbols" are not supported by PCRE2. Note that \P{Any} does not
|
other Perl properties such as "InMusicalSymbols" are not supported by
|
||||||
match any characters, so always causes a match failure.
|
PCRE2. Note that \P{Any} does not match any characters, so always
|
||||||
|
causes a match failure.
|
||||||
|
|
||||||
|
Script properties for \p and \P
|
||||||
|
|
||||||
There are three different syntax forms for matching a script. Each Uni-
|
There are three different syntax forms for matching a script. Each Uni-
|
||||||
code character has a basic script and, optionally, a list of other
|
code character has a basic script and, optionally, a list of other
|
||||||
scripts ("Script Extentions") with which it is commonly used. Using the
|
scripts ("Script Extensions") with which it is commonly used. Using the
|
||||||
Adlam script as an example, \p{sc:Adlam} matches characters whose basic
|
Adlam script as an example, \p{sc:Adlam} matches characters whose basic
|
||||||
script is Adlam, whereas \p{scx:Adlam} matches, in addition, characters
|
script is Adlam, whereas \p{scx:Adlam} matches, in addition, characters
|
||||||
that have Adlam in their extensions list. The full names "script" and
|
that have Adlam in their extensions list. The full names "script" and
|
||||||
|
@ -6920,51 +6931,13 @@ BACKSLASH
|
||||||
Unassigned characters (and in non-UTF 32-bit mode, characters with code
|
Unassigned characters (and in non-UTF 32-bit mode, characters with code
|
||||||
points greater than 0x10FFFF) are assigned the "Unknown" script. Others
|
points greater than 0x10FFFF) are assigned the "Unknown" script. Others
|
||||||
that are not part of an identified script are lumped together as "Com-
|
that are not part of an identified script are lumped together as "Com-
|
||||||
mon". The current list of script names and their 4-letter abbreviations
|
mon". The current list of recognized script names and their 4-character
|
||||||
is:
|
abbreviations can be obtained by running this command:
|
||||||
|
|
||||||
Adlam (Adlm), Ahom (Ahom), Anatolian_Hieroglyphs (Hluw), Arabic (Arab),
|
pcre2test -LS
|
||||||
Armenian (Armn), Avestan (Avst), Balinese (Bali), Bamum (Bamu),
|
|
||||||
Bassa_Vah (Bass), Batak (Batk), Bengali (Beng), Bhaiksuki (Bhks), Bopo-
|
|
||||||
mofo (Bopo), Brahmi (Brah), Braille (Brai), Buginese (Bugi), Buhid
|
The general category property for \p and \P
|
||||||
(Buhd), Canadian_Aboriginal (Cans), Carian (Cari), Caucasian_Albanian
|
|
||||||
(Aghb), Chakma (Cakm), Cham (Cham), Cherokee (Cher), Chorasmian (Chrs),
|
|
||||||
Common (Zyyy), Coptic (Copt), Cuneiform (Xsux), Cypriot (Cprt),
|
|
||||||
Cypro_Minoan (Cpmn), Cyrillic (Cyrl), Deseret (Dsrt), Devanagari
|
|
||||||
(Deva), Dives_Akuru (Diak), Dogra (Dogr), Duployan (Dupl), Egyptian_Hi-
|
|
||||||
eroglyphs (Egyp), Elbasan (Elba), Elymaic (Elym), Ethiopic (Ethi),
|
|
||||||
Georgian (Geor), Glagolitic (Glag), Gothic (Goth), Grantha (Gran),
|
|
||||||
Greek (Grek), Gujarati (Gujr), Gunjala_Gondi (Gong), Gurmukhi (Guru),
|
|
||||||
Han (Hani), Hangul (Hang), Hanifi_Rohingya (Rohg), Hanunoo (Hano), Ha-
|
|
||||||
tran (Hatr), Hebrew (Hebr), Hiragana (Hira), Imperial_Aramaic (Armi),
|
|
||||||
Inherited (Zinh), Inscriptional_Pahlavi (Phli), Inscriptional_Parthian
|
|
||||||
(Prti), Javanese (Java), Kaithi (Kthi), Kannada (Knda), Katakana
|
|
||||||
(Kana), Kayah_Li (Kali), Kharoshthi (Khar), Khitan_Small_Script (Kits),
|
|
||||||
Khmer (Khmr), Khojki (Khoj), Khudawadi (Sind), Lao (Laoo), Latin
|
|
||||||
(Latn), Lepcha (Lepc), Limbu (Limb), Linear_A (Lina), Linear_B (Linb),
|
|
||||||
Lisu (Lisu), Lycian (Lyci), Lydian (Lydi), Mahajani (Majh), Makasar
|
|
||||||
(Maka), Malayalam (Mlym), Mandaic (Mand), Manichaean (Mani), Marchen
|
|
||||||
(Marc), Masaram_Gondi (Gonm), Medefaidrin (Medf), Meetei_Mayek (Mtei),
|
|
||||||
Mende_Kikakui (Mend), Meroitic_Cursive (Merc), Meroitic_Hieroglyphs
|
|
||||||
(Mero), Miao (Miao), Modi (Modi), Mongolian (Mong), Mro (Mroo), Multani
|
|
||||||
(Mult), Myanmar (Mymr), Nabataean (Nbar), Nandinagari (Nand),
|
|
||||||
New_Tai_Lue (Talu), Newa (Newa), Nko (Nkoo), Nushu (Nshu), Nyiak-
|
|
||||||
eng_Puachue_Hmong (Hmnp), Ogham (Ogam), Ol_Chiki (Olck), Old_Hungarian
|
|
||||||
(Hung), Old_Italic (Olck), Old_North_Arabian (Narb), Old_Permic (Perm),
|
|
||||||
Old_Persian (Orkh), Old_Sogdian (Sogo), Old_South_Arabian (Sarb),
|
|
||||||
Old_Turkic (Orkh), Old_Uyghur (Ougr), Oriya (Orya), Osage (Osge), Os-
|
|
||||||
manya (Osma), Pahawh_Hmong (Hmng), Palmyrene (Palm), Pau_Cin_Hau
|
|
||||||
(Pauc), Phags_Pa (Phag), Phoenician (Phnx), Psalter_Pahlavi (Phli), Re-
|
|
||||||
jang (Rjng), Runic (Runr), Samaritan (Samr), Saurashtra (Saur), Sharada
|
|
||||||
(Shrd), Shavian (Shaw), Siddham (Sidd), SignWriting (Sgnw), Sinhala
|
|
||||||
(Sinh), Sogdian (Sogd), Sora_Sompeng (Sora), Soyombo (Soyo), Sundanese
|
|
||||||
(Sund), Syloti_Nagri (Sylo), Syriac (Syrc), Tagalog (Tglg), Tagbanwa
|
|
||||||
(Tagb), Tai_Le (Tale), Tai_Tham (Lana), Tai_Viet (Tavt), Takri (Takr),
|
|
||||||
Tamil (Taml), Tangsa (Tngs), Tangut (Tang), Telugu (Telu), Thaana
|
|
||||||
(Thaa), Thai (Thai), Tibetan (Tibt), Tifinagh (Tfng), Tirhuta (Tirh),
|
|
||||||
Toto (Toto), Ugaritic (Ugar), Vai (Vaii), Vithkuqi (Vith), Wancho
|
|
||||||
(Wcho), Warang_Citi (Wara), Yezidi (Yezi), Yi (Yiii), Zanabazar_Square
|
|
||||||
(Zanb).
|
|
||||||
|
|
||||||
Each character has exactly one Unicode general category property, spec-
|
Each character has exactly one Unicode general category property, spec-
|
||||||
ified by a two-letter abbreviation. For compatibility with Perl, nega-
|
ified by a two-letter abbreviation. For compatibility with Perl, nega-
|
||||||
|
@ -7050,20 +7023,18 @@ BACKSLASH
|
||||||
For example, \p{Lu} always matches only upper case letters. This is
|
For example, \p{Lu} always matches only upper case letters. This is
|
||||||
different from the behaviour of current versions of Perl.
|
different from the behaviour of current versions of Perl.
|
||||||
|
|
||||||
Matching characters by Unicode property is not fast, because PCRE2 has
|
Binary (yes/no) properties for \p and \P
|
||||||
to do a multistage table lookup in order to find a character's prop-
|
|
||||||
erty. That is why the traditional escape sequences such as \d and \w do
|
|
||||||
not use Unicode properties in PCRE2 by default, though you can make
|
|
||||||
them do so by setting the PCRE2_UCP option or by starting the pattern
|
|
||||||
with (*UCP).
|
|
||||||
|
|
||||||
Bi-directional properties for \p and \P
|
Unicode defines a number of binary properties, that is, properties
|
||||||
|
whose only values are true or false. You can obtain a list of those
|
||||||
|
that are recognized by \p and \P, along with their abbreviations, by
|
||||||
|
running this command:
|
||||||
|
|
||||||
Two properties relating to bi-directional text (each with a shorter
|
pcre2test -LP
|
||||||
synonym) are supported:
|
|
||||||
|
|
||||||
|
The Bidi_Class property for \p and \P
|
||||||
|
|
||||||
\p{Bidi_Control} matches a Bidi control character
|
|
||||||
\p{Bidi_C} matches a Bidi control character
|
|
||||||
\p{Bidi_Class:<class>} matches a character with the given class
|
\p{Bidi_Class:<class>} matches a character with the given class
|
||||||
\p{BC:<class>} matches a character with the given class
|
\p{BC:<class>} matches a character with the given class
|
||||||
|
|
||||||
|
@ -7093,9 +7064,8 @@ BACKSLASH
|
||||||
S segment separator
|
S segment separator
|
||||||
WS which space
|
WS which space
|
||||||
|
|
||||||
For Bidi_Class, an equals sign may be used instead of a colon. The
|
An equals sign may be used instead of a colon. The class names are
|
||||||
class names are case-insensitive; only the short names listed above are
|
case-insensitive; only the short names listed above are recognized.
|
||||||
recognized.
|
|
||||||
|
|
||||||
Extended grapheme clusters
|
Extended grapheme clusters
|
||||||
|
|
||||||
|
@ -9725,8 +9695,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 28 December 2021
|
Last updated: 12 January 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@ -10739,60 +10709,28 @@ PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P
|
||||||
acter set at release 5.18.
|
acter set at release 5.18.
|
||||||
|
|
||||||
|
|
||||||
|
BINARY PROPERTIES FOR \p AND \P
|
||||||
|
|
||||||
|
Unicode defines a number of binary properties, that is, properties
|
||||||
|
whose only values are true or false. You can obtain a list of those
|
||||||
|
that are recognized by \p and \P, along with their abbreviations, by
|
||||||
|
running this command:
|
||||||
|
|
||||||
|
pcre2test -LP
|
||||||
|
|
||||||
|
|
||||||
SCRIPT MATCHING WITH \p AND \P
|
SCRIPT MATCHING WITH \p AND \P
|
||||||
|
|
||||||
The following script names and their 4-letter abbreviations are recog-
|
Many script names and their 4-letter abbreviations are recognized in
|
||||||
nized in \p{sc:...} or \p{scx:...} items, or on their own with \p (and
|
\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P
|
||||||
also \P of course):
|
of course). You can obtain a list of these scripts by running this com-
|
||||||
|
mand:
|
||||||
|
|
||||||
Adlam (Adlm), Ahom (Ahom), Anatolian_Hieroglyphs (Hluw), Arabic (Arab),
|
pcre2test -LS
|
||||||
Armenian (Armn), Avestan (Avst), Balinese (Bali), Bamum (Bamu),
|
|
||||||
Bassa_Vah (Bass), Batak (Batk), Bengali (Beng), Bhaiksuki (Bhks), Bopo-
|
|
||||||
mofo (Bopo), Brahmi (Brah), Braille (Brai), Buginese (Bugi), Buhid
|
|
||||||
(Buhd), Canadian_Aboriginal (Cans), Carian (Cari), Caucasian_Albanian
|
|
||||||
(Aghb), Chakma (Cakm), Cham (Cham), Cherokee (Cher), Chorasmian (Chrs),
|
|
||||||
Common (Zyyy), Coptic (Copt), Cuneiform (Xsux), Cypriot (Cprt),
|
|
||||||
Cypro_Minoan (Cpmn), Cyrillic (Cyrl), Deseret (Dsrt), Devanagari
|
|
||||||
(Deva), Dives_Akuru (Diak), Dogra (Dogr), Duployan (Dupl), Egyptian_Hi-
|
|
||||||
eroglyphs (Egyp), Elbasan (Elba), Elymaic (Elym), Ethiopic (Ethi),
|
|
||||||
Georgian (Geor), Glagolitic (Glag), Gothic (Goth), Grantha (Gran),
|
|
||||||
Greek (Grek), Gujarati (Gujr), Gunjala_Gondi (Gong), Gurmukhi (Guru),
|
|
||||||
Han (Hani), Hangul (Hang), Hanifi_Rohingya (Rohg), Hanunoo (Hano), Ha-
|
|
||||||
tran (Hatr), Hebrew (Hebr), Hiragana (Hira), Imperial_Aramaic (Armi),
|
|
||||||
Inherited (Zinh), Inscriptional_Pahlavi (Phli), Inscriptional_Parthian
|
|
||||||
(Prti), Javanese (Java), Kaithi (Kthi), Kannada (Knda), Katakana
|
|
||||||
(Kana), Kayah_Li (Kali), Kharoshthi (Khar), Khitan_Small_Script (Kits),
|
|
||||||
Khmer (Khmr), Khojki (Khoj), Khudawadi (Sind), Lao (Laoo), Latin
|
|
||||||
(Latn), Lepcha (Lepc), Limbu (Limb), Linear_A (Lina), Linear_B (Linb),
|
|
||||||
Lisu (Lisu), Lycian (Lyci), Lydian (Lydi), Mahajani (Majh), Makasar
|
|
||||||
(Maka), Malayalam (Mlym), Mandaic (Mand), Manichaean (Mani), Marchen
|
|
||||||
(Marc), Masaram_Gondi (Gonm), Medefaidrin (Medf), Meetei_Mayek (Mtei),
|
|
||||||
Mende_Kikakui (Mend), Meroitic_Cursive (Merc), Meroitic_Hieroglyphs
|
|
||||||
(Mero), Miao (Miao), Modi (Modi), Mongolian (Mong), Mro (Mroo), Multani
|
|
||||||
(Mult), Myanmar (Mymr), Nabataean (Nbar), Nandinagari (Nand),
|
|
||||||
New_Tai_Lue (Talu), Newa (Newa), Nko (Nkoo), Nushu (Nshu), Nyiak-
|
|
||||||
eng_Puachue_Hmong (Hmnp), Ogham (Ogam), Ol_Chiki (Olck), Old_Hungarian
|
|
||||||
(Hung), Old_Italic (Olck), Old_North_Arabian (Narb), Old_Permic (Perm),
|
|
||||||
Old_Persian (Orkh), Old_Sogdian (Sogo), Old_South_Arabian (Sarb),
|
|
||||||
Old_Turkic (Orkh), Old_Uyghur (Ougr), Oriya (Orya), Osage (Osge), Os-
|
|
||||||
manya (Osma), Pahawh_Hmong (Hmng), Palmyrene (Palm), Pau_Cin_Hau
|
|
||||||
(Pauc), Phags_Pa (Phag), Phoenician (Phnx), Psalter_Pahlavi (Phli), Re-
|
|
||||||
jang (Rjng), Runic (Runr), Samaritan (Samr), Saurashtra (Saur), Sharada
|
|
||||||
(Shrd), Shavian (Shaw), Siddham (Sidd), SignWriting (Sgnw), Sinhala
|
|
||||||
(Sinh), Sogdian (Sogd), Sora_Sompeng (Sora), Soyombo (Soyo), Sundanese
|
|
||||||
(Sund), Syloti_Nagri (Sylo), Syriac (Syrc), Tagalog (Tglg), Tagbanwa
|
|
||||||
(Tagb), Tai_Le (Tale), Tai_Tham (Lana), Tai_Viet (Tavt), Takri (Takr),
|
|
||||||
Tamil (Taml), Tangsa (Tngs), Tangut (Tang), Telugu (Telu), Thaana
|
|
||||||
(Thaa), Thai (Thai), Tibetan (Tibt), Tifinagh (Tfng), Tirhuta (Tirh),
|
|
||||||
Toto (Toto), Ugaritic (Ugar), Vai (Vaii), Vithkuqi (Vith), Wancho
|
|
||||||
(Wcho), Warang_Citi (Wara), Yezidi (Yezi), Yi (Yiii), Zanabazar_Square
|
|
||||||
(Zanb).
|
|
||||||
|
|
||||||
|
|
||||||
BIDI_PROPERTIES FOR \p AND \P
|
THE BIDI_CLASS PROPERTY FOR \p AND \P
|
||||||
|
|
||||||
\p{Bidi_Control} matches a Bidi control character
|
|
||||||
\p{Bidi_C} matches a Bidi control character
|
|
||||||
\p{Bidi_Class:<class>} matches a character with the given class
|
\p{Bidi_Class:<class>} matches a character with the given class
|
||||||
\p{BC:<class>} matches a character with the given class
|
\p{BC:<class>} matches a character with the given class
|
||||||
|
|
||||||
|
@ -11152,8 +11090,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 28 December 2021
|
Last updated: 12 January 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "28 December 2021" "PCRE2 10.40"
|
.TH PCRE2PATTERN 3 "12 January 2022" "PCRE2 10.40"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -772,8 +772,15 @@ can be used in any mode, though in 8-bit and 16-bit non-UTF modes these
|
||||||
sequences are of course limited to testing characters whose code points are
|
sequences are of course limited to testing characters whose code points are
|
||||||
less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points
|
less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points
|
||||||
greater than 0x10ffff (the Unicode limit) may be encountered. These are all
|
greater than 0x10ffff (the Unicode limit) may be encountered. These are all
|
||||||
treated as being in the Unknown script and with an unassigned type. The extra
|
treated as being in the Unknown script and with an unassigned type.
|
||||||
escape sequences are:
|
.P
|
||||||
|
Matching characters by Unicode property is not fast, because PCRE2 has to do a
|
||||||
|
multistage table lookup in order to find a character's property. That is why
|
||||||
|
the traditional escape sequences such as \ed and \ew do not use Unicode
|
||||||
|
properties in PCRE2 by default, though you can make them do so by setting the
|
||||||
|
PCRE2_UCP option or by starting the pattern with (*UCP).
|
||||||
|
.P
|
||||||
|
The extra escape sequences that provide property support are:
|
||||||
.sp
|
.sp
|
||||||
\ep{\fIxx\fP} a character with the \fIxx\fP property
|
\ep{\fIxx\fP} a character with the \fIxx\fP property
|
||||||
\eP{\fIxx\fP} a character without the \fIxx\fP property
|
\eP{\fIxx\fP} a character without the \fIxx\fP property
|
||||||
|
@ -783,19 +790,24 @@ The property names represented by \fIxx\fP above are not case-sensitive, and in
|
||||||
accordance with Unicode's "loose matching" rules, spaces, hyphens, and
|
accordance with Unicode's "loose matching" rules, spaces, hyphens, and
|
||||||
underscores are ignored. There is support for Unicode script names, Unicode
|
underscores are ignored. There is support for Unicode script names, Unicode
|
||||||
general category properties, "Any", which matches any character (including
|
general category properties, "Any", which matches any character (including
|
||||||
newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
|
newline), Bidi_Class, a number of binary (yes/no) properties, and some special
|
||||||
(described
|
PCRE2 properties (described
|
||||||
.\" HTML <a href="#extraprops">
|
.\" HTML <a href="#extraprops">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
below).
|
below).
|
||||||
.\"
|
.\"
|
||||||
Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
|
Certain other Perl properties such as "InMusicalSymbols" are not supported by
|
||||||
Note that \eP{Any} does not match any characters, so always causes a match
|
PCRE2. Note that \eP{Any} does not match any characters, so always causes a
|
||||||
failure.
|
match failure.
|
||||||
.P
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Script properties for \ep and \eP"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
There are three different syntax forms for matching a script. Each Unicode
|
There are three different syntax forms for matching a script. Each Unicode
|
||||||
character has a basic script and, optionally, a list of other scripts ("Script
|
character has a basic script and, optionally, a list of other scripts ("Script
|
||||||
Extentions") with which it is commonly used. Using the Adlam script as an
|
Extensions") with which it is commonly used. Using the Adlam script as an
|
||||||
example, \ep{sc:Adlam} matches characters whose basic script is Adlam, whereas
|
example, \ep{sc:Adlam} matches characters whose basic script is Adlam, whereas
|
||||||
\ep{scx:Adlam} matches, in addition, characters that have Adlam in their
|
\ep{scx:Adlam} matches, in addition, characters that have Adlam in their
|
||||||
extensions list. The full names "script" and "script extensions" for the
|
extensions list. The full names "script" and "script extensions" for the
|
||||||
|
@ -807,170 +819,17 @@ interpretation at release 5.26 and PCRE2 changed at release 10.40.
|
||||||
Unassigned characters (and in non-UTF 32-bit mode, characters with code points
|
Unassigned characters (and in non-UTF 32-bit mode, characters with code points
|
||||||
greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not
|
greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not
|
||||||
part of an identified script are lumped together as "Common". The current list
|
part of an identified script are lumped together as "Common". The current list
|
||||||
of script names and their 4-letter abbreviations is:
|
of recognized script names and their 4-character abbreviations can be obtained
|
||||||
.P
|
by running this command:
|
||||||
Adlam (Adlm),
|
.sp
|
||||||
Ahom (Ahom),
|
pcre2test -LS
|
||||||
Anatolian_Hieroglyphs (Hluw),
|
.sp
|
||||||
Arabic (Arab),
|
.
|
||||||
Armenian (Armn),
|
.
|
||||||
Avestan (Avst),
|
.
|
||||||
Balinese (Bali),
|
.SS "The general category property for \ep and \eP"
|
||||||
Bamum (Bamu),
|
.rs
|
||||||
Bassa_Vah (Bass),
|
.sp
|
||||||
Batak (Batk),
|
|
||||||
Bengali (Beng),
|
|
||||||
Bhaiksuki (Bhks),
|
|
||||||
Bopomofo (Bopo),
|
|
||||||
Brahmi (Brah),
|
|
||||||
Braille (Brai),
|
|
||||||
Buginese (Bugi),
|
|
||||||
Buhid (Buhd),
|
|
||||||
Canadian_Aboriginal (Cans),
|
|
||||||
Carian (Cari),
|
|
||||||
Caucasian_Albanian (Aghb),
|
|
||||||
Chakma (Cakm),
|
|
||||||
Cham (Cham),
|
|
||||||
Cherokee (Cher),
|
|
||||||
Chorasmian (Chrs),
|
|
||||||
Common (Zyyy),
|
|
||||||
Coptic (Copt),
|
|
||||||
Cuneiform (Xsux),
|
|
||||||
Cypriot (Cprt),
|
|
||||||
Cypro_Minoan (Cpmn),
|
|
||||||
Cyrillic (Cyrl),
|
|
||||||
Deseret (Dsrt),
|
|
||||||
Devanagari (Deva),
|
|
||||||
Dives_Akuru (Diak),
|
|
||||||
Dogra (Dogr),
|
|
||||||
Duployan (Dupl),
|
|
||||||
Egyptian_Hieroglyphs (Egyp),
|
|
||||||
Elbasan (Elba),
|
|
||||||
Elymaic (Elym),
|
|
||||||
Ethiopic (Ethi),
|
|
||||||
Georgian (Geor),
|
|
||||||
Glagolitic (Glag),
|
|
||||||
Gothic (Goth),
|
|
||||||
Grantha (Gran),
|
|
||||||
Greek (Grek),
|
|
||||||
Gujarati (Gujr),
|
|
||||||
Gunjala_Gondi (Gong),
|
|
||||||
Gurmukhi (Guru),
|
|
||||||
Han (Hani),
|
|
||||||
Hangul (Hang),
|
|
||||||
Hanifi_Rohingya (Rohg),
|
|
||||||
Hanunoo (Hano),
|
|
||||||
Hatran (Hatr),
|
|
||||||
Hebrew (Hebr),
|
|
||||||
Hiragana (Hira),
|
|
||||||
Imperial_Aramaic (Armi),
|
|
||||||
Inherited (Zinh),
|
|
||||||
Inscriptional_Pahlavi (Phli),
|
|
||||||
Inscriptional_Parthian (Prti),
|
|
||||||
Javanese (Java),
|
|
||||||
Kaithi (Kthi),
|
|
||||||
Kannada (Knda),
|
|
||||||
Katakana (Kana),
|
|
||||||
Kayah_Li (Kali),
|
|
||||||
Kharoshthi (Khar),
|
|
||||||
Khitan_Small_Script (Kits),
|
|
||||||
Khmer (Khmr),
|
|
||||||
Khojki (Khoj),
|
|
||||||
Khudawadi (Sind),
|
|
||||||
Lao (Laoo),
|
|
||||||
Latin (Latn),
|
|
||||||
Lepcha (Lepc),
|
|
||||||
Limbu (Limb),
|
|
||||||
Linear_A (Lina),
|
|
||||||
Linear_B (Linb),
|
|
||||||
Lisu (Lisu),
|
|
||||||
Lycian (Lyci),
|
|
||||||
Lydian (Lydi),
|
|
||||||
Mahajani (Majh),
|
|
||||||
Makasar (Maka),
|
|
||||||
Malayalam (Mlym),
|
|
||||||
Mandaic (Mand),
|
|
||||||
Manichaean (Mani),
|
|
||||||
Marchen (Marc),
|
|
||||||
Masaram_Gondi (Gonm),
|
|
||||||
Medefaidrin (Medf),
|
|
||||||
Meetei_Mayek (Mtei),
|
|
||||||
Mende_Kikakui (Mend),
|
|
||||||
Meroitic_Cursive (Merc),
|
|
||||||
Meroitic_Hieroglyphs (Mero),
|
|
||||||
Miao (Miao),
|
|
||||||
Modi (Modi),
|
|
||||||
Mongolian (Mong),
|
|
||||||
Mro (Mroo),
|
|
||||||
Multani (Mult),
|
|
||||||
Myanmar (Mymr),
|
|
||||||
Nabataean (Nbar),
|
|
||||||
Nandinagari (Nand),
|
|
||||||
New_Tai_Lue (Talu),
|
|
||||||
Newa (Newa),
|
|
||||||
Nko (Nkoo),
|
|
||||||
Nushu (Nshu),
|
|
||||||
Nyiakeng_Puachue_Hmong (Hmnp),
|
|
||||||
Ogham (Ogam),
|
|
||||||
Ol_Chiki (Olck),
|
|
||||||
Old_Hungarian (Hung),
|
|
||||||
Old_Italic (Olck),
|
|
||||||
Old_North_Arabian (Narb),
|
|
||||||
Old_Permic (Perm),
|
|
||||||
Old_Persian (Orkh),
|
|
||||||
Old_Sogdian (Sogo),
|
|
||||||
Old_South_Arabian (Sarb),
|
|
||||||
Old_Turkic (Orkh),
|
|
||||||
Old_Uyghur (Ougr),
|
|
||||||
Oriya (Orya),
|
|
||||||
Osage (Osge),
|
|
||||||
Osmanya (Osma),
|
|
||||||
Pahawh_Hmong (Hmng),
|
|
||||||
Palmyrene (Palm),
|
|
||||||
Pau_Cin_Hau (Pauc),
|
|
||||||
Phags_Pa (Phag),
|
|
||||||
Phoenician (Phnx),
|
|
||||||
Psalter_Pahlavi (Phli),
|
|
||||||
Rejang (Rjng),
|
|
||||||
Runic (Runr),
|
|
||||||
Samaritan (Samr),
|
|
||||||
Saurashtra (Saur),
|
|
||||||
Sharada (Shrd),
|
|
||||||
Shavian (Shaw),
|
|
||||||
Siddham (Sidd),
|
|
||||||
SignWriting (Sgnw),
|
|
||||||
Sinhala (Sinh),
|
|
||||||
Sogdian (Sogd),
|
|
||||||
Sora_Sompeng (Sora),
|
|
||||||
Soyombo (Soyo),
|
|
||||||
Sundanese (Sund),
|
|
||||||
Syloti_Nagri (Sylo),
|
|
||||||
Syriac (Syrc),
|
|
||||||
Tagalog (Tglg),
|
|
||||||
Tagbanwa (Tagb),
|
|
||||||
Tai_Le (Tale),
|
|
||||||
Tai_Tham (Lana),
|
|
||||||
Tai_Viet (Tavt),
|
|
||||||
Takri (Takr),
|
|
||||||
Tamil (Taml),
|
|
||||||
Tangsa (Tngs),
|
|
||||||
Tangut (Tang),
|
|
||||||
Telugu (Telu),
|
|
||||||
Thaana (Thaa),
|
|
||||||
Thai (Thai),
|
|
||||||
Tibetan (Tibt),
|
|
||||||
Tifinagh (Tfng),
|
|
||||||
Tirhuta (Tirh),
|
|
||||||
Toto (Toto),
|
|
||||||
Ugaritic (Ugar),
|
|
||||||
Vai (Vaii),
|
|
||||||
Vithkuqi (Vith),
|
|
||||||
Wancho (Wcho),
|
|
||||||
Warang_Citi (Wara),
|
|
||||||
Yezidi (Yezi),
|
|
||||||
Yi (Yiii),
|
|
||||||
Zanabazar_Square (Zanb).
|
|
||||||
.P
|
|
||||||
Each character has exactly one Unicode general category property, specified by
|
Each character has exactly one Unicode general category property, specified by
|
||||||
a two-letter abbreviation. For compatibility with Perl, negation can be
|
a two-letter abbreviation. For compatibility with Perl, negation can be
|
||||||
specified by including a circumflex between the opening brace and the property
|
specified by including a circumflex between the opening brace and the property
|
||||||
|
@ -1056,22 +915,22 @@ Unicode table.
|
||||||
Specifying caseless matching does not affect these escape sequences. For
|
Specifying caseless matching does not affect these escape sequences. For
|
||||||
example, \ep{Lu} always matches only upper case letters. This is different from
|
example, \ep{Lu} always matches only upper case letters. This is different from
|
||||||
the behaviour of current versions of Perl.
|
the behaviour of current versions of Perl.
|
||||||
.P
|
|
||||||
Matching characters by Unicode property is not fast, because PCRE2 has to do a
|
|
||||||
multistage table lookup in order to find a character's property. That is why
|
|
||||||
the traditional escape sequences such as \ed and \ew do not use Unicode
|
|
||||||
properties in PCRE2 by default, though you can make them do so by setting the
|
|
||||||
PCRE2_UCP option or by starting the pattern with (*UCP).
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Bi-directional properties for \ep and \eP"
|
.SS "Binary (yes/no) properties for \ep and \eP"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
Two properties relating to bi-directional text (each with a shorter synonym)
|
Unicode defines a number of binary properties, that is, properties whose only
|
||||||
are supported:
|
values are true or false. You can obtain a list of those that are recognized by
|
||||||
|
\ep and \eP, along with their abbreviations, by running this command:
|
||||||
|
.sp
|
||||||
|
pcre2test -LP
|
||||||
|
.sp
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "The Bidi_Class property for \ep and \eP"
|
||||||
|
.rs
|
||||||
.sp
|
.sp
|
||||||
\ep{Bidi_Control} matches a Bidi control character
|
|
||||||
\ep{Bidi_C} matches a Bidi control character
|
|
||||||
\ep{Bidi_Class:<class>} matches a character with the given class
|
\ep{Bidi_Class:<class>} matches a character with the given class
|
||||||
\ep{BC:<class>} matches a character with the given class
|
\ep{BC:<class>} matches a character with the given class
|
||||||
.sp
|
.sp
|
||||||
|
@ -1101,8 +960,8 @@ The recognized classes are:
|
||||||
S segment separator
|
S segment separator
|
||||||
WS which space
|
WS which space
|
||||||
.sp
|
.sp
|
||||||
For Bidi_Class, an equals sign may be used instead of a colon. The class names
|
An equals sign may be used instead of a colon. The class names are
|
||||||
are case-insensitive; only the short names listed above are recognized.
|
case-insensitive; only the short names listed above are recognized.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Extended grapheme clusters
|
.SS Extended grapheme clusters
|
||||||
|
@ -3955,6 +3814,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 December 2021
|
Last updated: 12 January 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "28 December 2021" "PCRE2 10.40"
|
.TH PCRE2SYNTAX 3 "12 January 2022" "PCRE2 10.40"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -172,181 +172,31 @@ Perl and POSIX space are now the same. Perl added VT to its space character set
|
||||||
at release 5.18.
|
at release 5.18.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SH "BINARY PROPERTIES FOR \ep AND \eP"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Unicode defines a number of binary properties, that is, properties whose only
|
||||||
|
values are true or false. You can obtain a list of those that are recognized by
|
||||||
|
\ep and \eP, along with their abbreviations, by running this command:
|
||||||
|
.sp
|
||||||
|
pcre2test -LP
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "SCRIPT MATCHING WITH \ep AND \eP"
|
.SH "SCRIPT MATCHING WITH \ep AND \eP"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The following script names and their 4-letter abbreviations are recognized in
|
Many script names and their 4-letter abbreviations are recognized in
|
||||||
\ep{sc:...} or \ep{scx:...} items, or on their own with \ep (and also \eP of
|
\ep{sc:...} or \ep{scx:...} items, or on their own with \ep (and also \eP of
|
||||||
course):
|
course). You can obtain a list of these scripts by running this command:
|
||||||
.P
|
.sp
|
||||||
Adlam (Adlm),
|
pcre2test -LS
|
||||||
Ahom (Ahom),
|
|
||||||
Anatolian_Hieroglyphs (Hluw),
|
|
||||||
Arabic (Arab),
|
|
||||||
Armenian (Armn),
|
|
||||||
Avestan (Avst),
|
|
||||||
Balinese (Bali),
|
|
||||||
Bamum (Bamu),
|
|
||||||
Bassa_Vah (Bass),
|
|
||||||
Batak (Batk),
|
|
||||||
Bengali (Beng),
|
|
||||||
Bhaiksuki (Bhks),
|
|
||||||
Bopomofo (Bopo),
|
|
||||||
Brahmi (Brah),
|
|
||||||
Braille (Brai),
|
|
||||||
Buginese (Bugi),
|
|
||||||
Buhid (Buhd),
|
|
||||||
Canadian_Aboriginal (Cans),
|
|
||||||
Carian (Cari),
|
|
||||||
Caucasian_Albanian (Aghb),
|
|
||||||
Chakma (Cakm),
|
|
||||||
Cham (Cham),
|
|
||||||
Cherokee (Cher),
|
|
||||||
Chorasmian (Chrs),
|
|
||||||
Common (Zyyy),
|
|
||||||
Coptic (Copt),
|
|
||||||
Cuneiform (Xsux),
|
|
||||||
Cypriot (Cprt),
|
|
||||||
Cypro_Minoan (Cpmn),
|
|
||||||
Cyrillic (Cyrl),
|
|
||||||
Deseret (Dsrt),
|
|
||||||
Devanagari (Deva),
|
|
||||||
Dives_Akuru (Diak),
|
|
||||||
Dogra (Dogr),
|
|
||||||
Duployan (Dupl),
|
|
||||||
Egyptian_Hieroglyphs (Egyp),
|
|
||||||
Elbasan (Elba),
|
|
||||||
Elymaic (Elym),
|
|
||||||
Ethiopic (Ethi),
|
|
||||||
Georgian (Geor),
|
|
||||||
Glagolitic (Glag),
|
|
||||||
Gothic (Goth),
|
|
||||||
Grantha (Gran),
|
|
||||||
Greek (Grek),
|
|
||||||
Gujarati (Gujr),
|
|
||||||
Gunjala_Gondi (Gong),
|
|
||||||
Gurmukhi (Guru),
|
|
||||||
Han (Hani),
|
|
||||||
Hangul (Hang),
|
|
||||||
Hanifi_Rohingya (Rohg),
|
|
||||||
Hanunoo (Hano),
|
|
||||||
Hatran (Hatr),
|
|
||||||
Hebrew (Hebr),
|
|
||||||
Hiragana (Hira),
|
|
||||||
Imperial_Aramaic (Armi),
|
|
||||||
Inherited (Zinh),
|
|
||||||
Inscriptional_Pahlavi (Phli),
|
|
||||||
Inscriptional_Parthian (Prti),
|
|
||||||
Javanese (Java),
|
|
||||||
Kaithi (Kthi),
|
|
||||||
Kannada (Knda),
|
|
||||||
Katakana (Kana),
|
|
||||||
Kayah_Li (Kali),
|
|
||||||
Kharoshthi (Khar),
|
|
||||||
Khitan_Small_Script (Kits),
|
|
||||||
Khmer (Khmr),
|
|
||||||
Khojki (Khoj),
|
|
||||||
Khudawadi (Sind),
|
|
||||||
Lao (Laoo),
|
|
||||||
Latin (Latn),
|
|
||||||
Lepcha (Lepc),
|
|
||||||
Limbu (Limb),
|
|
||||||
Linear_A (Lina),
|
|
||||||
Linear_B (Linb),
|
|
||||||
Lisu (Lisu),
|
|
||||||
Lycian (Lyci),
|
|
||||||
Lydian (Lydi),
|
|
||||||
Mahajani (Majh),
|
|
||||||
Makasar (Maka),
|
|
||||||
Malayalam (Mlym),
|
|
||||||
Mandaic (Mand),
|
|
||||||
Manichaean (Mani),
|
|
||||||
Marchen (Marc),
|
|
||||||
Masaram_Gondi (Gonm),
|
|
||||||
Medefaidrin (Medf),
|
|
||||||
Meetei_Mayek (Mtei),
|
|
||||||
Mende_Kikakui (Mend),
|
|
||||||
Meroitic_Cursive (Merc),
|
|
||||||
Meroitic_Hieroglyphs (Mero),
|
|
||||||
Miao (Miao),
|
|
||||||
Modi (Modi),
|
|
||||||
Mongolian (Mong),
|
|
||||||
Mro (Mroo),
|
|
||||||
Multani (Mult),
|
|
||||||
Myanmar (Mymr),
|
|
||||||
Nabataean (Nbar),
|
|
||||||
Nandinagari (Nand),
|
|
||||||
New_Tai_Lue (Talu),
|
|
||||||
Newa (Newa),
|
|
||||||
Nko (Nkoo),
|
|
||||||
Nushu (Nshu),
|
|
||||||
Nyiakeng_Puachue_Hmong (Hmnp),
|
|
||||||
Ogham (Ogam),
|
|
||||||
Ol_Chiki (Olck),
|
|
||||||
Old_Hungarian (Hung),
|
|
||||||
Old_Italic (Olck),
|
|
||||||
Old_North_Arabian (Narb),
|
|
||||||
Old_Permic (Perm),
|
|
||||||
Old_Persian (Orkh),
|
|
||||||
Old_Sogdian (Sogo),
|
|
||||||
Old_South_Arabian (Sarb),
|
|
||||||
Old_Turkic (Orkh),
|
|
||||||
Old_Uyghur (Ougr),
|
|
||||||
Oriya (Orya),
|
|
||||||
Osage (Osge),
|
|
||||||
Osmanya (Osma),
|
|
||||||
Pahawh_Hmong (Hmng),
|
|
||||||
Palmyrene (Palm),
|
|
||||||
Pau_Cin_Hau (Pauc),
|
|
||||||
Phags_Pa (Phag),
|
|
||||||
Phoenician (Phnx),
|
|
||||||
Psalter_Pahlavi (Phli),
|
|
||||||
Rejang (Rjng),
|
|
||||||
Runic (Runr),
|
|
||||||
Samaritan (Samr),
|
|
||||||
Saurashtra (Saur),
|
|
||||||
Sharada (Shrd),
|
|
||||||
Shavian (Shaw),
|
|
||||||
Siddham (Sidd),
|
|
||||||
SignWriting (Sgnw),
|
|
||||||
Sinhala (Sinh),
|
|
||||||
Sogdian (Sogd),
|
|
||||||
Sora_Sompeng (Sora),
|
|
||||||
Soyombo (Soyo),
|
|
||||||
Sundanese (Sund),
|
|
||||||
Syloti_Nagri (Sylo),
|
|
||||||
Syriac (Syrc),
|
|
||||||
Tagalog (Tglg),
|
|
||||||
Tagbanwa (Tagb),
|
|
||||||
Tai_Le (Tale),
|
|
||||||
Tai_Tham (Lana),
|
|
||||||
Tai_Viet (Tavt),
|
|
||||||
Takri (Takr),
|
|
||||||
Tamil (Taml),
|
|
||||||
Tangsa (Tngs),
|
|
||||||
Tangut (Tang),
|
|
||||||
Telugu (Telu),
|
|
||||||
Thaana (Thaa),
|
|
||||||
Thai (Thai),
|
|
||||||
Tibetan (Tibt),
|
|
||||||
Tifinagh (Tfng),
|
|
||||||
Tirhuta (Tirh),
|
|
||||||
Toto (Toto),
|
|
||||||
Ugaritic (Ugar),
|
|
||||||
Vai (Vaii),
|
|
||||||
Vithkuqi (Vith),
|
|
||||||
Wancho (Wcho),
|
|
||||||
Warang_Citi (Wara),
|
|
||||||
Yezidi (Yezi),
|
|
||||||
Yi (Yiii),
|
|
||||||
Zanabazar_Square (Zanb).
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "BIDI_PROPERTIES FOR \ep AND \eP"
|
.
|
||||||
|
.SH "THE BIDI_CLASS PROPERTY FOR \ep AND \eP"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
\ep{Bidi_Control} matches a Bidi control character
|
|
||||||
\ep{Bidi_C} matches a Bidi control character
|
|
||||||
\ep{Bidi_Class:<class>} matches a character with the given class
|
\ep{Bidi_Class:<class>} matches a character with the given class
|
||||||
\ep{BC:<class>} matches a character with the given class
|
\ep{BC:<class>} matches a character with the given class
|
||||||
.sp
|
.sp
|
||||||
|
@ -728,6 +578,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 December 2021
|
Last updated: 12 January 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -197,7 +197,17 @@ COMMAND LINE OPTIONS
|
||||||
|
|
||||||
-LM List modifiers: write a list of available pattern and subject
|
-LM List modifiers: write a list of available pattern and subject
|
||||||
modifiers to the standard output, then exit with zero exit
|
modifiers to the standard output, then exit with zero exit
|
||||||
code. All other options are ignored. If both -C and -LM are
|
code. All other options are ignored. If both -C and any -Lx
|
||||||
|
options are present, whichever is first is recognized.
|
||||||
|
|
||||||
|
-LP List properties: write a list of recognized Unicode proper-
|
||||||
|
ties to the standard output, then exit with zero exit code.
|
||||||
|
All other options are ignored. If both -C and any -Lx options
|
||||||
|
are present, whichever is first is recognized.
|
||||||
|
|
||||||
|
-LS List scripts: write a list of recogized Unicode script names
|
||||||
|
to the standard output, then exit with zero exit code. All
|
||||||
|
other options are ignored. If both -C and any -Lx options are
|
||||||
present, whichever is first is recognized.
|
present, whichever is first is recognized.
|
||||||
|
|
||||||
-pattern modifier-list
|
-pattern modifier-list
|
||||||
|
@ -1939,5 +1949,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 28 November 2021
|
Last updated: 12 January 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue