Update to Unicode 11.0.0

This commit is contained in:
Philip.Hazel 2018-07-07 16:10:29 +00:00
parent 50aa69657e
commit 937617f343
28 changed files with 6015 additions and 4080 deletions

View File

@ -107,6 +107,10 @@ to an incorrect "lookbehind assertion is not fixed length" error.
23. The VERSION condition test was reading fractional PCRE2 version numbers
such as the 04 in 10.04 incorrectly and hence giving wrong results.
24. Updated to Unicode version 11.0.0. As well as the usual addition of new
scripts and characters, this involved re-jigging the grapheme break property
algorithm because Unicode has changed the way emojis are handled.
Version 10.31 12-February-2018

View File

@ -789,6 +789,7 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Dogra,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
@ -799,9 +800,11 @@ Gothic,
Grantha,
Greek,
Gujarati,
Gunjala_Gondi,
Gurmukhi,
Han,
Hangul,
Hanifi_Rohingya,
Hanunoo,
Hatran,
Hebrew,
@ -829,11 +832,13 @@ Lisu,
Lycian,
Lydian,
Mahajani,
Makasar,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
Medefaidrin,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@ -856,6 +861,7 @@ Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_Sogdian,
Old_South_Arabian,
Old_Turkic,
Oriya,
@ -876,6 +882,7 @@ Shavian,
Siddham,
SignWriting,
Sinhala,
Sogdian,
Sora_Sompeng,
Soyombo,
Sundanese,
@ -1006,7 +1013,10 @@ grapheme cluster", and treats the sequence as an atomic group
Unicode supports various kinds of composite character by giving each character
a grapheme breaking property, and having rules that use these properties to
define the boundaries of extended grapheme clusters. The rules are defined in
Unicode Standard Annex 29, "Unicode Text Segmentation".
Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0
abandoned the use of some previous properties that had been used for emojis.
Instead it introduced various emoji-specific properties. PCRE2 uses only the
Extended Pictographic property.
</P>
<P>
\X always matches at least one character. Then it decides whether to add
@ -1026,27 +1036,24 @@ character; an LVT or T character may be follwed only by a T character.
</P>
<P>
4. Do not end before extending characters or spacing marks or the "zero-width
joiner" characters. Characters with the "mark" property always have the
joiner" character. Characters with the "mark" property always have the
"extend" grapheme breaking property.
</P>
<P>
5. Do not end after prepend characters.
</P>
<P>
6. Do not break within emoji modifier sequences (a base character followed by a
modifier). Extending characters are allowed before the modifier.
6. Do not break within emoji modifier sequences or emoji zwj sequences. That
is, do not break between characters with the Extended_Pictographic property.
Extend and ZWJ characters are allowed between the characters.
</P>
<P>
7. Do not break within emoji zwj sequences (zero-width joiner followed by
"glue after ZWJ" or "base glue after ZWJ").
</P>
<P>
8. Do not break within emoji flag sequences. That is, do not break between
7. Do not break within emoji flag sequences. That is, do not break between
regional indicator (RI) characters if there are an odd number of RI characters
before the break point.
</P>
<P>
6. Otherwise, end the cluster.
8. Otherwise, end the cluster.
<a name="extraprops"></a></P>
<br><b>
PCRE2's additional properties
@ -1119,8 +1126,8 @@ lead to odd effects. For example, consider this pattern:
<pre>
(?&#60;=\Kfoo)bar
</pre>
If the subject is "foobar", a call to <b>pcre2_match()</b> with a starting
offset of 3 succeeds and reports the matching string as "foobar", that is, the
If the subject is "foobar", a call to <b>pcre2_match()</b> with a starting
offset of 3 succeeds and reports the matching string as "foobar", that is, the
start of the reported match is earlier than where the match started.
<a name="smallassertions"></a></P>
<br><b>
@ -3490,7 +3497,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 30 June 2018
Last updated: 07 July 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -188,6 +188,7 @@ at release 5.18.
</P>
<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
<P>
Adlam,
Ahom,
Anatolian_Hieroglyphs,
Arabic,
@ -198,6 +199,7 @@ Bamum,
Bassa_Vah,
Batak,
Bengali,
Bhaiksuki,
Bopomofo,
Brahmi,
Braille,
@ -216,6 +218,7 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Dogra,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
@ -226,9 +229,11 @@ Gothic,
Grantha,
Greek,
Gujarati,
Gunjala_Gondi,
Gurmukhi,
Han,
Hangul,
Hanifi_Rohingya,
Hanunoo,
Hatran,
Hebrew,
@ -256,9 +261,13 @@ Lisu,
Lycian,
Lydian,
Mahajani,
Makasar,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
Medefaidrin,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@ -271,7 +280,9 @@ Multani,
Myanmar,
Nabataean,
New_Tai_Lue,
Newa,
Nko,
Nushu,
Ogham,
Ol_Chiki,
Old_Hungarian,
@ -279,9 +290,11 @@ Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_Sogdian,
Old_South_Arabian,
Old_Turkic,
Oriya,
Osage,
Osmanya,
Pahawh_Hmong,
Palmyrene,
@ -298,7 +311,9 @@ Shavian,
Siddham,
SignWriting,
Sinhala,
Sogdian,
Sora_Sompeng,
Soyombo,
Sundanese,
Syloti_Nagri,
Syriac,
@ -309,6 +324,7 @@ Tai_Tham,
Tai_Viet,
Takri,
Tamil,
Tangut,
Telugu,
Thaana,
Thai,
@ -318,7 +334,8 @@ Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
Yi,
Zanabazar_Square.
</P>
<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
<P>
@ -600,7 +617,7 @@ Cambridge, England.
</P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
Last updated: 28 June 2018
Last updated: 07 July 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -6483,34 +6483,35 @@ BACKSLASH
nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba-
nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot,
Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan,
Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gur-
mukhi, Han, Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Ara-
maic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,
Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Kho-
jki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu,
Lycian, Lydian, Mahajani, Malayalam, Mandaic, Manichaean, Marchen,
Masaram_Gondi, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar,
Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pahawh_Hmong,
Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang,
Runic, Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting,
Sinhala, Sora_Sompeng, Soyombo, Sundanese, Syloti_Nagri, Syriac, Taga-
log, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Tangut, Tel-
ugu, Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai,
Warang_Citi, Yi, Zanabazar_Square.
Cyrillic, Deseret, Devanagari, Dogra, Duployan, Egyptian_Hieroglyphs,
Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek,
Gujarati, Gunjala_Gondi, Gurmukhi, Han, Hangul, Hanifi_Rohingya,
Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited,
Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan-
nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha-
jani, Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi,
Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar,
Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog-
dian, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng, Soyombo,
Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham,
Tai_Viet, Takri, Tamil, Tangut, Telugu, Thaana, Thai, Tibetan, Tifi-
nagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi, Zanabazar_Square.
Each character has exactly one Unicode general category property, spec-
ified by a two-letter abbreviation. For compatibility with Perl, nega-
tion can be specified by including a circumflex between the opening
brace and the property name. For example, \p{^Lu} is the same as
ified by a two-letter abbreviation. For compatibility with Perl, nega-
tion can be specified by including a circumflex between the opening
brace and the property name. For example, \p{^Lu} is the same as
\P{Lu}.
If only one letter is specified with \p or \P, it includes all the gen-
eral category properties that start with that letter. In this case, in
the absence of negation, the curly brackets in the escape sequence are
eral category properties that start with that letter. In this case, in
the absence of negation, the curly brackets in the escape sequence are
optional; these two examples have the same effect:
\p{L}
@ -6562,44 +6563,47 @@ BACKSLASH
Zp Paragraph separator
Zs Space separator
The special property L& is also supported: it matches a character that
has the Lu, Ll, or Lt property, in other words, a letter that is not
The special property L& is also supported: it matches a character that
has the Lu, Ll, or Lt property, in other words, a letter that is not
classified as a modifier or "other".
The Cs (Surrogate) property applies only to characters in the range
U+D800 to U+DFFF. Such characters are not valid in Unicode strings and
so cannot be tested by PCRE2, unless UTF validity checking has been
turned off (see the discussion of PCRE2_NO_UTF_CHECK in the pcre2api
The Cs (Surrogate) property applies only to characters in the range
U+D800 to U+DFFF. Such characters are not valid in Unicode strings and
so cannot be tested by PCRE2, unless UTF validity checking has been
turned off (see the discussion of PCRE2_NO_UTF_CHECK in the pcre2api
page). Perl does not support the Cs property.
The long synonyms for property names that Perl supports (such as
\p{Letter}) are not supported by PCRE2, nor is it permitted to prefix
The long synonyms for property names that Perl supports (such as
\p{Letter}) are not supported by PCRE2, nor is it permitted to prefix
any of these properties with "Is".
No character that is in the Unicode table has the Cn (unassigned) prop-
erty. Instead, this property is assumed for any code point that is not
in the Unicode table.
Specifying caseless matching does not affect these escape sequences.
For example, \p{Lu} always matches only upper case letters. This is
Specifying caseless matching does not affect these escape sequences.
For example, \p{Lu} always matches only upper case letters. This is
different from the behaviour of current versions of Perl.
Matching characters by Unicode property is not fast, because PCRE2 has
to do a multistage table lookup in order to find a character's prop-
Matching characters by Unicode property is not fast, because PCRE2 has
to do a multistage table lookup in order to find a character's prop-
erty. That is why the traditional escape sequences such as \d and \w do
not use Unicode properties in PCRE2 by default, though you can make
them do so by setting the PCRE2_UCP option or by starting the pattern
not use Unicode properties in PCRE2 by default, though you can make
them do so by setting the PCRE2_UCP option or by starting the pattern
with (*UCP).
Extended grapheme clusters
The \X escape matches any number of Unicode characters that form an
The \X escape matches any number of Unicode characters that form an
"extended grapheme cluster", and treats the sequence as an atomic group
(see below). Unicode supports various kinds of composite character by
giving each character a grapheme breaking property, and having rules
(see below). Unicode supports various kinds of composite character by
giving each character a grapheme breaking property, and having rules
that use these properties to define the boundaries of extended grapheme
clusters. The rules are defined in Unicode Standard Annex 29, "Unicode
Text Segmentation".
clusters. The rules are defined in Unicode Standard Annex 29, "Unicode
Text Segmentation". Unicode 11.0.0 abandoned the use of some previous
properties that had been used for emojis. Instead it introduced vari-
ous emoji-specific properties. PCRE2 uses only the Extended Picto-
graphic property.
\X always matches at least one character. Then it decides whether to
add additional characters according to the following rules for ending a
@ -6617,23 +6621,21 @@ BACKSLASH
only by a T character.
4. Do not end before extending characters or spacing marks or the
"zero-width joiner" characters. Characters with the "mark" property
"zero-width joiner" character. Characters with the "mark" property
always have the "extend" grapheme breaking property.
5. Do not end after prepend characters.
6. Do not break within emoji modifier sequences (a base character fol-
lowed by a modifier). Extending characters are allowed before the modi-
fier.
6. Do not break within emoji modifier sequences or emoji zwj sequences.
That is, do not break between characters with the Extended_Pictographic
property. Extend and ZWJ characters are allowed between the charac-
ters.
7. Do not break within emoji zwj sequences (zero-width joiner followed
by "glue after ZWJ" or "base glue after ZWJ").
8. Do not break within emoji flag sequences. That is, do not break
7. Do not break within emoji flag sequences. That is, do not break
between regional indicator (RI) characters if there are an odd number
of RI characters before the break point.
6. Otherwise, end the cluster.
8. Otherwise, end the cluster.
PCRE2's additional properties
@ -8941,7 +8943,7 @@ AUTHOR
REVISION
Last updated: 30 June 2018
Last updated: 07 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -9915,26 +9917,29 @@ PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P
SCRIPT NAMES FOR \p AND \P
Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Balinese,
Bamum, Bassa_Vah, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese,
Buhid, Canadian_Aboriginal, Carian, Caucasian_Albanian, Chakma, Cham,
Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan, Ethiopic, Geor-
gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gurmukhi, Han,
Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited,
Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan-
nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha-
jani, Malayalam, Mandaic, Manichaean, Meetei_Mayek, Mende_Kikakui,
Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro,
Multani, Myanmar, Nabataean, New_Tai_Lue, Nko, Ogham, Ol_Chiki,
Old_Hungarian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
Old_South_Arabian, Old_Turkic, Oriya, Osmanya, Pahawh_Hmong, Palmyrene,
Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang, Runic,
Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, Sinhala,
Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa,
Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, Thaana, Thai,
Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi.
Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali-
nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba-
nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot,
Cyrillic, Deseret, Devanagari, Dogra, Duployan, Egyptian_Hieroglyphs,
Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek,
Gujarati, Gunjala_Gondi, Gurmukhi, Han, Hangul, Hanifi_Rohingya,
Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited,
Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan-
nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha-
jani, Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi,
Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar,
Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog-
dian, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng, Soyombo,
Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham,
Tai_Viet, Takri, Tamil, Tangut, Telugu, Thaana, Thai, Tibetan, Tifi-
nagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi, Zanabazar_Square.
CHARACTER CLASSES
@ -9960,8 +9965,8 @@ CHARACTER CLASSES
word same as \w
xdigit hexadecimal digit
In PCRE2, POSIX character set names recognize only ASCII characters by
default, but some of them use Unicode properties if PCRE2_UCP is set.
In PCRE2, POSIX character set names recognize only ASCII characters by
default, but some of them use Unicode properties if PCRE2_UCP is set.
You can use \Q...\E inside a character class.
@ -10047,8 +10052,8 @@ OPTION SETTING
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s)
The following are recognized only at the very start of a pattern or
after one of the newline or \R options with similar syntax. More than
The following are recognized only at the very start of a pattern or
after one of the newline or \R options with similar syntax. More than
one of them may appear. For the first three, d is a decimal number.
(*LIMIT_DEPTH=d) set the backtracking limit to d
@ -10063,17 +10068,17 @@ OPTION SETTING
(*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the
value of the limits set by the caller of pcre2_match() or
pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete
Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the
value of the limits set by the caller of pcre2_match() or
pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete
synonym for LIMIT_DEPTH. The application can lock out the use of (*UTF)
and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options,
and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options,
respectively, at compile time.
NEWLINE CONVENTION
These are recognized only at the very start of the pattern or after
These are recognized only at the very start of the pattern or after
option settings with a similar syntax.
(*CR) carriage return only
@ -10086,7 +10091,7 @@ NEWLINE CONVENTION
WHAT \R MATCHES
These are recognized only at the very start of the pattern or after
These are recognized only at the very start of the pattern or after
option setting with a similar syntax.
(*BSR_ANYCRLF) CR, LF, or CRLF
@ -10155,8 +10160,8 @@ CONDITIONAL PATTERNS
(?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
conditions or recursion tests. Such a condition is interpreted as a
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
conditions or recursion tests. Such a condition is interpreted as a
reference condition if the relevant named group exists.
@ -10168,7 +10173,7 @@ BACKTRACKING CONTROL
(*FAIL) force backtrack; synonym (*F)
(*MARK:NAME) set name to be passed back; synonym (*:NAME)
The following act only when a subsequent match failure causes a back-
The following act only when a subsequent match failure causes a back-
track to reach them. They all force a match failure, but they differ in
what happens afterwards. Those that advance the start-of-match point do
so only if the pattern is not anchored.
@ -10190,14 +10195,14 @@ CALLOUTS
(?C"text") callout with string data
The allowed string delimiters are ` ' " ^ % # $ (which are the same for
the start and the end), and the starting delimiter { matched with the
ending delimiter }. To encode the ending delimiter within the string,
the start and the end), and the starting delimiter { matched with the
ending delimiter }. To encode the ending delimiter within the string,
double it.
SEE ALSO
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
pcre2(3).
@ -10210,7 +10215,7 @@ AUTHOR
REVISION
Last updated: 28 June 2018
Last updated: 07 July 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "30 June 2018" "PCRE2 10.32"
.TH PCRE2PATTERN 3 "07 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -788,6 +788,7 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Dogra,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
@ -798,9 +799,11 @@ Gothic,
Grantha,
Greek,
Gujarati,
Gunjala_Gondi,
Gurmukhi,
Han,
Hangul,
Hanifi_Rohingya,
Hanunoo,
Hatran,
Hebrew,
@ -828,11 +831,13 @@ Lisu,
Lycian,
Lydian,
Mahajani,
Makasar,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
Medefaidrin,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@ -855,6 +860,7 @@ Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_Sogdian,
Old_South_Arabian,
Old_Turkic,
Oriya,
@ -875,6 +881,7 @@ Shavian,
Siddham,
SignWriting,
Sinhala,
Sogdian,
Sora_Sompeng,
Soyombo,
Sundanese,
@ -1003,7 +1010,10 @@ grapheme cluster", and treats the sequence as an atomic group
Unicode supports various kinds of composite character by giving each character
a grapheme breaking property, and having rules that use these properties to
define the boundaries of extended grapheme clusters. The rules are defined in
Unicode Standard Annex 29, "Unicode Text Segmentation".
Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0
abandoned the use of some previous properties that had been used for emojis.
Instead it introduced various emoji-specific properties. PCRE2 uses only the
Extended Pictographic property.
.P
\eX always matches at least one character. Then it decides whether to add
additional characters according to the following rules for ending a cluster:
@ -1018,22 +1028,20 @@ L, V, LV, or LVT character; an LV or V character may be followed by a V or T
character; an LVT or T character may be follwed only by a T character.
.P
4. Do not end before extending characters or spacing marks or the "zero-width
joiner" characters. Characters with the "mark" property always have the
joiner" character. Characters with the "mark" property always have the
"extend" grapheme breaking property.
.P
5. Do not end after prepend characters.
.P
6. Do not break within emoji modifier sequences (a base character followed by a
modifier). Extending characters are allowed before the modifier.
6. Do not break within emoji modifier sequences or emoji zwj sequences. That
is, do not break between characters with the Extended_Pictographic property.
Extend and ZWJ characters are allowed between the characters.
.P
7. Do not break within emoji zwj sequences (zero-width joiner followed by
"glue after ZWJ" or "base glue after ZWJ").
.P
8. Do not break within emoji flag sequences. That is, do not break between
7. Do not break within emoji flag sequences. That is, do not break between
regional indicator (RI) characters if there are an odd number of RI characters
before the break point.
.P
6. Otherwise, end the cluster.
8. Otherwise, end the cluster.
.
.
.\" HTML <a name="extraprops"></a>
@ -1112,8 +1120,8 @@ lead to odd effects. For example, consider this pattern:
.sp
(?<=\eKfoo)bar
.sp
If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting
offset of 3 succeeds and reports the matching string as "foobar", that is, the
If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting
offset of 3 succeeds and reports the matching string as "foobar", that is, the
start of the reported match is earlier than where the match started.
.
.
@ -3517,6 +3525,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 30 June 2018
Last updated: 07 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "28 June 2018" "PCRE2 10.32"
.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -160,6 +160,7 @@ at release 5.18.
.SH "SCRIPT NAMES FOR \ep AND \eP"
.rs
.sp
Adlam,
Ahom,
Anatolian_Hieroglyphs,
Arabic,
@ -170,6 +171,7 @@ Bamum,
Bassa_Vah,
Batak,
Bengali,
Bhaiksuki,
Bopomofo,
Brahmi,
Braille,
@ -188,6 +190,7 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Dogra,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
@ -198,9 +201,11 @@ Gothic,
Grantha,
Greek,
Gujarati,
Gunjala_Gondi,
Gurmukhi,
Han,
Hangul,
Hanifi_Rohingya,
Hanunoo,
Hatran,
Hebrew,
@ -228,9 +233,13 @@ Lisu,
Lycian,
Lydian,
Mahajani,
Makasar,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
Medefaidrin,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@ -243,7 +252,9 @@ Multani,
Myanmar,
Nabataean,
New_Tai_Lue,
Newa,
Nko,
Nushu,
Ogham,
Ol_Chiki,
Old_Hungarian,
@ -251,9 +262,11 @@ Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_Sogdian,
Old_South_Arabian,
Old_Turkic,
Oriya,
Osage,
Osmanya,
Pahawh_Hmong,
Palmyrene,
@ -270,7 +283,9 @@ Shavian,
Siddham,
SignWriting,
Sinhala,
Sogdian,
Sora_Sompeng,
Soyombo,
Sundanese,
Syloti_Nagri,
Syriac,
@ -281,6 +296,7 @@ Tai_Tham,
Tai_Viet,
Takri,
Tamil,
Tangut,
Telugu,
Thaana,
Thai,
@ -290,7 +306,8 @@ Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
Yi,
Zanabazar_Square.
.
.
.SH "CHARACTER CLASSES"
@ -589,6 +606,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 28 June 2018
Last updated: 07 July 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -24,6 +24,7 @@
# Added script names for Unicode 7.0.0, 20-June-2014.
# Added script names for Unicode 8.0.0, 19-June-2015.
# Added script names for Unicode 10.0.0, 02-July-2017.
# Added script names for Unicode 11.0.0, 03-July-2018.
script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@ -55,7 +56,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
'SignWriting',
# New for Unicode 10.0.0
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
'Nushu', 'Soyombo', 'Zanabazar_Square'
'Nushu', 'Soyombo', 'Zanabazar_Square',
# New for Unicode 11.0.0
'Dogra', 'Gunjala_Gondi', 'Hanifi_Rohingya', 'Makasar', 'Medefaidrin',
'Old_Sogdian', 'Sogdian'
]
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',

View File

@ -7,20 +7,26 @@
# This script was submitted to the PCRE project by Peter Kankowski as part of
# the upgrading of Unicode property support. The new code speeds up property
# matching many times. The script is for the use of PCRE maintainers, to
# generate the pcre_ucd.c file that contains a digested form of the Unicode
# generate the pcre2_ucd.c file that contains a digested form of the Unicode
# data tables.
#
# The script has now been upgraded to Python 3 for PCRE2, and should be run in
# The script has now been upgraded to Python 3 for PCRE2, and should be run in
# the maint subdirectory, using the command
#
# [python3] ./MultiStage2.py >../src/pcre2_ucd.c
#
# It requires four Unicode data tables, DerivedGeneralCategory.txt,
# GraphemeBreakProperty.txt, Scripts.txt, and CaseFolding.txt, to be in the
# Unicode.tables subdirectory. The first of these is found in the "extracted"
# subdirectory of the Unicode database (UCD) on the Unicode web site; the
# second is in the "auxiliary" subdirectory; the other two are directly in the
# UCD directory.
# It requires five Unicode data tables: DerivedGeneralCategory.txt,
# GraphemeBreakProperty.txt, Scripts.txt, CaseFolding.txt, and emoji-data.txt.
# These must be in the maint/Unicode.tables subdirectory.
#
# DerivedGeneralCategory.txt is found in the "extracted" subdirectory of the
# Unicode database (UCD) on the Unicode web site; GraphemeBreakProperty.txt is
# in the "auxiliary" subdirectory. Scripts.txt and CaseFolding.txt are directly
# in the UCD directory. The emoji-data.txt file is in files associated with
# Unicode Technical Standard #51 ("Unicode Emoji"), for example:
#
# http://unicode.org/Public/emoji/11.0/emoji-data.txt
#
#
# Minor modifications made to this script:
# Added #! line at start
@ -41,7 +47,8 @@
# Added code to search for sets of more than two characters that must match
# each other caselessly. A new table is output containing these sets, and
# offsets into the table are added to the main output records. This new
# code scans CaseFolding.txt instead of UnicodeData.txt.
# code scans CaseFolding.txt instead of UnicodeData.txt, which is no longer
# used.
#
# Update for Python3:
# . Processed with 2to3, but that didn't fix everything
@ -50,8 +57,13 @@
# . Inserted 'int' before blocksize/ELEMS_PER_LINE because an int is
# required and the result of the division is a float
#
# Added code to scan the emoji-data.txt file to find the Extended Pictographic
# property, which is used by PCRE2 as a grapheme breaking property. This was
# done when updating to Unicode 11.0.0 (July 2018).
#
#
# The main tables generated by this script are used by macros defined in
# pcre2_internal.h. They look up Unicode character properties using short
# pcre2_internal.h. They look up Unicode character properties using short
# sequences of code that contains no branches, which makes for greater speed.
#
# Conceptually, there is a table of records (of type ucd_record), containing a
@ -75,43 +87,48 @@
# table of "virtual" blocks; each block is indexed by the offset of a character
# within its own block, and the result is the offset of the required record.
#
# The following examples are correct for the Unicode 11.0.0 database. Future
# updates may make change the actual lookup values.
#
# Example: lowercase "a" (U+0061) is in block 0
# lookup 0 in stage1 table yields 0
# lookup 97 in the first table in stage2 yields 16
# record 17 is { 33, 5, 11, 0, -32 }
# record 17 is { 33, 5, 11, 0, -32 }
# 33 = ucp_Latin => Latin script
# 5 = ucp_Ll => Lower case letter
# 11 = ucp_gbOther => Grapheme break property "Other"
# 12 = ucp_gbOther => Grapheme break property "Other"
# 0 => not part of a caseless set
# -32 => Other case is U+0041
#
#
# Almost all lowercase latin characters resolve to the same record. One or two
# are different because they are part of a multi-character caseless set (for
# example, k, K and the Kelvin symbol are such a set).
#
# Example: hiragana letter A (U+3042) is in block 96 (0x60)
# lookup 96 in stage1 table yields 88
# lookup 66 in the 88th table in stage2 yields 467
# record 470 is { 26, 7, 11, 0, 0 }
# lookup 96 in stage1 table yields 90
# lookup 66 in the 90th table in stage2 yields 515
# record 515 is { 26, 7, 11, 0, 0 }
# 26 = ucp_Hiragana => Hiragana script
# 7 = ucp_Lo => Other letter
# 11 = ucp_gbOther => Grapheme break property "Other"
# 12 = ucp_gbOther => Grapheme break property "Other"
# 0 => not part of a caseless set
# 0 => No other case
# 0 => No other case
#
# In these examples, no other blocks resolve to the same "virtual" block, as it
# happens, but plenty of other blocks do share "virtual" blocks.
#
# There is a fourth table, maintained by hand, which translates from the
# There is a fourth table, maintained by hand, which translates from the
# individual character types such as ucp_Cc to the general types like ucp_C.
#
# Philip Hazel, 03 July 2008
# Last Updated: 07 July 2018
#
#
# 01-March-2010: Updated list of scripts for Unicode 5.2.0
# 30-April-2011: Updated list of scripts for Unicode 6.0.0
# July-2012: Updated list of scripts for Unicode 6.1.0
# 20-August-2012: Added scan of GraphemeBreakProperty.txt and added a new
# field in the record to hold the value. Luckily, the
# 20-August-2012: Added scan of GraphemeBreakProperty.txt and added a new
# field in the record to hold the value. Luckily, the
# structure had a hole in it, so the resulting table is
# not much bigger than before.
# 18-September-2012: Added code for multiple caseless sets. This uses the
@ -123,6 +140,9 @@
# 12-August-2014: Updated to put Unicode version into the file
# 19-June-2015: Updated for Unicode 8.0.0
# 02-July-2017: Updated for Unicode 10.0.0
# 03-July-2018: Updated for Unicode 11.0.0
# 07-July-2018: Added code to scan emoji-data.txt for the Extended
# Pictographic property.
##############################################################################
@ -148,7 +168,7 @@ def get_other_case(chardata):
# Read the whole table in memory, setting/checking the Unicode version
def read_table(file_name, get_value, default_value):
global unicode_version
f = re.match(r'^[^/]+/([^.]+)\.txt$', file_name)
file_base = f.group(1)
version_pat = r"^# " + re.escape(file_base) + r"-(\d+\.\d+\.\d+)\.txt$"
@ -159,7 +179,7 @@ def read_table(file_name, get_value, default_value):
unicode_version = version
elif unicode_version != version:
print("WARNING: Unicode version differs in %s", file_name, file=sys.stderr)
table = [default_value] * MAX_UNICODE
for line in file:
line = re.sub(r'#.*', '', line)
@ -172,14 +192,14 @@ def read_table(file_name, get_value, default_value):
if m.group(3) is None:
last = char
else:
last = int(m.group(3), 16)
last = int(m.group(3), 16)
for i in range(char, last + 1):
# It is important not to overwrite a previously set
# value because in the CaseFolding file there are lines
# to be ignored (returning the default value of 0)
# which often come after a line which has already set
# data.
if table[i] == default_value:
# to be ignored (returning the default value of 0)
# which often come after a line which has already set
# data.
if table[i] == default_value:
table[i] = value
file.close()
return table
@ -220,14 +240,14 @@ def compress_table(table, block_size):
stage2 += block
blocks[block] = start
stage1.append(start)
return stage1, stage2
# Print a table
def print_table(table, table_name, block_size = None):
type, size = get_type_size(table)
ELEMS_PER_LINE = 16
s = "const %s %s[] = { /* %d bytes" % (type, table_name, size * len(table))
if block_size:
s += ", block = %d" % block_size
@ -237,7 +257,7 @@ def print_table(table, table_name, block_size = None):
fmt = "%3d," * ELEMS_PER_LINE + " /* U+%04X */"
mult = MAX_UNICODE / len(table)
for i in range(0, len(table), ELEMS_PER_LINE):
print(fmt % (table[i:i+ELEMS_PER_LINE] +
print(fmt % (table[i:i+ELEMS_PER_LINE] +
(int(i * mult),)))
else:
if block_size > ELEMS_PER_LINE:
@ -274,15 +294,15 @@ def get_record_size_struct(records):
size = (size + slice_size - 1) & -slice_size
size += slice_size
structure += '%s property_%d;\n' % (slice_type, i)
# round up to the first item of the next structure in array
record_slice = [record[0] for record in records]
slice_type, slice_size = get_type_size(record_slice)
size = (size + slice_size - 1) & -slice_size
structure += '} ucd_record;\n*/\n\n'
return size, structure
def test_record_size():
tests = [ \
( [(3,), (6,), (6,), (1,)], 1 ), \
@ -339,16 +359,23 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
'SignWriting',
# New for Unicode 10.0.0
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
'Nushu', 'Soyombo', 'Zanabazar_Square'
'Nushu', 'Soyombo', 'Zanabazar_Square',
# New for Unicode 11.0.0
'Dogra', 'Gunjala_Gondi', 'Hanifi_Rohingya', 'Makasar', 'Medefaidrin',
'Old_Sogdian', 'Sogdian'
]
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
'Mc', 'Me', 'Mn', 'Nd', 'Nl', 'No', 'Pc', 'Pd', 'Pe', 'Pf', 'Pi', 'Po', 'Ps',
'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ]
# The Extended_Pictographic property is not found in the file where all the
# others are (GraphemeBreakProperty.txt). It comes from the emoji-data.txt
# file, but we list it here so that the name has the correct index value.
break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend',
'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other',
'E_Base', 'E_Modifier', 'E_Base_GAZ', 'ZWJ', 'Glue_After_Zwj' ]
'ZWJ', 'Extended_Pictographic' ]
test_record_size()
unicode_version = ""
@ -358,21 +385,50 @@ category = read_table('Unicode.tables/DerivedGeneralCategory.txt', make_get_name
break_props = read_table('Unicode.tables/GraphemeBreakProperty.txt', make_get_names(break_property_names), break_property_names.index('Other'))
other_case = read_table('Unicode.tables/CaseFolding.txt', get_other_case, 0)
# The grapheme breaking rules were changed for Unicode 11.0.0 (June 2018). Now
# we need to find the Extended_Pictographic property for emoji characters. This
# can be set as an additional grapheme break property, because the default for
# all the emojis is "other". We scan the emoji-data.txt file and modify the
# break-props table.
# This block of code was added by PH in September 2012. I am not a Python
# programmer, so the style is probably dreadful, but it does the job. It scans
# the other_case table to find sets of more than two characters that must all
# match each other caselessly. Later in this script a table of these sets is
# written out. However, we have to do this work here in order to compute the
file = open('Unicode.tables/emoji-data.txt', 'r', encoding='utf-8')
for line in file:
line = re.sub(r'#.*', '', line)
chardata = list(map(str.strip, line.split(';')))
if len(chardata) <= 1:
continue
if chardata[1] != "Extended_Pictographic":
continue
m = re.match(r'([0-9a-fA-F]+)(\.\.([0-9a-fA-F]+))?$', chardata[0])
char = int(m.group(1), 16)
if m.group(3) is None:
last = char
else:
last = int(m.group(3), 16)
for i in range(char, last + 1):
if break_props[i] != break_property_names.index('Other'):
print("WARNING: Emoji 0x%x has break property %s, not 'Other'",
i, break_property_names[break_props[i]], file=sys.stderr)
break_props[i] = break_property_names.index('Extended_Pictographic')
file.close()
# This block of code was added by PH in September 2012. I am not a Python
# programmer, so the style is probably dreadful, but it does the job. It scans
# the other_case table to find sets of more than two characters that must all
# match each other caselessly. Later in this script a table of these sets is
# written out. However, we have to do this work here in order to compute the
# offsets in the table that are inserted into the main table.
# The CaseFolding.txt file lists pairs, but the common logic for reading data
# sets only one value, so first we go through the table and set "return"
# sets only one value, so first we go through the table and set "return"
# offsets for those that are not already set.
for c in range(0x10ffff):
if other_case[c] != 0 and other_case[c + other_case[c]] == 0:
other_case[c + other_case[c]] = -other_case[c]
other_case[c + other_case[c]] = -other_case[c]
# Now scan again and create equivalence sets.
@ -382,25 +438,25 @@ for c in range(0x10ffff):
o = c + other_case[c]
# Trigger when this character's other case does not point back here. We
# now have three characters that are case-equivalent.
# now have three characters that are case-equivalent.
if other_case[o] != -other_case[c]:
t = o + other_case[o]
# Scan the existing sets to see if any of the three characters are already
# Scan the existing sets to see if any of the three characters are already
# part of a set. If so, unite the existing set with the new set.
appended = 0
appended = 0
for s in sets:
found = 0
found = 0
for x in s:
if x == c or x == o or x == t:
found = 1
# Add new characters to an existing set
if found:
found = 0
found = 0
for y in [c, o, t]:
for x in s:
if x == y:
@ -408,10 +464,10 @@ for c in range(0x10ffff):
if not found:
s.append(y)
appended = 1
# If we have not added to an existing set, create a new one.
if not appended:
if not appended:
sets.append([c, o, t])
# End of loop looking for caseless sets.
@ -422,7 +478,7 @@ caseless_offsets = [0] * MAX_UNICODE
offset = 1;
for s in sets:
for x in s:
for x in s:
caseless_offsets[x] = offset
offset += len(s) + 1
@ -431,7 +487,7 @@ for s in sets:
# Combine the tables
table, records = combine_tables(script, category, break_props,
table, records = combine_tables(script, category, break_props,
caseless_offsets, other_case)
record_size, record_struct = get_record_size_struct(list(records.keys()))
@ -473,7 +529,7 @@ print("/* This file was autogenerated by the MultiStage2.py script. */")
print("/* Total size: %d bytes, block size: %d. */" % (min_size, min_block_size))
print()
print("/* The tables herein are needed only when UCP support is built,")
print("and in PCRE2 that happens automatically with UTF support.")
print("and in PCRE2 that happens automatically with UTF support.")
print("This module should not be referenced otherwise, so")
print("it should not matter whether it is compiled or not. However")
print("a comment was received about space saving - maybe the guy linked")
@ -484,7 +540,7 @@ print("Instead, just supply small dummy tables. */")
print()
print("#ifndef SUPPORT_UNICODE")
print("const ucd_record PRIV(ucd_records)[] = {{0,0,0,0,0 }};")
print("const uint8_t PRIV(ucd_stage1)[] = {0};")
print("const uint16_t PRIV(ucd_stage1)[] = {0};")
print("const uint16_t PRIV(ucd_stage2)[] = {0};")
print("const uint32_t PRIV(ucd_caseless_sets)[] = {0};")
print("#else")
@ -515,7 +571,7 @@ for s in sets:
s = sorted(s)
for x in s:
print(' 0x%04x,' % x, end=' ')
print(' NOTACHAR,')
print(' NOTACHAR,')
print('};')
print()

View File

@ -23,7 +23,7 @@ GenerateUtt.py A Python script to generate part of the pcre2_tables.c file
ManyConfigTests A shell script that runs "configure, make, test" a number of
times with different configuration settings.
MultiStage2.py A Python script that generates the file pcre2_ucd.c from three
MultiStage2.py A Python script that generates the file pcre2_ucd.c from five
Unicode data tables, which are themselves downloaded from the
Unicode web site. Run this script in the "maint" directory.
The generated file contains the tables for a 2-stage lookup
@ -37,11 +37,17 @@ pcre2_chartables.c.non-standard
README This file.
Unicode.tables The files in this directory (CaseFolding.txt,
DerivedGeneralCategory.txt, GraphemeBreakProperty.txt,
Scripts.txt and UnicodeData.txt) were downloaded from the
Unicode web site. They contain information about Unicode
characters and scripts.
Unicode.tables The files in this directory were downloaded from the Unicode
web site. They contain information about Unicode characters
and scripts. The ones used by the MultiStage2.py script are
CaseFolding.txt, DerivedGeneralCategory.txt, Scripts.txt,
GraphemeBreakProperty.txt, and emoji-data.txt. I've kept
UnicodeData.txt (which is no longer used by the script)
because it is useful occasionally for manually looking up the
details of certain characters. However, note that character
names in this file such as "Arabic sign sanah" do NOT mean
that the character is in a particular script (in this case,
Arabic). Scripts.txt is where to look for script information.
ucptest.c A short C program for testing the Unicode property macros
that do lookups in the pcre2_ucd.c data, mainly useful after
@ -359,4 +365,4 @@ very sensible; some are rather wacky. Some have been on this list for years.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 20 May 2017
Last updated: 07 July 2018

View File

@ -1,6 +1,6 @@
# CaseFolding-10.0.0.txt
# Date: 2017-04-14, 05:40:18 GMT
# © 2017 Unicode®, Inc.
# CaseFolding-11.0.0.txt
# Date: 2018-01-31, 08:20:09 GMT
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
@ -603,6 +603,52 @@
1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN
1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT
1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK
1C90; C; 10D0; # GEORGIAN MTAVRULI CAPITAL LETTER AN
1C91; C; 10D1; # GEORGIAN MTAVRULI CAPITAL LETTER BAN
1C92; C; 10D2; # GEORGIAN MTAVRULI CAPITAL LETTER GAN
1C93; C; 10D3; # GEORGIAN MTAVRULI CAPITAL LETTER DON
1C94; C; 10D4; # GEORGIAN MTAVRULI CAPITAL LETTER EN
1C95; C; 10D5; # GEORGIAN MTAVRULI CAPITAL LETTER VIN
1C96; C; 10D6; # GEORGIAN MTAVRULI CAPITAL LETTER ZEN
1C97; C; 10D7; # GEORGIAN MTAVRULI CAPITAL LETTER TAN
1C98; C; 10D8; # GEORGIAN MTAVRULI CAPITAL LETTER IN
1C99; C; 10D9; # GEORGIAN MTAVRULI CAPITAL LETTER KAN
1C9A; C; 10DA; # GEORGIAN MTAVRULI CAPITAL LETTER LAS
1C9B; C; 10DB; # GEORGIAN MTAVRULI CAPITAL LETTER MAN
1C9C; C; 10DC; # GEORGIAN MTAVRULI CAPITAL LETTER NAR
1C9D; C; 10DD; # GEORGIAN MTAVRULI CAPITAL LETTER ON
1C9E; C; 10DE; # GEORGIAN MTAVRULI CAPITAL LETTER PAR
1C9F; C; 10DF; # GEORGIAN MTAVRULI CAPITAL LETTER ZHAR
1CA0; C; 10E0; # GEORGIAN MTAVRULI CAPITAL LETTER RAE
1CA1; C; 10E1; # GEORGIAN MTAVRULI CAPITAL LETTER SAN
1CA2; C; 10E2; # GEORGIAN MTAVRULI CAPITAL LETTER TAR
1CA3; C; 10E3; # GEORGIAN MTAVRULI CAPITAL LETTER UN
1CA4; C; 10E4; # GEORGIAN MTAVRULI CAPITAL LETTER PHAR
1CA5; C; 10E5; # GEORGIAN MTAVRULI CAPITAL LETTER KHAR
1CA6; C; 10E6; # GEORGIAN MTAVRULI CAPITAL LETTER GHAN
1CA7; C; 10E7; # GEORGIAN MTAVRULI CAPITAL LETTER QAR
1CA8; C; 10E8; # GEORGIAN MTAVRULI CAPITAL LETTER SHIN
1CA9; C; 10E9; # GEORGIAN MTAVRULI CAPITAL LETTER CHIN
1CAA; C; 10EA; # GEORGIAN MTAVRULI CAPITAL LETTER CAN
1CAB; C; 10EB; # GEORGIAN MTAVRULI CAPITAL LETTER JIL
1CAC; C; 10EC; # GEORGIAN MTAVRULI CAPITAL LETTER CIL
1CAD; C; 10ED; # GEORGIAN MTAVRULI CAPITAL LETTER CHAR
1CAE; C; 10EE; # GEORGIAN MTAVRULI CAPITAL LETTER XAN
1CAF; C; 10EF; # GEORGIAN MTAVRULI CAPITAL LETTER JHAN
1CB0; C; 10F0; # GEORGIAN MTAVRULI CAPITAL LETTER HAE
1CB1; C; 10F1; # GEORGIAN MTAVRULI CAPITAL LETTER HE
1CB2; C; 10F2; # GEORGIAN MTAVRULI CAPITAL LETTER HIE
1CB3; C; 10F3; # GEORGIAN MTAVRULI CAPITAL LETTER WE
1CB4; C; 10F4; # GEORGIAN MTAVRULI CAPITAL LETTER HAR
1CB5; C; 10F5; # GEORGIAN MTAVRULI CAPITAL LETTER HOE
1CB6; C; 10F6; # GEORGIAN MTAVRULI CAPITAL LETTER FI
1CB7; C; 10F7; # GEORGIAN MTAVRULI CAPITAL LETTER YN
1CB8; C; 10F8; # GEORGIAN MTAVRULI CAPITAL LETTER ELIFI
1CB9; C; 10F9; # GEORGIAN MTAVRULI CAPITAL LETTER TURNED GAN
1CBA; C; 10FA; # GEORGIAN MTAVRULI CAPITAL LETTER AIN
1CBD; C; 10FD; # GEORGIAN MTAVRULI CAPITAL LETTER AEN
1CBE; C; 10FE; # GEORGIAN MTAVRULI CAPITAL LETTER HARD SIGN
1CBF; C; 10FF; # GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN
1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW
1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE
1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW
@ -1180,6 +1226,7 @@ A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL
A7B3; C; AB53; # LATIN CAPITAL LETTER CHI
A7B4; C; A7B5; # LATIN CAPITAL LETTER BETA
A7B6; C; A7B7; # LATIN CAPITAL LETTER OMEGA
A7B8; C; A7B9; # LATIN CAPITAL LETTER U WITH STROKE
AB70; C; 13A0; # CHEROKEE SMALL LETTER A
AB71; C; 13A1; # CHEROKEE SMALL LETTER E
AB72; C; 13A2; # CHEROKEE SMALL LETTER I
@ -1457,6 +1504,38 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU
118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII
118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO
16E40; C; 16E60; # MEDEFAIDRIN CAPITAL LETTER M
16E41; C; 16E61; # MEDEFAIDRIN CAPITAL LETTER S
16E42; C; 16E62; # MEDEFAIDRIN CAPITAL LETTER V
16E43; C; 16E63; # MEDEFAIDRIN CAPITAL LETTER W
16E44; C; 16E64; # MEDEFAIDRIN CAPITAL LETTER ATIU
16E45; C; 16E65; # MEDEFAIDRIN CAPITAL LETTER Z
16E46; C; 16E66; # MEDEFAIDRIN CAPITAL LETTER KP
16E47; C; 16E67; # MEDEFAIDRIN CAPITAL LETTER P
16E48; C; 16E68; # MEDEFAIDRIN CAPITAL LETTER T
16E49; C; 16E69; # MEDEFAIDRIN CAPITAL LETTER G
16E4A; C; 16E6A; # MEDEFAIDRIN CAPITAL LETTER F
16E4B; C; 16E6B; # MEDEFAIDRIN CAPITAL LETTER I
16E4C; C; 16E6C; # MEDEFAIDRIN CAPITAL LETTER K
16E4D; C; 16E6D; # MEDEFAIDRIN CAPITAL LETTER A
16E4E; C; 16E6E; # MEDEFAIDRIN CAPITAL LETTER J
16E4F; C; 16E6F; # MEDEFAIDRIN CAPITAL LETTER E
16E50; C; 16E70; # MEDEFAIDRIN CAPITAL LETTER B
16E51; C; 16E71; # MEDEFAIDRIN CAPITAL LETTER C
16E52; C; 16E72; # MEDEFAIDRIN CAPITAL LETTER U
16E53; C; 16E73; # MEDEFAIDRIN CAPITAL LETTER YU
16E54; C; 16E74; # MEDEFAIDRIN CAPITAL LETTER L
16E55; C; 16E75; # MEDEFAIDRIN CAPITAL LETTER Q
16E56; C; 16E76; # MEDEFAIDRIN CAPITAL LETTER HP
16E57; C; 16E77; # MEDEFAIDRIN CAPITAL LETTER NY
16E58; C; 16E78; # MEDEFAIDRIN CAPITAL LETTER X
16E59; C; 16E79; # MEDEFAIDRIN CAPITAL LETTER D
16E5A; C; 16E7A; # MEDEFAIDRIN CAPITAL LETTER OE
16E5B; C; 16E7B; # MEDEFAIDRIN CAPITAL LETTER N
16E5C; C; 16E7C; # MEDEFAIDRIN CAPITAL LETTER R
16E5D; C; 16E7D; # MEDEFAIDRIN CAPITAL LETTER O
16E5E; C; 16E7E; # MEDEFAIDRIN CAPITAL LETTER AI
16E5F; C; 16E7F; # MEDEFAIDRIN CAPITAL LETTER Y
1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF
1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI
1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM

View File

@ -1,6 +1,6 @@
# DerivedGeneralCategory-10.0.0.txt
# Date: 2017-03-08, 08:41:49 GMT
# © 2017 Unicode®, Inc.
# DerivedGeneralCategory-11.0.0.txt
# Date: 2018-02-21, 05:34:04 GMT
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
@ -22,25 +22,23 @@
03A2 ; Cn # <reserved-03A2>
0530 ; Cn # <reserved-0530>
0557..0558 ; Cn # [2] <reserved-0557>..<reserved-0558>
0560 ; Cn # <reserved-0560>
0588 ; Cn # <reserved-0588>
058B..058C ; Cn # [2] <reserved-058B>..<reserved-058C>
0590 ; Cn # <reserved-0590>
05C8..05CF ; Cn # [8] <reserved-05C8>..<reserved-05CF>
05EB..05EF ; Cn # [5] <reserved-05EB>..<reserved-05EF>
05EB..05EE ; Cn # [4] <reserved-05EB>..<reserved-05EE>
05F5..05FF ; Cn # [11] <reserved-05F5>..<reserved-05FF>
061D ; Cn # <reserved-061D>
070E ; Cn # <reserved-070E>
074B..074C ; Cn # [2] <reserved-074B>..<reserved-074C>
07B2..07BF ; Cn # [14] <reserved-07B2>..<reserved-07BF>
07FB..07FF ; Cn # [5] <reserved-07FB>..<reserved-07FF>
07FB..07FC ; Cn # [2] <reserved-07FB>..<reserved-07FC>
082E..082F ; Cn # [2] <reserved-082E>..<reserved-082F>
083F ; Cn # <reserved-083F>
085C..085D ; Cn # [2] <reserved-085C>..<reserved-085D>
085F ; Cn # <reserved-085F>
086B..089F ; Cn # [53] <reserved-086B>..<reserved-089F>
08B5 ; Cn # <reserved-08B5>
08BE..08D3 ; Cn # [22] <reserved-08BE>..<reserved-08D3>
08BE..08D2 ; Cn # [21] <reserved-08BE>..<reserved-08D2>
0984 ; Cn # <reserved-0984>
098D..098E ; Cn # [2] <reserved-098D>..<reserved-098E>
0991..0992 ; Cn # [2] <reserved-0991>..<reserved-0992>
@ -54,7 +52,7 @@
09D8..09DB ; Cn # [4] <reserved-09D8>..<reserved-09DB>
09DE ; Cn # <reserved-09DE>
09E4..09E5 ; Cn # [2] <reserved-09E4>..<reserved-09E5>
09FE..0A00 ; Cn # [3] <reserved-09FE>..<reserved-0A00>
09FF..0A00 ; Cn # [2] <reserved-09FF>..<reserved-0A00>
0A04 ; Cn # <reserved-0A04>
0A0B..0A0E ; Cn # [4] <reserved-0A0B>..<reserved-0A0E>
0A11..0A12 ; Cn # [2] <reserved-0A11>..<reserved-0A12>
@ -70,7 +68,7 @@
0A52..0A58 ; Cn # [7] <reserved-0A52>..<reserved-0A58>
0A5D ; Cn # <reserved-0A5D>
0A5F..0A65 ; Cn # [7] <reserved-0A5F>..<reserved-0A65>
0A76..0A80 ; Cn # [11] <reserved-0A76>..<reserved-0A80>
0A77..0A80 ; Cn # [10] <reserved-0A77>..<reserved-0A80>
0A84 ; Cn # <reserved-0A84>
0A8E ; Cn # <reserved-0A8E>
0A92 ; Cn # <reserved-0A92>
@ -115,7 +113,6 @@
0BD1..0BD6 ; Cn # [6] <reserved-0BD1>..<reserved-0BD6>
0BD8..0BE5 ; Cn # [14] <reserved-0BD8>..<reserved-0BE5>
0BFB..0BFF ; Cn # [5] <reserved-0BFB>..<reserved-0BFF>
0C04 ; Cn # <reserved-0C04>
0C0D ; Cn # <reserved-0C0D>
0C11 ; Cn # <reserved-0C11>
0C29 ; Cn # <reserved-0C29>
@ -127,7 +124,6 @@
0C5B..0C5F ; Cn # [5] <reserved-0C5B>..<reserved-0C5F>
0C64..0C65 ; Cn # [2] <reserved-0C64>..<reserved-0C65>
0C70..0C77 ; Cn # [8] <reserved-0C70>..<reserved-0C77>
0C84 ; Cn # <reserved-0C84>
0C8D ; Cn # <reserved-0C8D>
0C91 ; Cn # <reserved-0C91>
0CA9 ; Cn # <reserved-0CA9>
@ -224,7 +220,7 @@
17FA..17FF ; Cn # [6] <reserved-17FA>..<reserved-17FF>
180F ; Cn # <reserved-180F>
181A..181F ; Cn # [6] <reserved-181A>..<reserved-181F>
1878..187F ; Cn # [8] <reserved-1878>..<reserved-187F>
1879..187F ; Cn # [7] <reserved-1879>..<reserved-187F>
18AB..18AF ; Cn # [5] <reserved-18AB>..<reserved-18AF>
18F6..18FF ; Cn # [10] <reserved-18F6>..<reserved-18FF>
191F ; Cn # <reserved-191F>
@ -248,7 +244,8 @@
1BF4..1BFB ; Cn # [8] <reserved-1BF4>..<reserved-1BFB>
1C38..1C3A ; Cn # [3] <reserved-1C38>..<reserved-1C3A>
1C4A..1C4C ; Cn # [3] <reserved-1C4A>..<reserved-1C4C>
1C89..1CBF ; Cn # [55] <reserved-1C89>..<reserved-1CBF>
1C89..1C8F ; Cn # [7] <reserved-1C89>..<reserved-1C8F>
1CBB..1CBC ; Cn # [2] <reserved-1CBB>..<reserved-1CBC>
1CC8..1CCF ; Cn # [8] <reserved-1CC8>..<reserved-1CCF>
1CFA..1CFF ; Cn # [6] <reserved-1CFA>..<reserved-1CFF>
1DFA ; Cn # <reserved-1DFA>
@ -279,10 +276,8 @@
244B..245F ; Cn # [21] <reserved-244B>..<reserved-245F>
2B74..2B75 ; Cn # [2] <reserved-2B74>..<reserved-2B75>
2B96..2B97 ; Cn # [2] <reserved-2B96>..<reserved-2B97>
2BBA..2BBC ; Cn # [3] <reserved-2BBA>..<reserved-2BBC>
2BC9 ; Cn # <reserved-2BC9>
2BD3..2BEB ; Cn # [25] <reserved-2BD3>..<reserved-2BEB>
2BF0..2BFF ; Cn # [16] <reserved-2BF0>..<reserved-2BFF>
2BFF ; Cn # <reserved-2BFF>
2C2F ; Cn # <reserved-2C2F>
2C5F ; Cn # <reserved-2C5F>
2CF4..2CF8 ; Cn # [5] <reserved-2CF4>..<reserved-2CF8>
@ -300,7 +295,7 @@
2DCF ; Cn # <reserved-2DCF>
2DD7 ; Cn # <reserved-2DD7>
2DDF ; Cn # <reserved-2DDF>
2E4A..2E7F ; Cn # [54] <reserved-2E4A>..<reserved-2E7F>
2E4F..2E7F ; Cn # [49] <reserved-2E4F>..<reserved-2E7F>
2E9A ; Cn # <reserved-2E9A>
2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF>
2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF>
@ -308,26 +303,24 @@
3040 ; Cn # <reserved-3040>
3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098>
3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104>
312F..3130 ; Cn # [2] <reserved-312F>..<reserved-3130>
3130 ; Cn # <reserved-3130>
318F ; Cn # <reserved-318F>
31BB..31BF ; Cn # [5] <reserved-31BB>..<reserved-31BF>
31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF>
321F ; Cn # <reserved-321F>
32FF ; Cn # <reserved-32FF>
4DB6..4DBF ; Cn # [10] <reserved-4DB6>..<reserved-4DBF>
9FEB..9FFF ; Cn # [21] <reserved-9FEB>..<reserved-9FFF>
9FF0..9FFF ; Cn # [16] <reserved-9FF0>..<reserved-9FFF>
A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F>
A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF>
A62C..A63F ; Cn # [20] <reserved-A62C>..<reserved-A63F>
A6F8..A6FF ; Cn # [8] <reserved-A6F8>..<reserved-A6FF>
A7AF ; Cn # <reserved-A7AF>
A7B8..A7F6 ; Cn # [63] <reserved-A7B8>..<reserved-A7F6>
A7BA..A7F6 ; Cn # [61] <reserved-A7BA>..<reserved-A7F6>
A82C..A82F ; Cn # [4] <reserved-A82C>..<reserved-A82F>
A83A..A83F ; Cn # [6] <reserved-A83A>..<reserved-A83F>
A878..A87F ; Cn # [8] <reserved-A878>..<reserved-A87F>
A8C6..A8CD ; Cn # [8] <reserved-A8C6>..<reserved-A8CD>
A8DA..A8DF ; Cn # [6] <reserved-A8DA>..<reserved-A8DF>
A8FE..A8FF ; Cn # [2] <reserved-A8FE>..<reserved-A8FF>
A954..A95E ; Cn # [11] <reserved-A954>..<reserved-A95E>
A97D..A97F ; Cn # [3] <reserved-A97D>..<reserved-A97F>
A9CE ; Cn # <reserved-A9CE>
@ -429,9 +422,9 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
10A07..10A0B ; Cn # [5] <reserved-10A07>..<reserved-10A0B>
10A14 ; Cn # <reserved-10A14>
10A18 ; Cn # <reserved-10A18>
10A34..10A37 ; Cn # [4] <reserved-10A34>..<reserved-10A37>
10A36..10A37 ; Cn # [2] <reserved-10A36>..<reserved-10A37>
10A3B..10A3E ; Cn # [4] <reserved-10A3B>..<reserved-10A3E>
10A48..10A4F ; Cn # [8] <reserved-10A48>..<reserved-10A4F>
10A49..10A4F ; Cn # [7] <reserved-10A49>..<reserved-10A4F>
10A59..10A5F ; Cn # [7] <reserved-10A59>..<reserved-10A5F>
10AA0..10ABF ; Cn # [32] <reserved-10AA0>..<reserved-10ABF>
10AE7..10AEA ; Cn # [4] <reserved-10AE7>..<reserved-10AEA>
@ -445,15 +438,19 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
10C49..10C7F ; Cn # [55] <reserved-10C49>..<reserved-10C7F>
10CB3..10CBF ; Cn # [13] <reserved-10CB3>..<reserved-10CBF>
10CF3..10CF9 ; Cn # [7] <reserved-10CF3>..<reserved-10CF9>
10D00..10E5F ; Cn # [352] <reserved-10D00>..<reserved-10E5F>
10E7F..10FFF ; Cn # [385] <reserved-10E7F>..<reserved-10FFF>
10D28..10D2F ; Cn # [8] <reserved-10D28>..<reserved-10D2F>
10D3A..10E5F ; Cn # [294] <reserved-10D3A>..<reserved-10E5F>
10E7F..10EFF ; Cn # [129] <reserved-10E7F>..<reserved-10EFF>
10F28..10F2F ; Cn # [8] <reserved-10F28>..<reserved-10F2F>
10F5A..10FFF ; Cn # [166] <reserved-10F5A>..<reserved-10FFF>
1104E..11051 ; Cn # [4] <reserved-1104E>..<reserved-11051>
11070..1107E ; Cn # [15] <reserved-11070>..<reserved-1107E>
110C2..110CF ; Cn # [14] <reserved-110C2>..<reserved-110CF>
110C2..110CC ; Cn # [11] <reserved-110C2>..<reserved-110CC>
110CE..110CF ; Cn # [2] <reserved-110CE>..<reserved-110CF>
110E9..110EF ; Cn # [7] <reserved-110E9>..<reserved-110EF>
110FA..110FF ; Cn # [6] <reserved-110FA>..<reserved-110FF>
11135 ; Cn # <reserved-11135>
11144..1114F ; Cn # [12] <reserved-11144>..<reserved-1114F>
11147..1114F ; Cn # [9] <reserved-11147>..<reserved-1114F>
11177..1117F ; Cn # [9] <reserved-11177>..<reserved-1117F>
111CE..111CF ; Cn # [2] <reserved-111CE>..<reserved-111CF>
111E0 ; Cn # <reserved-111E0>
@ -473,7 +470,7 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
11329 ; Cn # <reserved-11329>
11331 ; Cn # <reserved-11331>
11334 ; Cn # <reserved-11334>
1133A..1133B ; Cn # [2] <reserved-1133A>..<reserved-1133B>
1133A ; Cn # <reserved-1133A>
11345..11346 ; Cn # [2] <reserved-11345>..<reserved-11346>
11349..1134A ; Cn # [2] <reserved-11349>..<reserved-1134A>
1134E..1134F ; Cn # [2] <reserved-1134E>..<reserved-1134F>
@ -484,7 +481,7 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
11375..113FF ; Cn # [139] <reserved-11375>..<reserved-113FF>
1145A ; Cn # <reserved-1145A>
1145C ; Cn # <reserved-1145C>
1145E..1147F ; Cn # [34] <reserved-1145E>..<reserved-1147F>
1145F..1147F ; Cn # [33] <reserved-1145F>..<reserved-1147F>
114C8..114CF ; Cn # [8] <reserved-114C8>..<reserved-114CF>
114DA..1157F ; Cn # [166] <reserved-114DA>..<reserved-1157F>
115B6..115B7 ; Cn # [2] <reserved-115B6>..<reserved-115B7>
@ -494,14 +491,14 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1166D..1167F ; Cn # [19] <reserved-1166D>..<reserved-1167F>
116B8..116BF ; Cn # [8] <reserved-116B8>..<reserved-116BF>
116CA..116FF ; Cn # [54] <reserved-116CA>..<reserved-116FF>
1171A..1171C ; Cn # [3] <reserved-1171A>..<reserved-1171C>
1171B..1171C ; Cn # [2] <reserved-1171B>..<reserved-1171C>
1172C..1172F ; Cn # [4] <reserved-1172C>..<reserved-1172F>
11740..1189F ; Cn # [352] <reserved-11740>..<reserved-1189F>
11740..117FF ; Cn # [192] <reserved-11740>..<reserved-117FF>
1183C..1189F ; Cn # [100] <reserved-1183C>..<reserved-1189F>
118F3..118FE ; Cn # [12] <reserved-118F3>..<reserved-118FE>
11900..119FF ; Cn # [256] <reserved-11900>..<reserved-119FF>
11A48..11A4F ; Cn # [8] <reserved-11A48>..<reserved-11A4F>
11A84..11A85 ; Cn # [2] <reserved-11A84>..<reserved-11A85>
11A9D ; Cn # <reserved-11A9D>
11AA3..11ABF ; Cn # [29] <reserved-11AA3>..<reserved-11ABF>
11AF9..11BFF ; Cn # [263] <reserved-11AF9>..<reserved-11BFF>
11C09 ; Cn # <reserved-11C09>
@ -517,7 +514,14 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
11D3B ; Cn # <reserved-11D3B>
11D3E ; Cn # <reserved-11D3E>
11D48..11D4F ; Cn # [8] <reserved-11D48>..<reserved-11D4F>
11D5A..11FFF ; Cn # [678] <reserved-11D5A>..<reserved-11FFF>
11D5A..11D5F ; Cn # [6] <reserved-11D5A>..<reserved-11D5F>
11D66 ; Cn # <reserved-11D66>
11D69 ; Cn # <reserved-11D69>
11D8F ; Cn # <reserved-11D8F>
11D92 ; Cn # <reserved-11D92>
11D99..11D9F ; Cn # [7] <reserved-11D99>..<reserved-11D9F>
11DAA..11EDF ; Cn # [310] <reserved-11DAA>..<reserved-11EDF>
11EF9..11FFF ; Cn # [263] <reserved-11EF9>..<reserved-11FFF>
1239A..123FF ; Cn # [102] <reserved-1239A>..<reserved-123FF>
1246F ; Cn # <reserved-1246F>
12475..1247F ; Cn # [11] <reserved-12475>..<reserved-1247F>
@ -534,12 +538,13 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
16B5A ; Cn # <reserved-16B5A>
16B62 ; Cn # <reserved-16B62>
16B78..16B7C ; Cn # [5] <reserved-16B78>..<reserved-16B7C>
16B90..16EFF ; Cn # [880] <reserved-16B90>..<reserved-16EFF>
16B90..16E3F ; Cn # [688] <reserved-16B90>..<reserved-16E3F>
16E9B..16EFF ; Cn # [101] <reserved-16E9B>..<reserved-16EFF>
16F45..16F4F ; Cn # [11] <reserved-16F45>..<reserved-16F4F>
16F7F..16F8E ; Cn # [16] <reserved-16F7F>..<reserved-16F8E>
16FA0..16FDF ; Cn # [64] <reserved-16FA0>..<reserved-16FDF>
16FE2..16FFF ; Cn # [30] <reserved-16FE2>..<reserved-16FFF>
187ED..187FF ; Cn # [19] <reserved-187ED>..<reserved-187FF>
187F2..187FF ; Cn # [14] <reserved-187F2>..<reserved-187FF>
18AF3..1AFFF ; Cn # [9485] <reserved-18AF3>..<reserved-1AFFF>
1B11F..1B16F ; Cn # [81] <reserved-1B11F>..<reserved-1B16F>
1B2FC..1BBFF ; Cn # [2308] <reserved-1B2FC>..<reserved-1BBFF>
@ -551,9 +556,10 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1D0F6..1D0FF ; Cn # [10] <reserved-1D0F6>..<reserved-1D0FF>
1D127..1D128 ; Cn # [2] <reserved-1D127>..<reserved-1D128>
1D1E9..1D1FF ; Cn # [23] <reserved-1D1E9>..<reserved-1D1FF>
1D246..1D2FF ; Cn # [186] <reserved-1D246>..<reserved-1D2FF>
1D246..1D2DF ; Cn # [154] <reserved-1D246>..<reserved-1D2DF>
1D2F4..1D2FF ; Cn # [12] <reserved-1D2F4>..<reserved-1D2FF>
1D357..1D35F ; Cn # [9] <reserved-1D357>..<reserved-1D35F>
1D372..1D3FF ; Cn # [142] <reserved-1D372>..<reserved-1D3FF>
1D379..1D3FF ; Cn # [135] <reserved-1D379>..<reserved-1D3FF>
1D455 ; Cn # <reserved-1D455>
1D49D ; Cn # <reserved-1D49D>
1D4A0..1D4A1 ; Cn # [2] <reserved-1D4A0>..<reserved-1D4A1>
@ -586,7 +592,8 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1E8D7..1E8FF ; Cn # [41] <reserved-1E8D7>..<reserved-1E8FF>
1E94B..1E94F ; Cn # [5] <reserved-1E94B>..<reserved-1E94F>
1E95A..1E95D ; Cn # [4] <reserved-1E95A>..<reserved-1E95D>
1E960..1EDFF ; Cn # [1184] <reserved-1E960>..<reserved-1EDFF>
1E960..1EC70 ; Cn # [785] <reserved-1E960>..<reserved-1EC70>
1ECB5..1EDFF ; Cn # [331] <reserved-1ECB5>..<reserved-1EDFF>
1EE04 ; Cn # <reserved-1EE04>
1EE20 ; Cn # <reserved-1EE20>
1EE23 ; Cn # <reserved-1EE23>
@ -628,7 +635,6 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1F0D0 ; Cn # <reserved-1F0D0>
1F0F6..1F0FF ; Cn # [10] <reserved-1F0F6>..<reserved-1F0FF>
1F10D..1F10F ; Cn # [3] <reserved-1F10D>..<reserved-1F10F>
1F12F ; Cn # <reserved-1F12F>
1F16C..1F16F ; Cn # [4] <reserved-1F16C>..<reserved-1F16F>
1F1AD..1F1E5 ; Cn # [57] <reserved-1F1AD>..<reserved-1F1E5>
1F203..1F20F ; Cn # [13] <reserved-1F203>..<reserved-1F20F>
@ -638,9 +644,9 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1F266..1F2FF ; Cn # [154] <reserved-1F266>..<reserved-1F2FF>
1F6D5..1F6DF ; Cn # [11] <reserved-1F6D5>..<reserved-1F6DF>
1F6ED..1F6EF ; Cn # [3] <reserved-1F6ED>..<reserved-1F6EF>
1F6F9..1F6FF ; Cn # [7] <reserved-1F6F9>..<reserved-1F6FF>
1F6FA..1F6FF ; Cn # [6] <reserved-1F6FA>..<reserved-1F6FF>
1F774..1F77F ; Cn # [12] <reserved-1F774>..<reserved-1F77F>
1F7D5..1F7FF ; Cn # [43] <reserved-1F7D5>..<reserved-1F7FF>
1F7D9..1F7FF ; Cn # [39] <reserved-1F7D9>..<reserved-1F7FF>
1F80C..1F80F ; Cn # [4] <reserved-1F80C>..<reserved-1F80F>
1F848..1F84F ; Cn # [8] <reserved-1F848>..<reserved-1F84F>
1F85A..1F85F ; Cn # [6] <reserved-1F85A>..<reserved-1F85F>
@ -648,11 +654,14 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1F8AE..1F8FF ; Cn # [82] <reserved-1F8AE>..<reserved-1F8FF>
1F90C..1F90F ; Cn # [4] <reserved-1F90C>..<reserved-1F90F>
1F93F ; Cn # <reserved-1F93F>
1F94D..1F94F ; Cn # [3] <reserved-1F94D>..<reserved-1F94F>
1F96C..1F97F ; Cn # [20] <reserved-1F96C>..<reserved-1F97F>
1F998..1F9BF ; Cn # [40] <reserved-1F998>..<reserved-1F9BF>
1F9C1..1F9CF ; Cn # [15] <reserved-1F9C1>..<reserved-1F9CF>
1F9E7..1FFFF ; Cn # [1561] <reserved-1F9E7>..<noncharacter-1FFFF>
1F971..1F972 ; Cn # [2] <reserved-1F971>..<reserved-1F972>
1F977..1F979 ; Cn # [3] <reserved-1F977>..<reserved-1F979>
1F97B ; Cn # <reserved-1F97B>
1F9A3..1F9AF ; Cn # [13] <reserved-1F9A3>..<reserved-1F9AF>
1F9BA..1F9BF ; Cn # [6] <reserved-1F9BA>..<reserved-1F9BF>
1F9C3..1F9CF ; Cn # [13] <reserved-1F9C3>..<reserved-1F9CF>
1FA00..1FA5F ; Cn # [96] <reserved-1FA00>..<reserved-1FA5F>
1FA6E..1FFFF ; Cn # [1426] <reserved-1FA6E>..<noncharacter-1FFFF>
2A6D7..2A6FF ; Cn # [41] <reserved-2A6D7>..<reserved-2A6FF>
2B735..2B73F ; Cn # [11] <reserved-2B735>..<reserved-2B73F>
2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F>
@ -665,7 +674,7 @@ E01F0..EFFFF ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF>
FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>
# Total code points: 837841
# Total code points: 837157
# ================================================
@ -947,6 +956,8 @@ FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
10C7 ; Lu # GEORGIAN CAPITAL LETTER YN
10CD ; Lu # GEORGIAN CAPITAL LETTER AEN
13A0..13F5 ; Lu # [86] CHEROKEE LETTER A..CHEROKEE LETTER MV
1C90..1CBA ; Lu # [43] GEORGIAN MTAVRULI CAPITAL LETTER AN..GEORGIAN MTAVRULI CAPITAL LETTER AIN
1CBD..1CBF ; Lu # [3] GEORGIAN MTAVRULI CAPITAL LETTER AEN..GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN
1E00 ; Lu # LATIN CAPITAL LETTER A WITH RING BELOW
1E02 ; Lu # LATIN CAPITAL LETTER B WITH DOT ABOVE
1E04 ; Lu # LATIN CAPITAL LETTER B WITH DOT BELOW
@ -1261,11 +1272,13 @@ A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE
A7AA..A7AE ; Lu # [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA
A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA
A7B8 ; Lu # LATIN CAPITAL LETTER U WITH STROKE
FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW
104B0..104D3 ; Lu # [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US
118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO
16E40..16E5F ; Lu # [32] MEDEFAIDRIN CAPITAL LETTER M..MEDEFAIDRIN CAPITAL LETTER Y
1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z
1D434..1D44D ; Lu # [26] MATHEMATICAL ITALIC CAPITAL A..MATHEMATICAL ITALIC CAPITAL Z
1D468..1D481 ; Lu # [26] MATHEMATICAL BOLD ITALIC CAPITAL A..MATHEMATICAL BOLD ITALIC CAPITAL Z
@ -1299,7 +1312,7 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA
1E900..1E921 ; Lu # [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA
# Total code points: 1702
# Total code points: 1781
# ================================================
@ -1574,7 +1587,9 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
052B ; Ll # CYRILLIC SMALL LETTER DZZHE
052D ; Ll # CYRILLIC SMALL LETTER DCHE
052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER
0561..0587 ; Ll # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
0560..0588 ; Ll # [41] ARMENIAN SMALL LETTER TURNED AYB..ARMENIAN SMALL LETTER YI WITH STROKE
10D0..10FA ; Ll # [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN
10FD..10FF ; Ll # [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN
13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
1C80..1C88 ; Ll # [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL
@ -1896,8 +1911,10 @@ A7A3 ; Ll # LATIN SMALL LETTER K WITH OBLIQUE STROKE
A7A5 ; Ll # LATIN SMALL LETTER N WITH OBLIQUE STROKE
A7A7 ; Ll # LATIN SMALL LETTER R WITH OBLIQUE STROKE
A7A9 ; Ll # LATIN SMALL LETTER S WITH OBLIQUE STROKE
A7AF ; Ll # LATIN LETTER SMALL CAPITAL Q
A7B5 ; Ll # LATIN SMALL LETTER BETA
A7B7 ; Ll # LATIN SMALL LETTER OMEGA
A7B9 ; Ll # LATIN SMALL LETTER U WITH STROKE
A7FA ; Ll # LATIN LETTER SMALL CAPITAL TURNED M
AB30..AB5A ; Ll # [43] LATIN SMALL LETTER BARRED ALPHA..LATIN SMALL LETTER Y WITH SHORT RIGHT LEG
AB60..AB65 ; Ll # [6] LATIN SMALL LETTER SAKHA YAT..GREEK LETTER SMALL CAPITAL OMEGA
@ -1909,6 +1926,7 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL
104D8..104FB ; Ll # [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US
118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO
16E60..16E7F ; Ll # [32] MEDEFAIDRIN SMALL LETTER M..MEDEFAIDRIN SMALL LETTER Y
1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z
1D44E..1D454 ; Ll # [7] MATHEMATICAL ITALIC SMALL A..MATHEMATICAL ITALIC SMALL G
1D456..1D467 ; Ll # [18] MATHEMATICAL ITALIC SMALL I..MATHEMATICAL ITALIC SMALL Z
@ -1939,7 +1957,7 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL
1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA
1E922..1E943 ; Ll # [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA
# Total code points: 2063
# Total code points: 2145
# ================================================
@ -2032,7 +2050,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
01C0..01C3 ; Lo # [4] LATIN LETTER DENTAL CLICK..LATIN LETTER RETROFLEX CLICK
0294 ; Lo # LATIN LETTER GLOTTAL STOP
05D0..05EA ; Lo # [27] HEBREW LETTER ALEF..HEBREW LETTER TAV
05F0..05F2 ; Lo # [3] HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW LIGATURE YIDDISH DOUBLE YOD
05EF..05F2 ; Lo # [4] HEBREW YOD TRIANGLE..HEBREW LIGATURE YIDDISH DOUBLE YOD
0620..063F ; Lo # [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE
0641..064A ; Lo # [10] ARABIC LETTER FEH..ARABIC LETTER YEH
066E..066F ; Lo # [2] ARABIC LETTER DOTLESS BEH..ARABIC LETTER DOTLESS QAF
@ -2171,8 +2189,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
106E..1070 ; Lo # [3] MYANMAR LETTER EASTERN PWO KAREN NNA..MYANMAR LETTER EASTERN PWO KAREN GHWA
1075..1081 ; Lo # [13] MYANMAR LETTER SHAN KA..MYANMAR LETTER SHAN HA
108E ; Lo # MYANMAR LETTER RUMAI PALAUNG FA
10D0..10FA ; Lo # [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN
10FD..1248 ; Lo # [332] GEORGIAN LETTER AEN..ETHIOPIC SYLLABLE QWA
1100..1248 ; Lo # [329] HANGUL CHOSEONG KIYEOK..ETHIOPIC SYLLABLE QWA
124A..124D ; Lo # [4] ETHIOPIC SYLLABLE QWI..ETHIOPIC SYLLABLE QWE
1250..1256 ; Lo # [7] ETHIOPIC SYLLABLE QHA..ETHIOPIC SYLLABLE QHO
1258 ; Lo # ETHIOPIC SYLLABLE QHWA
@ -2203,7 +2220,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
1780..17B3 ; Lo # [52] KHMER LETTER KA..KHMER INDEPENDENT VOWEL QAU
17DC ; Lo # KHMER SIGN AVAKRAHASANYA
1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1844..1877 ; Lo # [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
1844..1878 ; Lo # [53] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER CHA WITH TWO DOTS
1880..1884 ; Lo # [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
1887..18A8 ; Lo # [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA
@ -2243,12 +2260,12 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
309F ; Lo # HIRAGANA DIGRAPH YORI
30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
30FF ; Lo # KATAKANA DIGRAPH KOTO
3105..312E ; Lo # [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
3105..312F ; Lo # [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
31A0..31BA ; Lo # [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
3400..4DB5 ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4E00..9FEA ; Lo # [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
4E00..9FEF ; Lo # [20976] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEF
A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E
A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE
@ -2267,7 +2284,7 @@ A840..A873 ; Lo # [52] PHAGS-PA LETTER KA..PHAGS-PA LETTER CANDRABINDU
A882..A8B3 ; Lo # [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA
A8F2..A8F7 ; Lo # [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVANAGARI SIGN CANDRABINDU AVAGRAHA
A8FB ; Lo # DEVANAGARI HEADSTROKE
A8FD ; Lo # DEVANAGARI JAIN OM
A8FD..A8FE ; Lo # [2] DEVANAGARI JAIN OM..DEVANAGARI LETTER AY
A90A..A925 ; Lo # [28] KAYAH LI LETTER KA..KAYAH LI LETTER OO
A930..A946 ; Lo # [23] REJANG LETTER KA..REJANG LETTER A
A960..A97C ; Lo # [29] HANGUL CHOSEONG TIKEUT-MIEUM..HANGUL CHOSEONG SSANGYEORINHIEUH
@ -2361,7 +2378,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
10A00 ; Lo # KHAROSHTHI LETTER A
10A10..10A13 ; Lo # [4] KHAROSHTHI LETTER KA..KHAROSHTHI LETTER GHA
10A15..10A17 ; Lo # [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA
10A19..10A33 ; Lo # [27] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTTHA
10A19..10A35 ; Lo # [29] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER VHA
10A60..10A7C ; Lo # [29] OLD SOUTH ARABIAN LETTER HE..OLD SOUTH ARABIAN LETTER THETH
10A80..10A9C ; Lo # [29] OLD NORTH ARABIAN LETTER HEH..OLD NORTH ARABIAN LETTER ZAH
10AC0..10AC7 ; Lo # [8] MANICHAEAN LETTER ALEPH..MANICHAEAN LETTER WAW
@ -2371,10 +2388,15 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
10B60..10B72 ; Lo # [19] INSCRIPTIONAL PAHLAVI LETTER ALEPH..INSCRIPTIONAL PAHLAVI LETTER TAW
10B80..10B91 ; Lo # [18] PSALTER PAHLAVI LETTER ALEPH..PSALTER PAHLAVI LETTER TAW
10C00..10C48 ; Lo # [73] OLD TURKIC LETTER ORKHON A..OLD TURKIC LETTER ORKHON BASH
10D00..10D23 ; Lo # [36] HANIFI ROHINGYA LETTER A..HANIFI ROHINGYA MARK NA KHONNA
10F00..10F1C ; Lo # [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F27 ; Lo # OLD SOGDIAN LIGATURE AYIN-DALETH
10F30..10F45 ; Lo # [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN
11003..11037 ; Lo # [53] BRAHMI SIGN JIHVAMULIYA..BRAHMI LETTER OLD TAMIL NNNA
11083..110AF ; Lo # [45] KAITHI LETTER A..KAITHI LETTER HA
110D0..110E8 ; Lo # [25] SORA SOMPENG LETTER SAH..SORA SOMPENG LETTER MAE
11103..11126 ; Lo # [36] CHAKMA LETTER AA..CHAKMA LETTER HAA
11144 ; Lo # CHAKMA LETTER LHAA
11150..11172 ; Lo # [35] MAHAJANI LETTER A..MAHAJANI LETTER RRA
11176 ; Lo # MAHAJANI LIGATURE SHRI
11183..111B2 ; Lo # [48] SHARADA LETTER A..SHARADA LETTER HA
@ -2408,7 +2430,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
11600..1162F ; Lo # [48] MODI LETTER A..MODI LETTER LLA
11644 ; Lo # MODI SIGN HUVA
11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA
11700..11719 ; Lo # [26] AHOM LETTER KA..AHOM LETTER JHA
11700..1171A ; Lo # [27] AHOM LETTER KA..AHOM LETTER ALTERNATE BA
11800..1182B ; Lo # [44] DOGRA LETTER A..DOGRA LETTER RRA
118FF ; Lo # WARANG CITI OM
11A00 ; Lo # ZANABAZAR SQUARE LETTER A
11A0B..11A32 ; Lo # [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
@ -2416,6 +2439,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
11A50 ; Lo # SOYOMBO LETTER A
11A5C..11A83 ; Lo # [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
11A86..11A89 ; Lo # [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11A9D ; Lo # SOYOMBO MARK PLUTA
11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
11C00..11C08 ; Lo # [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
11C0A..11C2E ; Lo # [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
@ -2425,6 +2449,11 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
11D08..11D09 ; Lo # [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
11D0B..11D30 ; Lo # [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
11D46 ; Lo # MASARAM GONDI REPHA
11D60..11D65 ; Lo # [6] GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER UU
11D67..11D68 ; Lo # [2] GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTER AI
11D6A..11D89 ; Lo # [32] GUNJALA GONDI LETTER OO..GUNJALA GONDI LETTER SA
11D98 ; Lo # GUNJALA GONDI OM
11EE0..11EF2 ; Lo # [19] MAKASAR LETTER KA..MAKASAR ANGKA
12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U
12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
@ -2437,7 +2466,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ
16F00..16F44 ; Lo # [69] MIAO LETTER PA..MIAO LETTER HHA
16F50 ; Lo # MIAO LETTER NASALIZATION
17000..187EC ; Lo # [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
17000..187F1 ; Lo # [6130] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187F1
18800..18AF2 ; Lo # [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
1B000..1B11E ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2
1B170..1B2FB ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
@ -2486,7 +2515,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
# Total code points: 121047
# Total code points: 121212
# ================================================
@ -2510,12 +2539,13 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0730..074A ; Mn # [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
07A6..07B0 ; Mn # [11] THAANA ABAFILI..THAANA SUKUN
07EB..07F3 ; Mn # [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE
07FD ; Mn # NKO DANTAYALAN
0816..0819 ; Mn # [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH
081B..0823 ; Mn # [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A
0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
08D4..08E1 ; Mn # [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08D3..08E1 ; Mn # [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA
08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
093A ; Mn # DEVANAGARI VOWEL SIGN OE
093C ; Mn # DEVANAGARI SIGN NUKTA
@ -2528,6 +2558,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
09C1..09C4 ; Mn # [4] BENGALI VOWEL SIGN U..BENGALI VOWEL SIGN VOCALIC RR
09CD ; Mn # BENGALI SIGN VIRAMA
09E2..09E3 ; Mn # [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL
09FE ; Mn # BENGALI SANDHI MARK
0A01..0A02 ; Mn # [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI
0A3C ; Mn # GURMUKHI SIGN NUKTA
0A41..0A42 ; Mn # [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU
@ -2554,6 +2585,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0BC0 ; Mn # TAMIL VOWEL SIGN II
0BCD ; Mn # TAMIL SIGN VIRAMA
0C00 ; Mn # TELUGU SIGN COMBINING CANDRABINDU ABOVE
0C04 ; Mn # TELUGU SIGN COMBINING ANUSVARA ABOVE
0C3E..0C40 ; Mn # [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II
0C46..0C48 ; Mn # [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
0C4A..0C4D ; Mn # [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
@ -2670,6 +2702,7 @@ A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA
A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A8C4..A8C5 ; Mn # [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
A8FF ; Mn # DEVANAGARI VOWEL SIGN AY
A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
A980..A982 ; Mn # [3] JAVANESE SIGN PANYANGGA..JAVANESE SIGN LAYAR
@ -2705,6 +2738,8 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
10A38..10A3A ; Mn # [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
10A3F ; Mn # KHAROSHTHI VIRAMA
10AE5..10AE6 ; Mn # [2] MANICHAEAN ABBREVIATION MARK ABOVE..MANICHAEAN ABBREVIATION MARK BELOW
10D24..10D27 ; Mn # [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
10F46..10F50 ; Mn # [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
11001 ; Mn # BRAHMI SIGN ANUSVARA
11038..11046 ; Mn # [15] BRAHMI VOWEL SIGN AA..BRAHMI VIRAMA
1107F..11081 ; Mn # [3] BRAHMI NUMBER JOINER..KAITHI SIGN ANUSVARA
@ -2716,7 +2751,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
11173 ; Mn # MAHAJANI SIGN NUKTA
11180..11181 ; Mn # [2] SHARADA SIGN CANDRABINDU..SHARADA SIGN ANUSVARA
111B6..111BE ; Mn # [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O
111CA..111CC ; Mn # [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK
111C9..111CC ; Mn # [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK
1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
11234 ; Mn # KHOJKI SIGN ANUSVARA
11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
@ -2724,13 +2759,14 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
112DF ; Mn # KHUDAWADI SIGN ANUSVARA
112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
1133C ; Mn # GRANTHA SIGN NUKTA
1133B..1133C ; Mn # [2] COMBINING BINDU BELOW..GRANTHA SIGN NUKTA
11340 ; Mn # GRANTHA VOWEL SIGN II
11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
11438..1143F ; Mn # [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11442..11444 ; Mn # [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11446 ; Mn # NEWA SIGN NUKTA
1145E ; Mn # NEWA SANDHI MARK
114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E
114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA
@ -2749,8 +2785,9 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11A01..11A06 ; Mn # [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A09..11A0A ; Mn # [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
1182F..11837 ; Mn # [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA
11839..1183A ; Mn # [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA
11A01..11A0A ; Mn # [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A33..11A38 ; Mn # [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A3B..11A3E ; Mn # [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A47 ; Mn # ZANABAZAR SQUARE SUBJOINER
@ -2770,6 +2807,10 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
11D3C..11D3D ; Mn # [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Mn # [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D47 ; Mn # MASARAM GONDI RA-KARA
11D90..11D91 ; Mn # [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI
11D95 ; Mn # GUNJALA GONDI SIGN ANUSVARA
11D97 ; Mn # GUNJALA GONDI VIRAMA
11EF3..11EF4 ; Mn # [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U
16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -2794,7 +2835,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1E944..1E94A ; Mn # [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 1763
# Total code points: 1805
# ================================================
@ -2928,6 +2969,7 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
110B0..110B2 ; Mc # [3] KAITHI VOWEL SIGN AA..KAITHI VOWEL SIGN II
110B7..110B8 ; Mc # [2] KAITHI VOWEL SIGN O..KAITHI VOWEL SIGN AU
1112C ; Mc # CHAKMA VOWEL SIGN E
11145..11146 ; Mc # [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI
11182 ; Mc # SHARADA SIGN VISARGA
111B3..111B5 ; Mc # [3] SHARADA VOWEL SIGN AA..SHARADA VOWEL SIGN II
111BF..111C0 ; Mc # [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA
@ -2960,7 +3002,8 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
116B6 ; Mc # TAKRI SIGN VIRAMA
11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11726 ; Mc # AHOM VOWEL SIGN E
11A07..11A08 ; Mc # [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
1182C..1182E ; Mc # [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II
11838 ; Mc # DOGRA SIGN VISARGA
11A39 ; Mc # ZANABAZAR SQUARE SIGN VISARGA
11A57..11A58 ; Mc # [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A97 ; Mc # SOYOMBO SIGN VISARGA
@ -2969,11 +3012,15 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
11CA9 ; Mc # MARCHEN SUBJOINED LETTER YA
11CB1 ; Mc # MARCHEN VOWEL SIGN I
11CB4 ; Mc # MARCHEN VOWEL SIGN O
11D8A..11D8E ; Mc # [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU
11D93..11D94 ; Mc # [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU
11D96 ; Mc # GUNJALA GONDI SIGN VISARGA
11EF5..11EF6 ; Mc # [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O
16F51..16F7E ; Mc # [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5
# Total code points: 401
# Total code points: 415
# ================================================
@ -3017,6 +3064,7 @@ AA50..AA59 ; Nd # [10] CHAM DIGIT ZERO..CHAM DIGIT NINE
ABF0..ABF9 ; Nd # [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT NINE
FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
104A0..104A9 ; Nd # [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE
10D30..10D39 ; Nd # [10] HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA DIGIT NINE
11066..1106F ; Nd # [10] BRAHMI DIGIT ZERO..BRAHMI DIGIT NINE
110F0..110F9 ; Nd # [10] SORA SOMPENG DIGIT ZERO..SORA SOMPENG DIGIT NINE
11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE
@ -3030,12 +3078,13 @@ FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
11C50..11C59 ; Nd # [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
11D50..11D59 ; Nd # [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
11DA0..11DA9 ; Nd # [10] GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGIT NINE
16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE
16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE
1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
1E950..1E959 ; Nd # [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
# Total code points: 590
# Total code points: 610
# ================================================
@ -3102,7 +3151,7 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
109BC..109BD ; No # [2] MEROITIC CURSIVE FRACTION ELEVEN TWELFTHS..MEROITIC CURSIVE FRACTION ONE HALF
109C0..109CF ; No # [16] MEROITIC CURSIVE NUMBER ONE..MEROITIC CURSIVE NUMBER SEVENTY
109D2..109FF ; No # [46] MEROITIC CURSIVE NUMBER ONE HUNDRED..MEROITIC CURSIVE FRACTION TEN TWELFTHS
10A40..10A47 ; No # [8] KHAROSHTHI DIGIT ONE..KHAROSHTHI NUMBER ONE THOUSAND
10A40..10A48 ; No # [9] KHAROSHTHI DIGIT ONE..KHAROSHTHI FRACTION ONE HALF
10A7D..10A7E ; No # [2] OLD SOUTH ARABIAN NUMBER ONE..OLD SOUTH ARABIAN NUMBER FIFTY
10A9D..10A9F ; No # [3] OLD NORTH ARABIAN NUMBER ONE..OLD NORTH ARABIAN NUMBER TWENTY
10AEB..10AEF ; No # [5] MANICHAEAN NUMBER ONE..MANICHAEAN NUMBER ONE HUNDRED
@ -3111,17 +3160,24 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
10BA9..10BAF ; No # [7] PSALTER PAHLAVI NUMBER ONE..PSALTER PAHLAVI NUMBER ONE HUNDRED
10CFA..10CFF ; No # [6] OLD HUNGARIAN NUMBER ONE..OLD HUNGARIAN NUMBER ONE THOUSAND
10E60..10E7E ; No # [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS
10F1D..10F26 ; No # [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF
10F51..10F54 ; No # [4] SOGDIAN NUMBER ONE..SOGDIAN NUMBER ONE HUNDRED
11052..11065 ; No # [20] BRAHMI NUMBER ONE..BRAHMI NUMBER ONE THOUSAND
111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND
1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
11C5A..11C6C ; No # [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS
1D360..1D371 ; No # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
16E80..16E96 ; No # [23] MEDEFAIDRIN DIGIT ZERO..MEDEFAIDRIN DIGIT THREE ALTERNATE FORM
1D2E0..1D2F3 ; No # [20] MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN
1D360..1D378 ; No # [25] COUNTING ROD UNIT DIGIT ONE..TALLY MARK FIVE
1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
1EC71..1ECAB ; No # [59] INDIC SIYAQ NUMBER ONE..INDIC SIYAQ NUMBER PREFIXED NINE
1ECAD..1ECAF ; No # [3] INDIC SIYAQ FRACTION ONE QUARTER..INDIC SIYAQ FRACTION THREE QUARTERS
1ECB1..1ECB4 ; No # [4] INDIC SIYAQ NUMBER ALTERNATE ONE..INDIC SIYAQ ALTERNATE LAKH MARK
1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
# Total code points: 676
# Total code points: 807
# ================================================
@ -3180,12 +3236,13 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
FEFF ; Cf # ZERO WIDTH NO-BREAK SPACE
FFF9..FFFB ; Cf # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR
110BD ; Cf # KAITHI NUMBER SIGN
110CD ; Cf # KAITHI NUMBER SIGN ABOVE
1BCA0..1BCA3 ; Cf # [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP
1D173..1D17A ; Cf # [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
E0001 ; Cf # LANGUAGE TAG
E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG
# Total code points: 151
# Total code points: 152
# ================================================
@ -3440,7 +3497,9 @@ FF3F ; Pc # FULLWIDTH LOW LINE
0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
0970 ; Po # DEVANAGARI ABBREVIATION SIGN
09FD ; Po # BENGALI ABBREVIATION SIGN
0A76 ; Po # GURMUKHI ABBREVIATION SIGN
0AF0 ; Po # GUJARATI ABBREVIATION SIGN
0C84 ; Po # KANNADA SIGN SIDDHAM
0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA
0E4F ; Po # THAI CHARACTER FONGMAN
0E5A..0E5B ; Po # [2] THAI CHARACTER ANGKHANKHU..THAI CHARACTER KHOMUT
@ -3491,7 +3550,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN
2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM
2E41 ; Po # REVERSED COMMA
2E43..2E49 ; Po # [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
2E43..2E4E ; Po # [12] DASH WITH LEFT UPTURN..PUNCTUS ELEVATUS MARK
3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK
303D ; Po # PART ALTERNATION MARK
30FB ; Po # KATAKANA MIDDLE DOT
@ -3544,12 +3603,13 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
10AF0..10AF6 ; Po # [7] MANICHAEAN PUNCTUATION STAR..MANICHAEAN PUNCTUATION LINE FILLER
10B39..10B3F ; Po # [7] AVESTAN ABBREVIATION MARK..LARGE ONE RING OVER TWO RINGS PUNCTUATION
10B99..10B9C ; Po # [4] PSALTER PAHLAVI SECTION MARK..PSALTER PAHLAVI FOUR DOTS WITH DOT
10F55..10F59 ; Po # [5] SOGDIAN PUNCTUATION TWO VERTICAL BARS..SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT
11047..1104D ; Po # [7] BRAHMI DANDA..BRAHMI PUNCTUATION LOTUS
110BB..110BC ; Po # [2] KAITHI ABBREVIATION SIGN..KAITHI ENUMERATION SIGN
110BE..110C1 ; Po # [4] KAITHI SECTION MARK..KAITHI DOUBLE DANDA
11140..11143 ; Po # [4] CHAKMA SECTION MARK..CHAKMA QUESTION MARK
11174..11175 ; Po # [2] MAHAJANI ABBREVIATION SIGN..MAHAJANI SECTION MARK
111C5..111C9 ; Po # [5] SHARADA DANDA..SHARADA SANDHI MARK
111C5..111C8 ; Po # [4] SHARADA DANDA..SHARADA SEPARATOR
111CD ; Po # SHARADA SUTRA MARK
111DB ; Po # SHARADA SIGN SIDDHAM
111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2
@ -3563,21 +3623,24 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN
11660..1166C ; Po # [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
1183B ; Po # DOGRA ABBREVIATION SIGN
11A3F..11A46 ; Po # [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
11A9A..11A9C ; Po # [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9E..11AA2 ; Po # [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
11C41..11C45 ; Po # [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
11C70..11C71 ; Po # [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
11EF7..11EF8 ; Po # [2] MAKASAR PASSIMBANG..MAKASAR END OF SECTION
12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA
16AF5 ; Po # BASSA VAH FULL STOP
16B37..16B3B ; Po # [5] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS FEEM
16B44 ; Po # PAHAWH HMONG SIGN XAUS
16E97..16E9A ; Po # [4] MEDEFAIDRIN COMMA..MEDEFAIDRIN EXCLAMATION OH
1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
1E95E..1E95F ; Po # [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
# Total code points: 566
# Total code points: 584
# ================================================
@ -3658,6 +3721,7 @@ FFE9..FFEC ; Sm # [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW
00A2..00A5 ; Sc # [4] CENT SIGN..YEN SIGN
058F ; Sc # ARMENIAN DRAM SIGN
060B ; Sc # AFGHANI SIGN
07FE..07FF ; Sc # [2] NKO DOROME SIGN..NKO TAMAN SIGN
09F2..09F3 ; Sc # [2] BENGALI RUPEE MARK..BENGALI RUPEE SIGN
09FB ; Sc # BENGALI GANDA MARK
0AF1 ; Sc # GUJARATI RUPEE SIGN
@ -3671,8 +3735,9 @@ FE69 ; Sc # SMALL DOLLAR SIGN
FF04 ; Sc # FULLWIDTH DOLLAR SIGN
FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN
FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN
1ECB0 ; Sc # INDIC SIYAQ RUPEE MARK
# Total code points: 54
# Total code points: 57
# ================================================
@ -3793,10 +3858,8 @@ FFE3 ; Sk # FULLWIDTH MACRON
2B45..2B46 ; So # [2] LEFTWARDS QUADRUPLE ARROW..RIGHTWARDS QUADRUPLE ARROW
2B4D..2B73 ; So # [39] DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW..DOWNWARDS TRIANGLE-HEADED ARROW TO BAR
2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BB9 ; So # [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8 ; So # [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD2 ; So # [9] TOP HALF BLACK CIRCLE..GROUP MARK
2BEC..2BEF ; So # [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2B98..2BC8 ; So # [49] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BFE ; So # [53] TOP HALF BLACK CIRCLE..REVERSED RIGHT ANGLE
2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA
2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP
2E9B..2EF3 ; So # [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
@ -3855,14 +3918,14 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1DA6D..1DA74 ; So # [8] SIGNWRITING SHOULDER HIP SPINE..SIGNWRITING TORSO-FLOORPLANE TWISTING
1DA76..1DA83 ; So # [14] SIGNWRITING LIMB COMBINATION..SIGNWRITING LOCATION DEPTH
1DA85..1DA86 ; So # [2] SIGNWRITING LOCATION TORSO..SIGNWRITING LOCATION LIMBS DIGITS
1ECAC ; So # INDIC SIYAQ PLACEHOLDER
1F000..1F02B ; So # [44] MAHJONG TILE EAST WIND..MAHJONG TILE BACK
1F030..1F093 ; So # [100] DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06
1F0A0..1F0AE ; So # [15] PLAYING CARD BACK..PLAYING CARD KING OF SPADES
1F0B1..1F0BF ; So # [15] PLAYING CARD ACE OF HEARTS..PLAYING CARD RED JOKER
1F0C1..1F0CF ; So # [15] PLAYING CARD ACE OF DIAMONDS..PLAYING CARD BLACK JOKER
1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
1F110..1F12E ; So # [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
1F130..1F16B ; So # [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F110..1F16B ; So # [92] PARENTHESIZED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F170..1F1AC ; So # [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA
1F210..1F23B ; So # [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
@ -3872,9 +3935,9 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1F300..1F3FA ; So # [251] CYCLONE..AMPHORA
1F400..1F6D4 ; So # [725] RAT..PAGODA
1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F8 ; So # [9] SATELLITE..FLYING SAUCER
1F6F0..1F6F9 ; So # [10] SATELLITE..SKATEBOARD
1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D4 ; So # [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
1F780..1F7D8 ; So # [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE
1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
1F810..1F847 ; So # [56] LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD..DOWNWARDS HEAVY ARROW
1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
@ -3882,13 +3945,16 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F900..1F90B ; So # [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F910..1F93E ; So # [47] ZIPPER-MOUTH FACE..HANDBALL
1F940..1F94C ; So # [13] WILTED FLOWER..CURLING STONE
1F950..1F96B ; So # [28] CROISSANT..CANNED FOOD
1F980..1F997 ; So # [24] CRAB..CRICKET
1F9C0 ; So # CHEESE WEDGE
1F9D0..1F9E6 ; So # [23] FACE WITH MONOCLE..SOCKS
1F940..1F970 ; So # [49] WILTED FLOWER..SMILING FACE WITH SMILING EYES AND THREE HEARTS
1F973..1F976 ; So # [4] FACE WITH PARTY HORN AND PARTY HAT..FREEZING FACE
1F97A ; So # FACE WITH PLEADING EYES
1F97C..1F9A2 ; So # [39] LAB COAT..SWAN
1F9B0..1F9B9 ; So # [10] EMOJI COMPONENT RED HAIR..SUPERVILLAIN
1F9C0..1F9C2 ; So # [3] CHEESE WEDGE..SALT SHAKER
1F9D0..1F9FF ; So # [48] FACE WITH MONOCLE..NAZAR AMULET
1FA60..1FA6D ; So # [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER
# Total code points: 5855
# Total code points: 5984
# ================================================

View File

@ -1,6 +1,6 @@
# GraphemeBreakProperty-10.0.0.txt
# Date: 2017-03-12, 07:03:41 GMT
# © 2017 Unicode®, Inc.
# GraphemeBreakProperty-11.0.0.txt
# Date: 2018-03-16, 20:34:02 GMT
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
@ -24,12 +24,13 @@
08E2 ; Prepend # Cf ARABIC DISPUTED END OF AYAH
0D4E ; Prepend # Lo MALAYALAM LETTER DOT REPH
110BD ; Prepend # Cf KAITHI NUMBER SIGN
110CD ; Prepend # Cf KAITHI NUMBER SIGN ABOVE
111C2..111C3 ; Prepend # Lo [2] SHARADA SIGN JIHVAMULIYA..SHARADA SIGN UPADHMANIYA
11A3A ; Prepend # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A86..11A89 ; Prepend # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11D46 ; Prepend # Lo MASARAM GONDI REPHA
# Total code points: 19
# Total code points: 20
# ================================================
@ -95,12 +96,13 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0730..074A ; Extend # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
07A6..07B0 ; Extend # Mn [11] THAANA ABAFILI..THAANA SUKUN
07EB..07F3 ; Extend # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE
07FD ; Extend # Mn NKO DANTAYALAN
0816..0819 ; Extend # Mn [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH
081B..0823 ; Extend # Mn [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A
0825..0827 ; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
0829..082D ; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
0859..085B ; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
08D4..08E1 ; Extend # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08D3..08E1 ; Extend # Mn [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA
08E3..0902 ; Extend # Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
093A ; Extend # Mn DEVANAGARI VOWEL SIGN OE
093C ; Extend # Mn DEVANAGARI SIGN NUKTA
@ -115,6 +117,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
09CD ; Extend # Mn BENGALI SIGN VIRAMA
09D7 ; Extend # Mc BENGALI AU LENGTH MARK
09E2..09E3 ; Extend # Mn [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL
09FE ; Extend # Mn BENGALI SANDHI MARK
0A01..0A02 ; Extend # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI
0A3C ; Extend # Mn GURMUKHI SIGN NUKTA
0A41..0A42 ; Extend # Mn [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU
@ -145,6 +148,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0BCD ; Extend # Mn TAMIL SIGN VIRAMA
0BD7 ; Extend # Mc TAMIL AU LENGTH MARK
0C00 ; Extend # Mn TELUGU SIGN COMBINING CANDRABINDU ABOVE
0C04 ; Extend # Mn TELUGU SIGN COMBINING ANUSVARA ABOVE
0C3E..0C40 ; Extend # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II
0C46..0C48 ; Extend # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
0C4A..0C4D ; Extend # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
@ -273,6 +277,7 @@ A80B ; Extend # Mn SYLOTI NAGRI SIGN ANUSVARA
A825..A826 ; Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A8C4..A8C5 ; Extend # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8E0..A8F1 ; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
A8FF ; Extend # Mn DEVANAGARI VOWEL SIGN AY
A926..A92D ; Extend # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
A947..A951 ; Extend # Mn [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
A980..A982 ; Extend # Mn [3] JAVANESE SIGN PANYANGGA..JAVANESE SIGN LAYAR
@ -309,6 +314,8 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
10A38..10A3A ; Extend # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
10A3F ; Extend # Mn KHAROSHTHI VIRAMA
10AE5..10AE6 ; Extend # Mn [2] MANICHAEAN ABBREVIATION MARK ABOVE..MANICHAEAN ABBREVIATION MARK BELOW
10D24..10D27 ; Extend # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
10F46..10F50 ; Extend # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
11001 ; Extend # Mn BRAHMI SIGN ANUSVARA
11038..11046 ; Extend # Mn [15] BRAHMI VOWEL SIGN AA..BRAHMI VIRAMA
1107F..11081 ; Extend # Mn [3] BRAHMI NUMBER JOINER..KAITHI SIGN ANUSVARA
@ -320,7 +327,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
11173 ; Extend # Mn MAHAJANI SIGN NUKTA
11180..11181 ; Extend # Mn [2] SHARADA SIGN CANDRABINDU..SHARADA SIGN ANUSVARA
111B6..111BE ; Extend # Mn [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O
111CA..111CC ; Extend # Mn [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK
111C9..111CC ; Extend # Mn [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK
1122F..11231 ; Extend # Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
11234 ; Extend # Mn KHOJKI SIGN ANUSVARA
11236..11237 ; Extend # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
@ -328,7 +335,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
112DF ; Extend # Mn KHUDAWADI SIGN ANUSVARA
112E3..112EA ; Extend # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
11300..11301 ; Extend # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
1133C ; Extend # Mn GRANTHA SIGN NUKTA
1133B..1133C ; Extend # Mn [2] COMBINING BINDU BELOW..GRANTHA SIGN NUKTA
1133E ; Extend # Mc GRANTHA VOWEL SIGN AA
11340 ; Extend # Mn GRANTHA VOWEL SIGN II
11357 ; Extend # Mc GRANTHA AU LENGTH MARK
@ -337,6 +344,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
11438..1143F ; Extend # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11442..11444 ; Extend # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11446 ; Extend # Mn NEWA SIGN NUKTA
1145E ; Extend # Mn NEWA SANDHI MARK
114B0 ; Extend # Mc TIRHUTA VOWEL SIGN AA
114B3..114B8 ; Extend # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
114BA ; Extend # Mn TIRHUTA VOWEL SIGN SHORT E
@ -358,8 +366,9 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1171D..1171F ; Extend # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11722..11725 ; Extend # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11727..1172B ; Extend # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11A01..11A06 ; Extend # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A09..11A0A ; Extend # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
1182F..11837 ; Extend # Mn [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA
11839..1183A ; Extend # Mn [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA
11A01..11A0A ; Extend # Mn [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A33..11A38 ; Extend # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A3B..11A3E ; Extend # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A47 ; Extend # Mn ZANABAZAR SQUARE SUBJOINER
@ -379,6 +388,10 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
11D3C..11D3D ; Extend # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Extend # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D47 ; Extend # Mn MASARAM GONDI RA-KARA
11D90..11D91 ; Extend # Mn [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI
11D95 ; Extend # Mn GUNJALA GONDI SIGN ANUSVARA
11D97 ; Extend # Mn GUNJALA GONDI VIRAMA
11EF3..11EF4 ; Extend # Mn [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U
16AF0..16AF4 ; Extend # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16B30..16B36 ; Extend # Mn [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
16F8F..16F92 ; Extend # Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -403,10 +416,11 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1E026..1E02A ; Extend # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
1E8D0..1E8D6 ; Extend # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
1E944..1E94A ; Extend # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
1F3FB..1F3FF ; Extend # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
E0020..E007F ; Extend # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 1901
# Total code points: 1948
# ================================================
@ -517,6 +531,7 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
110B0..110B2 ; SpacingMark # Mc [3] KAITHI VOWEL SIGN AA..KAITHI VOWEL SIGN II
110B7..110B8 ; SpacingMark # Mc [2] KAITHI VOWEL SIGN O..KAITHI VOWEL SIGN AU
1112C ; SpacingMark # Mc CHAKMA VOWEL SIGN E
11145..11146 ; SpacingMark # Mc [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI
11182 ; SpacingMark # Mc SHARADA SIGN VISARGA
111B3..111B5 ; SpacingMark # Mc [3] SHARADA VOWEL SIGN AA..SHARADA VOWEL SIGN II
111BF..111C0 ; SpacingMark # Mc [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA
@ -549,7 +564,8 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
116B6 ; SpacingMark # Mc TAKRI SIGN VIRAMA
11720..11721 ; SpacingMark # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11726 ; SpacingMark # Mc AHOM VOWEL SIGN E
11A07..11A08 ; SpacingMark # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
1182C..1182E ; SpacingMark # Mc [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II
11838 ; SpacingMark # Mc DOGRA SIGN VISARGA
11A39 ; SpacingMark # Mc ZANABAZAR SQUARE SIGN VISARGA
11A57..11A58 ; SpacingMark # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A97 ; SpacingMark # Mc SOYOMBO SIGN VISARGA
@ -558,11 +574,15 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
11CA9 ; SpacingMark # Mc MARCHEN SUBJOINED LETTER YA
11CB1 ; SpacingMark # Mc MARCHEN VOWEL SIGN I
11CB4 ; SpacingMark # Mc MARCHEN VOWEL SIGN O
11D8A..11D8E ; SpacingMark # Mc [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU
11D93..11D94 ; SpacingMark # Mc [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU
11D96 ; SpacingMark # Mc GUNJALA GONDI SIGN VISARGA
11EF5..11EF6 ; SpacingMark # Mc [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O
16F51..16F7E ; SpacingMark # Mc [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
1D166 ; SpacingMark # Mc MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
1D16D ; SpacingMark # Mc MUSICAL SYMBOL COMBINING AUGMENTATION DOT
# Total code points: 348
# Total code points: 362
# ================================================
@ -1395,81 +1415,8 @@ D789..D7A3 ; LVT # Lo [27] HANGUL SYLLABLE HIG..HANGUL SYLLABLE HIH
# ================================================
261D ; E_Base # So WHITE UP POINTING INDEX
26F9 ; E_Base # So PERSON WITH BALL
270A..270D ; E_Base # So [4] RAISED FIST..WRITING HAND
1F385 ; E_Base # So FATHER CHRISTMAS
1F3C2..1F3C4 ; E_Base # So [3] SNOWBOARDER..SURFER
1F3C7 ; E_Base # So HORSE RACING
1F3CA..1F3CC ; E_Base # So [3] SWIMMER..GOLFER
1F442..1F443 ; E_Base # So [2] EAR..NOSE
1F446..1F450 ; E_Base # So [11] WHITE UP POINTING BACKHAND INDEX..OPEN HANDS SIGN
1F46E ; E_Base # So POLICE OFFICER
1F470..1F478 ; E_Base # So [9] BRIDE WITH VEIL..PRINCESS
1F47C ; E_Base # So BABY ANGEL
1F481..1F483 ; E_Base # So [3] INFORMATION DESK PERSON..DANCER
1F485..1F487 ; E_Base # So [3] NAIL POLISH..HAIRCUT
1F4AA ; E_Base # So FLEXED BICEPS
1F574..1F575 ; E_Base # So [2] MAN IN BUSINESS SUIT LEVITATING..SLEUTH OR SPY
1F57A ; E_Base # So MAN DANCING
1F590 ; E_Base # So RAISED HAND WITH FINGERS SPLAYED
1F595..1F596 ; E_Base # So [2] REVERSED HAND WITH MIDDLE FINGER EXTENDED..RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS
1F645..1F647 ; E_Base # So [3] FACE WITH NO GOOD GESTURE..PERSON BOWING DEEPLY
1F64B..1F64F ; E_Base # So [5] HAPPY PERSON RAISING ONE HAND..PERSON WITH FOLDED HANDS
1F6A3 ; E_Base # So ROWBOAT
1F6B4..1F6B6 ; E_Base # So [3] BICYCLIST..PEDESTRIAN
1F6C0 ; E_Base # So BATH
1F6CC ; E_Base # So SLEEPING ACCOMMODATION
1F918..1F91C ; E_Base # So [5] SIGN OF THE HORNS..RIGHT-FACING FIST
1F91E..1F91F ; E_Base # So [2] HAND WITH INDEX AND MIDDLE FINGERS CROSSED..I LOVE YOU HAND SIGN
1F926 ; E_Base # So FACE PALM
1F930..1F939 ; E_Base # So [10] PREGNANT WOMAN..JUGGLING
1F93D..1F93E ; E_Base # So [2] WATER POLO..HANDBALL
1F9D1..1F9DD ; E_Base # So [13] ADULT..ELF
# Total code points: 98
# ================================================
1F3FB..1F3FF ; E_Modifier # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
# Total code points: 5
# ================================================
200D ; ZWJ # Cf ZERO WIDTH JOINER
# Total code points: 1
# ================================================
2640 ; Glue_After_Zwj # So FEMALE SIGN
2642 ; Glue_After_Zwj # So MALE SIGN
2695..2696 ; Glue_After_Zwj # So [2] STAFF OF AESCULAPIUS..SCALES
2708 ; Glue_After_Zwj # So AIRPLANE
2764 ; Glue_After_Zwj # So HEAVY BLACK HEART
1F308 ; Glue_After_Zwj # So RAINBOW
1F33E ; Glue_After_Zwj # So EAR OF RICE
1F373 ; Glue_After_Zwj # So COOKING
1F393 ; Glue_After_Zwj # So GRADUATION CAP
1F3A4 ; Glue_After_Zwj # So MICROPHONE
1F3A8 ; Glue_After_Zwj # So ARTIST PALETTE
1F3EB ; Glue_After_Zwj # So SCHOOL
1F3ED ; Glue_After_Zwj # So FACTORY
1F48B ; Glue_After_Zwj # So KISS MARK
1F4BB..1F4BC ; Glue_After_Zwj # So [2] PERSONAL COMPUTER..BRIEFCASE
1F527 ; Glue_After_Zwj # So WRENCH
1F52C ; Glue_After_Zwj # So MICROSCOPE
1F5E8 ; Glue_After_Zwj # So LEFT SPEECH BUBBLE
1F680 ; Glue_After_Zwj # So ROCKET
1F692 ; Glue_After_Zwj # So FIRE ENGINE
# Total code points: 22
# ================================================
1F466..1F469 ; E_Base_GAZ # So [4] BOY..WOMAN
# Total code points: 4
# EOF

View File

@ -1,6 +1,6 @@
# Scripts-10.0.0.txt
# Date: 2017-03-11, 06:40:37 GMT
# © 2017 Unicode®, Inc.
# Scripts-11.0.0.txt
# Date: 2018-02-21, 05:34:31 GMT
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
@ -308,10 +308,8 @@
2B47..2B4C ; Common # Sm [6] REVERSE TILDE OPERATOR ABOVE RIGHTWARDS ARROW..RIGHTWARDS ARROW ABOVE REVERSE TILDE OPERATOR
2B4D..2B73 ; Common # So [39] DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW..DOWNWARDS TRIANGLE-HEADED ARROW TO BAR
2B76..2B95 ; Common # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BB9 ; Common # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8 ; Common # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD2 ; Common # So [9] TOP HALF BLACK CIRCLE..GROUP MARK
2BEC..2BEF ; Common # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2B98..2BC8 ; Common # So [49] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BFE ; Common # So [53] TOP HALF BLACK CIRCLE..REVERSED RIGHT ANGLE
2E00..2E01 ; Common # Po [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER
2E02 ; Common # Pi LEFT SUBSTITUTION BRACKET
2E03 ; Common # Pf RIGHT SUBSTITUTION BRACKET
@ -349,7 +347,7 @@
2E40 ; Common # Pd DOUBLE HYPHEN
2E41 ; Common # Po REVERSED COMMA
2E42 ; Common # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK
2E43..2E49 ; Common # Po [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
2E43..2E4E ; Common # Po [12] DASH WITH LEFT UPTURN..PUNCTUS ELEVATUS MARK
2FF0..2FFB ; Common # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
3000 ; Common # Zs IDEOGRAPHIC SPACE
3001..3003 ; Common # Po [3] IDEOGRAPHIC COMMA..DITTO MARK
@ -522,8 +520,9 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1D183..1D184 ; Common # So [2] MUSICAL SYMBOL ARPEGGIATO UP..MUSICAL SYMBOL ARPEGGIATO DOWN
1D18C..1D1A9 ; Common # So [30] MUSICAL SYMBOL RINFORZANDO..MUSICAL SYMBOL DEGREE SLASH
1D1AE..1D1E8 ; Common # So [59] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KIEVAN FLAT SIGN
1D2E0..1D2F3 ; Common # No [20] MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN
1D300..1D356 ; Common # So [87] MONOGRAM FOR EARTH..TETRAGRAM FOR FOSTERING
1D360..1D371 ; Common # No [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
1D360..1D378 ; Common # No [25] COUNTING ROD UNIT DIGIT ONE..TALLY MARK FIVE
1D400..1D454 ; Common # L& [85] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL ITALIC SMALL G
1D456..1D49C ; Common # L& [71] MATHEMATICAL ITALIC SMALL I..MATHEMATICAL SCRIPT CAPITAL A
1D49E..1D49F ; Common # L& [2] MATHEMATICAL SCRIPT CAPITAL C..MATHEMATICAL SCRIPT CAPITAL D
@ -565,6 +564,11 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1D7C3 ; Common # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL
1D7C4..1D7CB ; Common # L& [8] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL BOLD SMALL DIGAMMA
1D7CE..1D7FF ; Common # Nd [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
1EC71..1ECAB ; Common # No [59] INDIC SIYAQ NUMBER ONE..INDIC SIYAQ NUMBER PREFIXED NINE
1ECAC ; Common # So INDIC SIYAQ PLACEHOLDER
1ECAD..1ECAF ; Common # No [3] INDIC SIYAQ FRACTION ONE QUARTER..INDIC SIYAQ FRACTION THREE QUARTERS
1ECB0 ; Common # Sc INDIC SIYAQ RUPEE MARK
1ECB1..1ECB4 ; Common # No [4] INDIC SIYAQ NUMBER ALTERNATE ONE..INDIC SIYAQ ALTERNATE LAKH MARK
1F000..1F02B ; Common # So [44] MAHJONG TILE EAST WIND..MAHJONG TILE BACK
1F030..1F093 ; Common # So [100] DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06
1F0A0..1F0AE ; Common # So [15] PLAYING CARD BACK..PLAYING CARD KING OF SPADES
@ -572,8 +576,7 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F0C1..1F0CF ; Common # So [15] PLAYING CARD ACE OF DIAMONDS..PLAYING CARD BLACK JOKER
1F0D1..1F0F5 ; Common # So [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
1F100..1F10C ; Common # No [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
1F110..1F12E ; Common # So [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
1F130..1F16B ; Common # So [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F110..1F16B ; Common # So [92] PARENTHESIZED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F170..1F1AC ; Common # So [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
1F1E6..1F1FF ; Common # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F201..1F202 ; Common # So [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA
@ -585,9 +588,9 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
1F400..1F6D4 ; Common # So [725] RAT..PAGODA
1F6E0..1F6EC ; Common # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F8 ; Common # So [9] SATELLITE..FLYING SAUCER
1F6F0..1F6F9 ; Common # So [10] SATELLITE..SKATEBOARD
1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D4 ; Common # So [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
1F780..1F7D8 ; Common # So [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE
1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
1F810..1F847 ; Common # So [56] LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD..DOWNWARDS HEAVY ARROW
1F850..1F859 ; Common # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
@ -595,15 +598,18 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F890..1F8AD ; Common # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F900..1F90B ; Common # So [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F910..1F93E ; Common # So [47] ZIPPER-MOUTH FACE..HANDBALL
1F940..1F94C ; Common # So [13] WILTED FLOWER..CURLING STONE
1F950..1F96B ; Common # So [28] CROISSANT..CANNED FOOD
1F980..1F997 ; Common # So [24] CRAB..CRICKET
1F9C0 ; Common # So CHEESE WEDGE
1F9D0..1F9E6 ; Common # So [23] FACE WITH MONOCLE..SOCKS
1F940..1F970 ; Common # So [49] WILTED FLOWER..SMILING FACE WITH SMILING EYES AND THREE HEARTS
1F973..1F976 ; Common # So [4] FACE WITH PARTY HORN AND PARTY HAT..FREEZING FACE
1F97A ; Common # So FACE WITH PLEADING EYES
1F97C..1F9A2 ; Common # So [39] LAB COAT..SWAN
1F9B0..1F9B9 ; Common # So [10] EMOJI COMPONENT RED HAIR..SUPERVILLAIN
1F9C0..1F9C2 ; Common # So [3] CHEESE WEDGE..SALT SHAKER
1F9D0..1F9FF ; Common # So [48] FACE WITH MONOCLE..NAZAR AMULET
1FA60..1FA6D ; Common # So [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER
E0001 ; Common # Cf LANGUAGE TAG
E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG
# Total code points: 7363
# Total code points: 7591
# ================================================
@ -646,8 +652,7 @@ A770 ; Latin # Lm MODIFIER LETTER US
A771..A787 ; Latin # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T
A78B..A78E ; Latin # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
A78F ; Latin # Lo LATIN LETTER SINOLOGICAL DOT
A790..A7AE ; Latin # L& [31] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0..A7B7 ; Latin # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA
A790..A7B9 ; Latin # L& [42] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN SMALL LETTER U WITH STROKE
A7F7 ; Latin # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I
A7F8..A7F9 ; Latin # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
A7FA ; Latin # L& LATIN LETTER SMALL CAPITAL TURNED M
@ -659,7 +664,7 @@ FB00..FB06 ; Latin # L& [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE S
FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
# Total code points: 1350
# Total code points: 1353
# ================================================
@ -753,13 +758,13 @@ FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBININ
0531..0556 ; Armenian # L& [38] ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITAL LETTER FEH
0559 ; Armenian # Lm ARMENIAN MODIFIER LETTER LEFT HALF RING
055A..055F ; Armenian # Po [6] ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION MARK
0561..0587 ; Armenian # L& [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
0560..0588 ; Armenian # L& [41] ARMENIAN SMALL LETTER TURNED AYB..ARMENIAN SMALL LETTER YI WITH STROKE
058A ; Armenian # Pd ARMENIAN HYPHEN
058D..058E ; Armenian # So [2] RIGHT-FACING ARMENIAN ETERNITY SIGN..LEFT-FACING ARMENIAN ETERNITY SIGN
058F ; Armenian # Sc ARMENIAN DRAM SIGN
FB13..FB17 ; Armenian # L& [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
# Total code points: 93
# Total code points: 95
# ================================================
@ -773,7 +778,7 @@ FB13..FB17 ; Armenian # L& [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SM
05C6 ; Hebrew # Po HEBREW PUNCTUATION NUN HAFUKHA
05C7 ; Hebrew # Mn HEBREW POINT QAMATS QATAN
05D0..05EA ; Hebrew # Lo [27] HEBREW LETTER ALEF..HEBREW LETTER TAV
05F0..05F2 ; Hebrew # Lo [3] HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW LIGATURE YIDDISH DOUBLE YOD
05EF..05F2 ; Hebrew # Lo [4] HEBREW YOD TRIANGLE..HEBREW LIGATURE YIDDISH DOUBLE YOD
05F3..05F4 ; Hebrew # Po [2] HEBREW PUNCTUATION GERESH..HEBREW PUNCTUATION GERSHAYIM
FB1D ; Hebrew # Lo HEBREW LETTER YOD WITH HIRIQ
FB1E ; Hebrew # Mn HEBREW POINT JUDEO-SPANISH VARIKA
@ -786,7 +791,7 @@ FB40..FB41 ; Hebrew # Lo [2] HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER S
FB43..FB44 ; Hebrew # Lo [2] HEBREW LETTER FINAL PE WITH DAGESH..HEBREW LETTER PE WITH DAGESH
FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATURE ALEF LAMED
# Total code points: 133
# Total code points: 134
# ================================================
@ -823,7 +828,7 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
0750..077F ; Arabic # Lo [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE
08A0..08B4 ; Arabic # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08B6..08BD ; Arabic # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
08D4..08E1 ; Arabic # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08D3..08E1 ; Arabic # Mn [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA
08E3..08FF ; Arabic # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
FB50..FBB1 ; Arabic # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
FBB2..FBC1 ; Arabic # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW
@ -871,7 +876,7 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN
1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL
# Total code points: 1280
# Total code points: 1281
# ================================================
@ -921,9 +926,10 @@ A8F2..A8F7 ; Devanagari # Lo [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVAN
A8F8..A8FA ; Devanagari # Po [3] DEVANAGARI SIGN PUSHPIKA..DEVANAGARI CARET
A8FB ; Devanagari # Lo DEVANAGARI HEADSTROKE
A8FC ; Devanagari # Po DEVANAGARI SIGN SIDDHAM
A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
A8FD..A8FE ; Devanagari # Lo [2] DEVANAGARI JAIN OM..DEVANAGARI LETTER AY
A8FF ; Devanagari # Mn DEVANAGARI VOWEL SIGN AY
# Total code points: 154
# Total code points: 156
# ================================================
@ -956,8 +962,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
09FB ; Bengali # Sc BENGALI GANDA MARK
09FC ; Bengali # Lo BENGALI LETTER VEDIC ANUSVARA
09FD ; Bengali # Po BENGALI ABBREVIATION SIGN
09FE ; Bengali # Mn BENGALI SANDHI MARK
# Total code points: 95
# Total code points: 96
# ================================================
@ -982,8 +989,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0A70..0A71 ; Gurmukhi # Mn [2] GURMUKHI TIPPI..GURMUKHI ADDAK
0A72..0A74 ; Gurmukhi # Lo [3] GURMUKHI IRI..GURMUKHI EK ONKAR
0A75 ; Gurmukhi # Mn GURMUKHI SIGN YAKASH
0A76 ; Gurmukhi # Po GURMUKHI ABBREVIATION SIGN
# Total code points: 79
# Total code points: 80
# ================================================
@ -1078,6 +1086,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0C00 ; Telugu # Mn TELUGU SIGN COMBINING CANDRABINDU ABOVE
0C01..0C03 ; Telugu # Mc [3] TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA
0C04 ; Telugu # Mn TELUGU SIGN COMBINING ANUSVARA ABOVE
0C05..0C0C ; Telugu # Lo [8] TELUGU LETTER A..TELUGU LETTER VOCALIC L
0C0E..0C10 ; Telugu # Lo [3] TELUGU LETTER E..TELUGU LETTER AI
0C12..0C28 ; Telugu # Lo [23] TELUGU LETTER O..TELUGU LETTER NA
@ -1095,13 +1104,14 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0C78..0C7E ; Telugu # No [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR
0C7F ; Telugu # So TELUGU SIGN TUUMU
# Total code points: 96
# Total code points: 97
# ================================================
0C80 ; Kannada # Lo KANNADA SIGN SPACING CANDRABINDU
0C81 ; Kannada # Mn KANNADA SIGN CANDRABINDU
0C82..0C83 ; Kannada # Mc [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
0C84 ; Kannada # Po KANNADA SIGN SIDDHAM
0C85..0C8C ; Kannada # Lo [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
0C8E..0C90 ; Kannada # Lo [3] KANNADA LETTER E..KANNADA LETTER AI
0C92..0CA8 ; Kannada # Lo [23] KANNADA LETTER O..KANNADA LETTER NA
@ -1123,7 +1133,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
# Total code points: 88
# Total code points: 89
# ================================================
@ -1317,14 +1327,16 @@ AA7E..AA7F ; Myanmar # Lo [2] MYANMAR LETTER SHWE PALAUNG CHA..MYANMAR LETT
10A0..10C5 ; Georgian # L& [38] GEORGIAN CAPITAL LETTER AN..GEORGIAN CAPITAL LETTER HOE
10C7 ; Georgian # L& GEORGIAN CAPITAL LETTER YN
10CD ; Georgian # L& GEORGIAN CAPITAL LETTER AEN
10D0..10FA ; Georgian # Lo [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN
10D0..10FA ; Georgian # L& [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN
10FC ; Georgian # Lm MODIFIER LETTER GEORGIAN NAR
10FD..10FF ; Georgian # Lo [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN
10FD..10FF ; Georgian # L& [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN
1C90..1CBA ; Georgian # L& [43] GEORGIAN MTAVRULI CAPITAL LETTER AN..GEORGIAN MTAVRULI CAPITAL LETTER AIN
1CBD..1CBF ; Georgian # L& [3] GEORGIAN MTAVRULI CAPITAL LETTER AEN..GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN
2D00..2D25 ; Georgian # L& [38] GEORGIAN SMALL LETTER AN..GEORGIAN SMALL LETTER HOE
2D27 ; Georgian # L& GEORGIAN SMALL LETTER YN
2D2D ; Georgian # L& GEORGIAN SMALL LETTER AEN
# Total code points: 127
# Total code points: 173
# ================================================
@ -1453,7 +1465,7 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
1810..1819 ; Mongolian # Nd [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE
1820..1842 ; Mongolian # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1843 ; Mongolian # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN
1844..1877 ; Mongolian # Lo [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
1844..1878 ; Mongolian # Lo [53] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER CHA WITH TWO DOTS
1880..1884 ; Mongolian # Lo [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
1885..1886 ; Mongolian # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
1887..18A8 ; Mongolian # Lo [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
@ -1461,7 +1473,7 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
18AA ; Mongolian # Lo MONGOLIAN LETTER MANCHU ALI GALI LHA
11660..1166C ; Mongolian # Po [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
# Total code points: 166
# Total code points: 167
# ================================================
@ -1490,10 +1502,10 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
# ================================================
02EA..02EB ; Bopomofo # Sk [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK
3105..312E ; Bopomofo # Lo [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
3105..312F ; Bopomofo # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
31A0..31BA ; Bopomofo # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
# Total code points: 71
# Total code points: 72
# ================================================
@ -1506,7 +1518,7 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
3038..303A ; Han # Nl [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY
303B ; Han # Lm VERTICAL IDEOGRAPHIC ITERATION MARK
3400..4DB5 ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4E00..9FEA ; Han # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
4E00..9FEF ; Han # Lo [20976] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEF
F900..FA6D ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D
FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
20000..2A6D6 ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
@ -1516,7 +1528,7 @@ FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILI
2CEB0..2EBE0 ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
# Total code points: 89228
# Total code points: 89233
# ================================================
@ -1579,13 +1591,14 @@ FE00..FE0F ; Inherited # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
FE20..FE2D ; Inherited # Mn [14] COMBINING LIGATURE LEFT HALF..COMBINING CONJOINING MACRON BELOW
101FD ; Inherited # Mn PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE
102E0 ; Inherited # Mn COPTIC EPACT THOUSANDS MARK
1133B ; Inherited # Mn COMBINING BINDU BELOW
1D167..1D169 ; Inherited # Mn [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3
1D17B..1D182 ; Inherited # Mn [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE
1D185..1D18B ; Inherited # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE
1D1AA..1D1AD ; Inherited # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 568
# Total code points: 569
# ================================================
@ -1778,13 +1791,13 @@ A828..A82B ; Syloti_Nagri # So [4] SYLOTI NAGRI POETRY MARK-1..SYLOTI NAGRI
10A0C..10A0F ; Kharoshthi # Mn [4] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI SIGN VISARGA
10A10..10A13 ; Kharoshthi # Lo [4] KHAROSHTHI LETTER KA..KHAROSHTHI LETTER GHA
10A15..10A17 ; Kharoshthi # Lo [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA
10A19..10A33 ; Kharoshthi # Lo [27] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTTHA
10A19..10A35 ; Kharoshthi # Lo [29] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER VHA
10A38..10A3A ; Kharoshthi # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
10A3F ; Kharoshthi # Mn KHAROSHTHI VIRAMA
10A40..10A47 ; Kharoshthi # No [8] KHAROSHTHI DIGIT ONE..KHAROSHTHI NUMBER ONE THOUSAND
10A40..10A48 ; Kharoshthi # No [9] KHAROSHTHI DIGIT ONE..KHAROSHTHI FRACTION ONE HALF
10A50..10A58 ; Kharoshthi # Po [9] KHAROSHTHI PUNCTUATION DOT..KHAROSHTHI PUNCTUATION LINES
# Total code points: 65
# Total code points: 68
# ================================================
@ -1841,8 +1854,10 @@ A874..A877 ; Phags_Pa # Po [4] PHAGS-PA SINGLE HEAD MARK..PHAGS-PA MARK DOU
07F6 ; Nko # So NKO SYMBOL OO DENNEN
07F7..07F9 ; Nko # Po [3] NKO SYMBOL GBAKURUNEN..NKO EXCLAMATION MARK
07FA ; Nko # Lm NKO LAJANYALAN
07FD ; Nko # Mn NKO DANTAYALAN
07FE..07FF ; Nko # Sc [2] NKO DOROME SIGN..NKO TAMAN SIGN
# Total code points: 59
# Total code points: 62
# ================================================
@ -2137,8 +2152,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
110BB..110BC ; Kaithi # Po [2] KAITHI ABBREVIATION SIGN..KAITHI ENUMERATION SIGN
110BD ; Kaithi # Cf KAITHI NUMBER SIGN
110BE..110C1 ; Kaithi # Po [4] KAITHI SECTION MARK..KAITHI DOUBLE DANDA
110CD ; Kaithi # Cf KAITHI NUMBER SIGN ABOVE
# Total code points: 66
# Total code points: 67
# ================================================
@ -2186,8 +2202,10 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
1112D..11134 ; Chakma # Mn [8] CHAKMA VOWEL SIGN AI..CHAKMA MAAYYAA
11136..1113F ; Chakma # Nd [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE
11140..11143 ; Chakma # Po [4] CHAKMA SECTION MARK..CHAKMA QUESTION MARK
11144 ; Chakma # Lo CHAKMA LETTER LHAA
11145..11146 ; Chakma # Mc [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI
# Total code points: 67
# Total code points: 70
# ================================================
@ -2224,8 +2242,8 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
111B6..111BE ; Sharada # Mn [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O
111BF..111C0 ; Sharada # Mc [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA
111C1..111C4 ; Sharada # Lo [4] SHARADA SIGN AVAGRAHA..SHARADA OM
111C5..111C9 ; Sharada # Po [5] SHARADA DANDA..SHARADA SANDHI MARK
111CA..111CC ; Sharada # Mn [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK
111C5..111C8 ; Sharada # Po [4] SHARADA DANDA..SHARADA SEPARATOR
111C9..111CC ; Sharada # Mn [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK
111CD ; Sharada # Po SHARADA SUTRA MARK
111D0..111D9 ; Sharada # Nd [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
111DA ; Sharada # Lo SHARADA EKAM
@ -2502,7 +2520,7 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
# ================================================
11700..11719 ; Ahom # Lo [26] AHOM LETTER KA..AHOM LETTER JHA
11700..1171A ; Ahom # Lo [27] AHOM LETTER KA..AHOM LETTER ALTERNATE BA
1171D..1171F ; Ahom # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11720..11721 ; Ahom # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11722..11725 ; Ahom # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
@ -2513,7 +2531,7 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
1173C..1173E ; Ahom # Po [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
1173F ; Ahom # So AHOM SYMBOL VI
# Total code points: 57
# Total code points: 58
# ================================================
@ -2618,8 +2636,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
11450..11459 ; Newa # Nd [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
1145B ; Newa # Po NEWA PLACEHOLDER MARK
1145D ; Newa # Po NEWA INSERTION SIGN
1145E ; Newa # Mn NEWA SANDHI MARK
# Total code points: 92
# Total code points: 93
# ================================================
@ -2631,10 +2650,10 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
# ================================================
16FE0 ; Tangut # Lm TANGUT ITERATION MARK
17000..187EC ; Tangut # Lo [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
17000..187F1 ; Tangut # Lo [6130] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187F1
18800..18AF2 ; Tangut # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
# Total code points: 6881
# Total code points: 6886
# ================================================
@ -2670,16 +2689,15 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
11A97 ; Soyombo # Mc SOYOMBO SIGN VISARGA
11A98..11A99 ; Soyombo # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11A9A..11A9C ; Soyombo # Po [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9D ; Soyombo # Lo SOYOMBO MARK PLUTA
11A9E..11AA2 ; Soyombo # Po [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
# Total code points: 80
# Total code points: 81
# ================================================
11A00 ; Zanabazar_Square # Lo ZANABAZAR SQUARE LETTER A
11A01..11A06 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A07..11A08 ; Zanabazar_Square # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A09..11A0A ; Zanabazar_Square # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A01..11A0A ; Zanabazar_Square # Mn [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A0B..11A32 ; Zanabazar_Square # Lo [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
11A33..11A38 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A39 ; Zanabazar_Square # Mc ZANABAZAR SQUARE SIGN VISARGA
@ -2690,4 +2708,73 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
# Total code points: 72
# ================================================
11800..1182B ; Dogra # Lo [44] DOGRA LETTER A..DOGRA LETTER RRA
1182C..1182E ; Dogra # Mc [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II
1182F..11837 ; Dogra # Mn [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA
11838 ; Dogra # Mc DOGRA SIGN VISARGA
11839..1183A ; Dogra # Mn [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA
1183B ; Dogra # Po DOGRA ABBREVIATION SIGN
# Total code points: 60
# ================================================
11D60..11D65 ; Gunjala_Gondi # Lo [6] GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER UU
11D67..11D68 ; Gunjala_Gondi # Lo [2] GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTER AI
11D6A..11D89 ; Gunjala_Gondi # Lo [32] GUNJALA GONDI LETTER OO..GUNJALA GONDI LETTER SA
11D8A..11D8E ; Gunjala_Gondi # Mc [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU
11D90..11D91 ; Gunjala_Gondi # Mn [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI
11D93..11D94 ; Gunjala_Gondi # Mc [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU
11D95 ; Gunjala_Gondi # Mn GUNJALA GONDI SIGN ANUSVARA
11D96 ; Gunjala_Gondi # Mc GUNJALA GONDI SIGN VISARGA
11D97 ; Gunjala_Gondi # Mn GUNJALA GONDI VIRAMA
11D98 ; Gunjala_Gondi # Lo GUNJALA GONDI OM
11DA0..11DA9 ; Gunjala_Gondi # Nd [10] GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGIT NINE
# Total code points: 63
# ================================================
11EE0..11EF2 ; Makasar # Lo [19] MAKASAR LETTER KA..MAKASAR ANGKA
11EF3..11EF4 ; Makasar # Mn [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U
11EF5..11EF6 ; Makasar # Mc [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O
11EF7..11EF8 ; Makasar # Po [2] MAKASAR PASSIMBANG..MAKASAR END OF SECTION
# Total code points: 25
# ================================================
16E40..16E7F ; Medefaidrin # L& [64] MEDEFAIDRIN CAPITAL LETTER M..MEDEFAIDRIN SMALL LETTER Y
16E80..16E96 ; Medefaidrin # No [23] MEDEFAIDRIN DIGIT ZERO..MEDEFAIDRIN DIGIT THREE ALTERNATE FORM
16E97..16E9A ; Medefaidrin # Po [4] MEDEFAIDRIN COMMA..MEDEFAIDRIN EXCLAMATION OH
# Total code points: 91
# ================================================
10D00..10D23 ; Hanifi_Rohingya # Lo [36] HANIFI ROHINGYA LETTER A..HANIFI ROHINGYA MARK NA KHONNA
10D24..10D27 ; Hanifi_Rohingya # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
10D30..10D39 ; Hanifi_Rohingya # Nd [10] HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA DIGIT NINE
# Total code points: 50
# ================================================
10F30..10F45 ; Sogdian # Lo [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN
10F46..10F50 ; Sogdian # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
10F51..10F54 ; Sogdian # No [4] SOGDIAN NUMBER ONE..SOGDIAN NUMBER ONE HUNDRED
10F55..10F59 ; Sogdian # Po [5] SOGDIAN PUNCTUATION TWO VERTICAL BARS..SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT
# Total code points: 42
# ================================================
10F00..10F1C ; Old_Sogdian # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F1D..10F26 ; Old_Sogdian # No [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF
10F27 ; Old_Sogdian # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
# Total code points: 40
# EOF

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,714 @@
# emoji-data.txt
# Date: 2018-02-07, 07:55:18 GMT
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Emoji Data for UTS #51
# Version: 11.0
#
# For documentation and usage, see http://www.unicode.org/reports/tr51
#
# Format:
# <codepoint(s)> ; <property> # <comments>
# Note: there is no guarantee as to the structure of whitespace or comments
#
# Characters and sequences are listed in code point order. Users should be shown a more natural order.
# See the CLDR collation order for Emoji.
# ================================================
# All omitted code points have Emoji=No
# @missing: 0000..10FFFF ; Emoji ; No
0023 ; Emoji # 1.1 [1] (#) number sign
002A ; Emoji # 1.1 [1] (*) asterisk
0030..0039 ; Emoji # 1.1 [10] (0..9) digit zero..digit nine
00A9 ; Emoji # 1.1 [1] (©️) copyright
00AE ; Emoji # 1.1 [1] (®️) registered
203C ; Emoji # 1.1 [1] (‼️) double exclamation mark
2049 ; Emoji # 3.0 [1] (⁉️) exclamation question mark
2122 ; Emoji # 1.1 [1] (™️) trade mark
2139 ; Emoji # 3.0 [1] () information
2194..2199 ; Emoji # 1.1 [6] (↔️..↙️) left-right arrow..down-left arrow
21A9..21AA ; Emoji # 1.1 [2] (↩️..↪️) right arrow curving left..left arrow curving right
231A..231B ; Emoji # 1.1 [2] (⌚..⌛) watch..hourglass done
2328 ; Emoji # 1.1 [1] (⌨️) keyboard
23CF ; Emoji # 4.0 [1] (⏏️) eject button
23E9..23F3 ; Emoji # 6.0 [11] (⏩..⏳) fast-forward button..hourglass not done
23F8..23FA ; Emoji # 7.0 [3] (⏸️..⏺️) pause button..record button
24C2 ; Emoji # 1.1 [1] (Ⓜ️) circled M
25AA..25AB ; Emoji # 1.1 [2] (▪️..▫️) black small square..white small square
25B6 ; Emoji # 1.1 [1] (▶️) play button
25C0 ; Emoji # 1.1 [1] (◀️) reverse button
25FB..25FE ; Emoji # 3.2 [4] (◻️..◾) white medium square..black medium-small square
2600..2604 ; Emoji # 1.1 [5] (☀️..☄️) sun..comet
260E ; Emoji # 1.1 [1] (☎️) telephone
2611 ; Emoji # 1.1 [1] (☑️) ballot box with check
2614..2615 ; Emoji # 4.0 [2] (☔..☕) umbrella with rain drops..hot beverage
2618 ; Emoji # 4.1 [1] (☘️) shamrock
261D ; Emoji # 1.1 [1] (☝️) index pointing up
2620 ; Emoji # 1.1 [1] (☠️) skull and crossbones
2622..2623 ; Emoji # 1.1 [2] (☢️..☣️) radioactive..biohazard
2626 ; Emoji # 1.1 [1] (☦️) orthodox cross
262A ; Emoji # 1.1 [1] (☪️) star and crescent
262E..262F ; Emoji # 1.1 [2] (☮️..☯️) peace symbol..yin yang
2638..263A ; Emoji # 1.1 [3] (☸️..☺️) wheel of dharma..smiling face
2640 ; Emoji # 1.1 [1] (♀️) female sign
2642 ; Emoji # 1.1 [1] (♂️) male sign
2648..2653 ; Emoji # 1.1 [12] (♈..♓) Aries..Pisces
265F..2660 ; Emoji # 1.1 [2] (♟️..♠️) chess pawn..spade suit
2663 ; Emoji # 1.1 [1] (♣️) club suit
2665..2666 ; Emoji # 1.1 [2] (♥️..♦️) heart suit..diamond suit
2668 ; Emoji # 1.1 [1] (♨️) hot springs
267B ; Emoji # 3.2 [1] (♻️) recycling symbol
267E..267F ; Emoji # 4.1 [2] (♾️..♿) infinity..wheelchair symbol
2692..2697 ; Emoji # 4.1 [6] (⚒️..⚗️) hammer and pick..alembic
2699 ; Emoji # 4.1 [1] (⚙️) gear
269B..269C ; Emoji # 4.1 [2] (⚛️..⚜️) atom symbol..fleur-de-lis
26A0..26A1 ; Emoji # 4.0 [2] (⚠️..⚡) warning..high voltage
26AA..26AB ; Emoji # 4.1 [2] (⚪..⚫) white circle..black circle
26B0..26B1 ; Emoji # 4.1 [2] (⚰️..⚱️) coffin..funeral urn
26BD..26BE ; Emoji # 5.2 [2] (⚽..⚾) soccer ball..baseball
26C4..26C5 ; Emoji # 5.2 [2] (⛄..⛅) snowman without snow..sun behind cloud
26C8 ; Emoji # 5.2 [1] (⛈️) cloud with lightning and rain
26CE ; Emoji # 6.0 [1] (⛎) Ophiuchus
26CF ; Emoji # 5.2 [1] (⛏️) pick
26D1 ; Emoji # 5.2 [1] (⛑️) rescue workers helmet
26D3..26D4 ; Emoji # 5.2 [2] (⛓️..⛔) chains..no entry
26E9..26EA ; Emoji # 5.2 [2] (⛩️..⛪) shinto shrine..church
26F0..26F5 ; Emoji # 5.2 [6] (⛰️..⛵) mountain..sailboat
26F7..26FA ; Emoji # 5.2 [4] (⛷️..⛺) skier..tent
26FD ; Emoji # 5.2 [1] (⛽) fuel pump
2702 ; Emoji # 1.1 [1] (✂️) scissors
2705 ; Emoji # 6.0 [1] (✅) white heavy check mark
2708..2709 ; Emoji # 1.1 [2] (✈️..✉️) airplane..envelope
270A..270B ; Emoji # 6.0 [2] (✊..✋) raised fist..raised hand
270C..270D ; Emoji # 1.1 [2] (✌️..✍️) victory hand..writing hand
270F ; Emoji # 1.1 [1] (✏️) pencil
2712 ; Emoji # 1.1 [1] (✒️) black nib
2714 ; Emoji # 1.1 [1] (✔️) heavy check mark
2716 ; Emoji # 1.1 [1] (✖️) heavy multiplication x
271D ; Emoji # 1.1 [1] (✝️) latin cross
2721 ; Emoji # 1.1 [1] (✡️) star of David
2728 ; Emoji # 6.0 [1] (✨) sparkles
2733..2734 ; Emoji # 1.1 [2] (✳️..✴️) eight-spoked asterisk..eight-pointed star
2744 ; Emoji # 1.1 [1] (❄️) snowflake
2747 ; Emoji # 1.1 [1] (❇️) sparkle
274C ; Emoji # 6.0 [1] (❌) cross mark
274E ; Emoji # 6.0 [1] (❎) cross mark button
2753..2755 ; Emoji # 6.0 [3] (❓..❕) question mark..white exclamation mark
2757 ; Emoji # 5.2 [1] (❗) exclamation mark
2763..2764 ; Emoji # 1.1 [2] (❣️..❤️) heavy heart exclamation..red heart
2795..2797 ; Emoji # 6.0 [3] (..➗) heavy plus sign..heavy division sign
27A1 ; Emoji # 1.1 [1] (➡️) right arrow
27B0 ; Emoji # 6.0 [1] (➰) curly loop
27BF ; Emoji # 6.0 [1] (➿) double curly loop
2934..2935 ; Emoji # 3.2 [2] (⤴️..⤵️) right arrow curving up..right arrow curving down
2B05..2B07 ; Emoji # 4.0 [3] (⬅️..⬇️) left arrow..down arrow
2B1B..2B1C ; Emoji # 5.1 [2] (⬛..⬜) black large square..white large square
2B50 ; Emoji # 5.1 [1] (⭐) star
2B55 ; Emoji # 5.2 [1] (⭕) heavy large circle
3030 ; Emoji # 1.1 [1] (〰️) wavy dash
303D ; Emoji # 3.2 [1] (〽️) part alternation mark
3297 ; Emoji # 1.1 [1] (㊗️) Japanese “congratulations” button
3299 ; Emoji # 1.1 [1] (㊙️) Japanese “secret” button
1F004 ; Emoji # 5.1 [1] (🀄) mahjong red dragon
1F0CF ; Emoji # 6.0 [1] (🃏) joker
1F170..1F171 ; Emoji # 6.0 [2] (🅰️..🅱️) A button (blood type)..B button (blood type)
1F17E ; Emoji # 6.0 [1] (🅾️) O button (blood type)
1F17F ; Emoji # 5.2 [1] (🅿️) P button
1F18E ; Emoji # 6.0 [1] (🆎) AB button (blood type)
1F191..1F19A ; Emoji # 6.0 [10] (🆑..🆚) CL button..VS button
1F1E6..1F1FF ; Emoji # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z
1F201..1F202 ; Emoji # 6.0 [2] (🈁..🈂️) Japanese “here” button..Japanese “service charge” button
1F21A ; Emoji # 5.2 [1] (🈚) Japanese “free of charge” button
1F22F ; Emoji # 5.2 [1] (🈯) Japanese “reserved” button
1F232..1F23A ; Emoji # 6.0 [9] (🈲..🈺) Japanese “prohibited” button..Japanese “open for business” button
1F250..1F251 ; Emoji # 6.0 [2] (🉐..🉑) Japanese “bargain” button..Japanese “acceptable” button
1F300..1F320 ; Emoji # 6.0 [33] (🌀..🌠) cyclone..shooting star
1F321 ; Emoji # 7.0 [1] (🌡️) thermometer
1F324..1F32C ; Emoji # 7.0 [9] (🌤️..🌬️) sun behind small cloud..wind face
1F32D..1F32F ; Emoji # 8.0 [3] (🌭..🌯) hot dog..burrito
1F330..1F335 ; Emoji # 6.0 [6] (🌰..🌵) chestnut..cactus
1F336 ; Emoji # 7.0 [1] (🌶️) hot pepper
1F337..1F37C ; Emoji # 6.0 [70] (🌷..🍼) tulip..baby bottle
1F37D ; Emoji # 7.0 [1] (🍽️) fork and knife with plate
1F37E..1F37F ; Emoji # 8.0 [2] (🍾..🍿) bottle with popping cork..popcorn
1F380..1F393 ; Emoji # 6.0 [20] (🎀..🎓) ribbon..graduation cap
1F396..1F397 ; Emoji # 7.0 [2] (🎖️..🎗️) military medal..reminder ribbon
1F399..1F39B ; Emoji # 7.0 [3] (🎙️..🎛️) studio microphone..control knobs
1F39E..1F39F ; Emoji # 7.0 [2] (🎞️..🎟️) film frames..admission tickets
1F3A0..1F3C4 ; Emoji # 6.0 [37] (🎠..🏄) carousel horse..person surfing
1F3C5 ; Emoji # 7.0 [1] (🏅) sports medal
1F3C6..1F3CA ; Emoji # 6.0 [5] (🏆..🏊) trophy..person swimming
1F3CB..1F3CE ; Emoji # 7.0 [4] (🏋️..🏎️) person lifting weights..racing car
1F3CF..1F3D3 ; Emoji # 8.0 [5] (🏏..🏓) cricket game..ping pong
1F3D4..1F3DF ; Emoji # 7.0 [12] (🏔️..🏟️) snow-capped mountain..stadium
1F3E0..1F3F0 ; Emoji # 6.0 [17] (🏠..🏰) house..castle
1F3F3..1F3F5 ; Emoji # 7.0 [3] (🏳️..🏵️) white flag..rosette
1F3F7 ; Emoji # 7.0 [1] (🏷️) label
1F3F8..1F3FF ; Emoji # 8.0 [8] (🏸..🏿) badminton..dark skin tone
1F400..1F43E ; Emoji # 6.0 [63] (🐀..🐾) rat..paw prints
1F43F ; Emoji # 7.0 [1] (🐿️) chipmunk
1F440 ; Emoji # 6.0 [1] (👀) eyes
1F441 ; Emoji # 7.0 [1] (👁️) eye
1F442..1F4F7 ; Emoji # 6.0[182] (👂..📷) ear..camera
1F4F8 ; Emoji # 7.0 [1] (📸) camera with flash
1F4F9..1F4FC ; Emoji # 6.0 [4] (📹..📼) video camera..videocassette
1F4FD ; Emoji # 7.0 [1] (📽️) film projector
1F4FF ; Emoji # 8.0 [1] (📿) prayer beads
1F500..1F53D ; Emoji # 6.0 [62] (🔀..🔽) shuffle tracks button..downwards button
1F549..1F54A ; Emoji # 7.0 [2] (🕉️..🕊️) om..dove
1F54B..1F54E ; Emoji # 8.0 [4] (🕋..🕎) kaaba..menorah
1F550..1F567 ; Emoji # 6.0 [24] (🕐..🕧) one oclock..twelve-thirty
1F56F..1F570 ; Emoji # 7.0 [2] (🕯️..🕰️) candle..mantelpiece clock
1F573..1F579 ; Emoji # 7.0 [7] (🕳️..🕹️) hole..joystick
1F57A ; Emoji # 9.0 [1] (🕺) man dancing
1F587 ; Emoji # 7.0 [1] (🖇️) linked paperclips
1F58A..1F58D ; Emoji # 7.0 [4] (🖊️..🖍️) pen..crayon
1F590 ; Emoji # 7.0 [1] (🖐️) hand with fingers splayed
1F595..1F596 ; Emoji # 7.0 [2] (🖕..🖖) middle finger..vulcan salute
1F5A4 ; Emoji # 9.0 [1] (🖤) black heart
1F5A5 ; Emoji # 7.0 [1] (🖥️) desktop computer
1F5A8 ; Emoji # 7.0 [1] (🖨️) printer
1F5B1..1F5B2 ; Emoji # 7.0 [2] (🖱️..🖲️) computer mouse..trackball
1F5BC ; Emoji # 7.0 [1] (🖼️) framed picture
1F5C2..1F5C4 ; Emoji # 7.0 [3] (🗂️..🗄️) card index dividers..file cabinet
1F5D1..1F5D3 ; Emoji # 7.0 [3] (🗑️..🗓️) wastebasket..spiral calendar
1F5DC..1F5DE ; Emoji # 7.0 [3] (🗜️..🗞️) clamp..rolled-up newspaper
1F5E1 ; Emoji # 7.0 [1] (🗡️) dagger
1F5E3 ; Emoji # 7.0 [1] (🗣️) speaking head
1F5E8 ; Emoji # 7.0 [1] (🗨️) left speech bubble
1F5EF ; Emoji # 7.0 [1] (🗯️) right anger bubble
1F5F3 ; Emoji # 7.0 [1] (🗳️) ballot box with ballot
1F5FA ; Emoji # 7.0 [1] (🗺️) world map
1F5FB..1F5FF ; Emoji # 6.0 [5] (🗻..🗿) mount fuji..moai
1F600 ; Emoji # 6.1 [1] (😀) grinning face
1F601..1F610 ; Emoji # 6.0 [16] (😁..😐) beaming face with smiling eyes..neutral face
1F611 ; Emoji # 6.1 [1] (😑) expressionless face
1F612..1F614 ; Emoji # 6.0 [3] (😒..😔) unamused face..pensive face
1F615 ; Emoji # 6.1 [1] (😕) confused face
1F616 ; Emoji # 6.0 [1] (😖) confounded face
1F617 ; Emoji # 6.1 [1] (😗) kissing face
1F618 ; Emoji # 6.0 [1] (😘) face blowing a kiss
1F619 ; Emoji # 6.1 [1] (😙) kissing face with smiling eyes
1F61A ; Emoji # 6.0 [1] (😚) kissing face with closed eyes
1F61B ; Emoji # 6.1 [1] (😛) face with tongue
1F61C..1F61E ; Emoji # 6.0 [3] (😜..😞) winking face with tongue..disappointed face
1F61F ; Emoji # 6.1 [1] (😟) worried face
1F620..1F625 ; Emoji # 6.0 [6] (😠..😥) angry face..sad but relieved face
1F626..1F627 ; Emoji # 6.1 [2] (😦..😧) frowning face with open mouth..anguished face
1F628..1F62B ; Emoji # 6.0 [4] (😨..😫) fearful face..tired face
1F62C ; Emoji # 6.1 [1] (😬) grimacing face
1F62D ; Emoji # 6.0 [1] (😭) loudly crying face
1F62E..1F62F ; Emoji # 6.1 [2] (😮..😯) face with open mouth..hushed face
1F630..1F633 ; Emoji # 6.0 [4] (😰..😳) anxious face with sweat..flushed face
1F634 ; Emoji # 6.1 [1] (😴) sleeping face
1F635..1F640 ; Emoji # 6.0 [12] (😵..🙀) dizzy face..weary cat face
1F641..1F642 ; Emoji # 7.0 [2] (🙁..🙂) slightly frowning face..slightly smiling face
1F643..1F644 ; Emoji # 8.0 [2] (🙃..🙄) upside-down face..face with rolling eyes
1F645..1F64F ; Emoji # 6.0 [11] (🙅..🙏) person gesturing NO..folded hands
1F680..1F6C5 ; Emoji # 6.0 [70] (🚀..🛅) rocket..left luggage
1F6CB..1F6CF ; Emoji # 7.0 [5] (🛋️..🛏️) couch and lamp..bed
1F6D0 ; Emoji # 8.0 [1] (🛐) place of worship
1F6D1..1F6D2 ; Emoji # 9.0 [2] (🛑..🛒) stop sign..shopping cart
1F6E0..1F6E5 ; Emoji # 7.0 [6] (🛠️..🛥️) hammer and wrench..motor boat
1F6E9 ; Emoji # 7.0 [1] (🛩️) small airplane
1F6EB..1F6EC ; Emoji # 7.0 [2] (🛫..🛬) airplane departure..airplane arrival
1F6F0 ; Emoji # 7.0 [1] (🛰️) satellite
1F6F3 ; Emoji # 7.0 [1] (🛳️) passenger ship
1F6F4..1F6F6 ; Emoji # 9.0 [3] (🛴..🛶) kick scooter..canoe
1F6F7..1F6F8 ; Emoji # 10.0 [2] (🛷..🛸) sled..flying saucer
1F6F9 ; Emoji # 11.0 [1] (🛹) skateboard
1F910..1F918 ; Emoji # 8.0 [9] (🤐..🤘) zipper-mouth face..sign of the horns
1F919..1F91E ; Emoji # 9.0 [6] (🤙..🤞) call me hand..crossed fingers
1F91F ; Emoji # 10.0 [1] (🤟) love-you gesture
1F920..1F927 ; Emoji # 9.0 [8] (🤠..🤧) cowboy hat face..sneezing face
1F928..1F92F ; Emoji # 10.0 [8] (🤨..🤯) face with raised eyebrow..exploding head
1F930 ; Emoji # 9.0 [1] (🤰) pregnant woman
1F931..1F932 ; Emoji # 10.0 [2] (🤱..🤲) breast-feeding..palms up together
1F933..1F93A ; Emoji # 9.0 [8] (🤳..🤺) selfie..person fencing
1F93C..1F93E ; Emoji # 9.0 [3] (🤼..🤾) people wrestling..person playing handball
1F940..1F945 ; Emoji # 9.0 [6] (🥀..🥅) wilted flower..goal net
1F947..1F94B ; Emoji # 9.0 [5] (🥇..🥋) 1st place medal..martial arts uniform
1F94C ; Emoji # 10.0 [1] (🥌) curling stone
1F94D..1F94F ; Emoji # 11.0 [3] (🥍..🥏) lacrosse..flying disc
1F950..1F95E ; Emoji # 9.0 [15] (🥐..🥞) croissant..pancakes
1F95F..1F96B ; Emoji # 10.0 [13] (🥟..🥫) dumpling..canned food
1F96C..1F970 ; Emoji # 11.0 [5] (🥬..🥰) leafy green..smiling face with 3 hearts
1F973..1F976 ; Emoji # 11.0 [4] (🥳..🥶) partying face..cold face
1F97A ; Emoji # 11.0 [1] (🥺) pleading face
1F97C..1F97F ; Emoji # 11.0 [4] (🥼..🥿) lab coat..womans flat shoe
1F980..1F984 ; Emoji # 8.0 [5] (🦀..🦄) crab..unicorn face
1F985..1F991 ; Emoji # 9.0 [13] (🦅..🦑) eagle..squid
1F992..1F997 ; Emoji # 10.0 [6] (🦒..🦗) giraffe..cricket
1F998..1F9A2 ; Emoji # 11.0 [11] (🦘..🦢) kangaroo..swan
1F9B0..1F9B9 ; Emoji # 11.0 [10] (🦰..🦹) red-haired..supervillain
1F9C0 ; Emoji # 8.0 [1] (🧀) cheese wedge
1F9C1..1F9C2 ; Emoji # 11.0 [2] (🧁..🧂) cupcake..salt
1F9D0..1F9E6 ; Emoji # 10.0 [23] (🧐..🧦) face with monocle..socks
1F9E7..1F9FF ; Emoji # 11.0 [25] (🧧..🧿) red envelope..nazar amulet
# Total elements: 1250
# ================================================
# All omitted code points have Emoji_Presentation=No
# @missing: 0000..10FFFF ; Emoji_Presentation ; No
231A..231B ; Emoji_Presentation # 1.1 [2] (⌚..⌛) watch..hourglass done
23E9..23EC ; Emoji_Presentation # 6.0 [4] (⏩..⏬) fast-forward button..fast down button
23F0 ; Emoji_Presentation # 6.0 [1] (⏰) alarm clock
23F3 ; Emoji_Presentation # 6.0 [1] (⏳) hourglass not done
25FD..25FE ; Emoji_Presentation # 3.2 [2] (◽..◾) white medium-small square..black medium-small square
2614..2615 ; Emoji_Presentation # 4.0 [2] (☔..☕) umbrella with rain drops..hot beverage
2648..2653 ; Emoji_Presentation # 1.1 [12] (♈..♓) Aries..Pisces
267F ; Emoji_Presentation # 4.1 [1] (♿) wheelchair symbol
2693 ; Emoji_Presentation # 4.1 [1] (⚓) anchor
26A1 ; Emoji_Presentation # 4.0 [1] (⚡) high voltage
26AA..26AB ; Emoji_Presentation # 4.1 [2] (⚪..⚫) white circle..black circle
26BD..26BE ; Emoji_Presentation # 5.2 [2] (⚽..⚾) soccer ball..baseball
26C4..26C5 ; Emoji_Presentation # 5.2 [2] (⛄..⛅) snowman without snow..sun behind cloud
26CE ; Emoji_Presentation # 6.0 [1] (⛎) Ophiuchus
26D4 ; Emoji_Presentation # 5.2 [1] (⛔) no entry
26EA ; Emoji_Presentation # 5.2 [1] (⛪) church
26F2..26F3 ; Emoji_Presentation # 5.2 [2] (⛲..⛳) fountain..flag in hole
26F5 ; Emoji_Presentation # 5.2 [1] (⛵) sailboat
26FA ; Emoji_Presentation # 5.2 [1] (⛺) tent
26FD ; Emoji_Presentation # 5.2 [1] (⛽) fuel pump
2705 ; Emoji_Presentation # 6.0 [1] (✅) white heavy check mark
270A..270B ; Emoji_Presentation # 6.0 [2] (✊..✋) raised fist..raised hand
2728 ; Emoji_Presentation # 6.0 [1] (✨) sparkles
274C ; Emoji_Presentation # 6.0 [1] (❌) cross mark
274E ; Emoji_Presentation # 6.0 [1] (❎) cross mark button
2753..2755 ; Emoji_Presentation # 6.0 [3] (❓..❕) question mark..white exclamation mark
2757 ; Emoji_Presentation # 5.2 [1] (❗) exclamation mark
2795..2797 ; Emoji_Presentation # 6.0 [3] (..➗) heavy plus sign..heavy division sign
27B0 ; Emoji_Presentation # 6.0 [1] (➰) curly loop
27BF ; Emoji_Presentation # 6.0 [1] (➿) double curly loop
2B1B..2B1C ; Emoji_Presentation # 5.1 [2] (⬛..⬜) black large square..white large square
2B50 ; Emoji_Presentation # 5.1 [1] (⭐) star
2B55 ; Emoji_Presentation # 5.2 [1] (⭕) heavy large circle
1F004 ; Emoji_Presentation # 5.1 [1] (🀄) mahjong red dragon
1F0CF ; Emoji_Presentation # 6.0 [1] (🃏) joker
1F18E ; Emoji_Presentation # 6.0 [1] (🆎) AB button (blood type)
1F191..1F19A ; Emoji_Presentation # 6.0 [10] (🆑..🆚) CL button..VS button
1F1E6..1F1FF ; Emoji_Presentation # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z
1F201 ; Emoji_Presentation # 6.0 [1] (🈁) Japanese “here” button
1F21A ; Emoji_Presentation # 5.2 [1] (🈚) Japanese “free of charge” button
1F22F ; Emoji_Presentation # 5.2 [1] (🈯) Japanese “reserved” button
1F232..1F236 ; Emoji_Presentation # 6.0 [5] (🈲..🈶) Japanese “prohibited” button..Japanese “not free of charge” button
1F238..1F23A ; Emoji_Presentation # 6.0 [3] (🈸..🈺) Japanese “application” button..Japanese “open for business” button
1F250..1F251 ; Emoji_Presentation # 6.0 [2] (🉐..🉑) Japanese “bargain” button..Japanese “acceptable” button
1F300..1F320 ; Emoji_Presentation # 6.0 [33] (🌀..🌠) cyclone..shooting star
1F32D..1F32F ; Emoji_Presentation # 8.0 [3] (🌭..🌯) hot dog..burrito
1F330..1F335 ; Emoji_Presentation # 6.0 [6] (🌰..🌵) chestnut..cactus
1F337..1F37C ; Emoji_Presentation # 6.0 [70] (🌷..🍼) tulip..baby bottle
1F37E..1F37F ; Emoji_Presentation # 8.0 [2] (🍾..🍿) bottle with popping cork..popcorn
1F380..1F393 ; Emoji_Presentation # 6.0 [20] (🎀..🎓) ribbon..graduation cap
1F3A0..1F3C4 ; Emoji_Presentation # 6.0 [37] (🎠..🏄) carousel horse..person surfing
1F3C5 ; Emoji_Presentation # 7.0 [1] (🏅) sports medal
1F3C6..1F3CA ; Emoji_Presentation # 6.0 [5] (🏆..🏊) trophy..person swimming
1F3CF..1F3D3 ; Emoji_Presentation # 8.0 [5] (🏏..🏓) cricket game..ping pong
1F3E0..1F3F0 ; Emoji_Presentation # 6.0 [17] (🏠..🏰) house..castle
1F3F4 ; Emoji_Presentation # 7.0 [1] (🏴) black flag
1F3F8..1F3FF ; Emoji_Presentation # 8.0 [8] (🏸..🏿) badminton..dark skin tone
1F400..1F43E ; Emoji_Presentation # 6.0 [63] (🐀..🐾) rat..paw prints
1F440 ; Emoji_Presentation # 6.0 [1] (👀) eyes
1F442..1F4F7 ; Emoji_Presentation # 6.0[182] (👂..📷) ear..camera
1F4F8 ; Emoji_Presentation # 7.0 [1] (📸) camera with flash
1F4F9..1F4FC ; Emoji_Presentation # 6.0 [4] (📹..📼) video camera..videocassette
1F4FF ; Emoji_Presentation # 8.0 [1] (📿) prayer beads
1F500..1F53D ; Emoji_Presentation # 6.0 [62] (🔀..🔽) shuffle tracks button..downwards button
1F54B..1F54E ; Emoji_Presentation # 8.0 [4] (🕋..🕎) kaaba..menorah
1F550..1F567 ; Emoji_Presentation # 6.0 [24] (🕐..🕧) one oclock..twelve-thirty
1F57A ; Emoji_Presentation # 9.0 [1] (🕺) man dancing
1F595..1F596 ; Emoji_Presentation # 7.0 [2] (🖕..🖖) middle finger..vulcan salute
1F5A4 ; Emoji_Presentation # 9.0 [1] (🖤) black heart
1F5FB..1F5FF ; Emoji_Presentation # 6.0 [5] (🗻..🗿) mount fuji..moai
1F600 ; Emoji_Presentation # 6.1 [1] (😀) grinning face
1F601..1F610 ; Emoji_Presentation # 6.0 [16] (😁..😐) beaming face with smiling eyes..neutral face
1F611 ; Emoji_Presentation # 6.1 [1] (😑) expressionless face
1F612..1F614 ; Emoji_Presentation # 6.0 [3] (😒..😔) unamused face..pensive face
1F615 ; Emoji_Presentation # 6.1 [1] (😕) confused face
1F616 ; Emoji_Presentation # 6.0 [1] (😖) confounded face
1F617 ; Emoji_Presentation # 6.1 [1] (😗) kissing face
1F618 ; Emoji_Presentation # 6.0 [1] (😘) face blowing a kiss
1F619 ; Emoji_Presentation # 6.1 [1] (😙) kissing face with smiling eyes
1F61A ; Emoji_Presentation # 6.0 [1] (😚) kissing face with closed eyes
1F61B ; Emoji_Presentation # 6.1 [1] (😛) face with tongue
1F61C..1F61E ; Emoji_Presentation # 6.0 [3] (😜..😞) winking face with tongue..disappointed face
1F61F ; Emoji_Presentation # 6.1 [1] (😟) worried face
1F620..1F625 ; Emoji_Presentation # 6.0 [6] (😠..😥) angry face..sad but relieved face
1F626..1F627 ; Emoji_Presentation # 6.1 [2] (😦..😧) frowning face with open mouth..anguished face
1F628..1F62B ; Emoji_Presentation # 6.0 [4] (😨..😫) fearful face..tired face
1F62C ; Emoji_Presentation # 6.1 [1] (😬) grimacing face
1F62D ; Emoji_Presentation # 6.0 [1] (😭) loudly crying face
1F62E..1F62F ; Emoji_Presentation # 6.1 [2] (😮..😯) face with open mouth..hushed face
1F630..1F633 ; Emoji_Presentation # 6.0 [4] (😰..😳) anxious face with sweat..flushed face
1F634 ; Emoji_Presentation # 6.1 [1] (😴) sleeping face
1F635..1F640 ; Emoji_Presentation # 6.0 [12] (😵..🙀) dizzy face..weary cat face
1F641..1F642 ; Emoji_Presentation # 7.0 [2] (🙁..🙂) slightly frowning face..slightly smiling face
1F643..1F644 ; Emoji_Presentation # 8.0 [2] (🙃..🙄) upside-down face..face with rolling eyes
1F645..1F64F ; Emoji_Presentation # 6.0 [11] (🙅..🙏) person gesturing NO..folded hands
1F680..1F6C5 ; Emoji_Presentation # 6.0 [70] (🚀..🛅) rocket..left luggage
1F6CC ; Emoji_Presentation # 7.0 [1] (🛌) person in bed
1F6D0 ; Emoji_Presentation # 8.0 [1] (🛐) place of worship
1F6D1..1F6D2 ; Emoji_Presentation # 9.0 [2] (🛑..🛒) stop sign..shopping cart
1F6EB..1F6EC ; Emoji_Presentation # 7.0 [2] (🛫..🛬) airplane departure..airplane arrival
1F6F4..1F6F6 ; Emoji_Presentation # 9.0 [3] (🛴..🛶) kick scooter..canoe
1F6F7..1F6F8 ; Emoji_Presentation # 10.0 [2] (🛷..🛸) sled..flying saucer
1F6F9 ; Emoji_Presentation # 11.0 [1] (🛹) skateboard
1F910..1F918 ; Emoji_Presentation # 8.0 [9] (🤐..🤘) zipper-mouth face..sign of the horns
1F919..1F91E ; Emoji_Presentation # 9.0 [6] (🤙..🤞) call me hand..crossed fingers
1F91F ; Emoji_Presentation # 10.0 [1] (🤟) love-you gesture
1F920..1F927 ; Emoji_Presentation # 9.0 [8] (🤠..🤧) cowboy hat face..sneezing face
1F928..1F92F ; Emoji_Presentation # 10.0 [8] (🤨..🤯) face with raised eyebrow..exploding head
1F930 ; Emoji_Presentation # 9.0 [1] (🤰) pregnant woman
1F931..1F932 ; Emoji_Presentation # 10.0 [2] (🤱..🤲) breast-feeding..palms up together
1F933..1F93A ; Emoji_Presentation # 9.0 [8] (🤳..🤺) selfie..person fencing
1F93C..1F93E ; Emoji_Presentation # 9.0 [3] (🤼..🤾) people wrestling..person playing handball
1F940..1F945 ; Emoji_Presentation # 9.0 [6] (🥀..🥅) wilted flower..goal net
1F947..1F94B ; Emoji_Presentation # 9.0 [5] (🥇..🥋) 1st place medal..martial arts uniform
1F94C ; Emoji_Presentation # 10.0 [1] (🥌) curling stone
1F94D..1F94F ; Emoji_Presentation # 11.0 [3] (🥍..🥏) lacrosse..flying disc
1F950..1F95E ; Emoji_Presentation # 9.0 [15] (🥐..🥞) croissant..pancakes
1F95F..1F96B ; Emoji_Presentation # 10.0 [13] (🥟..🥫) dumpling..canned food
1F96C..1F970 ; Emoji_Presentation # 11.0 [5] (🥬..🥰) leafy green..smiling face with 3 hearts
1F973..1F976 ; Emoji_Presentation # 11.0 [4] (🥳..🥶) partying face..cold face
1F97A ; Emoji_Presentation # 11.0 [1] (🥺) pleading face
1F97C..1F97F ; Emoji_Presentation # 11.0 [4] (🥼..🥿) lab coat..womans flat shoe
1F980..1F984 ; Emoji_Presentation # 8.0 [5] (🦀..🦄) crab..unicorn face
1F985..1F991 ; Emoji_Presentation # 9.0 [13] (🦅..🦑) eagle..squid
1F992..1F997 ; Emoji_Presentation # 10.0 [6] (🦒..🦗) giraffe..cricket
1F998..1F9A2 ; Emoji_Presentation # 11.0 [11] (🦘..🦢) kangaroo..swan
1F9B0..1F9B9 ; Emoji_Presentation # 11.0 [10] (🦰..🦹) red-haired..supervillain
1F9C0 ; Emoji_Presentation # 8.0 [1] (🧀) cheese wedge
1F9C1..1F9C2 ; Emoji_Presentation # 11.0 [2] (🧁..🧂) cupcake..salt
1F9D0..1F9E6 ; Emoji_Presentation # 10.0 [23] (🧐..🧦) face with monocle..socks
1F9E7..1F9FF ; Emoji_Presentation # 11.0 [25] (🧧..🧿) red envelope..nazar amulet
# Total elements: 1032
# ================================================
# All omitted code points have Emoji_Modifier=No
# @missing: 0000..10FFFF ; Emoji_Modifier ; No
1F3FB..1F3FF ; Emoji_Modifier # 8.0 [5] (🏻..🏿) light skin tone..dark skin tone
# Total elements: 5
# ================================================
# All omitted code points have Emoji_Modifier_Base=No
# @missing: 0000..10FFFF ; Emoji_Modifier_Base ; No
261D ; Emoji_Modifier_Base # 1.1 [1] (☝️) index pointing up
26F9 ; Emoji_Modifier_Base # 5.2 [1] (⛹️) person bouncing ball
270A..270B ; Emoji_Modifier_Base # 6.0 [2] (✊..✋) raised fist..raised hand
270C..270D ; Emoji_Modifier_Base # 1.1 [2] (✌️..✍️) victory hand..writing hand
1F385 ; Emoji_Modifier_Base # 6.0 [1] (🎅) Santa Claus
1F3C2..1F3C4 ; Emoji_Modifier_Base # 6.0 [3] (🏂..🏄) snowboarder..person surfing
1F3C7 ; Emoji_Modifier_Base # 6.0 [1] (🏇) horse racing
1F3CA ; Emoji_Modifier_Base # 6.0 [1] (🏊) person swimming
1F3CB..1F3CC ; Emoji_Modifier_Base # 7.0 [2] (🏋️..🏌️) person lifting weights..person golfing
1F442..1F443 ; Emoji_Modifier_Base # 6.0 [2] (👂..👃) ear..nose
1F446..1F450 ; Emoji_Modifier_Base # 6.0 [11] (👆..👐) backhand index pointing up..open hands
1F466..1F469 ; Emoji_Modifier_Base # 6.0 [4] (👦..👩) boy..woman
1F46E ; Emoji_Modifier_Base # 6.0 [1] (👮) police officer
1F470..1F478 ; Emoji_Modifier_Base # 6.0 [9] (👰..👸) bride with veil..princess
1F47C ; Emoji_Modifier_Base # 6.0 [1] (👼) baby angel
1F481..1F483 ; Emoji_Modifier_Base # 6.0 [3] (💁..💃) person tipping hand..woman dancing
1F485..1F487 ; Emoji_Modifier_Base # 6.0 [3] (💅..💇) nail polish..person getting haircut
1F4AA ; Emoji_Modifier_Base # 6.0 [1] (💪) flexed biceps
1F574..1F575 ; Emoji_Modifier_Base # 7.0 [2] (🕴️..🕵️) man in suit levitating..detective
1F57A ; Emoji_Modifier_Base # 9.0 [1] (🕺) man dancing
1F590 ; Emoji_Modifier_Base # 7.0 [1] (🖐️) hand with fingers splayed
1F595..1F596 ; Emoji_Modifier_Base # 7.0 [2] (🖕..🖖) middle finger..vulcan salute
1F645..1F647 ; Emoji_Modifier_Base # 6.0 [3] (🙅..🙇) person gesturing NO..person bowing
1F64B..1F64F ; Emoji_Modifier_Base # 6.0 [5] (🙋..🙏) person raising hand..folded hands
1F6A3 ; Emoji_Modifier_Base # 6.0 [1] (🚣) person rowing boat
1F6B4..1F6B6 ; Emoji_Modifier_Base # 6.0 [3] (🚴..🚶) person biking..person walking
1F6C0 ; Emoji_Modifier_Base # 6.0 [1] (🛀) person taking bath
1F6CC ; Emoji_Modifier_Base # 7.0 [1] (🛌) person in bed
1F918 ; Emoji_Modifier_Base # 8.0 [1] (🤘) sign of the horns
1F919..1F91C ; Emoji_Modifier_Base # 9.0 [4] (🤙..🤜) call me hand..right-facing fist
1F91E ; Emoji_Modifier_Base # 9.0 [1] (🤞) crossed fingers
1F91F ; Emoji_Modifier_Base # 10.0 [1] (🤟) love-you gesture
1F926 ; Emoji_Modifier_Base # 9.0 [1] (🤦) person facepalming
1F930 ; Emoji_Modifier_Base # 9.0 [1] (🤰) pregnant woman
1F931..1F932 ; Emoji_Modifier_Base # 10.0 [2] (🤱..🤲) breast-feeding..palms up together
1F933..1F939 ; Emoji_Modifier_Base # 9.0 [7] (🤳..🤹) selfie..person juggling
1F93D..1F93E ; Emoji_Modifier_Base # 9.0 [2] (🤽..🤾) person playing water polo..person playing handball
1F9B5..1F9B6 ; Emoji_Modifier_Base # 11.0 [2] (🦵..🦶) leg..foot
1F9B8..1F9B9 ; Emoji_Modifier_Base # 11.0 [2] (🦸..🦹) superhero..supervillain
1F9D1..1F9DD ; Emoji_Modifier_Base # 10.0 [13] (🧑..🧝) adult..elf
# Total elements: 106
# ================================================
# All omitted code points have Emoji_Component=No
# @missing: 0000..10FFFF ; Emoji_Component ; No
0023 ; Emoji_Component # 1.1 [1] (#) number sign
002A ; Emoji_Component # 1.1 [1] (*) asterisk
0030..0039 ; Emoji_Component # 1.1 [10] (0..9) digit zero..digit nine
200D ; Emoji_Component # 1.1 [1] () zero width joiner
20E3 ; Emoji_Component # 3.0 [1] (⃣) combining enclosing keycap
FE0F ; Emoji_Component # 3.2 [1] () VARIATION SELECTOR-16
1F1E6..1F1FF ; Emoji_Component # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z
1F3FB..1F3FF ; Emoji_Component # 8.0 [5] (🏻..🏿) light skin tone..dark skin tone
1F9B0..1F9B3 ; Emoji_Component # 11.0 [4] (🦰..🦳) red-haired..white-haired
E0020..E007F ; Emoji_Component # 3.1 [96] (󠀠..󠁿) tag space..cancel tag
# Total elements: 146
# ================================================
# All omitted code points have Extended_Pictographic=No
# @missing: 0000..10FFFF ; Extended_Pictographic ; No
00A9 ; Extended_Pictographic# 1.1 [1] (©️) copyright
00AE ; Extended_Pictographic# 1.1 [1] (®️) registered
203C ; Extended_Pictographic# 1.1 [1] (‼️) double exclamation mark
2049 ; Extended_Pictographic# 3.0 [1] (⁉️) exclamation question mark
2122 ; Extended_Pictographic# 1.1 [1] (™️) trade mark
2139 ; Extended_Pictographic# 3.0 [1] () information
2194..2199 ; Extended_Pictographic# 1.1 [6] (↔️..↙️) left-right arrow..down-left arrow
21A9..21AA ; Extended_Pictographic# 1.1 [2] (↩️..↪️) right arrow curving left..left arrow curving right
231A..231B ; Extended_Pictographic# 1.1 [2] (⌚..⌛) watch..hourglass done
2328 ; Extended_Pictographic# 1.1 [1] (⌨️) keyboard
2388 ; Extended_Pictographic# 3.0 [1] (⎈️) HELM SYMBOL
23CF ; Extended_Pictographic# 4.0 [1] (⏏️) eject button
23E9..23F3 ; Extended_Pictographic# 6.0 [11] (⏩..⏳) fast-forward button..hourglass not done
23F8..23FA ; Extended_Pictographic# 7.0 [3] (⏸️..⏺️) pause button..record button
24C2 ; Extended_Pictographic# 1.1 [1] (Ⓜ️) circled M
25AA..25AB ; Extended_Pictographic# 1.1 [2] (▪️..▫️) black small square..white small square
25B6 ; Extended_Pictographic# 1.1 [1] (▶️) play button
25C0 ; Extended_Pictographic# 1.1 [1] (◀️) reverse button
25FB..25FE ; Extended_Pictographic# 3.2 [4] (◻️..◾) white medium square..black medium-small square
2600..2605 ; Extended_Pictographic# 1.1 [6] (☀️..★️) sun..BLACK STAR
2607..2612 ; Extended_Pictographic# 1.1 [12] (☇️..☒️) LIGHTNING..BALLOT BOX WITH X
2614..2615 ; Extended_Pictographic# 4.0 [2] (☔..☕) umbrella with rain drops..hot beverage
2616..2617 ; Extended_Pictographic# 3.2 [2] (☖️..☗️) WHITE SHOGI PIECE..BLACK SHOGI PIECE
2618 ; Extended_Pictographic# 4.1 [1] (☘️) shamrock
2619 ; Extended_Pictographic# 3.0 [1] (☙️) REVERSED ROTATED FLORAL HEART BULLET
261A..266F ; Extended_Pictographic# 1.1 [86] (☚️..♯️) BLACK LEFT POINTING INDEX..MUSIC SHARP SIGN
2670..2671 ; Extended_Pictographic# 3.0 [2] (♰️..♱️) WEST SYRIAC CROSS..EAST SYRIAC CROSS
2672..267D ; Extended_Pictographic# 3.2 [12] (♲️..♽️) UNIVERSAL RECYCLING SYMBOL..PARTIALLY-RECYCLED PAPER SYMBOL
267E..267F ; Extended_Pictographic# 4.1 [2] (♾️..♿) infinity..wheelchair symbol
2680..2685 ; Extended_Pictographic# 3.2 [6] (⚀️..⚅️) DIE FACE-1..DIE FACE-6
2690..2691 ; Extended_Pictographic# 4.0 [2] (⚐️..⚑️) WHITE FLAG..BLACK FLAG
2692..269C ; Extended_Pictographic# 4.1 [11] (⚒️..⚜️) hammer and pick..fleur-de-lis
269D ; Extended_Pictographic# 5.1 [1] (⚝️) OUTLINED WHITE STAR
269E..269F ; Extended_Pictographic# 5.2 [2] (⚞️..⚟️) THREE LINES CONVERGING RIGHT..THREE LINES CONVERGING LEFT
26A0..26A1 ; Extended_Pictographic# 4.0 [2] (⚠️..⚡) warning..high voltage
26A2..26B1 ; Extended_Pictographic# 4.1 [16] (⚢️..⚱️) DOUBLED FEMALE SIGN..funeral urn
26B2 ; Extended_Pictographic# 5.0 [1] (⚲️) NEUTER
26B3..26BC ; Extended_Pictographic# 5.1 [10] (⚳️..⚼️) CERES..SESQUIQUADRATE
26BD..26BF ; Extended_Pictographic# 5.2 [3] (⚽..⚿️) soccer ball..SQUARED KEY
26C0..26C3 ; Extended_Pictographic# 5.1 [4] (⛀️..⛃️) WHITE DRAUGHTS MAN..BLACK DRAUGHTS KING
26C4..26CD ; Extended_Pictographic# 5.2 [10] (⛄..⛍️) snowman without snow..DISABLED CAR
26CE ; Extended_Pictographic# 6.0 [1] (⛎) Ophiuchus
26CF..26E1 ; Extended_Pictographic# 5.2 [19] (⛏️..⛡️) pick..RESTRICTED LEFT ENTRY-2
26E2 ; Extended_Pictographic# 6.0 [1] (⛢️) ASTRONOMICAL SYMBOL FOR URANUS
26E3 ; Extended_Pictographic# 5.2 [1] (⛣️) HEAVY CIRCLE WITH STROKE AND TWO DOTS ABOVE
26E4..26E7 ; Extended_Pictographic# 6.0 [4] (⛤️..⛧️) PENTAGRAM..INVERTED PENTAGRAM
26E8..26FF ; Extended_Pictographic# 5.2 [24] (⛨️..⛿️) BLACK CROSS ON SHIELD..WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE
2700 ; Extended_Pictographic# 7.0 [1] (✀️) BLACK SAFETY SCISSORS
2701..2704 ; Extended_Pictographic# 1.1 [4] (✁️..✄️) UPPER BLADE SCISSORS..WHITE SCISSORS
2705 ; Extended_Pictographic# 6.0 [1] (✅) white heavy check mark
2708..2709 ; Extended_Pictographic# 1.1 [2] (✈️..✉️) airplane..envelope
270A..270B ; Extended_Pictographic# 6.0 [2] (✊..✋) raised fist..raised hand
270C..2712 ; Extended_Pictographic# 1.1 [7] (✌️..✒️) victory hand..black nib
2714 ; Extended_Pictographic# 1.1 [1] (✔️) heavy check mark
2716 ; Extended_Pictographic# 1.1 [1] (✖️) heavy multiplication x
271D ; Extended_Pictographic# 1.1 [1] (✝️) latin cross
2721 ; Extended_Pictographic# 1.1 [1] (✡️) star of David
2728 ; Extended_Pictographic# 6.0 [1] (✨) sparkles
2733..2734 ; Extended_Pictographic# 1.1 [2] (✳️..✴️) eight-spoked asterisk..eight-pointed star
2744 ; Extended_Pictographic# 1.1 [1] (❄️) snowflake
2747 ; Extended_Pictographic# 1.1 [1] (❇️) sparkle
274C ; Extended_Pictographic# 6.0 [1] (❌) cross mark
274E ; Extended_Pictographic# 6.0 [1] (❎) cross mark button
2753..2755 ; Extended_Pictographic# 6.0 [3] (❓..❕) question mark..white exclamation mark
2757 ; Extended_Pictographic# 5.2 [1] (❗) exclamation mark
2763..2767 ; Extended_Pictographic# 1.1 [5] (❣️..❧️) heavy heart exclamation..ROTATED FLORAL HEART BULLET
2795..2797 ; Extended_Pictographic# 6.0 [3] (..➗) heavy plus sign..heavy division sign
27A1 ; Extended_Pictographic# 1.1 [1] (➡️) right arrow
27B0 ; Extended_Pictographic# 6.0 [1] (➰) curly loop
27BF ; Extended_Pictographic# 6.0 [1] (➿) double curly loop
2934..2935 ; Extended_Pictographic# 3.2 [2] (⤴️..⤵️) right arrow curving up..right arrow curving down
2B05..2B07 ; Extended_Pictographic# 4.0 [3] (⬅️..⬇️) left arrow..down arrow
2B1B..2B1C ; Extended_Pictographic# 5.1 [2] (⬛..⬜) black large square..white large square
2B50 ; Extended_Pictographic# 5.1 [1] (⭐) star
2B55 ; Extended_Pictographic# 5.2 [1] (⭕) heavy large circle
3030 ; Extended_Pictographic# 1.1 [1] (〰️) wavy dash
303D ; Extended_Pictographic# 3.2 [1] (〽️) part alternation mark
3297 ; Extended_Pictographic# 1.1 [1] (㊗️) Japanese “congratulations” button
3299 ; Extended_Pictographic# 1.1 [1] (㊙️) Japanese “secret” button
1F000..1F02B ; Extended_Pictographic# 5.1 [44] (🀀️..🀫️) MAHJONG TILE EAST WIND..MAHJONG TILE BACK
1F02C..1F02F ; Extended_Pictographic# NA [4] (🀬️..🀯️) <reserved-1F02C>..<reserved-1F02F>
1F030..1F093 ; Extended_Pictographic# 5.1[100] (🀰️..🂓️) DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06
1F094..1F09F ; Extended_Pictographic# NA [12] (🂔️..🂟️) <reserved-1F094>..<reserved-1F09F>
1F0A0..1F0AE ; Extended_Pictographic# 6.0 [15] (🂠️..🂮️) PLAYING CARD BACK..PLAYING CARD KING OF SPADES
1F0AF..1F0B0 ; Extended_Pictographic# NA [2] (🂯️..🂰️) <reserved-1F0AF>..<reserved-1F0B0>
1F0B1..1F0BE ; Extended_Pictographic# 6.0 [14] (🂱️..🂾️) PLAYING CARD ACE OF HEARTS..PLAYING CARD KING OF HEARTS
1F0BF ; Extended_Pictographic# 7.0 [1] (🂿️) PLAYING CARD RED JOKER
1F0C0 ; Extended_Pictographic# NA [1] (🃀️) <reserved-1F0C0>
1F0C1..1F0CF ; Extended_Pictographic# 6.0 [15] (🃁️..🃏) PLAYING CARD ACE OF DIAMONDS..joker
1F0D0 ; Extended_Pictographic# NA [1] (🃐️) <reserved-1F0D0>
1F0D1..1F0DF ; Extended_Pictographic# 6.0 [15] (🃑️..🃟️) PLAYING CARD ACE OF CLUBS..PLAYING CARD WHITE JOKER
1F0E0..1F0F5 ; Extended_Pictographic# 7.0 [22] (🃠️..🃵️) PLAYING CARD FOOL..PLAYING CARD TRUMP-21
1F0F6..1F0FF ; Extended_Pictographic# NA [10] (🃶️..🃿️) <reserved-1F0F6>..<reserved-1F0FF>
1F10D..1F10F ; Extended_Pictographic# NA [3] (🄍️..🄏️) <reserved-1F10D>..<reserved-1F10F>
1F12F ; Extended_Pictographic# 11.0 [1] (🄯️) COPYLEFT SYMBOL
1F16C..1F16F ; Extended_Pictographic# NA [4] (🅬️..🅯️) <reserved-1F16C>..<reserved-1F16F>
1F170..1F171 ; Extended_Pictographic# 6.0 [2] (🅰️..🅱️) A button (blood type)..B button (blood type)
1F17E ; Extended_Pictographic# 6.0 [1] (🅾️) O button (blood type)
1F17F ; Extended_Pictographic# 5.2 [1] (🅿️) P button
1F18E ; Extended_Pictographic# 6.0 [1] (🆎) AB button (blood type)
1F191..1F19A ; Extended_Pictographic# 6.0 [10] (🆑..🆚) CL button..VS button
1F1AD..1F1E5 ; Extended_Pictographic# NA [57] (🆭️..🇥️) <reserved-1F1AD>..<reserved-1F1E5>
1F201..1F202 ; Extended_Pictographic# 6.0 [2] (🈁..🈂️) Japanese “here” button..Japanese “service charge” button
1F203..1F20F ; Extended_Pictographic# NA [13] (🈃️..🈏️) <reserved-1F203>..<reserved-1F20F>
1F21A ; Extended_Pictographic# 5.2 [1] (🈚) Japanese “free of charge” button
1F22F ; Extended_Pictographic# 5.2 [1] (🈯) Japanese “reserved” button
1F232..1F23A ; Extended_Pictographic# 6.0 [9] (🈲..🈺) Japanese “prohibited” button..Japanese “open for business” button
1F23C..1F23F ; Extended_Pictographic# NA [4] (🈼️..🈿️) <reserved-1F23C>..<reserved-1F23F>
1F249..1F24F ; Extended_Pictographic# NA [7] (🉉️..🉏️) <reserved-1F249>..<reserved-1F24F>
1F250..1F251 ; Extended_Pictographic# 6.0 [2] (🉐..🉑) Japanese “bargain” button..Japanese “acceptable” button
1F252..1F25F ; Extended_Pictographic# NA [14] (🉒️..🉟️) <reserved-1F252>..<reserved-1F25F>
1F260..1F265 ; Extended_Pictographic# 10.0 [6] (🉠️..🉥️) ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
1F266..1F2FF ; Extended_Pictographic# NA[154] (🉦️..🋿️) <reserved-1F266>..<reserved-1F2FF>
1F300..1F320 ; Extended_Pictographic# 6.0 [33] (🌀..🌠) cyclone..shooting star
1F321..1F32C ; Extended_Pictographic# 7.0 [12] (🌡️..🌬️) thermometer..wind face
1F32D..1F32F ; Extended_Pictographic# 8.0 [3] (🌭..🌯) hot dog..burrito
1F330..1F335 ; Extended_Pictographic# 6.0 [6] (🌰..🌵) chestnut..cactus
1F336 ; Extended_Pictographic# 7.0 [1] (🌶️) hot pepper
1F337..1F37C ; Extended_Pictographic# 6.0 [70] (🌷..🍼) tulip..baby bottle
1F37D ; Extended_Pictographic# 7.0 [1] (🍽️) fork and knife with plate
1F37E..1F37F ; Extended_Pictographic# 8.0 [2] (🍾..🍿) bottle with popping cork..popcorn
1F380..1F393 ; Extended_Pictographic# 6.0 [20] (🎀..🎓) ribbon..graduation cap
1F394..1F39F ; Extended_Pictographic# 7.0 [12] (🎔️..🎟️) HEART WITH TIP ON THE LEFT..admission tickets
1F3A0..1F3C4 ; Extended_Pictographic# 6.0 [37] (🎠..🏄) carousel horse..person surfing
1F3C5 ; Extended_Pictographic# 7.0 [1] (🏅) sports medal
1F3C6..1F3CA ; Extended_Pictographic# 6.0 [5] (🏆..🏊) trophy..person swimming
1F3CB..1F3CE ; Extended_Pictographic# 7.0 [4] (🏋️..🏎️) person lifting weights..racing car
1F3CF..1F3D3 ; Extended_Pictographic# 8.0 [5] (🏏..🏓) cricket game..ping pong
1F3D4..1F3DF ; Extended_Pictographic# 7.0 [12] (🏔️..🏟️) snow-capped mountain..stadium
1F3E0..1F3F0 ; Extended_Pictographic# 6.0 [17] (🏠..🏰) house..castle
1F3F1..1F3F7 ; Extended_Pictographic# 7.0 [7] (🏱️..🏷️) WHITE PENNANT..label
1F3F8..1F3FA ; Extended_Pictographic# 8.0 [3] (🏸..🏺) badminton..amphora
1F400..1F43E ; Extended_Pictographic# 6.0 [63] (🐀..🐾) rat..paw prints
1F43F ; Extended_Pictographic# 7.0 [1] (🐿️) chipmunk
1F440 ; Extended_Pictographic# 6.0 [1] (👀) eyes
1F441 ; Extended_Pictographic# 7.0 [1] (👁️) eye
1F442..1F4F7 ; Extended_Pictographic# 6.0[182] (👂..📷) ear..camera
1F4F8 ; Extended_Pictographic# 7.0 [1] (📸) camera with flash
1F4F9..1F4FC ; Extended_Pictographic# 6.0 [4] (📹..📼) video camera..videocassette
1F4FD..1F4FE ; Extended_Pictographic# 7.0 [2] (📽️..📾️) film projector..PORTABLE STEREO
1F4FF ; Extended_Pictographic# 8.0 [1] (📿) prayer beads
1F500..1F53D ; Extended_Pictographic# 6.0 [62] (🔀..🔽) shuffle tracks button..downwards button
1F546..1F54A ; Extended_Pictographic# 7.0 [5] (🕆️..🕊️) WHITE LATIN CROSS..dove
1F54B..1F54F ; Extended_Pictographic# 8.0 [5] (🕋..🕏️) kaaba..BOWL OF HYGIEIA
1F550..1F567 ; Extended_Pictographic# 6.0 [24] (🕐..🕧) one oclock..twelve-thirty
1F568..1F579 ; Extended_Pictographic# 7.0 [18] (🕨️..🕹️) RIGHT SPEAKER..joystick
1F57A ; Extended_Pictographic# 9.0 [1] (🕺) man dancing
1F57B..1F5A3 ; Extended_Pictographic# 7.0 [41] (🕻️..🖣️) LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
1F5A4 ; Extended_Pictographic# 9.0 [1] (🖤) black heart
1F5A5..1F5FA ; Extended_Pictographic# 7.0 [86] (🖥️..🗺️) desktop computer..world map
1F5FB..1F5FF ; Extended_Pictographic# 6.0 [5] (🗻..🗿) mount fuji..moai
1F600 ; Extended_Pictographic# 6.1 [1] (😀) grinning face
1F601..1F610 ; Extended_Pictographic# 6.0 [16] (😁..😐) beaming face with smiling eyes..neutral face
1F611 ; Extended_Pictographic# 6.1 [1] (😑) expressionless face
1F612..1F614 ; Extended_Pictographic# 6.0 [3] (😒..😔) unamused face..pensive face
1F615 ; Extended_Pictographic# 6.1 [1] (😕) confused face
1F616 ; Extended_Pictographic# 6.0 [1] (😖) confounded face
1F617 ; Extended_Pictographic# 6.1 [1] (😗) kissing face
1F618 ; Extended_Pictographic# 6.0 [1] (😘) face blowing a kiss
1F619 ; Extended_Pictographic# 6.1 [1] (😙) kissing face with smiling eyes
1F61A ; Extended_Pictographic# 6.0 [1] (😚) kissing face with closed eyes
1F61B ; Extended_Pictographic# 6.1 [1] (😛) face with tongue
1F61C..1F61E ; Extended_Pictographic# 6.0 [3] (😜..😞) winking face with tongue..disappointed face
1F61F ; Extended_Pictographic# 6.1 [1] (😟) worried face
1F620..1F625 ; Extended_Pictographic# 6.0 [6] (😠..😥) angry face..sad but relieved face
1F626..1F627 ; Extended_Pictographic# 6.1 [2] (😦..😧) frowning face with open mouth..anguished face
1F628..1F62B ; Extended_Pictographic# 6.0 [4] (😨..😫) fearful face..tired face
1F62C ; Extended_Pictographic# 6.1 [1] (😬) grimacing face
1F62D ; Extended_Pictographic# 6.0 [1] (😭) loudly crying face
1F62E..1F62F ; Extended_Pictographic# 6.1 [2] (😮..😯) face with open mouth..hushed face
1F630..1F633 ; Extended_Pictographic# 6.0 [4] (😰..😳) anxious face with sweat..flushed face
1F634 ; Extended_Pictographic# 6.1 [1] (😴) sleeping face
1F635..1F640 ; Extended_Pictographic# 6.0 [12] (😵..🙀) dizzy face..weary cat face
1F641..1F642 ; Extended_Pictographic# 7.0 [2] (🙁..🙂) slightly frowning face..slightly smiling face
1F643..1F644 ; Extended_Pictographic# 8.0 [2] (🙃..🙄) upside-down face..face with rolling eyes
1F645..1F64F ; Extended_Pictographic# 6.0 [11] (🙅..🙏) person gesturing NO..folded hands
1F680..1F6C5 ; Extended_Pictographic# 6.0 [70] (🚀..🛅) rocket..left luggage
1F6C6..1F6CF ; Extended_Pictographic# 7.0 [10] (🛆️..🛏️) TRIANGLE WITH ROUNDED CORNERS..bed
1F6D0 ; Extended_Pictographic# 8.0 [1] (🛐) place of worship
1F6D1..1F6D2 ; Extended_Pictographic# 9.0 [2] (🛑..🛒) stop sign..shopping cart
1F6D3..1F6D4 ; Extended_Pictographic# 10.0 [2] (🛓️..🛔️) STUPA..PAGODA
1F6D5..1F6DF ; Extended_Pictographic# NA [11] (🛕️..🛟️) <reserved-1F6D5>..<reserved-1F6DF>
1F6E0..1F6EC ; Extended_Pictographic# 7.0 [13] (🛠️..🛬) hammer and wrench..airplane arrival
1F6ED..1F6EF ; Extended_Pictographic# NA [3] (🛭️..🛯️) <reserved-1F6ED>..<reserved-1F6EF>
1F6F0..1F6F3 ; Extended_Pictographic# 7.0 [4] (🛰️..🛳️) satellite..passenger ship
1F6F4..1F6F6 ; Extended_Pictographic# 9.0 [3] (🛴..🛶) kick scooter..canoe
1F6F7..1F6F8 ; Extended_Pictographic# 10.0 [2] (🛷..🛸) sled..flying saucer
1F6F9 ; Extended_Pictographic# 11.0 [1] (🛹) skateboard
1F6FA..1F6FF ; Extended_Pictographic# NA [6] (🛺️..🛿️) <reserved-1F6FA>..<reserved-1F6FF>
1F774..1F77F ; Extended_Pictographic# NA [12] (🝴️..🝿️) <reserved-1F774>..<reserved-1F77F>
1F7D5..1F7D8 ; Extended_Pictographic# 11.0 [4] (🟕️..🟘️) CIRCLED TRIANGLE..NEGATIVE CIRCLED SQUARE
1F7D9..1F7FF ; Extended_Pictographic# NA [39] (🟙️..🟿️) <reserved-1F7D9>..<reserved-1F7FF>
1F80C..1F80F ; Extended_Pictographic# NA [4] (🠌️..🠏️) <reserved-1F80C>..<reserved-1F80F>
1F848..1F84F ; Extended_Pictographic# NA [8] (🡈️..🡏️) <reserved-1F848>..<reserved-1F84F>
1F85A..1F85F ; Extended_Pictographic# NA [6] (🡚️..🡟️) <reserved-1F85A>..<reserved-1F85F>
1F888..1F88F ; Extended_Pictographic# NA [8] (🢈️..🢏️) <reserved-1F888>..<reserved-1F88F>
1F8AE..1F8FF ; Extended_Pictographic# NA [82] (🢮️..🣿️) <reserved-1F8AE>..<reserved-1F8FF>
1F90C..1F90F ; Extended_Pictographic# NA [4] (🤌️..🤏️) <reserved-1F90C>..<reserved-1F90F>
1F910..1F918 ; Extended_Pictographic# 8.0 [9] (🤐..🤘) zipper-mouth face..sign of the horns
1F919..1F91E ; Extended_Pictographic# 9.0 [6] (🤙..🤞) call me hand..crossed fingers
1F91F ; Extended_Pictographic# 10.0 [1] (🤟) love-you gesture
1F920..1F927 ; Extended_Pictographic# 9.0 [8] (🤠..🤧) cowboy hat face..sneezing face
1F928..1F92F ; Extended_Pictographic# 10.0 [8] (🤨..🤯) face with raised eyebrow..exploding head
1F930 ; Extended_Pictographic# 9.0 [1] (🤰) pregnant woman
1F931..1F932 ; Extended_Pictographic# 10.0 [2] (🤱..🤲) breast-feeding..palms up together
1F933..1F93A ; Extended_Pictographic# 9.0 [8] (🤳..🤺) selfie..person fencing
1F93C..1F93E ; Extended_Pictographic# 9.0 [3] (🤼..🤾) people wrestling..person playing handball
1F93F ; Extended_Pictographic# NA [1] (🤿️) <reserved-1F93F>
1F940..1F945 ; Extended_Pictographic# 9.0 [6] (🥀..🥅) wilted flower..goal net
1F947..1F94B ; Extended_Pictographic# 9.0 [5] (🥇..🥋) 1st place medal..martial arts uniform
1F94C ; Extended_Pictographic# 10.0 [1] (🥌) curling stone
1F94D..1F94F ; Extended_Pictographic# 11.0 [3] (🥍..🥏) lacrosse..flying disc
1F950..1F95E ; Extended_Pictographic# 9.0 [15] (🥐..🥞) croissant..pancakes
1F95F..1F96B ; Extended_Pictographic# 10.0 [13] (🥟..🥫) dumpling..canned food
1F96C..1F970 ; Extended_Pictographic# 11.0 [5] (🥬..🥰) leafy green..smiling face with 3 hearts
1F971..1F972 ; Extended_Pictographic# NA [2] (🥱️..🥲️) <reserved-1F971>..<reserved-1F972>
1F973..1F976 ; Extended_Pictographic# 11.0 [4] (🥳..🥶) partying face..cold face
1F977..1F979 ; Extended_Pictographic# NA [3] (🥷️..🥹️) <reserved-1F977>..<reserved-1F979>
1F97A ; Extended_Pictographic# 11.0 [1] (🥺) pleading face
1F97B ; Extended_Pictographic# NA [1] (🥻️) <reserved-1F97B>
1F97C..1F97F ; Extended_Pictographic# 11.0 [4] (🥼..🥿) lab coat..womans flat shoe
1F980..1F984 ; Extended_Pictographic# 8.0 [5] (🦀..🦄) crab..unicorn face
1F985..1F991 ; Extended_Pictographic# 9.0 [13] (🦅..🦑) eagle..squid
1F992..1F997 ; Extended_Pictographic# 10.0 [6] (🦒..🦗) giraffe..cricket
1F998..1F9A2 ; Extended_Pictographic# 11.0 [11] (🦘..🦢) kangaroo..swan
1F9A3..1F9AF ; Extended_Pictographic# NA [13] (🦣️..🦯️) <reserved-1F9A3>..<reserved-1F9AF>
1F9B0..1F9B9 ; Extended_Pictographic# 11.0 [10] (🦰..🦹) red-haired..supervillain
1F9BA..1F9BF ; Extended_Pictographic# NA [6] (🦺️..🦿️) <reserved-1F9BA>..<reserved-1F9BF>
1F9C0 ; Extended_Pictographic# 8.0 [1] (🧀) cheese wedge
1F9C1..1F9C2 ; Extended_Pictographic# 11.0 [2] (🧁..🧂) cupcake..salt
1F9C3..1F9CF ; Extended_Pictographic# NA [13] (🧃️..🧏️) <reserved-1F9C3>..<reserved-1F9CF>
1F9D0..1F9E6 ; Extended_Pictographic# 10.0 [23] (🧐..🧦) face with monocle..socks
1F9E7..1F9FF ; Extended_Pictographic# 11.0 [25] (🧧..🧿) red envelope..nazar amulet
1FA00..1FA5F ; Extended_Pictographic# NA [96] (🨀️..🩟️) <reserved-1FA00>..<reserved-1FA5F>
1FA60..1FA6D ; Extended_Pictographic# 11.0 [14] (🩠️..🩭️) XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER
1FA6E..1FFFD ; Extended_Pictographic# NA[1424] (🩮️..🿽️) <reserved-1FA6E>..<reserved-1FFFD>
# Total elements: 3793
#EOF

View File

@ -2,7 +2,7 @@
* A program for testing the Unicode property table *
***************************************************/
/* Copyright (c) University of Cambridge 2008 - 2014 */
/* Copyright (c) University of Cambridge 2008 - 2018 */
/* Compile thus:
gcc -DHAVE_CONFIG_H -DPCRE2_CODE_UNIT_WIDTH=8 -o ucptest \
@ -123,7 +123,13 @@ switch(gbprop)
case ucp_gbT: graphbreak = US"Hangul syllable type T"; break;
case ucp_gbLV: graphbreak = US"Hangul syllable type LV"; break;
case ucp_gbLVT: graphbreak = US"Hangul syllable type LVT"; break;
case ucp_gbRegionalIndicator:
graphbreak = US"Regional Indicator"; break;
case ucp_gbOther: graphbreak = US"Other"; break;
case ucp_gbZWJ: graphbreak = US"Zero Width Joiner"; break;
case ucp_gbExtended_Pictographic:
graphbreak = US"Extended Pictographic"; break;
default: graphbreak = US"Unknown"; break;
}
switch(script)
@ -268,6 +274,27 @@ switch(script)
case ucp_Multani: scriptname = US"Multani"; break;
case ucp_Old_Hungarian: scriptname = US"Old_Hungarian"; break;
case ucp_SignWriting: scriptname = US"SignWriting"; break;
/* New for Unicode 10.0.0 (no update since 8.0.0) */
case ucp_Adlam: scriptname = US"Adlam"; break;
case ucp_Bhaiksuki: scriptname = US"Bhaiksuki"; break;
case ucp_Marchen: scriptname = US"Marchen"; break;
case ucp_Newa: scriptname = US"Newa"; break;
case ucp_Osage: scriptname = US"Osage"; break;
case ucp_Tangut: scriptname = US"Tangut"; break;
case ucp_Masaram_Gondi: scriptname = US"Masaram_Gondi"; break;
case ucp_Nushu: scriptname = US"Nushu"; break;
case ucp_Soyombo: scriptname = US"Soyombo"; break;
case ucp_Zanabazar_Square: scriptname = US"Zanabazar_Square"; break;
/* New for Unicode 11.0.0 */
case ucp_Dogra: scriptname = US"Dogra"; break;
case ucp_Gunjala_Gondi: scriptname = US"Gunjala_Gondi"; break;
case ucp_Hanifi_Rohingya: scriptname = US"Hanifi_Rohingya"; break;
case ucp_Makasar: scriptname = US"Makasar"; break;
case ucp_Medefaidrin: scriptname = US"Medefaidrin"; break;
case ucp_Old_Sogdian: scriptname = US"Old_Sogdian"; break;
case ucp_Sogdian: scriptname = US"Sogdian"; break;
}
printf("%04x %s: %s, %s, %s", c, typename, fulltypename, scriptname, graphbreak);

View File

@ -36,3 +36,5 @@ findprop 0d 0a 0e 0711 1b04 1111 1169 11fe ae4c ad89
findprop 118a0 11ac7 16ad0
findprop 11700 14400 108e0 11280 1d800
findprop 11800 1e903 11da9 10d27 11ee0 16e48 10f27 10f30

View File

@ -179,12 +179,12 @@ findprop a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
00a6 Symbol: Other symbol, Common, Other
00a7 Punctuation: Other punctuation, Common, Other
00a8 Symbol: Modifier symbol, Common, Other
00a9 Symbol: Other symbol, Common, Other
00a9 Symbol: Other symbol, Common, Extended Pictographic
00aa Letter: Other letter, Latin, Other
00ab Punctuation: Initial punctuation, Common, Other
00ac Symbol: Mathematical symbol, Common, Other
00ad Control: Format, Common, Control
00ae Symbol: Other symbol, Common, Other
00ae Symbol: Other symbol, Common, Extended Pictographic
00af Symbol: Modifier symbol, Common, Other
findprop b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
00b0 Symbol: Other symbol, Common, Other
@ -369,3 +369,13 @@ findprop 11700 14400 108e0 11280 1d800
108e0 Letter: Other letter, Hatran, Other
11280 Letter: Other letter, Multani, Other
1d800 Symbol: Other symbol, SignWriting, Other
findprop 11800 1e903 11da9 10d27 11ee0 16e48 10f27 10f30
11800 Letter: Other letter, Dogra, Other
1e903 Letter: Upper case letter, Adlam, Other, 1e925
11da9 Number: Decimal number, Gunjala_Gondi, Other
10d27 Mark: Non-spacing mark, Hanifi_Rohingya, Extend
11ee0 Letter: Other letter, Makasar, Other
16e48 Letter: Upper case letter, Medefaidrin, Other, 16e68
10f27 Letter: Other letter, Old_Sogdian, Other
10f30 Letter: Other letter, Sogdian, Other

View File

@ -129,13 +129,13 @@ while (eptr < end_subject)
if ((ricount & 1) != 0) break; /* Grapheme break required */
}
/* If Extend follows E_Base[_GAZ] do not update lgb; this allows
any number of Extend before a following E_Modifier. */
/* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this
allows any number of them before a following Extended_Pictographic. */
if (rgb != ucp_gbExtend ||
(lgb != ucp_gbE_Base && lgb != ucp_gbE_Base_GAZ))
if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) ||
lgb != ucp_gbExtended_Pictographic)
lgb = rgb;
eptr += len;
if (xcount != NULL) *xcount += 1;
}

View File

@ -1901,7 +1901,7 @@ extern const ucd_record PRIV(ucd_records)[];
#if PCRE2_CODE_UNIT_WIDTH == 32
extern const ucd_record PRIV(dummy_ucd_record)[];
#endif
extern const uint8_t PRIV(ucd_stage1)[];
extern const uint16_t PRIV(ucd_stage1)[];
extern const uint16_t PRIV(ucd_stage2)[];
extern const uint32_t PRIV(ucp_gbtable)[];
extern const uint32_t PRIV(ucp_gentype)[];

View File

@ -3666,7 +3666,8 @@ if (!common->utf)
#endif
OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT);
OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1));
OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP2, 0);
OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1));
OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK);
OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT);
OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0);
@ -6627,7 +6628,8 @@ if (needstype || needsscript)
#endif
OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT);
OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1));
OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP2, 0);
OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1));
OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK);
OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT);
OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0);
@ -7254,12 +7256,13 @@ while (cc < end_subject)
if ((ricount & 1) != 0) break; /* Grapheme break required */
}
/* If Extend follows E_Base[_GAZ] do not update lgb; this allows
any number of Extend before a following E_Modifier. */
if (rgb != ucp_gbExtend || (lgb != ucp_gbE_Base && lgb != ucp_gbE_Base_GAZ))
/* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this
allows any number of them before a following Extended_Pictographic. */
if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) ||
lgb != ucp_gbExtended_Pictographic)
lgb = rgb;
prevcc = cc;
cc += len;
}
@ -7309,12 +7312,13 @@ while (cc < end_subject)
if ((ricount & 1) != 0) break; /* Grapheme break required */
}
/* If Extend follows E_Base[_GAZ] do not update lgb; this allows
any number of Extend before a following E_Modifier. */
if (rgb != ucp_gbExtend || (lgb != ucp_gbE_Base && lgb != ucp_gbE_Base_GAZ))
/* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this
allows any number of them before a following Extended_Pictographic. */
if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) ||
lgb != ucp_gbExtended_Pictographic)
lgb = rgb;
cc++;
}

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2017 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -137,9 +137,10 @@ const uint32_t PRIV(ucp_gentype)[] = {
/* This table encodes the rules for finding the end of an extended grapheme
cluster. Every code point has a grapheme break property which is one of the
ucp_gbXX values defined in pcre2_ucp.h. The 2-dimensional table is indexed by
the properties of two adjacent code points. The left property selects a word
from the table, and the right property selects a bit from that word like this:
ucp_gbXX values defined in pcre2_ucp.h. These changed between Unicode versions
10 and 11. The 2-dimensional table is indexed by the properties of two adjacent
code points. The left property selects a word from the table, and the right
property selects a bit from that word like this:
PRIV(ucp_gbtable)[left-property] & (1 << right-property)
@ -166,49 +167,41 @@ are implementing).
6. Do not break after Prepend characters.
7. Do not break within emoji modifier sequences (E_Base or E_Base_GAZ followed
by E_Modifier). Extend characters are allowed before the modifier; this
cannot be represented in this table, the code has to deal with it.
7. Do not break within emoji modifier sequences or emoji zwj sequences. That
is, do not break between characters with the Extended_Pictographic property.
Extend and ZWJ characters are allowed between the characters; this cannot be
represented in this table, the code has to deal with it.
8. Do not break within emoji zwj sequences (ZWJ followed by Glue_After_Zwj or
E_Base_GAZ).
9. Do not break within emoji flag sequences. That is, do not break between
8. Do not break within emoji flag sequences. That is, do not break between
regional indicator (RI) symbols if there are an odd number of RI characters
before the break point. This table encodes "join RI characters"; the code
has to deal with checking for previous adjoining RIs.
10. Otherwise, break everywhere.
9. Otherwise, break everywhere.
*/
#define ESZ (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbZWJ)
const uint32_t PRIV(ucp_gbtable)[] = {
(1<<ucp_gbLF), /* 0 CR */
0, /* 1 LF */
0, /* 2 Control */
ESZ, /* 3 Extend */
ESZ|(1<<ucp_gbPrepend)| /* 4 Prepend */
(1<<ucp_gbLF), /* 0 CR */
0, /* 1 LF */
0, /* 2 Control */
ESZ, /* 3 Extend */
ESZ|(1<<ucp_gbPrepend)| /* 4 Prepend */
(1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbT)|
(1<<ucp_gbLV)|(1<<ucp_gbLVT)|(1<<ucp_gbOther)|
(1<<ucp_gbRegionalIndicator)|
(1<<ucp_gbE_Base)|(1<<ucp_gbE_Modifier)|
(1<<ucp_gbE_Base_GAZ)|
(1<<ucp_gbZWJ)|(1<<ucp_gbGlue_After_Zwj),
ESZ, /* 5 SpacingMark */
ESZ|(1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)| /* 6 L */
(1<<ucp_gbRegionalIndicator),
ESZ, /* 5 SpacingMark */
ESZ|(1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)| /* 6 L */
(1<<ucp_gbLVT),
ESZ|(1<<ucp_gbV)|(1<<ucp_gbT), /* 7 V */
ESZ|(1<<ucp_gbT), /* 8 T */
ESZ|(1<<ucp_gbV)|(1<<ucp_gbT), /* 9 LV */
ESZ|(1<<ucp_gbT), /* 10 LVT */
(1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */
ESZ, /* 12 Other */
ESZ|(1<<ucp_gbE_Modifier), /* 13 E_Base */
ESZ, /* 14 E_Modifier */
ESZ|(1<<ucp_gbE_Modifier), /* 15 E_Base_GAZ */
ESZ|(1<<ucp_gbGlue_After_Zwj)|(1<<ucp_gbE_Base_GAZ), /* 16 ZWJ */
ESZ /* 12 Glue_After_Zwj */
ESZ|(1<<ucp_gbV)|(1<<ucp_gbT), /* 7 V */
ESZ|(1<<ucp_gbT), /* 8 T */
ESZ|(1<<ucp_gbV)|(1<<ucp_gbT), /* 9 LV */
ESZ|(1<<ucp_gbT), /* 10 LVT */
(1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */
ESZ, /* 12 Other */
ESZ, /* 13 ZWJ */
ESZ|(1<<ucp_gbExtended_Pictographic) /* 14 Extended Pictographic */
};
#undef ESZ
@ -282,6 +275,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Cyrillic0 STR_C STR_y STR_r STR_i STR_l STR_l STR_i STR_c "\0"
#define STRING_Deseret0 STR_D STR_e STR_s STR_e STR_r STR_e STR_t "\0"
#define STRING_Devanagari0 STR_D STR_e STR_v STR_a STR_n STR_a STR_g STR_a STR_r STR_i "\0"
#define STRING_Dogra0 STR_D STR_o STR_g STR_r STR_a "\0"
#define STRING_Duployan0 STR_D STR_u STR_p STR_l STR_o STR_y STR_a STR_n "\0"
#define STRING_Egyptian_Hieroglyphs0 STR_E STR_g STR_y STR_p STR_t STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Elbasan0 STR_E STR_l STR_b STR_a STR_s STR_a STR_n "\0"
@ -292,9 +286,11 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Grantha0 STR_G STR_r STR_a STR_n STR_t STR_h STR_a "\0"
#define STRING_Greek0 STR_G STR_r STR_e STR_e STR_k "\0"
#define STRING_Gujarati0 STR_G STR_u STR_j STR_a STR_r STR_a STR_t STR_i "\0"
#define STRING_Gunjala_Gondi0 STR_G STR_u STR_n STR_j STR_a STR_l STR_a STR_UNDERSCORE STR_G STR_o STR_n STR_d STR_i "\0"
#define STRING_Gurmukhi0 STR_G STR_u STR_r STR_m STR_u STR_k STR_h STR_i "\0"
#define STRING_Han0 STR_H STR_a STR_n "\0"
#define STRING_Hangul0 STR_H STR_a STR_n STR_g STR_u STR_l "\0"
#define STRING_Hanifi_Rohingya0 STR_H STR_a STR_n STR_i STR_f STR_i STR_UNDERSCORE STR_R STR_o STR_h STR_i STR_n STR_g STR_y STR_a "\0"
#define STRING_Hanunoo0 STR_H STR_a STR_n STR_u STR_n STR_o STR_o "\0"
#define STRING_Hatran0 STR_H STR_a STR_t STR_r STR_a STR_n "\0"
#define STRING_Hebrew0 STR_H STR_e STR_b STR_r STR_e STR_w "\0"
@ -330,6 +326,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Lydian0 STR_L STR_y STR_d STR_i STR_a STR_n "\0"
#define STRING_M0 STR_M "\0"
#define STRING_Mahajani0 STR_M STR_a STR_h STR_a STR_j STR_a STR_n STR_i "\0"
#define STRING_Makasar0 STR_M STR_a STR_k STR_a STR_s STR_a STR_r "\0"
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
@ -337,6 +334,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Masaram_Gondi0 STR_M STR_a STR_s STR_a STR_r STR_a STR_m STR_UNDERSCORE STR_G STR_o STR_n STR_d STR_i "\0"
#define STRING_Mc0 STR_M STR_c "\0"
#define STRING_Me0 STR_M STR_e "\0"
#define STRING_Medefaidrin0 STR_M STR_e STR_d STR_e STR_f STR_a STR_i STR_d STR_r STR_i STR_n "\0"
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
#define STRING_Mende_Kikakui0 STR_M STR_e STR_n STR_d STR_e STR_UNDERSCORE STR_K STR_i STR_k STR_a STR_k STR_u STR_i "\0"
#define STRING_Meroitic_Cursive0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_C STR_u STR_r STR_s STR_i STR_v STR_e "\0"
@ -364,6 +362,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Old_North_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_N STR_o STR_r STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Permic0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_m STR_i STR_c "\0"
#define STRING_Old_Persian0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_s STR_i STR_a STR_n "\0"
#define STRING_Old_Sogdian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_g STR_d STR_i STR_a STR_n "\0"
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
@ -397,6 +396,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Sk0 STR_S STR_k "\0"
#define STRING_Sm0 STR_S STR_m "\0"
#define STRING_So0 STR_S STR_o "\0"
#define STRING_Sogdian0 STR_S STR_o STR_g STR_d STR_i STR_a STR_n "\0"
#define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0"
#define STRING_Soyombo0 STR_S STR_o STR_y STR_o STR_m STR_b STR_o "\0"
#define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0"
@ -469,6 +469,7 @@ const char PRIV(utt_names)[] =
STRING_Cyrillic0
STRING_Deseret0
STRING_Devanagari0
STRING_Dogra0
STRING_Duployan0
STRING_Egyptian_Hieroglyphs0
STRING_Elbasan0
@ -479,9 +480,11 @@ const char PRIV(utt_names)[] =
STRING_Grantha0
STRING_Greek0
STRING_Gujarati0
STRING_Gunjala_Gondi0
STRING_Gurmukhi0
STRING_Han0
STRING_Hangul0
STRING_Hanifi_Rohingya0
STRING_Hanunoo0
STRING_Hatran0
STRING_Hebrew0
@ -517,6 +520,7 @@ const char PRIV(utt_names)[] =
STRING_Lydian0
STRING_M0
STRING_Mahajani0
STRING_Makasar0
STRING_Malayalam0
STRING_Mandaic0
STRING_Manichaean0
@ -524,6 +528,7 @@ const char PRIV(utt_names)[] =
STRING_Masaram_Gondi0
STRING_Mc0
STRING_Me0
STRING_Medefaidrin0
STRING_Meetei_Mayek0
STRING_Mende_Kikakui0
STRING_Meroitic_Cursive0
@ -551,6 +556,7 @@ const char PRIV(utt_names)[] =
STRING_Old_North_Arabian0
STRING_Old_Permic0
STRING_Old_Persian0
STRING_Old_Sogdian0
STRING_Old_South_Arabian0
STRING_Old_Turkic0
STRING_Oriya0
@ -584,6 +590,7 @@ const char PRIV(utt_names)[] =
STRING_Sk0
STRING_Sm0
STRING_So0
STRING_Sogdian0
STRING_Sora_Sompeng0
STRING_Soyombo0
STRING_Sundanese0
@ -656,154 +663,161 @@ const ucp_type_table PRIV(utt)[] = {
{ 265, PT_SC, ucp_Cyrillic },
{ 274, PT_SC, ucp_Deseret },
{ 282, PT_SC, ucp_Devanagari },
{ 293, PT_SC, ucp_Duployan },
{ 302, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 323, PT_SC, ucp_Elbasan },
{ 331, PT_SC, ucp_Ethiopic },
{ 340, PT_SC, ucp_Georgian },
{ 349, PT_SC, ucp_Glagolitic },
{ 360, PT_SC, ucp_Gothic },
{ 367, PT_SC, ucp_Grantha },
{ 375, PT_SC, ucp_Greek },
{ 381, PT_SC, ucp_Gujarati },
{ 390, PT_SC, ucp_Gurmukhi },
{ 399, PT_SC, ucp_Han },
{ 403, PT_SC, ucp_Hangul },
{ 410, PT_SC, ucp_Hanunoo },
{ 418, PT_SC, ucp_Hatran },
{ 425, PT_SC, ucp_Hebrew },
{ 432, PT_SC, ucp_Hiragana },
{ 441, PT_SC, ucp_Imperial_Aramaic },
{ 458, PT_SC, ucp_Inherited },
{ 468, PT_SC, ucp_Inscriptional_Pahlavi },
{ 490, PT_SC, ucp_Inscriptional_Parthian },
{ 513, PT_SC, ucp_Javanese },
{ 522, PT_SC, ucp_Kaithi },
{ 529, PT_SC, ucp_Kannada },
{ 537, PT_SC, ucp_Katakana },
{ 546, PT_SC, ucp_Kayah_Li },
{ 555, PT_SC, ucp_Kharoshthi },
{ 566, PT_SC, ucp_Khmer },
{ 572, PT_SC, ucp_Khojki },
{ 579, PT_SC, ucp_Khudawadi },
{ 589, PT_GC, ucp_L },
{ 591, PT_LAMP, 0 },
{ 594, PT_SC, ucp_Lao },
{ 598, PT_SC, ucp_Latin },
{ 604, PT_SC, ucp_Lepcha },
{ 611, PT_SC, ucp_Limbu },
{ 617, PT_SC, ucp_Linear_A },
{ 626, PT_SC, ucp_Linear_B },
{ 635, PT_SC, ucp_Lisu },
{ 640, PT_PC, ucp_Ll },
{ 643, PT_PC, ucp_Lm },
{ 646, PT_PC, ucp_Lo },
{ 649, PT_PC, ucp_Lt },
{ 652, PT_PC, ucp_Lu },
{ 655, PT_SC, ucp_Lycian },
{ 662, PT_SC, ucp_Lydian },
{ 669, PT_GC, ucp_M },
{ 671, PT_SC, ucp_Mahajani },
{ 680, PT_SC, ucp_Malayalam },
{ 690, PT_SC, ucp_Mandaic },
{ 698, PT_SC, ucp_Manichaean },
{ 709, PT_SC, ucp_Marchen },
{ 717, PT_SC, ucp_Masaram_Gondi },
{ 731, PT_PC, ucp_Mc },
{ 734, PT_PC, ucp_Me },
{ 737, PT_SC, ucp_Meetei_Mayek },
{ 750, PT_SC, ucp_Mende_Kikakui },
{ 764, PT_SC, ucp_Meroitic_Cursive },
{ 781, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 802, PT_SC, ucp_Miao },
{ 807, PT_PC, ucp_Mn },
{ 810, PT_SC, ucp_Modi },
{ 815, PT_SC, ucp_Mongolian },
{ 825, PT_SC, ucp_Mro },
{ 829, PT_SC, ucp_Multani },
{ 837, PT_SC, ucp_Myanmar },
{ 845, PT_GC, ucp_N },
{ 847, PT_SC, ucp_Nabataean },
{ 857, PT_PC, ucp_Nd },
{ 860, PT_SC, ucp_New_Tai_Lue },
{ 872, PT_SC, ucp_Newa },
{ 877, PT_SC, ucp_Nko },
{ 881, PT_PC, ucp_Nl },
{ 884, PT_PC, ucp_No },
{ 887, PT_SC, ucp_Nushu },
{ 893, PT_SC, ucp_Ogham },
{ 899, PT_SC, ucp_Ol_Chiki },
{ 908, PT_SC, ucp_Old_Hungarian },
{ 922, PT_SC, ucp_Old_Italic },
{ 933, PT_SC, ucp_Old_North_Arabian },
{ 951, PT_SC, ucp_Old_Permic },
{ 962, PT_SC, ucp_Old_Persian },
{ 974, PT_SC, ucp_Old_South_Arabian },
{ 992, PT_SC, ucp_Old_Turkic },
{ 1003, PT_SC, ucp_Oriya },
{ 1009, PT_SC, ucp_Osage },
{ 1015, PT_SC, ucp_Osmanya },
{ 1023, PT_GC, ucp_P },
{ 1025, PT_SC, ucp_Pahawh_Hmong },
{ 1038, PT_SC, ucp_Palmyrene },
{ 1048, PT_SC, ucp_Pau_Cin_Hau },
{ 1060, PT_PC, ucp_Pc },
{ 1063, PT_PC, ucp_Pd },
{ 1066, PT_PC, ucp_Pe },
{ 1069, PT_PC, ucp_Pf },
{ 1072, PT_SC, ucp_Phags_Pa },
{ 1081, PT_SC, ucp_Phoenician },
{ 1092, PT_PC, ucp_Pi },
{ 1095, PT_PC, ucp_Po },
{ 1098, PT_PC, ucp_Ps },
{ 1101, PT_SC, ucp_Psalter_Pahlavi },
{ 1117, PT_SC, ucp_Rejang },
{ 1124, PT_SC, ucp_Runic },
{ 1130, PT_GC, ucp_S },
{ 1132, PT_SC, ucp_Samaritan },
{ 1142, PT_SC, ucp_Saurashtra },
{ 1153, PT_PC, ucp_Sc },
{ 1156, PT_SC, ucp_Sharada },
{ 1164, PT_SC, ucp_Shavian },
{ 1172, PT_SC, ucp_Siddham },
{ 1180, PT_SC, ucp_SignWriting },
{ 1192, PT_SC, ucp_Sinhala },
{ 1200, PT_PC, ucp_Sk },
{ 1203, PT_PC, ucp_Sm },
{ 1206, PT_PC, ucp_So },
{ 1209, PT_SC, ucp_Sora_Sompeng },
{ 1222, PT_SC, ucp_Soyombo },
{ 1230, PT_SC, ucp_Sundanese },
{ 1240, PT_SC, ucp_Syloti_Nagri },
{ 1253, PT_SC, ucp_Syriac },
{ 1260, PT_SC, ucp_Tagalog },
{ 1268, PT_SC, ucp_Tagbanwa },
{ 1277, PT_SC, ucp_Tai_Le },
{ 1284, PT_SC, ucp_Tai_Tham },
{ 1293, PT_SC, ucp_Tai_Viet },
{ 1302, PT_SC, ucp_Takri },
{ 1308, PT_SC, ucp_Tamil },
{ 1314, PT_SC, ucp_Tangut },
{ 1321, PT_SC, ucp_Telugu },
{ 1328, PT_SC, ucp_Thaana },
{ 1335, PT_SC, ucp_Thai },
{ 1340, PT_SC, ucp_Tibetan },
{ 1348, PT_SC, ucp_Tifinagh },
{ 1357, PT_SC, ucp_Tirhuta },
{ 1365, PT_SC, ucp_Ugaritic },
{ 1374, PT_SC, ucp_Vai },
{ 1378, PT_SC, ucp_Warang_Citi },
{ 1390, PT_ALNUM, 0 },
{ 1394, PT_PXSPACE, 0 },
{ 1398, PT_SPACE, 0 },
{ 1402, PT_UCNC, 0 },
{ 1406, PT_WORD, 0 },
{ 1410, PT_SC, ucp_Yi },
{ 1413, PT_GC, ucp_Z },
{ 1415, PT_SC, ucp_Zanabazar_Square },
{ 1432, PT_PC, ucp_Zl },
{ 1435, PT_PC, ucp_Zp },
{ 1438, PT_PC, ucp_Zs }
{ 293, PT_SC, ucp_Dogra },
{ 299, PT_SC, ucp_Duployan },
{ 308, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 329, PT_SC, ucp_Elbasan },
{ 337, PT_SC, ucp_Ethiopic },
{ 346, PT_SC, ucp_Georgian },
{ 355, PT_SC, ucp_Glagolitic },
{ 366, PT_SC, ucp_Gothic },
{ 373, PT_SC, ucp_Grantha },
{ 381, PT_SC, ucp_Greek },
{ 387, PT_SC, ucp_Gujarati },
{ 396, PT_SC, ucp_Gunjala_Gondi },
{ 410, PT_SC, ucp_Gurmukhi },
{ 419, PT_SC, ucp_Han },
{ 423, PT_SC, ucp_Hangul },
{ 430, PT_SC, ucp_Hanifi_Rohingya },
{ 446, PT_SC, ucp_Hanunoo },
{ 454, PT_SC, ucp_Hatran },
{ 461, PT_SC, ucp_Hebrew },
{ 468, PT_SC, ucp_Hiragana },
{ 477, PT_SC, ucp_Imperial_Aramaic },
{ 494, PT_SC, ucp_Inherited },
{ 504, PT_SC, ucp_Inscriptional_Pahlavi },
{ 526, PT_SC, ucp_Inscriptional_Parthian },
{ 549, PT_SC, ucp_Javanese },
{ 558, PT_SC, ucp_Kaithi },
{ 565, PT_SC, ucp_Kannada },
{ 573, PT_SC, ucp_Katakana },
{ 582, PT_SC, ucp_Kayah_Li },
{ 591, PT_SC, ucp_Kharoshthi },
{ 602, PT_SC, ucp_Khmer },
{ 608, PT_SC, ucp_Khojki },
{ 615, PT_SC, ucp_Khudawadi },
{ 625, PT_GC, ucp_L },
{ 627, PT_LAMP, 0 },
{ 630, PT_SC, ucp_Lao },
{ 634, PT_SC, ucp_Latin },
{ 640, PT_SC, ucp_Lepcha },
{ 647, PT_SC, ucp_Limbu },
{ 653, PT_SC, ucp_Linear_A },
{ 662, PT_SC, ucp_Linear_B },
{ 671, PT_SC, ucp_Lisu },
{ 676, PT_PC, ucp_Ll },
{ 679, PT_PC, ucp_Lm },
{ 682, PT_PC, ucp_Lo },
{ 685, PT_PC, ucp_Lt },
{ 688, PT_PC, ucp_Lu },
{ 691, PT_SC, ucp_Lycian },
{ 698, PT_SC, ucp_Lydian },
{ 705, PT_GC, ucp_M },
{ 707, PT_SC, ucp_Mahajani },
{ 716, PT_SC, ucp_Makasar },
{ 724, PT_SC, ucp_Malayalam },
{ 734, PT_SC, ucp_Mandaic },
{ 742, PT_SC, ucp_Manichaean },
{ 753, PT_SC, ucp_Marchen },
{ 761, PT_SC, ucp_Masaram_Gondi },
{ 775, PT_PC, ucp_Mc },
{ 778, PT_PC, ucp_Me },
{ 781, PT_SC, ucp_Medefaidrin },
{ 793, PT_SC, ucp_Meetei_Mayek },
{ 806, PT_SC, ucp_Mende_Kikakui },
{ 820, PT_SC, ucp_Meroitic_Cursive },
{ 837, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 858, PT_SC, ucp_Miao },
{ 863, PT_PC, ucp_Mn },
{ 866, PT_SC, ucp_Modi },
{ 871, PT_SC, ucp_Mongolian },
{ 881, PT_SC, ucp_Mro },
{ 885, PT_SC, ucp_Multani },
{ 893, PT_SC, ucp_Myanmar },
{ 901, PT_GC, ucp_N },
{ 903, PT_SC, ucp_Nabataean },
{ 913, PT_PC, ucp_Nd },
{ 916, PT_SC, ucp_New_Tai_Lue },
{ 928, PT_SC, ucp_Newa },
{ 933, PT_SC, ucp_Nko },
{ 937, PT_PC, ucp_Nl },
{ 940, PT_PC, ucp_No },
{ 943, PT_SC, ucp_Nushu },
{ 949, PT_SC, ucp_Ogham },
{ 955, PT_SC, ucp_Ol_Chiki },
{ 964, PT_SC, ucp_Old_Hungarian },
{ 978, PT_SC, ucp_Old_Italic },
{ 989, PT_SC, ucp_Old_North_Arabian },
{ 1007, PT_SC, ucp_Old_Permic },
{ 1018, PT_SC, ucp_Old_Persian },
{ 1030, PT_SC, ucp_Old_Sogdian },
{ 1042, PT_SC, ucp_Old_South_Arabian },
{ 1060, PT_SC, ucp_Old_Turkic },
{ 1071, PT_SC, ucp_Oriya },
{ 1077, PT_SC, ucp_Osage },
{ 1083, PT_SC, ucp_Osmanya },
{ 1091, PT_GC, ucp_P },
{ 1093, PT_SC, ucp_Pahawh_Hmong },
{ 1106, PT_SC, ucp_Palmyrene },
{ 1116, PT_SC, ucp_Pau_Cin_Hau },
{ 1128, PT_PC, ucp_Pc },
{ 1131, PT_PC, ucp_Pd },
{ 1134, PT_PC, ucp_Pe },
{ 1137, PT_PC, ucp_Pf },
{ 1140, PT_SC, ucp_Phags_Pa },
{ 1149, PT_SC, ucp_Phoenician },
{ 1160, PT_PC, ucp_Pi },
{ 1163, PT_PC, ucp_Po },
{ 1166, PT_PC, ucp_Ps },
{ 1169, PT_SC, ucp_Psalter_Pahlavi },
{ 1185, PT_SC, ucp_Rejang },
{ 1192, PT_SC, ucp_Runic },
{ 1198, PT_GC, ucp_S },
{ 1200, PT_SC, ucp_Samaritan },
{ 1210, PT_SC, ucp_Saurashtra },
{ 1221, PT_PC, ucp_Sc },
{ 1224, PT_SC, ucp_Sharada },
{ 1232, PT_SC, ucp_Shavian },
{ 1240, PT_SC, ucp_Siddham },
{ 1248, PT_SC, ucp_SignWriting },
{ 1260, PT_SC, ucp_Sinhala },
{ 1268, PT_PC, ucp_Sk },
{ 1271, PT_PC, ucp_Sm },
{ 1274, PT_PC, ucp_So },
{ 1277, PT_SC, ucp_Sogdian },
{ 1285, PT_SC, ucp_Sora_Sompeng },
{ 1298, PT_SC, ucp_Soyombo },
{ 1306, PT_SC, ucp_Sundanese },
{ 1316, PT_SC, ucp_Syloti_Nagri },
{ 1329, PT_SC, ucp_Syriac },
{ 1336, PT_SC, ucp_Tagalog },
{ 1344, PT_SC, ucp_Tagbanwa },
{ 1353, PT_SC, ucp_Tai_Le },
{ 1360, PT_SC, ucp_Tai_Tham },
{ 1369, PT_SC, ucp_Tai_Viet },
{ 1378, PT_SC, ucp_Takri },
{ 1384, PT_SC, ucp_Tamil },
{ 1390, PT_SC, ucp_Tangut },
{ 1397, PT_SC, ucp_Telugu },
{ 1404, PT_SC, ucp_Thaana },
{ 1411, PT_SC, ucp_Thai },
{ 1416, PT_SC, ucp_Tibetan },
{ 1424, PT_SC, ucp_Tifinagh },
{ 1433, PT_SC, ucp_Tirhuta },
{ 1441, PT_SC, ucp_Ugaritic },
{ 1450, PT_SC, ucp_Vai },
{ 1454, PT_SC, ucp_Warang_Citi },
{ 1466, PT_ALNUM, 0 },
{ 1470, PT_PXSPACE, 0 },
{ 1474, PT_SPACE, 0 },
{ 1478, PT_UCNC, 0 },
{ 1482, PT_WORD, 0 },
{ 1486, PT_SC, ucp_Yi },
{ 1489, PT_GC, ucp_Z },
{ 1491, PT_SC, ucp_Zanabazar_Square },
{ 1508, PT_PC, ucp_Zl },
{ 1511, PT_PC, ucp_Zp },
{ 1514, PT_PC, ucp_Zs }
};
const size_t PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);

File diff suppressed because it is too large Load Diff

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -100,27 +100,25 @@ enum {
ucp_Zs /* Space separator */
};
/* These are grapheme break properties. */
/* These are grapheme break properties. The Extended Pictographic property
comes from the emoji-data.txt file. */
enum {
ucp_gbCR, /* 0 */
ucp_gbLF, /* 1 */
ucp_gbControl, /* 2 */
ucp_gbExtend, /* 3 */
ucp_gbPrepend, /* 4 */
ucp_gbSpacingMark, /* 5 */
ucp_gbL, /* 6 Hangul syllable type L */
ucp_gbV, /* 7 Hangul syllable type V */
ucp_gbT, /* 8 Hangul syllable type T */
ucp_gbLV, /* 9 Hangul syllable type LV */
ucp_gbLVT, /* 10 Hangul syllable type LVT */
ucp_gbRegionalIndicator, /* 11 */
ucp_gbOther, /* 12 */
ucp_gbE_Base, /* 13 */
ucp_gbE_Modifier, /* 14 */
ucp_gbE_Base_GAZ, /* 15 */
ucp_gbZWJ, /* 16 */
ucp_gbGlue_After_Zwj /* 17 */
ucp_gbCR, /* 0 */
ucp_gbLF, /* 1 */
ucp_gbControl, /* 2 */
ucp_gbExtend, /* 3 */
ucp_gbPrepend, /* 4 */
ucp_gbSpacingMark, /* 5 */
ucp_gbL, /* 6 Hangul syllable type L */
ucp_gbV, /* 7 Hangul syllable type V */
ucp_gbT, /* 8 Hangul syllable type T */
ucp_gbLV, /* 9 Hangul syllable type LV */
ucp_gbLVT, /* 10 Hangul syllable type LVT */
ucp_gbRegionalIndicator, /* 11 */
ucp_gbOther, /* 12 */
ucp_gbZWJ, /* 13 */
ucp_gbExtended_Pictographic /* 14 */
};
/* These are the script identifications. */
@ -274,7 +272,15 @@ enum {
ucp_Masaram_Gondi,
ucp_Nushu,
ucp_Soyombo,
ucp_Zanabazar_Square
ucp_Zanabazar_Square,
/* New for Unicode 11.0.0 */
ucp_Dogra,
ucp_Gunjala_Gondi,
ucp_Hanifi_Rohingya,
ucp_Makasar,
ucp_Medefaidrin,
ucp_Old_Sogdian,
ucp_Sogdian
};
#endif /* PCRE2_UCP_H_IDEMPOTENT_GUARD */

19
testdata/testinput4 vendored
View File

@ -1394,28 +1394,15 @@
\x{6e9}
\x{6ef}
\x{6fa}
\= Expect no match
\x{650}
\x{651}
\x{652}
\x{653}
\x{654}
\x{655}
/^\p{Cyrillic}/utf
\x{1d2b}
/^\p{Common}/utf
\x{589}
\x{60c}
\x{61f}
\x{964}
\x{965}
\x{2116}
\x{1D183}
/^\p{Inherited}/utf
\x{64b}
\x{654}
\x{655}
\x{200c}
\= Expect no match
\x{64a}

45
testdata/testinput5 vendored
View File

@ -2030,8 +2030,8 @@
# to test 4.
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
/^\x{1E900}\x{104B0}/i,utf
@ -2041,23 +2041,50 @@
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
# These two are here because JIT is not yet updated. Also, the very first data
# line is handled differently by Perl.
# Similarly for Unicode 11.0.0
/^(\p{Dogra}+)(\p{Gunjala_Gondi}+)(\p{Hanifi_Rohingya}+)(\p{Makasar}+)
(\p{Medefaidrin}+)(\p{Old_Sogdian}+)(\p{Sogdian}+)/x,utf
\x{11800}\x{11da9}\x{10d27}\x{11ee0}\x{16e48}\x{10f27}\x{10f30}
# These two are here because of differences from Perl.
/^\X/utf
A\x{200d}B A ZWJ
\x{261D}\x{1F3FB}B E_Base E_Modifier
\x{1F466}\x{1F3FF}B E_Base_GAZ E_Modifier
\x{200d}\x{1F3A4}B ZWJ Glue_After_ZWJ
\x{200d}\x{1F469}B ZWJ E_Base_GAZ
\x{261d}\x{261d}B Extended_Pictographic Extended_Pictographic
\x{261D}\x{1F3FB}B Extended_Pictographic Extend
\x{1F1E6}\x{1F1E7}B RegionalIndicator RegionalIndicator
\x{261D}\x{E0100}\x{1F3FB}B E_Base Extend E_Modifier
\x{261D}\x{1F3FB}\x{261d}B Extended_Pictographic Extend E-P
\x{261D}\x{1F3FB}\x{200d}\x{261d}B Extended_Pictographic Extend ZWJ E-P
# Regional indicators
/^(\X)(\X)/utf,aftertext
\x{1F1E6}\x{1F1E7}\x{1F1E7}B
\x{1F1E6}\x{1F1E7}\x{1F1E7}\x{1F1E6}B
# More differences from Perl
/^[\p{Arabic}]/utf
\= Expect no match
\x{650}
\x{651}
\x{652}
\x{653}
\x{654}
\x{655}
/^\p{Common}/utf
\x{589}
\x{60c}
\x{61f}
\x{964}
\x{965}
/^\p{Inherited}/utf
\x{64b}
\x{654}
\x{655}
\x{1D1AA}
# End of testinput5

35
testdata/testoutput4 vendored
View File

@ -2293,43 +2293,18 @@ No match
0: \x{6ef}
\x{6fa}
0: \x{6fa}
\= Expect no match
\x{650}
No match
\x{651}
No match
\x{652}
No match
\x{653}
No match
\x{654}
No match
\x{655}
No match
/^\p{Cyrillic}/utf
\x{1d2b}
0: \x{1d2b}
/^\p{Common}/utf
\x{589}
0: \x{589}
\x{60c}
0: \x{60c}
\x{61f}
0: \x{61f}
\x{964}
0: \x{964}
\x{965}
0: \x{965}
\x{2116}
0: \x{2116}
\x{1D183}
0: \x{1d183}
/^\p{Inherited}/utf
\x{64b}
0: \x{64b}
\x{654}
0: \x{654}
\x{655}
0: \x{655}
\x{200c}
0: \x{200c}
\= Expect no match

75
testdata/testoutput5 vendored
View File

@ -4593,8 +4593,8 @@ No match
# to test 4.
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
1: \x{1e900}\x{1e924}\x{1e953}
@ -4667,24 +4667,35 @@ Callout 0: last capture = 1
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
1: \x{11a00}\x{11a07}\x{11a47}
# These two are here because JIT is not yet updated. Also, the very first data
# line is handled differently by Perl.
# Similarly for Unicode 11.0.0
/^(\p{Dogra}+)(\p{Gunjala_Gondi}+)(\p{Hanifi_Rohingya}+)(\p{Makasar}+)
(\p{Medefaidrin}+)(\p{Old_Sogdian}+)(\p{Sogdian}+)/x,utf
\x{11800}\x{11da9}\x{10d27}\x{11ee0}\x{16e48}\x{10f27}\x{10f30}
0: \x{11800}\x{11da9}\x{10d27}\x{11ee0}\x{16e48}\x{10f27}\x{10f30}
1: \x{11800}
2: \x{11da9}
3: \x{10d27}
4: \x{11ee0}
5: \x{16e48}
6: \x{10f27}
7: \x{10f30}
# These two are here because of differences from Perl.
/^\X/utf
A\x{200d}B A ZWJ
0: A\x{200d}
\x{261D}\x{1F3FB}B E_Base E_Modifier
\x{261d}\x{261d}B Extended_Pictographic Extended_Pictographic
0: \x{261d}\x{261d}
\x{261D}\x{1F3FB}B Extended_Pictographic Extend
0: \x{261d}\x{1f3fb}
\x{1F466}\x{1F3FF}B E_Base_GAZ E_Modifier
0: \x{1f466}\x{1f3ff}
\x{200d}\x{1F3A4}B ZWJ Glue_After_ZWJ
0: \x{200d}\x{1f3a4}
\x{200d}\x{1F469}B ZWJ E_Base_GAZ
0: \x{200d}\x{1f469}
\x{1F1E6}\x{1F1E7}B RegionalIndicator RegionalIndicator
0: \x{1f1e6}\x{1f1e7}
\x{261D}\x{E0100}\x{1F3FB}B E_Base Extend E_Modifier
0: \x{261d}\x{e0100}\x{1f3fb}
\x{261D}\x{1F3FB}\x{261d}B Extended_Pictographic Extend E-P
0: \x{261d}\x{1f3fb}\x{261d}
\x{261D}\x{1F3FB}\x{200d}\x{261d}B Extended_Pictographic Extend ZWJ E-P
0: \x{261d}\x{1f3fb}\x{200d}\x{261d}
# Regional indicators
@ -4699,6 +4710,44 @@ Callout 0: last capture = 1
0+ B
1: \x{1f1e6}\x{1f1e7}
2: \x{1f1e7}\x{1f1e6}
# More differences from Perl
/^[\p{Arabic}]/utf
\= Expect no match
\x{650}
No match
\x{651}
No match
\x{652}
No match
\x{653}
No match
\x{654}
No match
\x{655}
No match
/^\p{Common}/utf
\x{589}
0: \x{589}
\x{60c}
0: \x{60c}
\x{61f}
0: \x{61f}
\x{964}
0: \x{964}
\x{965}
0: \x{965}
/^\p{Inherited}/utf
\x{64b}
0: \x{64b}
\x{654}
0: \x{654}
\x{655}
0: \x{655}
\x{1D1AA}
0: \x{1d1aa}
# End of testinput5