diff --git a/ChangeLog b/ChangeLog index cf87b2d..a068002 100644 --- a/ChangeLog +++ b/ChangeLog @@ -107,6 +107,10 @@ to an incorrect "lookbehind assertion is not fixed length" error. 23. The VERSION condition test was reading fractional PCRE2 version numbers such as the 04 in 10.04 incorrectly and hence giving wrong results. + +24. Updated to Unicode version 11.0.0. As well as the usual addition of new +scripts and characters, this involved re-jigging the grapheme break property +algorithm because Unicode has changed the way emojis are handled. Version 10.31 12-February-2018 diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html index 9adc426..9d241b7 100644 --- a/doc/html/pcre2pattern.html +++ b/doc/html/pcre2pattern.html @@ -789,6 +789,7 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Dogra, Duployan, Egyptian_Hieroglyphs, Elbasan, @@ -799,9 +800,11 @@ Gothic, Grantha, Greek, Gujarati, +Gunjala_Gondi, Gurmukhi, Han, Hangul, +Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, @@ -829,11 +832,13 @@ Lisu, Lycian, Lydian, Mahajani, +Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi, +Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, @@ -856,6 +861,7 @@ Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, +Old_Sogdian, Old_South_Arabian, Old_Turkic, Oriya, @@ -876,6 +882,7 @@ Shavian, Siddham, SignWriting, Sinhala, +Sogdian, Sora_Sompeng, Soyombo, Sundanese, @@ -1006,7 +1013,10 @@ grapheme cluster", and treats the sequence as an atomic group Unicode supports various kinds of composite character by giving each character a grapheme breaking property, and having rules that use these properties to define the boundaries of extended grapheme clusters. The rules are defined in -Unicode Standard Annex 29, "Unicode Text Segmentation". +Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 +abandoned the use of some previous properties that had been used for emojis. +Instead it introduced various emoji-specific properties. PCRE2 uses only the +Extended Pictographic property.

\X always matches at least one character. Then it decides whether to add @@ -1026,27 +1036,24 @@ character; an LVT or T character may be follwed only by a T character.

4. Do not end before extending characters or spacing marks or the "zero-width -joiner" characters. Characters with the "mark" property always have the +joiner" character. Characters with the "mark" property always have the "extend" grapheme breaking property.

5. Do not end after prepend characters.

-6. Do not break within emoji modifier sequences (a base character followed by a -modifier). Extending characters are allowed before the modifier. +6. Do not break within emoji modifier sequences or emoji zwj sequences. That +is, do not break between characters with the Extended_Pictographic property. +Extend and ZWJ characters are allowed between the characters.

-7. Do not break within emoji zwj sequences (zero-width joiner followed by -"glue after ZWJ" or "base glue after ZWJ"). -

-

-8. Do not break within emoji flag sequences. That is, do not break between +7. Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) characters if there are an odd number of RI characters before the break point.

-6. Otherwise, end the cluster. +8. Otherwise, end the cluster.


PCRE2's additional properties @@ -1119,8 +1126,8 @@ lead to odd effects. For example, consider this pattern:
   (?<=\Kfoo)bar
 
-If the subject is "foobar", a call to pcre2_match() with a starting -offset of 3 succeeds and reports the matching string as "foobar", that is, the +If the subject is "foobar", a call to pcre2_match() with a starting +offset of 3 succeeds and reports the matching string as "foobar", that is, the start of the reported match is earlier than where the match started.


@@ -3490,7 +3497,7 @@ Cambridge, England.


REVISION

-Last updated: 30 June 2018 +Last updated: 07 July 2018
Copyright © 1997-2018 University of Cambridge.
diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html index c0d7b39..dee937e 100644 --- a/doc/html/pcre2syntax.html +++ b/doc/html/pcre2syntax.html @@ -188,6 +188,7 @@ at release 5.18.


SCRIPT NAMES FOR \p AND \P

+Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, @@ -198,6 +199,7 @@ Bamum, Bassa_Vah, Batak, Bengali, +Bhaiksuki, Bopomofo, Brahmi, Braille, @@ -216,6 +218,7 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Dogra, Duployan, Egyptian_Hieroglyphs, Elbasan, @@ -226,9 +229,11 @@ Gothic, Grantha, Greek, Gujarati, +Gunjala_Gondi, Gurmukhi, Han, Hangul, +Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, @@ -256,9 +261,13 @@ Lisu, Lycian, Lydian, Mahajani, +Makasar, Malayalam, Mandaic, Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, @@ -271,7 +280,9 @@ Multani, Myanmar, Nabataean, New_Tai_Lue, +Newa, Nko, +Nushu, Ogham, Ol_Chiki, Old_Hungarian, @@ -279,9 +290,11 @@ Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, +Old_Sogdian, Old_South_Arabian, Old_Turkic, Oriya, +Osage, Osmanya, Pahawh_Hmong, Palmyrene, @@ -298,7 +311,9 @@ Shavian, Siddham, SignWriting, Sinhala, +Sogdian, Sora_Sompeng, +Soyombo, Sundanese, Syloti_Nagri, Syriac, @@ -309,6 +324,7 @@ Tai_Tham, Tai_Viet, Takri, Tamil, +Tangut, Telugu, Thaana, Thai, @@ -318,7 +334,8 @@ Tirhuta, Ugaritic, Vai, Warang_Citi, -Yi. +Yi, +Zanabazar_Square.


CHARACTER CLASSES

@@ -600,7 +617,7 @@ Cambridge, England.


REVISION

-Last updated: 28 June 2018 +Last updated: 07 July 2018
Copyright © 1997-2018 University of Cambridge.
diff --git a/doc/pcre2.txt b/doc/pcre2.txt index d8c08a9..553ca20 100644 --- a/doc/pcre2.txt +++ b/doc/pcre2.txt @@ -6483,34 +6483,35 @@ BACKSLASH nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba- nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, - Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan, - Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gur- - mukhi, Han, Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Ara- - maic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian, - Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Kho- - jki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, - Lycian, Lydian, Mahajani, Malayalam, Mandaic, Manichaean, Marchen, - Masaram_Gondi, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, - Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar, - Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar- - ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, - Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pahawh_Hmong, - Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang, - Runic, Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, - Sinhala, Sora_Sompeng, Soyombo, Sundanese, Syloti_Nagri, Syriac, Taga- - log, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Tangut, Tel- - ugu, Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, - Warang_Citi, Yi, Zanabazar_Square. + Cyrillic, Deseret, Devanagari, Dogra, Duployan, Egyptian_Hieroglyphs, + Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, + Gujarati, Gunjala_Gondi, Gurmukhi, Han, Hangul, Hanifi_Rohingya, + Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited, + Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan- + nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao, + Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha- + jani, Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi, + Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, + Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar, + Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar- + ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog- + dian, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, + Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, + Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha- + vian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng, Soyombo, + Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, + Tai_Viet, Takri, Tamil, Tangut, Telugu, Thaana, Thai, Tibetan, Tifi- + nagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi, Zanabazar_Square. Each character has exactly one Unicode general category property, spec- - ified by a two-letter abbreviation. For compatibility with Perl, nega- - tion can be specified by including a circumflex between the opening - brace and the property name. For example, \p{^Lu} is the same as + ified by a two-letter abbreviation. For compatibility with Perl, nega- + tion can be specified by including a circumflex between the opening + brace and the property name. For example, \p{^Lu} is the same as \P{Lu}. If only one letter is specified with \p or \P, it includes all the gen- - eral category properties that start with that letter. In this case, in - the absence of negation, the curly brackets in the escape sequence are + eral category properties that start with that letter. In this case, in + the absence of negation, the curly brackets in the escape sequence are optional; these two examples have the same effect: \p{L} @@ -6562,44 +6563,47 @@ BACKSLASH Zp Paragraph separator Zs Space separator - The special property L& is also supported: it matches a character that - has the Lu, Ll, or Lt property, in other words, a letter that is not + The special property L& is also supported: it matches a character that + has the Lu, Ll, or Lt property, in other words, a letter that is not classified as a modifier or "other". - The Cs (Surrogate) property applies only to characters in the range - U+D800 to U+DFFF. Such characters are not valid in Unicode strings and - so cannot be tested by PCRE2, unless UTF validity checking has been - turned off (see the discussion of PCRE2_NO_UTF_CHECK in the pcre2api + The Cs (Surrogate) property applies only to characters in the range + U+D800 to U+DFFF. Such characters are not valid in Unicode strings and + so cannot be tested by PCRE2, unless UTF validity checking has been + turned off (see the discussion of PCRE2_NO_UTF_CHECK in the pcre2api page). Perl does not support the Cs property. - The long synonyms for property names that Perl supports (such as - \p{Letter}) are not supported by PCRE2, nor is it permitted to prefix + The long synonyms for property names that Perl supports (such as + \p{Letter}) are not supported by PCRE2, nor is it permitted to prefix any of these properties with "Is". No character that is in the Unicode table has the Cn (unassigned) prop- erty. Instead, this property is assumed for any code point that is not in the Unicode table. - Specifying caseless matching does not affect these escape sequences. - For example, \p{Lu} always matches only upper case letters. This is + Specifying caseless matching does not affect these escape sequences. + For example, \p{Lu} always matches only upper case letters. This is different from the behaviour of current versions of Perl. - Matching characters by Unicode property is not fast, because PCRE2 has - to do a multistage table lookup in order to find a character's prop- + Matching characters by Unicode property is not fast, because PCRE2 has + to do a multistage table lookup in order to find a character's prop- erty. That is why the traditional escape sequences such as \d and \w do - not use Unicode properties in PCRE2 by default, though you can make - them do so by setting the PCRE2_UCP option or by starting the pattern + not use Unicode properties in PCRE2 by default, though you can make + them do so by setting the PCRE2_UCP option or by starting the pattern with (*UCP). Extended grapheme clusters - The \X escape matches any number of Unicode characters that form an + The \X escape matches any number of Unicode characters that form an "extended grapheme cluster", and treats the sequence as an atomic group - (see below). Unicode supports various kinds of composite character by - giving each character a grapheme breaking property, and having rules + (see below). Unicode supports various kinds of composite character by + giving each character a grapheme breaking property, and having rules that use these properties to define the boundaries of extended grapheme - clusters. The rules are defined in Unicode Standard Annex 29, "Unicode - Text Segmentation". + clusters. The rules are defined in Unicode Standard Annex 29, "Unicode + Text Segmentation". Unicode 11.0.0 abandoned the use of some previous + properties that had been used for emojis. Instead it introduced vari- + ous emoji-specific properties. PCRE2 uses only the Extended Picto- + graphic property. \X always matches at least one character. Then it decides whether to add additional characters according to the following rules for ending a @@ -6617,23 +6621,21 @@ BACKSLASH only by a T character. 4. Do not end before extending characters or spacing marks or the - "zero-width joiner" characters. Characters with the "mark" property + "zero-width joiner" character. Characters with the "mark" property always have the "extend" grapheme breaking property. 5. Do not end after prepend characters. - 6. Do not break within emoji modifier sequences (a base character fol- - lowed by a modifier). Extending characters are allowed before the modi- - fier. + 6. Do not break within emoji modifier sequences or emoji zwj sequences. + That is, do not break between characters with the Extended_Pictographic + property. Extend and ZWJ characters are allowed between the charac- + ters. - 7. Do not break within emoji zwj sequences (zero-width joiner followed - by "glue after ZWJ" or "base glue after ZWJ"). - - 8. Do not break within emoji flag sequences. That is, do not break + 7. Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) characters if there are an odd number of RI characters before the break point. - 6. Otherwise, end the cluster. + 8. Otherwise, end the cluster. PCRE2's additional properties @@ -8941,7 +8943,7 @@ AUTHOR REVISION - Last updated: 30 June 2018 + Last updated: 07 July 2018 Copyright (c) 1997-2018 University of Cambridge. ------------------------------------------------------------------------------ @@ -9915,26 +9917,29 @@ PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P SCRIPT NAMES FOR \p AND \P - Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Balinese, - Bamum, Bassa_Vah, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese, - Buhid, Canadian_Aboriginal, Carian, Caucasian_Albanian, Chakma, Cham, - Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, - Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan, Ethiopic, Geor- - gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gurmukhi, Han, - Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited, - Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan- - nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao, - Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha- - jani, Malayalam, Mandaic, Manichaean, Meetei_Mayek, Mende_Kikakui, - Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, - Multani, Myanmar, Nabataean, New_Tai_Lue, Nko, Ogham, Ol_Chiki, - Old_Hungarian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, - Old_South_Arabian, Old_Turkic, Oriya, Osmanya, Pahawh_Hmong, Palmyrene, - Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang, Runic, - Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, Sinhala, - Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, - Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, Thaana, Thai, - Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi. + Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali- + nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi, + Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba- + nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, + Cyrillic, Deseret, Devanagari, Dogra, Duployan, Egyptian_Hieroglyphs, + Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, + Gujarati, Gunjala_Gondi, Gurmukhi, Han, Hangul, Hanifi_Rohingya, + Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited, + Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan- + nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao, + Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha- + jani, Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi, + Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, + Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar, + Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar- + ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog- + dian, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, + Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, + Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha- + vian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng, Soyombo, + Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, + Tai_Viet, Takri, Tamil, Tangut, Telugu, Thaana, Thai, Tibetan, Tifi- + nagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi, Zanabazar_Square. CHARACTER CLASSES @@ -9960,8 +9965,8 @@ CHARACTER CLASSES word same as \w xdigit hexadecimal digit - In PCRE2, POSIX character set names recognize only ASCII characters by - default, but some of them use Unicode properties if PCRE2_UCP is set. + In PCRE2, POSIX character set names recognize only ASCII characters by + default, but some of them use Unicode properties if PCRE2_UCP is set. You can use \Q...\E inside a character class. @@ -10047,8 +10052,8 @@ OPTION SETTING (?xx) as (?x) but also ignore space and tab in classes (?-...) unset option(s) - The following are recognized only at the very start of a pattern or - after one of the newline or \R options with similar syntax. More than + The following are recognized only at the very start of a pattern or + after one of the newline or \R options with similar syntax. More than one of them may appear. For the first three, d is a decimal number. (*LIMIT_DEPTH=d) set the backtracking limit to d @@ -10063,17 +10068,17 @@ OPTION SETTING (*UTF) set appropriate UTF mode for the library in use (*UCP) set PCRE2_UCP (use Unicode properties for \d etc) - Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the - value of the limits set by the caller of pcre2_match() or - pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete + Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the + value of the limits set by the caller of pcre2_match() or + pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The application can lock out the use of (*UTF) - and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, + and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time. NEWLINE CONVENTION - These are recognized only at the very start of the pattern or after + These are recognized only at the very start of the pattern or after option settings with a similar syntax. (*CR) carriage return only @@ -10086,7 +10091,7 @@ NEWLINE CONVENTION WHAT \R MATCHES - These are recognized only at the very start of the pattern or after + These are recognized only at the very start of the pattern or after option setting with a similar syntax. (*BSR_ANYCRLF) CR, LF, or CRLF @@ -10155,8 +10160,8 @@ CONDITIONAL PATTERNS (?(VERSION[>]=n.m) test PCRE2 version (?(assert) assertion condition - Note the ambiguity of (?(R) and (?(Rn) which might be named reference - conditions or recursion tests. Such a condition is interpreted as a + Note the ambiguity of (?(R) and (?(Rn) which might be named reference + conditions or recursion tests. Such a condition is interpreted as a reference condition if the relevant named group exists. @@ -10168,7 +10173,7 @@ BACKTRACKING CONTROL (*FAIL) force backtrack; synonym (*F) (*MARK:NAME) set name to be passed back; synonym (*:NAME) - The following act only when a subsequent match failure causes a back- + The following act only when a subsequent match failure causes a back- track to reach them. They all force a match failure, but they differ in what happens afterwards. Those that advance the start-of-match point do so only if the pattern is not anchored. @@ -10190,14 +10195,14 @@ CALLOUTS (?C"text") callout with string data The allowed string delimiters are ` ' " ^ % # $ (which are the same for - the start and the end), and the starting delimiter { matched with the - ending delimiter }. To encode the ending delimiter within the string, + the start and the end), and the starting delimiter { matched with the + ending delimiter }. To encode the ending delimiter within the string, double it. SEE ALSO - pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3), + pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2(3). @@ -10210,7 +10215,7 @@ AUTHOR REVISION - Last updated: 28 June 2018 + Last updated: 07 July 2018 Copyright (c) 1997-2018 University of Cambridge. ------------------------------------------------------------------------------ diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index 2b534f2..cd9a99c 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "30 June 2018" "PCRE2 10.32" +.TH PCRE2PATTERN 3 "07 July 2018" "PCRE2 10.32" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -788,6 +788,7 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Dogra, Duployan, Egyptian_Hieroglyphs, Elbasan, @@ -798,9 +799,11 @@ Gothic, Grantha, Greek, Gujarati, +Gunjala_Gondi, Gurmukhi, Han, Hangul, +Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, @@ -828,11 +831,13 @@ Lisu, Lycian, Lydian, Mahajani, +Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi, +Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, @@ -855,6 +860,7 @@ Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, +Old_Sogdian, Old_South_Arabian, Old_Turkic, Oriya, @@ -875,6 +881,7 @@ Shavian, Siddham, SignWriting, Sinhala, +Sogdian, Sora_Sompeng, Soyombo, Sundanese, @@ -1003,7 +1010,10 @@ grapheme cluster", and treats the sequence as an atomic group Unicode supports various kinds of composite character by giving each character a grapheme breaking property, and having rules that use these properties to define the boundaries of extended grapheme clusters. The rules are defined in -Unicode Standard Annex 29, "Unicode Text Segmentation". +Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 +abandoned the use of some previous properties that had been used for emojis. +Instead it introduced various emoji-specific properties. PCRE2 uses only the +Extended Pictographic property. .P \eX always matches at least one character. Then it decides whether to add additional characters according to the following rules for ending a cluster: @@ -1018,22 +1028,20 @@ L, V, LV, or LVT character; an LV or V character may be followed by a V or T character; an LVT or T character may be follwed only by a T character. .P 4. Do not end before extending characters or spacing marks or the "zero-width -joiner" characters. Characters with the "mark" property always have the +joiner" character. Characters with the "mark" property always have the "extend" grapheme breaking property. .P 5. Do not end after prepend characters. .P -6. Do not break within emoji modifier sequences (a base character followed by a -modifier). Extending characters are allowed before the modifier. +6. Do not break within emoji modifier sequences or emoji zwj sequences. That +is, do not break between characters with the Extended_Pictographic property. +Extend and ZWJ characters are allowed between the characters. .P -7. Do not break within emoji zwj sequences (zero-width joiner followed by -"glue after ZWJ" or "base glue after ZWJ"). -.P -8. Do not break within emoji flag sequences. That is, do not break between +7. Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) characters if there are an odd number of RI characters before the break point. .P -6. Otherwise, end the cluster. +8. Otherwise, end the cluster. . . .\" HTML @@ -1112,8 +1120,8 @@ lead to odd effects. For example, consider this pattern: .sp (?<=\eKfoo)bar .sp -If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting -offset of 3 succeeds and reports the matching string as "foobar", that is, the +If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting +offset of 3 succeeds and reports the matching string as "foobar", that is, the start of the reported match is earlier than where the match started. . . @@ -3517,6 +3525,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 30 June 2018 +Last updated: 07 July 2018 Copyright (c) 1997-2018 University of Cambridge. .fi diff --git a/doc/pcre2syntax.3 b/doc/pcre2syntax.3 index 4eec552..7e29beb 100644 --- a/doc/pcre2syntax.3 +++ b/doc/pcre2syntax.3 @@ -1,4 +1,4 @@ -.TH PCRE2SYNTAX 3 "28 June 2018" "PCRE2 10.32" +.TH PCRE2SYNTAX 3 "07 July 2018" "PCRE2 10.32" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" @@ -160,6 +160,7 @@ at release 5.18. .SH "SCRIPT NAMES FOR \ep AND \eP" .rs .sp +Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, @@ -170,6 +171,7 @@ Bamum, Bassa_Vah, Batak, Bengali, +Bhaiksuki, Bopomofo, Brahmi, Braille, @@ -188,6 +190,7 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Dogra, Duployan, Egyptian_Hieroglyphs, Elbasan, @@ -198,9 +201,11 @@ Gothic, Grantha, Greek, Gujarati, +Gunjala_Gondi, Gurmukhi, Han, Hangul, +Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, @@ -228,9 +233,13 @@ Lisu, Lycian, Lydian, Mahajani, +Makasar, Malayalam, Mandaic, Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, @@ -243,7 +252,9 @@ Multani, Myanmar, Nabataean, New_Tai_Lue, +Newa, Nko, +Nushu, Ogham, Ol_Chiki, Old_Hungarian, @@ -251,9 +262,11 @@ Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, +Old_Sogdian, Old_South_Arabian, Old_Turkic, Oriya, +Osage, Osmanya, Pahawh_Hmong, Palmyrene, @@ -270,7 +283,9 @@ Shavian, Siddham, SignWriting, Sinhala, +Sogdian, Sora_Sompeng, +Soyombo, Sundanese, Syloti_Nagri, Syriac, @@ -281,6 +296,7 @@ Tai_Tham, Tai_Viet, Takri, Tamil, +Tangut, Telugu, Thaana, Thai, @@ -290,7 +306,8 @@ Tirhuta, Ugaritic, Vai, Warang_Citi, -Yi. +Yi, +Zanabazar_Square. . . .SH "CHARACTER CLASSES" @@ -589,6 +606,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 28 June 2018 +Last updated: 07 July 2018 Copyright (c) 1997-2018 University of Cambridge. .fi diff --git a/maint/GenerateUtt.py b/maint/GenerateUtt.py index a152566..54a72e0 100755 --- a/maint/GenerateUtt.py +++ b/maint/GenerateUtt.py @@ -24,6 +24,7 @@ # Added script names for Unicode 7.0.0, 20-June-2014. # Added script names for Unicode 8.0.0, 19-June-2015. # Added script names for Unicode 10.0.0, 02-July-2017. +# Added script names for Unicode 11.0.0, 03-July-2018. script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \ 'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \ @@ -55,7 +56,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines 'SignWriting', # New for Unicode 10.0.0 'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi', - 'Nushu', 'Soyombo', 'Zanabazar_Square' + 'Nushu', 'Soyombo', 'Zanabazar_Square', +# New for Unicode 11.0.0 + 'Dogra', 'Gunjala_Gondi', 'Hanifi_Rohingya', 'Makasar', 'Medefaidrin', + 'Old_Sogdian', 'Sogdian' ] category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu', diff --git a/maint/MultiStage2.py b/maint/MultiStage2.py index f124538..e9ed694 100755 --- a/maint/MultiStage2.py +++ b/maint/MultiStage2.py @@ -7,20 +7,26 @@ # This script was submitted to the PCRE project by Peter Kankowski as part of # the upgrading of Unicode property support. The new code speeds up property # matching many times. The script is for the use of PCRE maintainers, to -# generate the pcre_ucd.c file that contains a digested form of the Unicode +# generate the pcre2_ucd.c file that contains a digested form of the Unicode # data tables. # -# The script has now been upgraded to Python 3 for PCRE2, and should be run in +# The script has now been upgraded to Python 3 for PCRE2, and should be run in # the maint subdirectory, using the command # # [python3] ./MultiStage2.py >../src/pcre2_ucd.c # -# It requires four Unicode data tables, DerivedGeneralCategory.txt, -# GraphemeBreakProperty.txt, Scripts.txt, and CaseFolding.txt, to be in the -# Unicode.tables subdirectory. The first of these is found in the "extracted" -# subdirectory of the Unicode database (UCD) on the Unicode web site; the -# second is in the "auxiliary" subdirectory; the other two are directly in the -# UCD directory. +# It requires five Unicode data tables: DerivedGeneralCategory.txt, +# GraphemeBreakProperty.txt, Scripts.txt, CaseFolding.txt, and emoji-data.txt. +# These must be in the maint/Unicode.tables subdirectory. +# +# DerivedGeneralCategory.txt is found in the "extracted" subdirectory of the +# Unicode database (UCD) on the Unicode web site; GraphemeBreakProperty.txt is +# in the "auxiliary" subdirectory. Scripts.txt and CaseFolding.txt are directly +# in the UCD directory. The emoji-data.txt file is in files associated with +# Unicode Technical Standard #51 ("Unicode Emoji"), for example: +# +# http://unicode.org/Public/emoji/11.0/emoji-data.txt +# # # Minor modifications made to this script: # Added #! line at start @@ -41,7 +47,8 @@ # Added code to search for sets of more than two characters that must match # each other caselessly. A new table is output containing these sets, and # offsets into the table are added to the main output records. This new -# code scans CaseFolding.txt instead of UnicodeData.txt. +# code scans CaseFolding.txt instead of UnicodeData.txt, which is no longer +# used. # # Update for Python3: # . Processed with 2to3, but that didn't fix everything @@ -50,8 +57,13 @@ # . Inserted 'int' before blocksize/ELEMS_PER_LINE because an int is # required and the result of the division is a float # +# Added code to scan the emoji-data.txt file to find the Extended Pictographic +# property, which is used by PCRE2 as a grapheme breaking property. This was +# done when updating to Unicode 11.0.0 (July 2018). +# +# # The main tables generated by this script are used by macros defined in -# pcre2_internal.h. They look up Unicode character properties using short +# pcre2_internal.h. They look up Unicode character properties using short # sequences of code that contains no branches, which makes for greater speed. # # Conceptually, there is a table of records (of type ucd_record), containing a @@ -75,43 +87,48 @@ # table of "virtual" blocks; each block is indexed by the offset of a character # within its own block, and the result is the offset of the required record. # +# The following examples are correct for the Unicode 11.0.0 database. Future +# updates may make change the actual lookup values. +# # Example: lowercase "a" (U+0061) is in block 0 # lookup 0 in stage1 table yields 0 # lookup 97 in the first table in stage2 yields 16 -# record 17 is { 33, 5, 11, 0, -32 } +# record 17 is { 33, 5, 11, 0, -32 } # 33 = ucp_Latin => Latin script # 5 = ucp_Ll => Lower case letter -# 11 = ucp_gbOther => Grapheme break property "Other" +# 12 = ucp_gbOther => Grapheme break property "Other" # 0 => not part of a caseless set # -32 => Other case is U+0041 -# +# # Almost all lowercase latin characters resolve to the same record. One or two # are different because they are part of a multi-character caseless set (for # example, k, K and the Kelvin symbol are such a set). # # Example: hiragana letter A (U+3042) is in block 96 (0x60) -# lookup 96 in stage1 table yields 88 -# lookup 66 in the 88th table in stage2 yields 467 -# record 470 is { 26, 7, 11, 0, 0 } +# lookup 96 in stage1 table yields 90 +# lookup 66 in the 90th table in stage2 yields 515 +# record 515 is { 26, 7, 11, 0, 0 } # 26 = ucp_Hiragana => Hiragana script # 7 = ucp_Lo => Other letter -# 11 = ucp_gbOther => Grapheme break property "Other" +# 12 = ucp_gbOther => Grapheme break property "Other" # 0 => not part of a caseless set -# 0 => No other case +# 0 => No other case # # In these examples, no other blocks resolve to the same "virtual" block, as it # happens, but plenty of other blocks do share "virtual" blocks. # -# There is a fourth table, maintained by hand, which translates from the +# There is a fourth table, maintained by hand, which translates from the # individual character types such as ucp_Cc to the general types like ucp_C. # # Philip Hazel, 03 July 2008 +# Last Updated: 07 July 2018 +# # # 01-March-2010: Updated list of scripts for Unicode 5.2.0 # 30-April-2011: Updated list of scripts for Unicode 6.0.0 # July-2012: Updated list of scripts for Unicode 6.1.0 -# 20-August-2012: Added scan of GraphemeBreakProperty.txt and added a new -# field in the record to hold the value. Luckily, the +# 20-August-2012: Added scan of GraphemeBreakProperty.txt and added a new +# field in the record to hold the value. Luckily, the # structure had a hole in it, so the resulting table is # not much bigger than before. # 18-September-2012: Added code for multiple caseless sets. This uses the @@ -123,6 +140,9 @@ # 12-August-2014: Updated to put Unicode version into the file # 19-June-2015: Updated for Unicode 8.0.0 # 02-July-2017: Updated for Unicode 10.0.0 +# 03-July-2018: Updated for Unicode 11.0.0 +# 07-July-2018: Added code to scan emoji-data.txt for the Extended +# Pictographic property. ############################################################################## @@ -148,7 +168,7 @@ def get_other_case(chardata): # Read the whole table in memory, setting/checking the Unicode version def read_table(file_name, get_value, default_value): global unicode_version - + f = re.match(r'^[^/]+/([^.]+)\.txt$', file_name) file_base = f.group(1) version_pat = r"^# " + re.escape(file_base) + r"-(\d+\.\d+\.\d+)\.txt$" @@ -159,7 +179,7 @@ def read_table(file_name, get_value, default_value): unicode_version = version elif unicode_version != version: print("WARNING: Unicode version differs in %s", file_name, file=sys.stderr) - + table = [default_value] * MAX_UNICODE for line in file: line = re.sub(r'#.*', '', line) @@ -172,14 +192,14 @@ def read_table(file_name, get_value, default_value): if m.group(3) is None: last = char else: - last = int(m.group(3), 16) + last = int(m.group(3), 16) for i in range(char, last + 1): # It is important not to overwrite a previously set # value because in the CaseFolding file there are lines - # to be ignored (returning the default value of 0) - # which often come after a line which has already set - # data. - if table[i] == default_value: + # to be ignored (returning the default value of 0) + # which often come after a line which has already set + # data. + if table[i] == default_value: table[i] = value file.close() return table @@ -220,14 +240,14 @@ def compress_table(table, block_size): stage2 += block blocks[block] = start stage1.append(start) - + return stage1, stage2 # Print a table def print_table(table, table_name, block_size = None): type, size = get_type_size(table) ELEMS_PER_LINE = 16 - + s = "const %s %s[] = { /* %d bytes" % (type, table_name, size * len(table)) if block_size: s += ", block = %d" % block_size @@ -237,7 +257,7 @@ def print_table(table, table_name, block_size = None): fmt = "%3d," * ELEMS_PER_LINE + " /* U+%04X */" mult = MAX_UNICODE / len(table) for i in range(0, len(table), ELEMS_PER_LINE): - print(fmt % (table[i:i+ELEMS_PER_LINE] + + print(fmt % (table[i:i+ELEMS_PER_LINE] + (int(i * mult),))) else: if block_size > ELEMS_PER_LINE: @@ -274,15 +294,15 @@ def get_record_size_struct(records): size = (size + slice_size - 1) & -slice_size size += slice_size structure += '%s property_%d;\n' % (slice_type, i) - + # round up to the first item of the next structure in array record_slice = [record[0] for record in records] slice_type, slice_size = get_type_size(record_slice) size = (size + slice_size - 1) & -slice_size - + structure += '} ucd_record;\n*/\n\n' return size, structure - + def test_record_size(): tests = [ \ ( [(3,), (6,), (6,), (1,)], 1 ), \ @@ -339,16 +359,23 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines 'SignWriting', # New for Unicode 10.0.0 'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi', - 'Nushu', 'Soyombo', 'Zanabazar_Square' + 'Nushu', 'Soyombo', 'Zanabazar_Square', +# New for Unicode 11.0.0 + 'Dogra', 'Gunjala_Gondi', 'Hanifi_Rohingya', 'Makasar', 'Medefaidrin', + 'Old_Sogdian', 'Sogdian' ] - + category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu', 'Mc', 'Me', 'Mn', 'Nd', 'Nl', 'No', 'Pc', 'Pd', 'Pe', 'Pf', 'Pi', 'Po', 'Ps', 'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ] +# The Extended_Pictographic property is not found in the file where all the +# others are (GraphemeBreakProperty.txt). It comes from the emoji-data.txt +# file, but we list it here so that the name has the correct index value. + break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend', 'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other', - 'E_Base', 'E_Modifier', 'E_Base_GAZ', 'ZWJ', 'Glue_After_Zwj' ] + 'ZWJ', 'Extended_Pictographic' ] test_record_size() unicode_version = "" @@ -358,21 +385,50 @@ category = read_table('Unicode.tables/DerivedGeneralCategory.txt', make_get_name break_props = read_table('Unicode.tables/GraphemeBreakProperty.txt', make_get_names(break_property_names), break_property_names.index('Other')) other_case = read_table('Unicode.tables/CaseFolding.txt', get_other_case, 0) +# The grapheme breaking rules were changed for Unicode 11.0.0 (June 2018). Now +# we need to find the Extended_Pictographic property for emoji characters. This +# can be set as an additional grapheme break property, because the default for +# all the emojis is "other". We scan the emoji-data.txt file and modify the +# break-props table. -# This block of code was added by PH in September 2012. I am not a Python -# programmer, so the style is probably dreadful, but it does the job. It scans -# the other_case table to find sets of more than two characters that must all -# match each other caselessly. Later in this script a table of these sets is -# written out. However, we have to do this work here in order to compute the +file = open('Unicode.tables/emoji-data.txt', 'r', encoding='utf-8') +for line in file: + line = re.sub(r'#.*', '', line) + chardata = list(map(str.strip, line.split(';'))) + if len(chardata) <= 1: + continue + + if chardata[1] != "Extended_Pictographic": + continue + + m = re.match(r'([0-9a-fA-F]+)(\.\.([0-9a-fA-F]+))?$', chardata[0]) + char = int(m.group(1), 16) + if m.group(3) is None: + last = char + else: + last = int(m.group(3), 16) + for i in range(char, last + 1): + if break_props[i] != break_property_names.index('Other'): + print("WARNING: Emoji 0x%x has break property %s, not 'Other'", + i, break_property_names[break_props[i]], file=sys.stderr) + break_props[i] = break_property_names.index('Extended_Pictographic') +file.close() + + +# This block of code was added by PH in September 2012. I am not a Python +# programmer, so the style is probably dreadful, but it does the job. It scans +# the other_case table to find sets of more than two characters that must all +# match each other caselessly. Later in this script a table of these sets is +# written out. However, we have to do this work here in order to compute the # offsets in the table that are inserted into the main table. # The CaseFolding.txt file lists pairs, but the common logic for reading data -# sets only one value, so first we go through the table and set "return" +# sets only one value, so first we go through the table and set "return" # offsets for those that are not already set. for c in range(0x10ffff): if other_case[c] != 0 and other_case[c + other_case[c]] == 0: - other_case[c + other_case[c]] = -other_case[c] + other_case[c + other_case[c]] = -other_case[c] # Now scan again and create equivalence sets. @@ -382,25 +438,25 @@ for c in range(0x10ffff): o = c + other_case[c] # Trigger when this character's other case does not point back here. We - # now have three characters that are case-equivalent. - + # now have three characters that are case-equivalent. + if other_case[o] != -other_case[c]: t = o + other_case[o] - - # Scan the existing sets to see if any of the three characters are already + + # Scan the existing sets to see if any of the three characters are already # part of a set. If so, unite the existing set with the new set. - - appended = 0 + + appended = 0 for s in sets: - found = 0 + found = 0 for x in s: if x == c or x == o or x == t: found = 1 - + # Add new characters to an existing set - + if found: - found = 0 + found = 0 for y in [c, o, t]: for x in s: if x == y: @@ -408,10 +464,10 @@ for c in range(0x10ffff): if not found: s.append(y) appended = 1 - + # If we have not added to an existing set, create a new one. - if not appended: + if not appended: sets.append([c, o, t]) # End of loop looking for caseless sets. @@ -422,7 +478,7 @@ caseless_offsets = [0] * MAX_UNICODE offset = 1; for s in sets: - for x in s: + for x in s: caseless_offsets[x] = offset offset += len(s) + 1 @@ -431,7 +487,7 @@ for s in sets: # Combine the tables -table, records = combine_tables(script, category, break_props, +table, records = combine_tables(script, category, break_props, caseless_offsets, other_case) record_size, record_struct = get_record_size_struct(list(records.keys())) @@ -473,7 +529,7 @@ print("/* This file was autogenerated by the MultiStage2.py script. */") print("/* Total size: %d bytes, block size: %d. */" % (min_size, min_block_size)) print() print("/* The tables herein are needed only when UCP support is built,") -print("and in PCRE2 that happens automatically with UTF support.") +print("and in PCRE2 that happens automatically with UTF support.") print("This module should not be referenced otherwise, so") print("it should not matter whether it is compiled or not. However") print("a comment was received about space saving - maybe the guy linked") @@ -484,7 +540,7 @@ print("Instead, just supply small dummy tables. */") print() print("#ifndef SUPPORT_UNICODE") print("const ucd_record PRIV(ucd_records)[] = {{0,0,0,0,0 }};") -print("const uint8_t PRIV(ucd_stage1)[] = {0};") +print("const uint16_t PRIV(ucd_stage1)[] = {0};") print("const uint16_t PRIV(ucd_stage2)[] = {0};") print("const uint32_t PRIV(ucd_caseless_sets)[] = {0};") print("#else") @@ -515,7 +571,7 @@ for s in sets: s = sorted(s) for x in s: print(' 0x%04x,' % x, end=' ') - print(' NOTACHAR,') + print(' NOTACHAR,') print('};') print() diff --git a/maint/README b/maint/README index fb9b7ee..d2de188 100644 --- a/maint/README +++ b/maint/README @@ -23,7 +23,7 @@ GenerateUtt.py A Python script to generate part of the pcre2_tables.c file ManyConfigTests A shell script that runs "configure, make, test" a number of times with different configuration settings. -MultiStage2.py A Python script that generates the file pcre2_ucd.c from three +MultiStage2.py A Python script that generates the file pcre2_ucd.c from five Unicode data tables, which are themselves downloaded from the Unicode web site. Run this script in the "maint" directory. The generated file contains the tables for a 2-stage lookup @@ -37,11 +37,17 @@ pcre2_chartables.c.non-standard README This file. -Unicode.tables The files in this directory (CaseFolding.txt, - DerivedGeneralCategory.txt, GraphemeBreakProperty.txt, - Scripts.txt and UnicodeData.txt) were downloaded from the - Unicode web site. They contain information about Unicode - characters and scripts. +Unicode.tables The files in this directory were downloaded from the Unicode + web site. They contain information about Unicode characters + and scripts. The ones used by the MultiStage2.py script are + CaseFolding.txt, DerivedGeneralCategory.txt, Scripts.txt, + GraphemeBreakProperty.txt, and emoji-data.txt. I've kept + UnicodeData.txt (which is no longer used by the script) + because it is useful occasionally for manually looking up the + details of certain characters. However, note that character + names in this file such as "Arabic sign sanah" do NOT mean + that the character is in a particular script (in this case, + Arabic). Scripts.txt is where to look for script information. ucptest.c A short C program for testing the Unicode property macros that do lookups in the pcre2_ucd.c data, mainly useful after @@ -359,4 +365,4 @@ very sensible; some are rather wacky. Some have been on this list for years. Philip Hazel Email local part: ph10 Email domain: cam.ac.uk -Last updated: 20 May 2017 +Last updated: 07 July 2018 diff --git a/maint/Unicode.tables/CaseFolding.txt b/maint/Unicode.tables/CaseFolding.txt index efdf18e..cce350f 100644 --- a/maint/Unicode.tables/CaseFolding.txt +++ b/maint/Unicode.tables/CaseFolding.txt @@ -1,6 +1,6 @@ -# CaseFolding-10.0.0.txt -# Date: 2017-04-14, 05:40:18 GMT -# © 2017 Unicode®, Inc. +# CaseFolding-11.0.0.txt +# Date: 2018-01-31, 08:20:09 GMT +# © 2018 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see http://www.unicode.org/terms_of_use.html # @@ -603,6 +603,52 @@ 1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN 1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT 1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK +1C90; C; 10D0; # GEORGIAN MTAVRULI CAPITAL LETTER AN +1C91; C; 10D1; # GEORGIAN MTAVRULI CAPITAL LETTER BAN +1C92; C; 10D2; # GEORGIAN MTAVRULI CAPITAL LETTER GAN +1C93; C; 10D3; # GEORGIAN MTAVRULI CAPITAL LETTER DON +1C94; C; 10D4; # GEORGIAN MTAVRULI CAPITAL LETTER EN +1C95; C; 10D5; # GEORGIAN MTAVRULI CAPITAL LETTER VIN +1C96; C; 10D6; # GEORGIAN MTAVRULI CAPITAL LETTER ZEN +1C97; C; 10D7; # GEORGIAN MTAVRULI CAPITAL LETTER TAN +1C98; C; 10D8; # GEORGIAN MTAVRULI CAPITAL LETTER IN +1C99; C; 10D9; # GEORGIAN MTAVRULI CAPITAL LETTER KAN +1C9A; C; 10DA; # GEORGIAN MTAVRULI CAPITAL LETTER LAS +1C9B; C; 10DB; # GEORGIAN MTAVRULI CAPITAL LETTER MAN +1C9C; C; 10DC; # GEORGIAN MTAVRULI CAPITAL LETTER NAR +1C9D; C; 10DD; # GEORGIAN MTAVRULI CAPITAL LETTER ON +1C9E; C; 10DE; # GEORGIAN MTAVRULI CAPITAL LETTER PAR +1C9F; C; 10DF; # GEORGIAN MTAVRULI CAPITAL LETTER ZHAR +1CA0; C; 10E0; # GEORGIAN MTAVRULI CAPITAL LETTER RAE +1CA1; C; 10E1; # GEORGIAN MTAVRULI CAPITAL LETTER SAN +1CA2; C; 10E2; # GEORGIAN MTAVRULI CAPITAL LETTER TAR +1CA3; C; 10E3; # GEORGIAN MTAVRULI CAPITAL LETTER UN +1CA4; C; 10E4; # GEORGIAN MTAVRULI CAPITAL LETTER PHAR +1CA5; C; 10E5; # GEORGIAN MTAVRULI CAPITAL LETTER KHAR +1CA6; C; 10E6; # GEORGIAN MTAVRULI CAPITAL LETTER GHAN +1CA7; C; 10E7; # GEORGIAN MTAVRULI CAPITAL LETTER QAR +1CA8; C; 10E8; # GEORGIAN MTAVRULI CAPITAL LETTER SHIN +1CA9; C; 10E9; # GEORGIAN MTAVRULI CAPITAL LETTER CHIN +1CAA; C; 10EA; # GEORGIAN MTAVRULI CAPITAL LETTER CAN +1CAB; C; 10EB; # GEORGIAN MTAVRULI CAPITAL LETTER JIL +1CAC; C; 10EC; # GEORGIAN MTAVRULI CAPITAL LETTER CIL +1CAD; C; 10ED; # GEORGIAN MTAVRULI CAPITAL LETTER CHAR +1CAE; C; 10EE; # GEORGIAN MTAVRULI CAPITAL LETTER XAN +1CAF; C; 10EF; # GEORGIAN MTAVRULI CAPITAL LETTER JHAN +1CB0; C; 10F0; # GEORGIAN MTAVRULI CAPITAL LETTER HAE +1CB1; C; 10F1; # GEORGIAN MTAVRULI CAPITAL LETTER HE +1CB2; C; 10F2; # GEORGIAN MTAVRULI CAPITAL LETTER HIE +1CB3; C; 10F3; # GEORGIAN MTAVRULI CAPITAL LETTER WE +1CB4; C; 10F4; # GEORGIAN MTAVRULI CAPITAL LETTER HAR +1CB5; C; 10F5; # GEORGIAN MTAVRULI CAPITAL LETTER HOE +1CB6; C; 10F6; # GEORGIAN MTAVRULI CAPITAL LETTER FI +1CB7; C; 10F7; # GEORGIAN MTAVRULI CAPITAL LETTER YN +1CB8; C; 10F8; # GEORGIAN MTAVRULI CAPITAL LETTER ELIFI +1CB9; C; 10F9; # GEORGIAN MTAVRULI CAPITAL LETTER TURNED GAN +1CBA; C; 10FA; # GEORGIAN MTAVRULI CAPITAL LETTER AIN +1CBD; C; 10FD; # GEORGIAN MTAVRULI CAPITAL LETTER AEN +1CBE; C; 10FE; # GEORGIAN MTAVRULI CAPITAL LETTER HARD SIGN +1CBF; C; 10FF; # GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN 1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW 1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE 1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW @@ -1180,6 +1226,7 @@ A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL A7B3; C; AB53; # LATIN CAPITAL LETTER CHI A7B4; C; A7B5; # LATIN CAPITAL LETTER BETA A7B6; C; A7B7; # LATIN CAPITAL LETTER OMEGA +A7B8; C; A7B9; # LATIN CAPITAL LETTER U WITH STROKE AB70; C; 13A0; # CHEROKEE SMALL LETTER A AB71; C; 13A1; # CHEROKEE SMALL LETTER E AB72; C; 13A2; # CHEROKEE SMALL LETTER I @@ -1457,6 +1504,38 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z 118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU 118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII 118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO +16E40; C; 16E60; # MEDEFAIDRIN CAPITAL LETTER M +16E41; C; 16E61; # MEDEFAIDRIN CAPITAL LETTER S +16E42; C; 16E62; # MEDEFAIDRIN CAPITAL LETTER V +16E43; C; 16E63; # MEDEFAIDRIN CAPITAL LETTER W +16E44; C; 16E64; # MEDEFAIDRIN CAPITAL LETTER ATIU +16E45; C; 16E65; # MEDEFAIDRIN CAPITAL LETTER Z +16E46; C; 16E66; # MEDEFAIDRIN CAPITAL LETTER KP +16E47; C; 16E67; # MEDEFAIDRIN CAPITAL LETTER P +16E48; C; 16E68; # MEDEFAIDRIN CAPITAL LETTER T +16E49; C; 16E69; # MEDEFAIDRIN CAPITAL LETTER G +16E4A; C; 16E6A; # MEDEFAIDRIN CAPITAL LETTER F +16E4B; C; 16E6B; # MEDEFAIDRIN CAPITAL LETTER I +16E4C; C; 16E6C; # MEDEFAIDRIN CAPITAL LETTER K +16E4D; C; 16E6D; # MEDEFAIDRIN CAPITAL LETTER A +16E4E; C; 16E6E; # MEDEFAIDRIN CAPITAL LETTER J +16E4F; C; 16E6F; # MEDEFAIDRIN CAPITAL LETTER E +16E50; C; 16E70; # MEDEFAIDRIN CAPITAL LETTER B +16E51; C; 16E71; # MEDEFAIDRIN CAPITAL LETTER C +16E52; C; 16E72; # MEDEFAIDRIN CAPITAL LETTER U +16E53; C; 16E73; # MEDEFAIDRIN CAPITAL LETTER YU +16E54; C; 16E74; # MEDEFAIDRIN CAPITAL LETTER L +16E55; C; 16E75; # MEDEFAIDRIN CAPITAL LETTER Q +16E56; C; 16E76; # MEDEFAIDRIN CAPITAL LETTER HP +16E57; C; 16E77; # MEDEFAIDRIN CAPITAL LETTER NY +16E58; C; 16E78; # MEDEFAIDRIN CAPITAL LETTER X +16E59; C; 16E79; # MEDEFAIDRIN CAPITAL LETTER D +16E5A; C; 16E7A; # MEDEFAIDRIN CAPITAL LETTER OE +16E5B; C; 16E7B; # MEDEFAIDRIN CAPITAL LETTER N +16E5C; C; 16E7C; # MEDEFAIDRIN CAPITAL LETTER R +16E5D; C; 16E7D; # MEDEFAIDRIN CAPITAL LETTER O +16E5E; C; 16E7E; # MEDEFAIDRIN CAPITAL LETTER AI +16E5F; C; 16E7F; # MEDEFAIDRIN CAPITAL LETTER Y 1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF 1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI 1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM diff --git a/maint/Unicode.tables/DerivedGeneralCategory.txt b/maint/Unicode.tables/DerivedGeneralCategory.txt index bc7f5e8..38c95e2 100644 --- a/maint/Unicode.tables/DerivedGeneralCategory.txt +++ b/maint/Unicode.tables/DerivedGeneralCategory.txt @@ -1,6 +1,6 @@ -# DerivedGeneralCategory-10.0.0.txt -# Date: 2017-03-08, 08:41:49 GMT -# © 2017 Unicode®, Inc. +# DerivedGeneralCategory-11.0.0.txt +# Date: 2018-02-21, 05:34:04 GMT +# © 2018 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see http://www.unicode.org/terms_of_use.html # @@ -22,25 +22,23 @@ 03A2 ; Cn # 0530 ; Cn # 0557..0558 ; Cn # [2] .. -0560 ; Cn # -0588 ; Cn # 058B..058C ; Cn # [2] .. 0590 ; Cn # 05C8..05CF ; Cn # [8] .. -05EB..05EF ; Cn # [5] .. +05EB..05EE ; Cn # [4] .. 05F5..05FF ; Cn # [11] .. 061D ; Cn # 070E ; Cn # 074B..074C ; Cn # [2] .. 07B2..07BF ; Cn # [14] .. -07FB..07FF ; Cn # [5] .. +07FB..07FC ; Cn # [2] .. 082E..082F ; Cn # [2] .. 083F ; Cn # 085C..085D ; Cn # [2] .. 085F ; Cn # 086B..089F ; Cn # [53] .. 08B5 ; Cn # -08BE..08D3 ; Cn # [22] .. +08BE..08D2 ; Cn # [21] .. 0984 ; Cn # 098D..098E ; Cn # [2] .. 0991..0992 ; Cn # [2] .. @@ -54,7 +52,7 @@ 09D8..09DB ; Cn # [4] .. 09DE ; Cn # 09E4..09E5 ; Cn # [2] .. -09FE..0A00 ; Cn # [3] .. +09FF..0A00 ; Cn # [2] .. 0A04 ; Cn # 0A0B..0A0E ; Cn # [4] .. 0A11..0A12 ; Cn # [2] .. @@ -70,7 +68,7 @@ 0A52..0A58 ; Cn # [7] .. 0A5D ; Cn # 0A5F..0A65 ; Cn # [7] .. -0A76..0A80 ; Cn # [11] .. +0A77..0A80 ; Cn # [10] .. 0A84 ; Cn # 0A8E ; Cn # 0A92 ; Cn # @@ -115,7 +113,6 @@ 0BD1..0BD6 ; Cn # [6] .. 0BD8..0BE5 ; Cn # [14] .. 0BFB..0BFF ; Cn # [5] .. -0C04 ; Cn # 0C0D ; Cn # 0C11 ; Cn # 0C29 ; Cn # @@ -127,7 +124,6 @@ 0C5B..0C5F ; Cn # [5] .. 0C64..0C65 ; Cn # [2] .. 0C70..0C77 ; Cn # [8] .. -0C84 ; Cn # 0C8D ; Cn # 0C91 ; Cn # 0CA9 ; Cn # @@ -224,7 +220,7 @@ 17FA..17FF ; Cn # [6] .. 180F ; Cn # 181A..181F ; Cn # [6] .. -1878..187F ; Cn # [8] .. +1879..187F ; Cn # [7] .. 18AB..18AF ; Cn # [5] .. 18F6..18FF ; Cn # [10] .. 191F ; Cn # @@ -248,7 +244,8 @@ 1BF4..1BFB ; Cn # [8] .. 1C38..1C3A ; Cn # [3] .. 1C4A..1C4C ; Cn # [3] .. -1C89..1CBF ; Cn # [55] .. +1C89..1C8F ; Cn # [7] .. +1CBB..1CBC ; Cn # [2] .. 1CC8..1CCF ; Cn # [8] .. 1CFA..1CFF ; Cn # [6] .. 1DFA ; Cn # @@ -279,10 +276,8 @@ 244B..245F ; Cn # [21] .. 2B74..2B75 ; Cn # [2] .. 2B96..2B97 ; Cn # [2] .. -2BBA..2BBC ; Cn # [3] .. 2BC9 ; Cn # -2BD3..2BEB ; Cn # [25] .. -2BF0..2BFF ; Cn # [16] .. +2BFF ; Cn # 2C2F ; Cn # 2C5F ; Cn # 2CF4..2CF8 ; Cn # [5] .. @@ -300,7 +295,7 @@ 2DCF ; Cn # 2DD7 ; Cn # 2DDF ; Cn # -2E4A..2E7F ; Cn # [54] .. +2E4F..2E7F ; Cn # [49] .. 2E9A ; Cn # 2EF4..2EFF ; Cn # [12] .. 2FD6..2FEF ; Cn # [26] .. @@ -308,26 +303,24 @@ 3040 ; Cn # 3097..3098 ; Cn # [2] .. 3100..3104 ; Cn # [5] .. -312F..3130 ; Cn # [2] .. +3130 ; Cn # 318F ; Cn # 31BB..31BF ; Cn # [5] .. 31E4..31EF ; Cn # [12] .. 321F ; Cn # 32FF ; Cn # 4DB6..4DBF ; Cn # [10] .. -9FEB..9FFF ; Cn # [21] .. +9FF0..9FFF ; Cn # [16] .. A48D..A48F ; Cn # [3] .. A4C7..A4CF ; Cn # [9] .. A62C..A63F ; Cn # [20] .. A6F8..A6FF ; Cn # [8] .. -A7AF ; Cn # -A7B8..A7F6 ; Cn # [63] .. +A7BA..A7F6 ; Cn # [61] .. A82C..A82F ; Cn # [4] .. A83A..A83F ; Cn # [6] .. A878..A87F ; Cn # [8] .. A8C6..A8CD ; Cn # [8] .. A8DA..A8DF ; Cn # [6] .. -A8FE..A8FF ; Cn # [2] .. A954..A95E ; Cn # [11] .. A97D..A97F ; Cn # [3] .. A9CE ; Cn # @@ -429,9 +422,9 @@ FFFE..FFFF ; Cn # [2] .. 10A07..10A0B ; Cn # [5] .. 10A14 ; Cn # 10A18 ; Cn # -10A34..10A37 ; Cn # [4] .. +10A36..10A37 ; Cn # [2] .. 10A3B..10A3E ; Cn # [4] .. -10A48..10A4F ; Cn # [8] .. +10A49..10A4F ; Cn # [7] .. 10A59..10A5F ; Cn # [7] .. 10AA0..10ABF ; Cn # [32] .. 10AE7..10AEA ; Cn # [4] .. @@ -445,15 +438,19 @@ FFFE..FFFF ; Cn # [2] .. 10C49..10C7F ; Cn # [55] .. 10CB3..10CBF ; Cn # [13] .. 10CF3..10CF9 ; Cn # [7] .. -10D00..10E5F ; Cn # [352] .. -10E7F..10FFF ; Cn # [385] .. +10D28..10D2F ; Cn # [8] .. +10D3A..10E5F ; Cn # [294] .. +10E7F..10EFF ; Cn # [129] .. +10F28..10F2F ; Cn # [8] .. +10F5A..10FFF ; Cn # [166] .. 1104E..11051 ; Cn # [4] .. 11070..1107E ; Cn # [15] .. -110C2..110CF ; Cn # [14] .. +110C2..110CC ; Cn # [11] .. +110CE..110CF ; Cn # [2] .. 110E9..110EF ; Cn # [7] .. 110FA..110FF ; Cn # [6] .. 11135 ; Cn # -11144..1114F ; Cn # [12] .. +11147..1114F ; Cn # [9] .. 11177..1117F ; Cn # [9] .. 111CE..111CF ; Cn # [2] .. 111E0 ; Cn # @@ -473,7 +470,7 @@ FFFE..FFFF ; Cn # [2] .. 11329 ; Cn # 11331 ; Cn # 11334 ; Cn # -1133A..1133B ; Cn # [2] .. +1133A ; Cn # 11345..11346 ; Cn # [2] .. 11349..1134A ; Cn # [2] .. 1134E..1134F ; Cn # [2] .. @@ -484,7 +481,7 @@ FFFE..FFFF ; Cn # [2] .. 11375..113FF ; Cn # [139] .. 1145A ; Cn # 1145C ; Cn # -1145E..1147F ; Cn # [34] .. +1145F..1147F ; Cn # [33] .. 114C8..114CF ; Cn # [8] .. 114DA..1157F ; Cn # [166] .. 115B6..115B7 ; Cn # [2] .. @@ -494,14 +491,14 @@ FFFE..FFFF ; Cn # [2] .. 1166D..1167F ; Cn # [19] .. 116B8..116BF ; Cn # [8] .. 116CA..116FF ; Cn # [54] .. -1171A..1171C ; Cn # [3] .. +1171B..1171C ; Cn # [2] .. 1172C..1172F ; Cn # [4] .. -11740..1189F ; Cn # [352] .. +11740..117FF ; Cn # [192] .. +1183C..1189F ; Cn # [100] .. 118F3..118FE ; Cn # [12] .. 11900..119FF ; Cn # [256] .. 11A48..11A4F ; Cn # [8] .. 11A84..11A85 ; Cn # [2] .. -11A9D ; Cn # 11AA3..11ABF ; Cn # [29] .. 11AF9..11BFF ; Cn # [263] .. 11C09 ; Cn # @@ -517,7 +514,14 @@ FFFE..FFFF ; Cn # [2] .. 11D3B ; Cn # 11D3E ; Cn # 11D48..11D4F ; Cn # [8] .. -11D5A..11FFF ; Cn # [678] .. +11D5A..11D5F ; Cn # [6] .. +11D66 ; Cn # +11D69 ; Cn # +11D8F ; Cn # +11D92 ; Cn # +11D99..11D9F ; Cn # [7] .. +11DAA..11EDF ; Cn # [310] .. +11EF9..11FFF ; Cn # [263] .. 1239A..123FF ; Cn # [102] .. 1246F ; Cn # 12475..1247F ; Cn # [11] .. @@ -534,12 +538,13 @@ FFFE..FFFF ; Cn # [2] .. 16B5A ; Cn # 16B62 ; Cn # 16B78..16B7C ; Cn # [5] .. -16B90..16EFF ; Cn # [880] .. +16B90..16E3F ; Cn # [688] .. +16E9B..16EFF ; Cn # [101] .. 16F45..16F4F ; Cn # [11] .. 16F7F..16F8E ; Cn # [16] .. 16FA0..16FDF ; Cn # [64] .. 16FE2..16FFF ; Cn # [30] .. -187ED..187FF ; Cn # [19] .. +187F2..187FF ; Cn # [14] .. 18AF3..1AFFF ; Cn # [9485] .. 1B11F..1B16F ; Cn # [81] .. 1B2FC..1BBFF ; Cn # [2308] .. @@ -551,9 +556,10 @@ FFFE..FFFF ; Cn # [2] .. 1D0F6..1D0FF ; Cn # [10] .. 1D127..1D128 ; Cn # [2] .. 1D1E9..1D1FF ; Cn # [23] .. -1D246..1D2FF ; Cn # [186] .. +1D246..1D2DF ; Cn # [154] .. +1D2F4..1D2FF ; Cn # [12] .. 1D357..1D35F ; Cn # [9] .. -1D372..1D3FF ; Cn # [142] .. +1D379..1D3FF ; Cn # [135] .. 1D455 ; Cn # 1D49D ; Cn # 1D4A0..1D4A1 ; Cn # [2] .. @@ -586,7 +592,8 @@ FFFE..FFFF ; Cn # [2] .. 1E8D7..1E8FF ; Cn # [41] .. 1E94B..1E94F ; Cn # [5] .. 1E95A..1E95D ; Cn # [4] .. -1E960..1EDFF ; Cn # [1184] .. +1E960..1EC70 ; Cn # [785] .. +1ECB5..1EDFF ; Cn # [331] .. 1EE04 ; Cn # 1EE20 ; Cn # 1EE23 ; Cn # @@ -628,7 +635,6 @@ FFFE..FFFF ; Cn # [2] .. 1F0D0 ; Cn # 1F0F6..1F0FF ; Cn # [10] .. 1F10D..1F10F ; Cn # [3] .. -1F12F ; Cn # 1F16C..1F16F ; Cn # [4] .. 1F1AD..1F1E5 ; Cn # [57] .. 1F203..1F20F ; Cn # [13] .. @@ -638,9 +644,9 @@ FFFE..FFFF ; Cn # [2] .. 1F266..1F2FF ; Cn # [154] .. 1F6D5..1F6DF ; Cn # [11] .. 1F6ED..1F6EF ; Cn # [3] .. -1F6F9..1F6FF ; Cn # [7] .. +1F6FA..1F6FF ; Cn # [6] .. 1F774..1F77F ; Cn # [12] .. -1F7D5..1F7FF ; Cn # [43] .. +1F7D9..1F7FF ; Cn # [39] .. 1F80C..1F80F ; Cn # [4] .. 1F848..1F84F ; Cn # [8] .. 1F85A..1F85F ; Cn # [6] .. @@ -648,11 +654,14 @@ FFFE..FFFF ; Cn # [2] .. 1F8AE..1F8FF ; Cn # [82] .. 1F90C..1F90F ; Cn # [4] .. 1F93F ; Cn # -1F94D..1F94F ; Cn # [3] .. -1F96C..1F97F ; Cn # [20] .. -1F998..1F9BF ; Cn # [40] .. -1F9C1..1F9CF ; Cn # [15] .. -1F9E7..1FFFF ; Cn # [1561] .. +1F971..1F972 ; Cn # [2] .. +1F977..1F979 ; Cn # [3] .. +1F97B ; Cn # +1F9A3..1F9AF ; Cn # [13] .. +1F9BA..1F9BF ; Cn # [6] .. +1F9C3..1F9CF ; Cn # [13] .. +1FA00..1FA5F ; Cn # [96] .. +1FA6E..1FFFF ; Cn # [1426] .. 2A6D7..2A6FF ; Cn # [41] .. 2B735..2B73F ; Cn # [11] .. 2B81E..2B81F ; Cn # [2] .. @@ -665,7 +674,7 @@ E01F0..EFFFF ; Cn # [65040] .. FFFFE..FFFFF ; Cn # [2] .. 10FFFE..10FFFF; Cn # [2] .. -# Total code points: 837841 +# Total code points: 837157 # ================================================ @@ -947,6 +956,8 @@ FFFFE..FFFFF ; Cn # [2] .. 10C7 ; Lu # GEORGIAN CAPITAL LETTER YN 10CD ; Lu # GEORGIAN CAPITAL LETTER AEN 13A0..13F5 ; Lu # [86] CHEROKEE LETTER A..CHEROKEE LETTER MV +1C90..1CBA ; Lu # [43] GEORGIAN MTAVRULI CAPITAL LETTER AN..GEORGIAN MTAVRULI CAPITAL LETTER AIN +1CBD..1CBF ; Lu # [3] GEORGIAN MTAVRULI CAPITAL LETTER AEN..GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN 1E00 ; Lu # LATIN CAPITAL LETTER A WITH RING BELOW 1E02 ; Lu # LATIN CAPITAL LETTER B WITH DOT ABOVE 1E04 ; Lu # LATIN CAPITAL LETTER B WITH DOT BELOW @@ -1261,11 +1272,13 @@ A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE A7AA..A7AE ; Lu # [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA +A7B8 ; Lu # LATIN CAPITAL LETTER U WITH STROKE FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z 10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW 104B0..104D3 ; Lu # [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA 10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US 118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO +16E40..16E5F ; Lu # [32] MEDEFAIDRIN CAPITAL LETTER M..MEDEFAIDRIN CAPITAL LETTER Y 1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z 1D434..1D44D ; Lu # [26] MATHEMATICAL ITALIC CAPITAL A..MATHEMATICAL ITALIC CAPITAL Z 1D468..1D481 ; Lu # [26] MATHEMATICAL BOLD ITALIC CAPITAL A..MATHEMATICAL BOLD ITALIC CAPITAL Z @@ -1299,7 +1312,7 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP 1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA 1E900..1E921 ; Lu # [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA -# Total code points: 1702 +# Total code points: 1781 # ================================================ @@ -1574,7 +1587,9 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP 052B ; Ll # CYRILLIC SMALL LETTER DZZHE 052D ; Ll # CYRILLIC SMALL LETTER DCHE 052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER -0561..0587 ; Ll # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN +0560..0588 ; Ll # [41] ARMENIAN SMALL LETTER TURNED AYB..ARMENIAN SMALL LETTER YI WITH STROKE +10D0..10FA ; Ll # [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN +10FD..10FF ; Ll # [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN 13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV 1C80..1C88 ; Ll # [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK 1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL @@ -1896,8 +1911,10 @@ A7A3 ; Ll # LATIN SMALL LETTER K WITH OBLIQUE STROKE A7A5 ; Ll # LATIN SMALL LETTER N WITH OBLIQUE STROKE A7A7 ; Ll # LATIN SMALL LETTER R WITH OBLIQUE STROKE A7A9 ; Ll # LATIN SMALL LETTER S WITH OBLIQUE STROKE +A7AF ; Ll # LATIN LETTER SMALL CAPITAL Q A7B5 ; Ll # LATIN SMALL LETTER BETA A7B7 ; Ll # LATIN SMALL LETTER OMEGA +A7B9 ; Ll # LATIN SMALL LETTER U WITH STROKE A7FA ; Ll # LATIN LETTER SMALL CAPITAL TURNED M AB30..AB5A ; Ll # [43] LATIN SMALL LETTER BARRED ALPHA..LATIN SMALL LETTER Y WITH SHORT RIGHT LEG AB60..AB65 ; Ll # [6] LATIN SMALL LETTER SAKHA YAT..GREEK LETTER SMALL CAPITAL OMEGA @@ -1909,6 +1926,7 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL 104D8..104FB ; Ll # [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA 10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US 118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO +16E60..16E7F ; Ll # [32] MEDEFAIDRIN SMALL LETTER M..MEDEFAIDRIN SMALL LETTER Y 1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z 1D44E..1D454 ; Ll # [7] MATHEMATICAL ITALIC SMALL A..MATHEMATICAL ITALIC SMALL G 1D456..1D467 ; Ll # [18] MATHEMATICAL ITALIC SMALL I..MATHEMATICAL ITALIC SMALL Z @@ -1939,7 +1957,7 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL 1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA 1E922..1E943 ; Ll # [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA -# Total code points: 2063 +# Total code points: 2145 # ================================================ @@ -2032,7 +2050,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK 01C0..01C3 ; Lo # [4] LATIN LETTER DENTAL CLICK..LATIN LETTER RETROFLEX CLICK 0294 ; Lo # LATIN LETTER GLOTTAL STOP 05D0..05EA ; Lo # [27] HEBREW LETTER ALEF..HEBREW LETTER TAV -05F0..05F2 ; Lo # [3] HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW LIGATURE YIDDISH DOUBLE YOD +05EF..05F2 ; Lo # [4] HEBREW YOD TRIANGLE..HEBREW LIGATURE YIDDISH DOUBLE YOD 0620..063F ; Lo # [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE 0641..064A ; Lo # [10] ARABIC LETTER FEH..ARABIC LETTER YEH 066E..066F ; Lo # [2] ARABIC LETTER DOTLESS BEH..ARABIC LETTER DOTLESS QAF @@ -2171,8 +2189,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK 106E..1070 ; Lo # [3] MYANMAR LETTER EASTERN PWO KAREN NNA..MYANMAR LETTER EASTERN PWO KAREN GHWA 1075..1081 ; Lo # [13] MYANMAR LETTER SHAN KA..MYANMAR LETTER SHAN HA 108E ; Lo # MYANMAR LETTER RUMAI PALAUNG FA -10D0..10FA ; Lo # [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN -10FD..1248 ; Lo # [332] GEORGIAN LETTER AEN..ETHIOPIC SYLLABLE QWA +1100..1248 ; Lo # [329] HANGUL CHOSEONG KIYEOK..ETHIOPIC SYLLABLE QWA 124A..124D ; Lo # [4] ETHIOPIC SYLLABLE QWI..ETHIOPIC SYLLABLE QWE 1250..1256 ; Lo # [7] ETHIOPIC SYLLABLE QHA..ETHIOPIC SYLLABLE QHO 1258 ; Lo # ETHIOPIC SYLLABLE QHWA @@ -2203,7 +2220,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK 1780..17B3 ; Lo # [52] KHMER LETTER KA..KHMER INDEPENDENT VOWEL QAU 17DC ; Lo # KHMER SIGN AVAKRAHASANYA 1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI -1844..1877 ; Lo # [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA +1844..1878 ; Lo # [53] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER CHA WITH TWO DOTS 1880..1884 ; Lo # [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA 1887..18A8 ; Lo # [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA 18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA @@ -2243,12 +2260,12 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK 309F ; Lo # HIRAGANA DIGRAPH YORI 30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO 30FF ; Lo # KATAKANA DIGRAPH KOTO -3105..312E ; Lo # [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE +3105..312F ; Lo # [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN 3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE 31A0..31BA ; Lo # [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY 31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO 3400..4DB5 ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5 -4E00..9FEA ; Lo # [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA +4E00..9FEF ; Lo # [20976] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEF A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE @@ -2267,7 +2284,7 @@ A840..A873 ; Lo # [52] PHAGS-PA LETTER KA..PHAGS-PA LETTER CANDRABINDU A882..A8B3 ; Lo # [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA A8F2..A8F7 ; Lo # [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVANAGARI SIGN CANDRABINDU AVAGRAHA A8FB ; Lo # DEVANAGARI HEADSTROKE -A8FD ; Lo # DEVANAGARI JAIN OM +A8FD..A8FE ; Lo # [2] DEVANAGARI JAIN OM..DEVANAGARI LETTER AY A90A..A925 ; Lo # [28] KAYAH LI LETTER KA..KAYAH LI LETTER OO A930..A946 ; Lo # [23] REJANG LETTER KA..REJANG LETTER A A960..A97C ; Lo # [29] HANGUL CHOSEONG TIKEUT-MIEUM..HANGUL CHOSEONG SSANGYEORINHIEUH @@ -2361,7 +2378,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 10A00 ; Lo # KHAROSHTHI LETTER A 10A10..10A13 ; Lo # [4] KHAROSHTHI LETTER KA..KHAROSHTHI LETTER GHA 10A15..10A17 ; Lo # [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA -10A19..10A33 ; Lo # [27] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTTHA +10A19..10A35 ; Lo # [29] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER VHA 10A60..10A7C ; Lo # [29] OLD SOUTH ARABIAN LETTER HE..OLD SOUTH ARABIAN LETTER THETH 10A80..10A9C ; Lo # [29] OLD NORTH ARABIAN LETTER HEH..OLD NORTH ARABIAN LETTER ZAH 10AC0..10AC7 ; Lo # [8] MANICHAEAN LETTER ALEPH..MANICHAEAN LETTER WAW @@ -2371,10 +2388,15 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 10B60..10B72 ; Lo # [19] INSCRIPTIONAL PAHLAVI LETTER ALEPH..INSCRIPTIONAL PAHLAVI LETTER TAW 10B80..10B91 ; Lo # [18] PSALTER PAHLAVI LETTER ALEPH..PSALTER PAHLAVI LETTER TAW 10C00..10C48 ; Lo # [73] OLD TURKIC LETTER ORKHON A..OLD TURKIC LETTER ORKHON BASH +10D00..10D23 ; Lo # [36] HANIFI ROHINGYA LETTER A..HANIFI ROHINGYA MARK NA KHONNA +10F00..10F1C ; Lo # [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL +10F27 ; Lo # OLD SOGDIAN LIGATURE AYIN-DALETH +10F30..10F45 ; Lo # [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN 11003..11037 ; Lo # [53] BRAHMI SIGN JIHVAMULIYA..BRAHMI LETTER OLD TAMIL NNNA 11083..110AF ; Lo # [45] KAITHI LETTER A..KAITHI LETTER HA 110D0..110E8 ; Lo # [25] SORA SOMPENG LETTER SAH..SORA SOMPENG LETTER MAE 11103..11126 ; Lo # [36] CHAKMA LETTER AA..CHAKMA LETTER HAA +11144 ; Lo # CHAKMA LETTER LHAA 11150..11172 ; Lo # [35] MAHAJANI LETTER A..MAHAJANI LETTER RRA 11176 ; Lo # MAHAJANI LIGATURE SHRI 11183..111B2 ; Lo # [48] SHARADA LETTER A..SHARADA LETTER HA @@ -2408,7 +2430,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 11600..1162F ; Lo # [48] MODI LETTER A..MODI LETTER LLA 11644 ; Lo # MODI SIGN HUVA 11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA -11700..11719 ; Lo # [26] AHOM LETTER KA..AHOM LETTER JHA +11700..1171A ; Lo # [27] AHOM LETTER KA..AHOM LETTER ALTERNATE BA +11800..1182B ; Lo # [44] DOGRA LETTER A..DOGRA LETTER RRA 118FF ; Lo # WARANG CITI OM 11A00 ; Lo # ZANABAZAR SQUARE LETTER A 11A0B..11A32 ; Lo # [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA @@ -2416,6 +2439,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 11A50 ; Lo # SOYOMBO LETTER A 11A5C..11A83 ; Lo # [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA 11A86..11A89 ; Lo # [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA +11A9D ; Lo # SOYOMBO MARK PLUTA 11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL 11C00..11C08 ; Lo # [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L 11C0A..11C2E ; Lo # [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA @@ -2425,6 +2449,11 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 11D08..11D09 ; Lo # [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O 11D0B..11D30 ; Lo # [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA 11D46 ; Lo # MASARAM GONDI REPHA +11D60..11D65 ; Lo # [6] GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER UU +11D67..11D68 ; Lo # [2] GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTER AI +11D6A..11D89 ; Lo # [32] GUNJALA GONDI LETTER OO..GUNJALA GONDI LETTER SA +11D98 ; Lo # GUNJALA GONDI OM +11EE0..11EF2 ; Lo # [19] MAKASAR LETTER KA..MAKASAR ANGKA 12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U 12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU 13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032 @@ -2437,7 +2466,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ 16F00..16F44 ; Lo # [69] MIAO LETTER PA..MIAO LETTER HHA 16F50 ; Lo # MIAO LETTER NASALIZATION -17000..187EC ; Lo # [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC +17000..187F1 ; Lo # [6130] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187F1 18800..18AF2 ; Lo # [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755 1B000..1B11E ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2 1B170..1B2FB ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB @@ -2486,7 +2515,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D -# Total code points: 121047 +# Total code points: 121212 # ================================================ @@ -2510,12 +2539,13 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 0730..074A ; Mn # [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH 07A6..07B0 ; Mn # [11] THAANA ABAFILI..THAANA SUKUN 07EB..07F3 ; Mn # [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE +07FD ; Mn # NKO DANTAYALAN 0816..0819 ; Mn # [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH 081B..0823 ; Mn # [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A 0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U 0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA 0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK -08D4..08E1 ; Mn # [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA +08D3..08E1 ; Mn # [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA 08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA 093A ; Mn # DEVANAGARI VOWEL SIGN OE 093C ; Mn # DEVANAGARI SIGN NUKTA @@ -2528,6 +2558,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 09C1..09C4 ; Mn # [4] BENGALI VOWEL SIGN U..BENGALI VOWEL SIGN VOCALIC RR 09CD ; Mn # BENGALI SIGN VIRAMA 09E2..09E3 ; Mn # [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL +09FE ; Mn # BENGALI SANDHI MARK 0A01..0A02 ; Mn # [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI 0A3C ; Mn # GURMUKHI SIGN NUKTA 0A41..0A42 ; Mn # [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU @@ -2554,6 +2585,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I 0BC0 ; Mn # TAMIL VOWEL SIGN II 0BCD ; Mn # TAMIL SIGN VIRAMA 0C00 ; Mn # TELUGU SIGN COMBINING CANDRABINDU ABOVE +0C04 ; Mn # TELUGU SIGN COMBINING ANUSVARA ABOVE 0C3E..0C40 ; Mn # [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II 0C46..0C48 ; Mn # [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI 0C4A..0C4D ; Mn # [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA @@ -2670,6 +2702,7 @@ A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E A8C4..A8C5 ; Mn # [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA +A8FF ; Mn # DEVANAGARI VOWEL SIGN AY A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R A980..A982 ; Mn # [3] JAVANESE SIGN PANYANGGA..JAVANESE SIGN LAYAR @@ -2705,6 +2738,8 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL 10A38..10A3A ; Mn # [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW 10A3F ; Mn # KHAROSHTHI VIRAMA 10AE5..10AE6 ; Mn # [2] MANICHAEAN ABBREVIATION MARK ABOVE..MANICHAEAN ABBREVIATION MARK BELOW +10D24..10D27 ; Mn # [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI +10F46..10F50 ; Mn # [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW 11001 ; Mn # BRAHMI SIGN ANUSVARA 11038..11046 ; Mn # [15] BRAHMI VOWEL SIGN AA..BRAHMI VIRAMA 1107F..11081 ; Mn # [3] BRAHMI NUMBER JOINER..KAITHI SIGN ANUSVARA @@ -2716,7 +2751,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL 11173 ; Mn # MAHAJANI SIGN NUKTA 11180..11181 ; Mn # [2] SHARADA SIGN CANDRABINDU..SHARADA SIGN ANUSVARA 111B6..111BE ; Mn # [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O -111CA..111CC ; Mn # [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK +111C9..111CC ; Mn # [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK 1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI 11234 ; Mn # KHOJKI SIGN ANUSVARA 11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA @@ -2724,13 +2759,14 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL 112DF ; Mn # KHUDAWADI SIGN ANUSVARA 112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA 11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU -1133C ; Mn # GRANTHA SIGN NUKTA +1133B..1133C ; Mn # [2] COMBINING BINDU BELOW..GRANTHA SIGN NUKTA 11340 ; Mn # GRANTHA VOWEL SIGN II 11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX 11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA 11438..1143F ; Mn # [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI 11442..11444 ; Mn # [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA 11446 ; Mn # NEWA SIGN NUKTA +1145E ; Mn # NEWA SANDHI MARK 114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL 114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E 114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA @@ -2749,8 +2785,9 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL 1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA 11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU 11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER -11A01..11A06 ; Mn # [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O -11A09..11A0A ; Mn # [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK +1182F..11837 ; Mn # [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA +11839..1183A ; Mn # [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA +11A01..11A0A ; Mn # [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK 11A33..11A38 ; Mn # [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA 11A3B..11A3E ; Mn # [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA 11A47 ; Mn # ZANABAZAR SQUARE SUBJOINER @@ -2770,6 +2807,10 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL 11D3C..11D3D ; Mn # [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O 11D3F..11D45 ; Mn # [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA 11D47 ; Mn # MASARAM GONDI RA-KARA +11D90..11D91 ; Mn # [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI +11D95 ; Mn # GUNJALA GONDI SIGN ANUSVARA +11D97 ; Mn # GUNJALA GONDI VIRAMA +11EF3..11EF4 ; Mn # [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U 16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE 16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM 16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW @@ -2794,7 +2835,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL 1E944..1E94A ; Mn # [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 1763 +# Total code points: 1805 # ================================================ @@ -2928,6 +2969,7 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK 110B0..110B2 ; Mc # [3] KAITHI VOWEL SIGN AA..KAITHI VOWEL SIGN II 110B7..110B8 ; Mc # [2] KAITHI VOWEL SIGN O..KAITHI VOWEL SIGN AU 1112C ; Mc # CHAKMA VOWEL SIGN E +11145..11146 ; Mc # [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI 11182 ; Mc # SHARADA SIGN VISARGA 111B3..111B5 ; Mc # [3] SHARADA VOWEL SIGN AA..SHARADA VOWEL SIGN II 111BF..111C0 ; Mc # [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA @@ -2960,7 +3002,8 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK 116B6 ; Mc # TAKRI SIGN VIRAMA 11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA 11726 ; Mc # AHOM VOWEL SIGN E -11A07..11A08 ; Mc # [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU +1182C..1182E ; Mc # [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II +11838 ; Mc # DOGRA SIGN VISARGA 11A39 ; Mc # ZANABAZAR SQUARE SIGN VISARGA 11A57..11A58 ; Mc # [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU 11A97 ; Mc # SOYOMBO SIGN VISARGA @@ -2969,11 +3012,15 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK 11CA9 ; Mc # MARCHEN SUBJOINED LETTER YA 11CB1 ; Mc # MARCHEN VOWEL SIGN I 11CB4 ; Mc # MARCHEN VOWEL SIGN O +11D8A..11D8E ; Mc # [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU +11D93..11D94 ; Mc # [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU +11D96 ; Mc # GUNJALA GONDI SIGN VISARGA +11EF5..11EF6 ; Mc # [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O 16F51..16F7E ; Mc # [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG 1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5 -# Total code points: 401 +# Total code points: 415 # ================================================ @@ -3017,6 +3064,7 @@ AA50..AA59 ; Nd # [10] CHAM DIGIT ZERO..CHAM DIGIT NINE ABF0..ABF9 ; Nd # [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT NINE FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE 104A0..104A9 ; Nd # [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE +10D30..10D39 ; Nd # [10] HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA DIGIT NINE 11066..1106F ; Nd # [10] BRAHMI DIGIT ZERO..BRAHMI DIGIT NINE 110F0..110F9 ; Nd # [10] SORA SOMPENG DIGIT ZERO..SORA SOMPENG DIGIT NINE 11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE @@ -3030,12 +3078,13 @@ FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE 118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE 11C50..11C59 ; Nd # [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE 11D50..11D59 ; Nd # [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE +11DA0..11DA9 ; Nd # [10] GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGIT NINE 16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE 16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE 1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE 1E950..1E959 ; Nd # [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE -# Total code points: 590 +# Total code points: 610 # ================================================ @@ -3102,7 +3151,7 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO 109BC..109BD ; No # [2] MEROITIC CURSIVE FRACTION ELEVEN TWELFTHS..MEROITIC CURSIVE FRACTION ONE HALF 109C0..109CF ; No # [16] MEROITIC CURSIVE NUMBER ONE..MEROITIC CURSIVE NUMBER SEVENTY 109D2..109FF ; No # [46] MEROITIC CURSIVE NUMBER ONE HUNDRED..MEROITIC CURSIVE FRACTION TEN TWELFTHS -10A40..10A47 ; No # [8] KHAROSHTHI DIGIT ONE..KHAROSHTHI NUMBER ONE THOUSAND +10A40..10A48 ; No # [9] KHAROSHTHI DIGIT ONE..KHAROSHTHI FRACTION ONE HALF 10A7D..10A7E ; No # [2] OLD SOUTH ARABIAN NUMBER ONE..OLD SOUTH ARABIAN NUMBER FIFTY 10A9D..10A9F ; No # [3] OLD NORTH ARABIAN NUMBER ONE..OLD NORTH ARABIAN NUMBER TWENTY 10AEB..10AEF ; No # [5] MANICHAEAN NUMBER ONE..MANICHAEAN NUMBER ONE HUNDRED @@ -3111,17 +3160,24 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO 10BA9..10BAF ; No # [7] PSALTER PAHLAVI NUMBER ONE..PSALTER PAHLAVI NUMBER ONE HUNDRED 10CFA..10CFF ; No # [6] OLD HUNGARIAN NUMBER ONE..OLD HUNGARIAN NUMBER ONE THOUSAND 10E60..10E7E ; No # [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS +10F1D..10F26 ; No # [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF +10F51..10F54 ; No # [4] SOGDIAN NUMBER ONE..SOGDIAN NUMBER ONE HUNDRED 11052..11065 ; No # [20] BRAHMI NUMBER ONE..BRAHMI NUMBER ONE THOUSAND 111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND 1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY 118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY 11C5A..11C6C ; No # [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK 16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS -1D360..1D371 ; No # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE +16E80..16E96 ; No # [23] MEDEFAIDRIN DIGIT ZERO..MEDEFAIDRIN DIGIT THREE ALTERNATE FORM +1D2E0..1D2F3 ; No # [20] MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN +1D360..1D378 ; No # [25] COUNTING ROD UNIT DIGIT ONE..TALLY MARK FIVE 1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE +1EC71..1ECAB ; No # [59] INDIC SIYAQ NUMBER ONE..INDIC SIYAQ NUMBER PREFIXED NINE +1ECAD..1ECAF ; No # [3] INDIC SIYAQ FRACTION ONE QUARTER..INDIC SIYAQ FRACTION THREE QUARTERS +1ECB1..1ECB4 ; No # [4] INDIC SIYAQ NUMBER ALTERNATE ONE..INDIC SIYAQ ALTERNATE LAKH MARK 1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO -# Total code points: 676 +# Total code points: 807 # ================================================ @@ -3180,12 +3236,13 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO FEFF ; Cf # ZERO WIDTH NO-BREAK SPACE FFF9..FFFB ; Cf # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR 110BD ; Cf # KAITHI NUMBER SIGN +110CD ; Cf # KAITHI NUMBER SIGN ABOVE 1BCA0..1BCA3 ; Cf # [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP 1D173..1D17A ; Cf # [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE E0001 ; Cf # LANGUAGE TAG E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG -# Total code points: 151 +# Total code points: 152 # ================================================ @@ -3440,7 +3497,9 @@ FF3F ; Pc # FULLWIDTH LOW LINE 0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA 0970 ; Po # DEVANAGARI ABBREVIATION SIGN 09FD ; Po # BENGALI ABBREVIATION SIGN +0A76 ; Po # GURMUKHI ABBREVIATION SIGN 0AF0 ; Po # GUJARATI ABBREVIATION SIGN +0C84 ; Po # KANNADA SIGN SIDDHAM 0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA 0E4F ; Po # THAI CHARACTER FONGMAN 0E5A..0E5B ; Po # [2] THAI CHARACTER ANGKHANKHU..THAI CHARACTER KHOMUT @@ -3491,7 +3550,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE 2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN 2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM 2E41 ; Po # REVERSED COMMA -2E43..2E49 ; Po # [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA +2E43..2E4E ; Po # [12] DASH WITH LEFT UPTURN..PUNCTUS ELEVATUS MARK 3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK 303D ; Po # PART ALTERNATION MARK 30FB ; Po # KATAKANA MIDDLE DOT @@ -3544,12 +3603,13 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL 10AF0..10AF6 ; Po # [7] MANICHAEAN PUNCTUATION STAR..MANICHAEAN PUNCTUATION LINE FILLER 10B39..10B3F ; Po # [7] AVESTAN ABBREVIATION MARK..LARGE ONE RING OVER TWO RINGS PUNCTUATION 10B99..10B9C ; Po # [4] PSALTER PAHLAVI SECTION MARK..PSALTER PAHLAVI FOUR DOTS WITH DOT +10F55..10F59 ; Po # [5] SOGDIAN PUNCTUATION TWO VERTICAL BARS..SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT 11047..1104D ; Po # [7] BRAHMI DANDA..BRAHMI PUNCTUATION LOTUS 110BB..110BC ; Po # [2] KAITHI ABBREVIATION SIGN..KAITHI ENUMERATION SIGN 110BE..110C1 ; Po # [4] KAITHI SECTION MARK..KAITHI DOUBLE DANDA 11140..11143 ; Po # [4] CHAKMA SECTION MARK..CHAKMA QUESTION MARK 11174..11175 ; Po # [2] MAHAJANI ABBREVIATION SIGN..MAHAJANI SECTION MARK -111C5..111C9 ; Po # [5] SHARADA DANDA..SHARADA SANDHI MARK +111C5..111C8 ; Po # [4] SHARADA DANDA..SHARADA SEPARATOR 111CD ; Po # SHARADA SUTRA MARK 111DB ; Po # SHARADA SIGN SIDDHAM 111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2 @@ -3563,21 +3623,24 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL 11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN 11660..1166C ; Po # [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT 1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI +1183B ; Po # DOGRA ABBREVIATION SIGN 11A3F..11A46 ; Po # [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK 11A9A..11A9C ; Po # [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD 11A9E..11AA2 ; Po # [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2 11C41..11C45 ; Po # [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2 11C70..11C71 ; Po # [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD +11EF7..11EF8 ; Po # [2] MAKASAR PASSIMBANG..MAKASAR END OF SECTION 12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON 16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA 16AF5 ; Po # BASSA VAH FULL STOP 16B37..16B3B ; Po # [5] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS FEEM 16B44 ; Po # PAHAWH HMONG SIGN XAUS +16E97..16E9A ; Po # [4] MEDEFAIDRIN COMMA..MEDEFAIDRIN EXCLAMATION OH 1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP 1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS 1E95E..1E95F ; Po # [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK -# Total code points: 566 +# Total code points: 584 # ================================================ @@ -3658,6 +3721,7 @@ FFE9..FFEC ; Sm # [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW 00A2..00A5 ; Sc # [4] CENT SIGN..YEN SIGN 058F ; Sc # ARMENIAN DRAM SIGN 060B ; Sc # AFGHANI SIGN +07FE..07FF ; Sc # [2] NKO DOROME SIGN..NKO TAMAN SIGN 09F2..09F3 ; Sc # [2] BENGALI RUPEE MARK..BENGALI RUPEE SIGN 09FB ; Sc # BENGALI GANDA MARK 0AF1 ; Sc # GUJARATI RUPEE SIGN @@ -3671,8 +3735,9 @@ FE69 ; Sc # SMALL DOLLAR SIGN FF04 ; Sc # FULLWIDTH DOLLAR SIGN FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN +1ECB0 ; Sc # INDIC SIYAQ RUPEE MARK -# Total code points: 54 +# Total code points: 57 # ================================================ @@ -3793,10 +3858,8 @@ FFE3 ; Sk # FULLWIDTH MACRON 2B45..2B46 ; So # [2] LEFTWARDS QUADRUPLE ARROW..RIGHTWARDS QUADRUPLE ARROW 2B4D..2B73 ; So # [39] DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW..DOWNWARDS TRIANGLE-HEADED ARROW TO BAR 2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW -2B98..2BB9 ; So # [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX -2BBD..2BC8 ; So # [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED -2BCA..2BD2 ; So # [9] TOP HALF BLACK CIRCLE..GROUP MARK -2BEC..2BEF ; So # [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS +2B98..2BC8 ; So # [49] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED +2BCA..2BFE ; So # [53] TOP HALF BLACK CIRCLE..REVERSED RIGHT ANGLE 2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA 2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP 2E9B..2EF3 ; So # [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE @@ -3855,14 +3918,14 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER 1DA6D..1DA74 ; So # [8] SIGNWRITING SHOULDER HIP SPINE..SIGNWRITING TORSO-FLOORPLANE TWISTING 1DA76..1DA83 ; So # [14] SIGNWRITING LIMB COMBINATION..SIGNWRITING LOCATION DEPTH 1DA85..1DA86 ; So # [2] SIGNWRITING LOCATION TORSO..SIGNWRITING LOCATION LIMBS DIGITS +1ECAC ; So # INDIC SIYAQ PLACEHOLDER 1F000..1F02B ; So # [44] MAHJONG TILE EAST WIND..MAHJONG TILE BACK 1F030..1F093 ; So # [100] DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06 1F0A0..1F0AE ; So # [15] PLAYING CARD BACK..PLAYING CARD KING OF SPADES 1F0B1..1F0BF ; So # [15] PLAYING CARD ACE OF HEARTS..PLAYING CARD RED JOKER 1F0C1..1F0CF ; So # [15] PLAYING CARD ACE OF DIAMONDS..PLAYING CARD BLACK JOKER 1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21 -1F110..1F12E ; So # [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ -1F130..1F16B ; So # [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN +1F110..1F16B ; So # [92] PARENTHESIZED LATIN CAPITAL LETTER A..RAISED MD SIGN 1F170..1F1AC ; So # [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD 1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA 1F210..1F23B ; So # [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D @@ -3872,9 +3935,9 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER 1F300..1F3FA ; So # [251] CYCLONE..AMPHORA 1F400..1F6D4 ; So # [725] RAT..PAGODA 1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING -1F6F0..1F6F8 ; So # [9] SATELLITE..FLYING SAUCER +1F6F0..1F6F9 ; So # [10] SATELLITE..SKATEBOARD 1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE -1F780..1F7D4 ; So # [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR +1F780..1F7D8 ; So # [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE 1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 1F810..1F847 ; So # [56] LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD..DOWNWARDS HEAVY ARROW 1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW @@ -3882,13 +3945,16 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER 1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS 1F900..1F90B ; So # [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT 1F910..1F93E ; So # [47] ZIPPER-MOUTH FACE..HANDBALL -1F940..1F94C ; So # [13] WILTED FLOWER..CURLING STONE -1F950..1F96B ; So # [28] CROISSANT..CANNED FOOD -1F980..1F997 ; So # [24] CRAB..CRICKET -1F9C0 ; So # CHEESE WEDGE -1F9D0..1F9E6 ; So # [23] FACE WITH MONOCLE..SOCKS +1F940..1F970 ; So # [49] WILTED FLOWER..SMILING FACE WITH SMILING EYES AND THREE HEARTS +1F973..1F976 ; So # [4] FACE WITH PARTY HORN AND PARTY HAT..FREEZING FACE +1F97A ; So # FACE WITH PLEADING EYES +1F97C..1F9A2 ; So # [39] LAB COAT..SWAN +1F9B0..1F9B9 ; So # [10] EMOJI COMPONENT RED HAIR..SUPERVILLAIN +1F9C0..1F9C2 ; So # [3] CHEESE WEDGE..SALT SHAKER +1F9D0..1F9FF ; So # [48] FACE WITH MONOCLE..NAZAR AMULET +1FA60..1FA6D ; So # [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER -# Total code points: 5855 +# Total code points: 5984 # ================================================ diff --git a/maint/Unicode.tables/GraphemeBreakProperty.txt b/maint/Unicode.tables/GraphemeBreakProperty.txt index 32bb12e..52052e6 100644 --- a/maint/Unicode.tables/GraphemeBreakProperty.txt +++ b/maint/Unicode.tables/GraphemeBreakProperty.txt @@ -1,6 +1,6 @@ -# GraphemeBreakProperty-10.0.0.txt -# Date: 2017-03-12, 07:03:41 GMT -# © 2017 Unicode®, Inc. +# GraphemeBreakProperty-11.0.0.txt +# Date: 2018-03-16, 20:34:02 GMT +# © 2018 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see http://www.unicode.org/terms_of_use.html # @@ -24,12 +24,13 @@ 08E2 ; Prepend # Cf ARABIC DISPUTED END OF AYAH 0D4E ; Prepend # Lo MALAYALAM LETTER DOT REPH 110BD ; Prepend # Cf KAITHI NUMBER SIGN +110CD ; Prepend # Cf KAITHI NUMBER SIGN ABOVE 111C2..111C3 ; Prepend # Lo [2] SHARADA SIGN JIHVAMULIYA..SHARADA SIGN UPADHMANIYA 11A3A ; Prepend # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA 11A86..11A89 ; Prepend # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA 11D46 ; Prepend # Lo MASARAM GONDI REPHA -# Total code points: 19 +# Total code points: 20 # ================================================ @@ -95,12 +96,13 @@ E01F0..E0FFF ; Control # Cn [3600] .. 0730..074A ; Extend # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH 07A6..07B0 ; Extend # Mn [11] THAANA ABAFILI..THAANA SUKUN 07EB..07F3 ; Extend # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE +07FD ; Extend # Mn NKO DANTAYALAN 0816..0819 ; Extend # Mn [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH 081B..0823 ; Extend # Mn [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A 0825..0827 ; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U 0829..082D ; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA 0859..085B ; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK -08D4..08E1 ; Extend # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA +08D3..08E1 ; Extend # Mn [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA 08E3..0902 ; Extend # Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA 093A ; Extend # Mn DEVANAGARI VOWEL SIGN OE 093C ; Extend # Mn DEVANAGARI SIGN NUKTA @@ -115,6 +117,7 @@ E01F0..E0FFF ; Control # Cn [3600] .. 09CD ; Extend # Mn BENGALI SIGN VIRAMA 09D7 ; Extend # Mc BENGALI AU LENGTH MARK 09E2..09E3 ; Extend # Mn [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL +09FE ; Extend # Mn BENGALI SANDHI MARK 0A01..0A02 ; Extend # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI 0A3C ; Extend # Mn GURMUKHI SIGN NUKTA 0A41..0A42 ; Extend # Mn [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU @@ -145,6 +148,7 @@ E01F0..E0FFF ; Control # Cn [3600] .. 0BCD ; Extend # Mn TAMIL SIGN VIRAMA 0BD7 ; Extend # Mc TAMIL AU LENGTH MARK 0C00 ; Extend # Mn TELUGU SIGN COMBINING CANDRABINDU ABOVE +0C04 ; Extend # Mn TELUGU SIGN COMBINING ANUSVARA ABOVE 0C3E..0C40 ; Extend # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II 0C46..0C48 ; Extend # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI 0C4A..0C4D ; Extend # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA @@ -273,6 +277,7 @@ A80B ; Extend # Mn SYLOTI NAGRI SIGN ANUSVARA A825..A826 ; Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E A8C4..A8C5 ; Extend # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU A8E0..A8F1 ; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA +A8FF ; Extend # Mn DEVANAGARI VOWEL SIGN AY A926..A92D ; Extend # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU A947..A951 ; Extend # Mn [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R A980..A982 ; Extend # Mn [3] JAVANESE SIGN PANYANGGA..JAVANESE SIGN LAYAR @@ -309,6 +314,8 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 10A38..10A3A ; Extend # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW 10A3F ; Extend # Mn KHAROSHTHI VIRAMA 10AE5..10AE6 ; Extend # Mn [2] MANICHAEAN ABBREVIATION MARK ABOVE..MANICHAEAN ABBREVIATION MARK BELOW +10D24..10D27 ; Extend # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI +10F46..10F50 ; Extend # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW 11001 ; Extend # Mn BRAHMI SIGN ANUSVARA 11038..11046 ; Extend # Mn [15] BRAHMI VOWEL SIGN AA..BRAHMI VIRAMA 1107F..11081 ; Extend # Mn [3] BRAHMI NUMBER JOINER..KAITHI SIGN ANUSVARA @@ -320,7 +327,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 11173 ; Extend # Mn MAHAJANI SIGN NUKTA 11180..11181 ; Extend # Mn [2] SHARADA SIGN CANDRABINDU..SHARADA SIGN ANUSVARA 111B6..111BE ; Extend # Mn [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O -111CA..111CC ; Extend # Mn [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK +111C9..111CC ; Extend # Mn [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK 1122F..11231 ; Extend # Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI 11234 ; Extend # Mn KHOJKI SIGN ANUSVARA 11236..11237 ; Extend # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA @@ -328,7 +335,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 112DF ; Extend # Mn KHUDAWADI SIGN ANUSVARA 112E3..112EA ; Extend # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA 11300..11301 ; Extend # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU -1133C ; Extend # Mn GRANTHA SIGN NUKTA +1133B..1133C ; Extend # Mn [2] COMBINING BINDU BELOW..GRANTHA SIGN NUKTA 1133E ; Extend # Mc GRANTHA VOWEL SIGN AA 11340 ; Extend # Mn GRANTHA VOWEL SIGN II 11357 ; Extend # Mc GRANTHA AU LENGTH MARK @@ -337,6 +344,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 11438..1143F ; Extend # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI 11442..11444 ; Extend # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA 11446 ; Extend # Mn NEWA SIGN NUKTA +1145E ; Extend # Mn NEWA SANDHI MARK 114B0 ; Extend # Mc TIRHUTA VOWEL SIGN AA 114B3..114B8 ; Extend # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL 114BA ; Extend # Mn TIRHUTA VOWEL SIGN SHORT E @@ -358,8 +366,9 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 1171D..1171F ; Extend # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA 11722..11725 ; Extend # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU 11727..1172B ; Extend # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER -11A01..11A06 ; Extend # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O -11A09..11A0A ; Extend # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK +1182F..11837 ; Extend # Mn [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA +11839..1183A ; Extend # Mn [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA +11A01..11A0A ; Extend # Mn [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK 11A33..11A38 ; Extend # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA 11A3B..11A3E ; Extend # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA 11A47 ; Extend # Mn ZANABAZAR SQUARE SUBJOINER @@ -379,6 +388,10 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 11D3C..11D3D ; Extend # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O 11D3F..11D45 ; Extend # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA 11D47 ; Extend # Mn MASARAM GONDI RA-KARA +11D90..11D91 ; Extend # Mn [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI +11D95 ; Extend # Mn GUNJALA GONDI SIGN ANUSVARA +11D97 ; Extend # Mn GUNJALA GONDI VIRAMA +11EF3..11EF4 ; Extend # Mn [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U 16AF0..16AF4 ; Extend # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE 16B30..16B36 ; Extend # Mn [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM 16F8F..16F92 ; Extend # Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW @@ -403,10 +416,11 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT 1E026..1E02A ; Extend # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA 1E8D0..1E8D6 ; Extend # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS 1E944..1E94A ; Extend # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA +1F3FB..1F3FF ; Extend # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6 E0020..E007F ; Extend # Cf [96] TAG SPACE..CANCEL TAG E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 1901 +# Total code points: 1948 # ================================================ @@ -517,6 +531,7 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK 110B0..110B2 ; SpacingMark # Mc [3] KAITHI VOWEL SIGN AA..KAITHI VOWEL SIGN II 110B7..110B8 ; SpacingMark # Mc [2] KAITHI VOWEL SIGN O..KAITHI VOWEL SIGN AU 1112C ; SpacingMark # Mc CHAKMA VOWEL SIGN E +11145..11146 ; SpacingMark # Mc [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI 11182 ; SpacingMark # Mc SHARADA SIGN VISARGA 111B3..111B5 ; SpacingMark # Mc [3] SHARADA VOWEL SIGN AA..SHARADA VOWEL SIGN II 111BF..111C0 ; SpacingMark # Mc [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA @@ -549,7 +564,8 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK 116B6 ; SpacingMark # Mc TAKRI SIGN VIRAMA 11720..11721 ; SpacingMark # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA 11726 ; SpacingMark # Mc AHOM VOWEL SIGN E -11A07..11A08 ; SpacingMark # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU +1182C..1182E ; SpacingMark # Mc [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II +11838 ; SpacingMark # Mc DOGRA SIGN VISARGA 11A39 ; SpacingMark # Mc ZANABAZAR SQUARE SIGN VISARGA 11A57..11A58 ; SpacingMark # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU 11A97 ; SpacingMark # Mc SOYOMBO SIGN VISARGA @@ -558,11 +574,15 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK 11CA9 ; SpacingMark # Mc MARCHEN SUBJOINED LETTER YA 11CB1 ; SpacingMark # Mc MARCHEN VOWEL SIGN I 11CB4 ; SpacingMark # Mc MARCHEN VOWEL SIGN O +11D8A..11D8E ; SpacingMark # Mc [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU +11D93..11D94 ; SpacingMark # Mc [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU +11D96 ; SpacingMark # Mc GUNJALA GONDI SIGN VISARGA +11EF5..11EF6 ; SpacingMark # Mc [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O 16F51..16F7E ; SpacingMark # Mc [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG 1D166 ; SpacingMark # Mc MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D16D ; SpacingMark # Mc MUSICAL SYMBOL COMBINING AUGMENTATION DOT -# Total code points: 348 +# Total code points: 362 # ================================================ @@ -1395,81 +1415,8 @@ D789..D7A3 ; LVT # Lo [27] HANGUL SYLLABLE HIG..HANGUL SYLLABLE HIH # ================================================ -261D ; E_Base # So WHITE UP POINTING INDEX -26F9 ; E_Base # So PERSON WITH BALL -270A..270D ; E_Base # So [4] RAISED FIST..WRITING HAND -1F385 ; E_Base # So FATHER CHRISTMAS -1F3C2..1F3C4 ; E_Base # So [3] SNOWBOARDER..SURFER -1F3C7 ; E_Base # So HORSE RACING -1F3CA..1F3CC ; E_Base # So [3] SWIMMER..GOLFER -1F442..1F443 ; E_Base # So [2] EAR..NOSE -1F446..1F450 ; E_Base # So [11] WHITE UP POINTING BACKHAND INDEX..OPEN HANDS SIGN -1F46E ; E_Base # So POLICE OFFICER -1F470..1F478 ; E_Base # So [9] BRIDE WITH VEIL..PRINCESS -1F47C ; E_Base # So BABY ANGEL -1F481..1F483 ; E_Base # So [3] INFORMATION DESK PERSON..DANCER -1F485..1F487 ; E_Base # So [3] NAIL POLISH..HAIRCUT -1F4AA ; E_Base # So FLEXED BICEPS -1F574..1F575 ; E_Base # So [2] MAN IN BUSINESS SUIT LEVITATING..SLEUTH OR SPY -1F57A ; E_Base # So MAN DANCING -1F590 ; E_Base # So RAISED HAND WITH FINGERS SPLAYED -1F595..1F596 ; E_Base # So [2] REVERSED HAND WITH MIDDLE FINGER EXTENDED..RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS -1F645..1F647 ; E_Base # So [3] FACE WITH NO GOOD GESTURE..PERSON BOWING DEEPLY -1F64B..1F64F ; E_Base # So [5] HAPPY PERSON RAISING ONE HAND..PERSON WITH FOLDED HANDS -1F6A3 ; E_Base # So ROWBOAT -1F6B4..1F6B6 ; E_Base # So [3] BICYCLIST..PEDESTRIAN -1F6C0 ; E_Base # So BATH -1F6CC ; E_Base # So SLEEPING ACCOMMODATION -1F918..1F91C ; E_Base # So [5] SIGN OF THE HORNS..RIGHT-FACING FIST -1F91E..1F91F ; E_Base # So [2] HAND WITH INDEX AND MIDDLE FINGERS CROSSED..I LOVE YOU HAND SIGN -1F926 ; E_Base # So FACE PALM -1F930..1F939 ; E_Base # So [10] PREGNANT WOMAN..JUGGLING -1F93D..1F93E ; E_Base # So [2] WATER POLO..HANDBALL -1F9D1..1F9DD ; E_Base # So [13] ADULT..ELF - -# Total code points: 98 - -# ================================================ - -1F3FB..1F3FF ; E_Modifier # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6 - -# Total code points: 5 - -# ================================================ - 200D ; ZWJ # Cf ZERO WIDTH JOINER # Total code points: 1 -# ================================================ - -2640 ; Glue_After_Zwj # So FEMALE SIGN -2642 ; Glue_After_Zwj # So MALE SIGN -2695..2696 ; Glue_After_Zwj # So [2] STAFF OF AESCULAPIUS..SCALES -2708 ; Glue_After_Zwj # So AIRPLANE -2764 ; Glue_After_Zwj # So HEAVY BLACK HEART -1F308 ; Glue_After_Zwj # So RAINBOW -1F33E ; Glue_After_Zwj # So EAR OF RICE -1F373 ; Glue_After_Zwj # So COOKING -1F393 ; Glue_After_Zwj # So GRADUATION CAP -1F3A4 ; Glue_After_Zwj # So MICROPHONE -1F3A8 ; Glue_After_Zwj # So ARTIST PALETTE -1F3EB ; Glue_After_Zwj # So SCHOOL -1F3ED ; Glue_After_Zwj # So FACTORY -1F48B ; Glue_After_Zwj # So KISS MARK -1F4BB..1F4BC ; Glue_After_Zwj # So [2] PERSONAL COMPUTER..BRIEFCASE -1F527 ; Glue_After_Zwj # So WRENCH -1F52C ; Glue_After_Zwj # So MICROSCOPE -1F5E8 ; Glue_After_Zwj # So LEFT SPEECH BUBBLE -1F680 ; Glue_After_Zwj # So ROCKET -1F692 ; Glue_After_Zwj # So FIRE ENGINE - -# Total code points: 22 - -# ================================================ - -1F466..1F469 ; E_Base_GAZ # So [4] BOY..WOMAN - -# Total code points: 4 - # EOF diff --git a/maint/Unicode.tables/Scripts.txt b/maint/Unicode.tables/Scripts.txt index 7231944..def6310 100644 --- a/maint/Unicode.tables/Scripts.txt +++ b/maint/Unicode.tables/Scripts.txt @@ -1,6 +1,6 @@ -# Scripts-10.0.0.txt -# Date: 2017-03-11, 06:40:37 GMT -# © 2017 Unicode®, Inc. +# Scripts-11.0.0.txt +# Date: 2018-02-21, 05:34:31 GMT +# © 2018 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see http://www.unicode.org/terms_of_use.html # @@ -308,10 +308,8 @@ 2B47..2B4C ; Common # Sm [6] REVERSE TILDE OPERATOR ABOVE RIGHTWARDS ARROW..RIGHTWARDS ARROW ABOVE REVERSE TILDE OPERATOR 2B4D..2B73 ; Common # So [39] DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW..DOWNWARDS TRIANGLE-HEADED ARROW TO BAR 2B76..2B95 ; Common # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW -2B98..2BB9 ; Common # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX -2BBD..2BC8 ; Common # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED -2BCA..2BD2 ; Common # So [9] TOP HALF BLACK CIRCLE..GROUP MARK -2BEC..2BEF ; Common # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS +2B98..2BC8 ; Common # So [49] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED +2BCA..2BFE ; Common # So [53] TOP HALF BLACK CIRCLE..REVERSED RIGHT ANGLE 2E00..2E01 ; Common # Po [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER 2E02 ; Common # Pi LEFT SUBSTITUTION BRACKET 2E03 ; Common # Pf RIGHT SUBSTITUTION BRACKET @@ -349,7 +347,7 @@ 2E40 ; Common # Pd DOUBLE HYPHEN 2E41 ; Common # Po REVERSED COMMA 2E42 ; Common # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK -2E43..2E49 ; Common # Po [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA +2E43..2E4E ; Common # Po [12] DASH WITH LEFT UPTURN..PUNCTUS ELEVATUS MARK 2FF0..2FFB ; Common # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID 3000 ; Common # Zs IDEOGRAPHIC SPACE 3001..3003 ; Common # Po [3] IDEOGRAPHIC COMMA..DITTO MARK @@ -522,8 +520,9 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR 1D183..1D184 ; Common # So [2] MUSICAL SYMBOL ARPEGGIATO UP..MUSICAL SYMBOL ARPEGGIATO DOWN 1D18C..1D1A9 ; Common # So [30] MUSICAL SYMBOL RINFORZANDO..MUSICAL SYMBOL DEGREE SLASH 1D1AE..1D1E8 ; Common # So [59] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KIEVAN FLAT SIGN +1D2E0..1D2F3 ; Common # No [20] MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN 1D300..1D356 ; Common # So [87] MONOGRAM FOR EARTH..TETRAGRAM FOR FOSTERING -1D360..1D371 ; Common # No [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE +1D360..1D378 ; Common # No [25] COUNTING ROD UNIT DIGIT ONE..TALLY MARK FIVE 1D400..1D454 ; Common # L& [85] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL ITALIC SMALL G 1D456..1D49C ; Common # L& [71] MATHEMATICAL ITALIC SMALL I..MATHEMATICAL SCRIPT CAPITAL A 1D49E..1D49F ; Common # L& [2] MATHEMATICAL SCRIPT CAPITAL C..MATHEMATICAL SCRIPT CAPITAL D @@ -565,6 +564,11 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR 1D7C3 ; Common # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL 1D7C4..1D7CB ; Common # L& [8] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL BOLD SMALL DIGAMMA 1D7CE..1D7FF ; Common # Nd [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE +1EC71..1ECAB ; Common # No [59] INDIC SIYAQ NUMBER ONE..INDIC SIYAQ NUMBER PREFIXED NINE +1ECAC ; Common # So INDIC SIYAQ PLACEHOLDER +1ECAD..1ECAF ; Common # No [3] INDIC SIYAQ FRACTION ONE QUARTER..INDIC SIYAQ FRACTION THREE QUARTERS +1ECB0 ; Common # Sc INDIC SIYAQ RUPEE MARK +1ECB1..1ECB4 ; Common # No [4] INDIC SIYAQ NUMBER ALTERNATE ONE..INDIC SIYAQ ALTERNATE LAKH MARK 1F000..1F02B ; Common # So [44] MAHJONG TILE EAST WIND..MAHJONG TILE BACK 1F030..1F093 ; Common # So [100] DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06 1F0A0..1F0AE ; Common # So [15] PLAYING CARD BACK..PLAYING CARD KING OF SPADES @@ -572,8 +576,7 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR 1F0C1..1F0CF ; Common # So [15] PLAYING CARD ACE OF DIAMONDS..PLAYING CARD BLACK JOKER 1F0D1..1F0F5 ; Common # So [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21 1F100..1F10C ; Common # No [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO -1F110..1F12E ; Common # So [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ -1F130..1F16B ; Common # So [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN +1F110..1F16B ; Common # So [92] PARENTHESIZED LATIN CAPITAL LETTER A..RAISED MD SIGN 1F170..1F1AC ; Common # So [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD 1F1E6..1F1FF ; Common # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z 1F201..1F202 ; Common # So [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA @@ -585,9 +588,9 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR 1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6 1F400..1F6D4 ; Common # So [725] RAT..PAGODA 1F6E0..1F6EC ; Common # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING -1F6F0..1F6F8 ; Common # So [9] SATELLITE..FLYING SAUCER +1F6F0..1F6F9 ; Common # So [10] SATELLITE..SKATEBOARD 1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE -1F780..1F7D4 ; Common # So [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR +1F780..1F7D8 ; Common # So [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE 1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 1F810..1F847 ; Common # So [56] LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD..DOWNWARDS HEAVY ARROW 1F850..1F859 ; Common # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW @@ -595,15 +598,18 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR 1F890..1F8AD ; Common # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS 1F900..1F90B ; Common # So [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT 1F910..1F93E ; Common # So [47] ZIPPER-MOUTH FACE..HANDBALL -1F940..1F94C ; Common # So [13] WILTED FLOWER..CURLING STONE -1F950..1F96B ; Common # So [28] CROISSANT..CANNED FOOD -1F980..1F997 ; Common # So [24] CRAB..CRICKET -1F9C0 ; Common # So CHEESE WEDGE -1F9D0..1F9E6 ; Common # So [23] FACE WITH MONOCLE..SOCKS +1F940..1F970 ; Common # So [49] WILTED FLOWER..SMILING FACE WITH SMILING EYES AND THREE HEARTS +1F973..1F976 ; Common # So [4] FACE WITH PARTY HORN AND PARTY HAT..FREEZING FACE +1F97A ; Common # So FACE WITH PLEADING EYES +1F97C..1F9A2 ; Common # So [39] LAB COAT..SWAN +1F9B0..1F9B9 ; Common # So [10] EMOJI COMPONENT RED HAIR..SUPERVILLAIN +1F9C0..1F9C2 ; Common # So [3] CHEESE WEDGE..SALT SHAKER +1F9D0..1F9FF ; Common # So [48] FACE WITH MONOCLE..NAZAR AMULET +1FA60..1FA6D ; Common # So [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER E0001 ; Common # Cf LANGUAGE TAG E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG -# Total code points: 7363 +# Total code points: 7591 # ================================================ @@ -646,8 +652,7 @@ A770 ; Latin # Lm MODIFIER LETTER US A771..A787 ; Latin # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T A78B..A78E ; Latin # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT A78F ; Latin # Lo LATIN LETTER SINOLOGICAL DOT -A790..A7AE ; Latin # L& [31] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER SMALL CAPITAL I -A7B0..A7B7 ; Latin # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA +A790..A7B9 ; Latin # L& [42] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN SMALL LETTER U WITH STROKE A7F7 ; Latin # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I A7F8..A7F9 ; Latin # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE A7FA ; Latin # L& LATIN LETTER SMALL CAPITAL TURNED M @@ -659,7 +664,7 @@ FB00..FB06 ; Latin # L& [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE S FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z -# Total code points: 1350 +# Total code points: 1353 # ================================================ @@ -753,13 +758,13 @@ FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBININ 0531..0556 ; Armenian # L& [38] ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITAL LETTER FEH 0559 ; Armenian # Lm ARMENIAN MODIFIER LETTER LEFT HALF RING 055A..055F ; Armenian # Po [6] ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION MARK -0561..0587 ; Armenian # L& [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN +0560..0588 ; Armenian # L& [41] ARMENIAN SMALL LETTER TURNED AYB..ARMENIAN SMALL LETTER YI WITH STROKE 058A ; Armenian # Pd ARMENIAN HYPHEN 058D..058E ; Armenian # So [2] RIGHT-FACING ARMENIAN ETERNITY SIGN..LEFT-FACING ARMENIAN ETERNITY SIGN 058F ; Armenian # Sc ARMENIAN DRAM SIGN FB13..FB17 ; Armenian # L& [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH -# Total code points: 93 +# Total code points: 95 # ================================================ @@ -773,7 +778,7 @@ FB13..FB17 ; Armenian # L& [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SM 05C6 ; Hebrew # Po HEBREW PUNCTUATION NUN HAFUKHA 05C7 ; Hebrew # Mn HEBREW POINT QAMATS QATAN 05D0..05EA ; Hebrew # Lo [27] HEBREW LETTER ALEF..HEBREW LETTER TAV -05F0..05F2 ; Hebrew # Lo [3] HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW LIGATURE YIDDISH DOUBLE YOD +05EF..05F2 ; Hebrew # Lo [4] HEBREW YOD TRIANGLE..HEBREW LIGATURE YIDDISH DOUBLE YOD 05F3..05F4 ; Hebrew # Po [2] HEBREW PUNCTUATION GERESH..HEBREW PUNCTUATION GERSHAYIM FB1D ; Hebrew # Lo HEBREW LETTER YOD WITH HIRIQ FB1E ; Hebrew # Mn HEBREW POINT JUDEO-SPANISH VARIKA @@ -786,7 +791,7 @@ FB40..FB41 ; Hebrew # Lo [2] HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER S FB43..FB44 ; Hebrew # Lo [2] HEBREW LETTER FINAL PE WITH DAGESH..HEBREW LETTER PE WITH DAGESH FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATURE ALEF LAMED -# Total code points: 133 +# Total code points: 134 # ================================================ @@ -823,7 +828,7 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU 0750..077F ; Arabic # Lo [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE 08A0..08B4 ; Arabic # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW 08B6..08BD ; Arabic # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON -08D4..08E1 ; Arabic # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA +08D3..08E1 ; Arabic # Mn [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA 08E3..08FF ; Arabic # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA FB50..FBB1 ; Arabic # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM FBB2..FBC1 ; Arabic # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW @@ -871,7 +876,7 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA 1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN 1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL -# Total code points: 1280 +# Total code points: 1281 # ================================================ @@ -921,9 +926,10 @@ A8F2..A8F7 ; Devanagari # Lo [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVAN A8F8..A8FA ; Devanagari # Po [3] DEVANAGARI SIGN PUSHPIKA..DEVANAGARI CARET A8FB ; Devanagari # Lo DEVANAGARI HEADSTROKE A8FC ; Devanagari # Po DEVANAGARI SIGN SIDDHAM -A8FD ; Devanagari # Lo DEVANAGARI JAIN OM +A8FD..A8FE ; Devanagari # Lo [2] DEVANAGARI JAIN OM..DEVANAGARI LETTER AY +A8FF ; Devanagari # Mn DEVANAGARI VOWEL SIGN AY -# Total code points: 154 +# Total code points: 156 # ================================================ @@ -956,8 +962,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM 09FB ; Bengali # Sc BENGALI GANDA MARK 09FC ; Bengali # Lo BENGALI LETTER VEDIC ANUSVARA 09FD ; Bengali # Po BENGALI ABBREVIATION SIGN +09FE ; Bengali # Mn BENGALI SANDHI MARK -# Total code points: 95 +# Total code points: 96 # ================================================ @@ -982,8 +989,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM 0A70..0A71 ; Gurmukhi # Mn [2] GURMUKHI TIPPI..GURMUKHI ADDAK 0A72..0A74 ; Gurmukhi # Lo [3] GURMUKHI IRI..GURMUKHI EK ONKAR 0A75 ; Gurmukhi # Mn GURMUKHI SIGN YAKASH +0A76 ; Gurmukhi # Po GURMUKHI ABBREVIATION SIGN -# Total code points: 79 +# Total code points: 80 # ================================================ @@ -1078,6 +1086,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM 0C00 ; Telugu # Mn TELUGU SIGN COMBINING CANDRABINDU ABOVE 0C01..0C03 ; Telugu # Mc [3] TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA +0C04 ; Telugu # Mn TELUGU SIGN COMBINING ANUSVARA ABOVE 0C05..0C0C ; Telugu # Lo [8] TELUGU LETTER A..TELUGU LETTER VOCALIC L 0C0E..0C10 ; Telugu # Lo [3] TELUGU LETTER E..TELUGU LETTER AI 0C12..0C28 ; Telugu # Lo [23] TELUGU LETTER O..TELUGU LETTER NA @@ -1095,13 +1104,14 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM 0C78..0C7E ; Telugu # No [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR 0C7F ; Telugu # So TELUGU SIGN TUUMU -# Total code points: 96 +# Total code points: 97 # ================================================ 0C80 ; Kannada # Lo KANNADA SIGN SPACING CANDRABINDU 0C81 ; Kannada # Mn KANNADA SIGN CANDRABINDU 0C82..0C83 ; Kannada # Mc [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA +0C84 ; Kannada # Po KANNADA SIGN SIDDHAM 0C85..0C8C ; Kannada # Lo [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L 0C8E..0C90 ; Kannada # Lo [3] KANNADA LETTER E..KANNADA LETTER AI 0C92..0CA8 ; Kannada # Lo [23] KANNADA LETTER O..KANNADA LETTER NA @@ -1123,7 +1133,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM 0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE 0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA -# Total code points: 88 +# Total code points: 89 # ================================================ @@ -1317,14 +1327,16 @@ AA7E..AA7F ; Myanmar # Lo [2] MYANMAR LETTER SHWE PALAUNG CHA..MYANMAR LETT 10A0..10C5 ; Georgian # L& [38] GEORGIAN CAPITAL LETTER AN..GEORGIAN CAPITAL LETTER HOE 10C7 ; Georgian # L& GEORGIAN CAPITAL LETTER YN 10CD ; Georgian # L& GEORGIAN CAPITAL LETTER AEN -10D0..10FA ; Georgian # Lo [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN +10D0..10FA ; Georgian # L& [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN 10FC ; Georgian # Lm MODIFIER LETTER GEORGIAN NAR -10FD..10FF ; Georgian # Lo [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN +10FD..10FF ; Georgian # L& [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN +1C90..1CBA ; Georgian # L& [43] GEORGIAN MTAVRULI CAPITAL LETTER AN..GEORGIAN MTAVRULI CAPITAL LETTER AIN +1CBD..1CBF ; Georgian # L& [3] GEORGIAN MTAVRULI CAPITAL LETTER AEN..GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN 2D00..2D25 ; Georgian # L& [38] GEORGIAN SMALL LETTER AN..GEORGIAN SMALL LETTER HOE 2D27 ; Georgian # L& GEORGIAN SMALL LETTER YN 2D2D ; Georgian # L& GEORGIAN SMALL LETTER AEN -# Total code points: 127 +# Total code points: 173 # ================================================ @@ -1453,7 +1465,7 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT 1810..1819 ; Mongolian # Nd [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE 1820..1842 ; Mongolian # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI 1843 ; Mongolian # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN -1844..1877 ; Mongolian # Lo [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA +1844..1878 ; Mongolian # Lo [53] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER CHA WITH TWO DOTS 1880..1884 ; Mongolian # Lo [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA 1885..1886 ; Mongolian # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA 1887..18A8 ; Mongolian # Lo [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA @@ -1461,7 +1473,7 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT 18AA ; Mongolian # Lo MONGOLIAN LETTER MANCHU ALI GALI LHA 11660..1166C ; Mongolian # Po [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT -# Total code points: 166 +# Total code points: 167 # ================================================ @@ -1490,10 +1502,10 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK # ================================================ 02EA..02EB ; Bopomofo # Sk [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK -3105..312E ; Bopomofo # Lo [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE +3105..312F ; Bopomofo # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN 31A0..31BA ; Bopomofo # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY -# Total code points: 71 +# Total code points: 72 # ================================================ @@ -1506,7 +1518,7 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK 3038..303A ; Han # Nl [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY 303B ; Han # Lm VERTICAL IDEOGRAPHIC ITERATION MARK 3400..4DB5 ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5 -4E00..9FEA ; Han # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA +4E00..9FEF ; Han # Lo [20976] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEF F900..FA6D ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9 20000..2A6D6 ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6 @@ -1516,7 +1528,7 @@ FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILI 2CEB0..2EBE0 ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D -# Total code points: 89228 +# Total code points: 89233 # ================================================ @@ -1579,13 +1591,14 @@ FE00..FE0F ; Inherited # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16 FE20..FE2D ; Inherited # Mn [14] COMBINING LIGATURE LEFT HALF..COMBINING CONJOINING MACRON BELOW 101FD ; Inherited # Mn PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE 102E0 ; Inherited # Mn COPTIC EPACT THOUSANDS MARK +1133B ; Inherited # Mn COMBINING BINDU BELOW 1D167..1D169 ; Inherited # Mn [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3 1D17B..1D182 ; Inherited # Mn [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE 1D185..1D18B ; Inherited # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE 1D1AA..1D1AD ; Inherited # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 568 +# Total code points: 569 # ================================================ @@ -1778,13 +1791,13 @@ A828..A82B ; Syloti_Nagri # So [4] SYLOTI NAGRI POETRY MARK-1..SYLOTI NAGRI 10A0C..10A0F ; Kharoshthi # Mn [4] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI SIGN VISARGA 10A10..10A13 ; Kharoshthi # Lo [4] KHAROSHTHI LETTER KA..KHAROSHTHI LETTER GHA 10A15..10A17 ; Kharoshthi # Lo [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA -10A19..10A33 ; Kharoshthi # Lo [27] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTTHA +10A19..10A35 ; Kharoshthi # Lo [29] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER VHA 10A38..10A3A ; Kharoshthi # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW 10A3F ; Kharoshthi # Mn KHAROSHTHI VIRAMA -10A40..10A47 ; Kharoshthi # No [8] KHAROSHTHI DIGIT ONE..KHAROSHTHI NUMBER ONE THOUSAND +10A40..10A48 ; Kharoshthi # No [9] KHAROSHTHI DIGIT ONE..KHAROSHTHI FRACTION ONE HALF 10A50..10A58 ; Kharoshthi # Po [9] KHAROSHTHI PUNCTUATION DOT..KHAROSHTHI PUNCTUATION LINES -# Total code points: 65 +# Total code points: 68 # ================================================ @@ -1841,8 +1854,10 @@ A874..A877 ; Phags_Pa # Po [4] PHAGS-PA SINGLE HEAD MARK..PHAGS-PA MARK DOU 07F6 ; Nko # So NKO SYMBOL OO DENNEN 07F7..07F9 ; Nko # Po [3] NKO SYMBOL GBAKURUNEN..NKO EXCLAMATION MARK 07FA ; Nko # Lm NKO LAJANYALAN +07FD ; Nko # Mn NKO DANTAYALAN +07FE..07FF ; Nko # Sc [2] NKO DOROME SIGN..NKO TAMAN SIGN -# Total code points: 59 +# Total code points: 62 # ================================================ @@ -2137,8 +2152,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI 110BB..110BC ; Kaithi # Po [2] KAITHI ABBREVIATION SIGN..KAITHI ENUMERATION SIGN 110BD ; Kaithi # Cf KAITHI NUMBER SIGN 110BE..110C1 ; Kaithi # Po [4] KAITHI SECTION MARK..KAITHI DOUBLE DANDA +110CD ; Kaithi # Cf KAITHI NUMBER SIGN ABOVE -# Total code points: 66 +# Total code points: 67 # ================================================ @@ -2186,8 +2202,10 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI 1112D..11134 ; Chakma # Mn [8] CHAKMA VOWEL SIGN AI..CHAKMA MAAYYAA 11136..1113F ; Chakma # Nd [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE 11140..11143 ; Chakma # Po [4] CHAKMA SECTION MARK..CHAKMA QUESTION MARK +11144 ; Chakma # Lo CHAKMA LETTER LHAA +11145..11146 ; Chakma # Mc [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI -# Total code points: 67 +# Total code points: 70 # ================================================ @@ -2224,8 +2242,8 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI 111B6..111BE ; Sharada # Mn [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O 111BF..111C0 ; Sharada # Mc [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA 111C1..111C4 ; Sharada # Lo [4] SHARADA SIGN AVAGRAHA..SHARADA OM -111C5..111C9 ; Sharada # Po [5] SHARADA DANDA..SHARADA SANDHI MARK -111CA..111CC ; Sharada # Mn [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK +111C5..111C8 ; Sharada # Po [4] SHARADA DANDA..SHARADA SEPARATOR +111C9..111CC ; Sharada # Mn [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK 111CD ; Sharada # Po SHARADA SUTRA MARK 111D0..111D9 ; Sharada # Nd [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE 111DA ; Sharada # Lo SHARADA EKAM @@ -2502,7 +2520,7 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI # ================================================ -11700..11719 ; Ahom # Lo [26] AHOM LETTER KA..AHOM LETTER JHA +11700..1171A ; Ahom # Lo [27] AHOM LETTER KA..AHOM LETTER ALTERNATE BA 1171D..1171F ; Ahom # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA 11720..11721 ; Ahom # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA 11722..11725 ; Ahom # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU @@ -2513,7 +2531,7 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI 1173C..1173E ; Ahom # Po [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI 1173F ; Ahom # So AHOM SYMBOL VI -# Total code points: 57 +# Total code points: 58 # ================================================ @@ -2618,8 +2636,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI 11450..11459 ; Newa # Nd [10] NEWA DIGIT ZERO..NEWA DIGIT NINE 1145B ; Newa # Po NEWA PLACEHOLDER MARK 1145D ; Newa # Po NEWA INSERTION SIGN +1145E ; Newa # Mn NEWA SANDHI MARK -# Total code points: 92 +# Total code points: 93 # ================================================ @@ -2631,10 +2650,10 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI # ================================================ 16FE0 ; Tangut # Lm TANGUT ITERATION MARK -17000..187EC ; Tangut # Lo [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC +17000..187F1 ; Tangut # Lo [6130] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187F1 18800..18AF2 ; Tangut # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755 -# Total code points: 6881 +# Total code points: 6886 # ================================================ @@ -2670,16 +2689,15 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI 11A97 ; Soyombo # Mc SOYOMBO SIGN VISARGA 11A98..11A99 ; Soyombo # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER 11A9A..11A9C ; Soyombo # Po [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD +11A9D ; Soyombo # Lo SOYOMBO MARK PLUTA 11A9E..11AA2 ; Soyombo # Po [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2 -# Total code points: 80 +# Total code points: 81 # ================================================ 11A00 ; Zanabazar_Square # Lo ZANABAZAR SQUARE LETTER A -11A01..11A06 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O -11A07..11A08 ; Zanabazar_Square # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU -11A09..11A0A ; Zanabazar_Square # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK +11A01..11A0A ; Zanabazar_Square # Mn [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK 11A0B..11A32 ; Zanabazar_Square # Lo [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA 11A33..11A38 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA 11A39 ; Zanabazar_Square # Mc ZANABAZAR SQUARE SIGN VISARGA @@ -2690,4 +2708,73 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI # Total code points: 72 +# ================================================ + +11800..1182B ; Dogra # Lo [44] DOGRA LETTER A..DOGRA LETTER RRA +1182C..1182E ; Dogra # Mc [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II +1182F..11837 ; Dogra # Mn [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA +11838 ; Dogra # Mc DOGRA SIGN VISARGA +11839..1183A ; Dogra # Mn [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA +1183B ; Dogra # Po DOGRA ABBREVIATION SIGN + +# Total code points: 60 + +# ================================================ + +11D60..11D65 ; Gunjala_Gondi # Lo [6] GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER UU +11D67..11D68 ; Gunjala_Gondi # Lo [2] GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTER AI +11D6A..11D89 ; Gunjala_Gondi # Lo [32] GUNJALA GONDI LETTER OO..GUNJALA GONDI LETTER SA +11D8A..11D8E ; Gunjala_Gondi # Mc [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU +11D90..11D91 ; Gunjala_Gondi # Mn [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI +11D93..11D94 ; Gunjala_Gondi # Mc [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU +11D95 ; Gunjala_Gondi # Mn GUNJALA GONDI SIGN ANUSVARA +11D96 ; Gunjala_Gondi # Mc GUNJALA GONDI SIGN VISARGA +11D97 ; Gunjala_Gondi # Mn GUNJALA GONDI VIRAMA +11D98 ; Gunjala_Gondi # Lo GUNJALA GONDI OM +11DA0..11DA9 ; Gunjala_Gondi # Nd [10] GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGIT NINE + +# Total code points: 63 + +# ================================================ + +11EE0..11EF2 ; Makasar # Lo [19] MAKASAR LETTER KA..MAKASAR ANGKA +11EF3..11EF4 ; Makasar # Mn [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U +11EF5..11EF6 ; Makasar # Mc [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O +11EF7..11EF8 ; Makasar # Po [2] MAKASAR PASSIMBANG..MAKASAR END OF SECTION + +# Total code points: 25 + +# ================================================ + +16E40..16E7F ; Medefaidrin # L& [64] MEDEFAIDRIN CAPITAL LETTER M..MEDEFAIDRIN SMALL LETTER Y +16E80..16E96 ; Medefaidrin # No [23] MEDEFAIDRIN DIGIT ZERO..MEDEFAIDRIN DIGIT THREE ALTERNATE FORM +16E97..16E9A ; Medefaidrin # Po [4] MEDEFAIDRIN COMMA..MEDEFAIDRIN EXCLAMATION OH + +# Total code points: 91 + +# ================================================ + +10D00..10D23 ; Hanifi_Rohingya # Lo [36] HANIFI ROHINGYA LETTER A..HANIFI ROHINGYA MARK NA KHONNA +10D24..10D27 ; Hanifi_Rohingya # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI +10D30..10D39 ; Hanifi_Rohingya # Nd [10] HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA DIGIT NINE + +# Total code points: 50 + +# ================================================ + +10F30..10F45 ; Sogdian # Lo [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN +10F46..10F50 ; Sogdian # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW +10F51..10F54 ; Sogdian # No [4] SOGDIAN NUMBER ONE..SOGDIAN NUMBER ONE HUNDRED +10F55..10F59 ; Sogdian # Po [5] SOGDIAN PUNCTUATION TWO VERTICAL BARS..SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT + +# Total code points: 42 + +# ================================================ + +10F00..10F1C ; Old_Sogdian # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL +10F1D..10F26 ; Old_Sogdian # No [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF +10F27 ; Old_Sogdian # Lo OLD SOGDIAN LIGATURE AYIN-DALETH + +# Total code points: 40 + # EOF diff --git a/maint/Unicode.tables/UnicodeData.txt b/maint/Unicode.tables/UnicodeData.txt index d89c64f..ec32faf 100644 --- a/maint/Unicode.tables/UnicodeData.txt +++ b/maint/Unicode.tables/UnicodeData.txt @@ -1362,6 +1362,7 @@ 055D;ARMENIAN COMMA;Po;0;L;;;;;N;;;;; 055E;ARMENIAN QUESTION MARK;Po;0;L;;;;;N;;;;; 055F;ARMENIAN ABBREVIATION MARK;Po;0;L;;;;;N;;;;; +0560;ARMENIAN SMALL LETTER TURNED AYB;Ll;0;L;;;;;N;;;;; 0561;ARMENIAN SMALL LETTER AYB;Ll;0;L;;;;;N;;;0531;;0531 0562;ARMENIAN SMALL LETTER BEN;Ll;0;L;;;;;N;;;0532;;0532 0563;ARMENIAN SMALL LETTER GIM;Ll;0;L;;;;;N;;;0533;;0533 @@ -1401,6 +1402,7 @@ 0585;ARMENIAN SMALL LETTER OH;Ll;0;L;;;;;N;;;0555;;0555 0586;ARMENIAN SMALL LETTER FEH;Ll;0;L;;;;;N;;;0556;;0556 0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582;;;;N;;;;; +0588;ARMENIAN SMALL LETTER YI WITH STROKE;Ll;0;L;;;;;N;;;;; 0589;ARMENIAN FULL STOP;Po;0;L;;;;;N;ARMENIAN PERIOD;;;; 058A;ARMENIAN HYPHEN;Pd;0;ON;;;;;N;;;;; 058D;RIGHT-FACING ARMENIAN ETERNITY SIGN;So;0;ON;;;;;N;;;;; @@ -1488,6 +1490,7 @@ 05E8;HEBREW LETTER RESH;Lo;0;R;;;;;N;;;;; 05E9;HEBREW LETTER SHIN;Lo;0;R;;;;;N;;;;; 05EA;HEBREW LETTER TAV;Lo;0;R;;;;;N;;;;; +05EF;HEBREW YOD TRIANGLE;Lo;0;R;;;;;N;;;;; 05F0;HEBREW LIGATURE YIDDISH DOUBLE VAV;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE VAV;;;; 05F1;HEBREW LIGATURE YIDDISH VAV YOD;Lo;0;R;;;;;N;HEBREW LETTER VAV YOD;;;; 05F2;HEBREW LIGATURE YIDDISH DOUBLE YOD;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE YOD;;;; @@ -1982,6 +1985,9 @@ 07F8;NKO COMMA;Po;0;ON;;;;;N;;;;; 07F9;NKO EXCLAMATION MARK;Po;0;ON;;;;;N;;;;; 07FA;NKO LAJANYALAN;Lm;0;R;;;;;N;;;;; +07FD;NKO DANTAYALAN;Mn;220;NSM;;;;;N;;;;; +07FE;NKO DOROME SIGN;Sc;0;R;;;;;N;;;;; +07FF;NKO TAMAN SIGN;Sc;0;R;;;;;N;;;;; 0800;SAMARITAN LETTER ALAF;Lo;0;R;;;;;N;;;;; 0801;SAMARITAN LETTER BIT;Lo;0;R;;;;;N;;;;; 0802;SAMARITAN LETTER GAMAN;Lo;0;R;;;;;N;;;;; @@ -2112,6 +2118,7 @@ 08BB;ARABIC LETTER AFRICAN FEH;Lo;0;AL;;;;;N;;;;; 08BC;ARABIC LETTER AFRICAN QAF;Lo;0;AL;;;;;N;;;;; 08BD;ARABIC LETTER AFRICAN NOON;Lo;0;AL;;;;;N;;;;; +08D3;ARABIC SMALL LOW WAW;Mn;220;NSM;;;;;N;;;;; 08D4;ARABIC SMALL HIGH WORD AR-RUB;Mn;230;NSM;;;;;N;;;;; 08D5;ARABIC SMALL HIGH SAD;Mn;230;NSM;;;;;N;;;;; 08D6;ARABIC SMALL HIGH AIN;Mn;230;NSM;;;;;N;;;;; @@ -2379,6 +2386,7 @@ 09FB;BENGALI GANDA MARK;Sc;0;ET;;;;;N;;;;; 09FC;BENGALI LETTER VEDIC ANUSVARA;Lo;0;L;;;;;N;;;;; 09FD;BENGALI ABBREVIATION SIGN;Po;0;L;;;;;N;;;;; +09FE;BENGALI SANDHI MARK;Mn;230;NSM;;;;;N;;;;; 0A01;GURMUKHI SIGN ADAK BINDI;Mn;0;NSM;;;;;N;;;;; 0A02;GURMUKHI SIGN BINDI;Mn;0;NSM;;;;;N;;;;; 0A03;GURMUKHI SIGN VISARGA;Mc;0;L;;;;;N;;;;; @@ -2458,6 +2466,7 @@ 0A73;GURMUKHI URA;Lo;0;L;;;;;N;;;;; 0A74;GURMUKHI EK ONKAR;Lo;0;L;;;;;N;;;;; 0A75;GURMUKHI SIGN YAKASH;Mn;0;NSM;;;;;N;;;;; +0A76;GURMUKHI ABBREVIATION SIGN;Po;0;L;;;;;N;;;;; 0A81;GUJARATI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;; 0A82;GUJARATI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;; 0A83;GUJARATI SIGN VISARGA;Mc;0;L;;;;;N;;;;; @@ -2715,6 +2724,7 @@ 0C01;TELUGU SIGN CANDRABINDU;Mc;0;L;;;;;N;;;;; 0C02;TELUGU SIGN ANUSVARA;Mc;0;L;;;;;N;;;;; 0C03;TELUGU SIGN VISARGA;Mc;0;L;;;;;N;;;;; +0C04;TELUGU SIGN COMBINING ANUSVARA ABOVE;Mn;0;NSM;;;;;N;;;;; 0C05;TELUGU LETTER A;Lo;0;L;;;;;N;;;;; 0C06;TELUGU LETTER AA;Lo;0;L;;;;;N;;;;; 0C07;TELUGU LETTER I;Lo;0;L;;;;;N;;;;; @@ -2811,6 +2821,7 @@ 0C81;KANNADA SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;; 0C82;KANNADA SIGN ANUSVARA;Mc;0;L;;;;;N;;;;; 0C83;KANNADA SIGN VISARGA;Mc;0;L;;;;;N;;;;; +0C84;KANNADA SIGN SIDDHAM;Po;0;L;;;;;N;;;;; 0C85;KANNADA LETTER A;Lo;0;L;;;;;N;;;;; 0C86;KANNADA LETTER AA;Lo;0;L;;;;;N;;;;; 0C87;KANNADA LETTER I;Lo;0;L;;;;;N;;;;; @@ -3667,54 +3678,54 @@ 10C5;GEORGIAN CAPITAL LETTER HOE;Lu;0;L;;;;;N;;;;2D25; 10C7;GEORGIAN CAPITAL LETTER YN;Lu;0;L;;;;;N;;;;2D27; 10CD;GEORGIAN CAPITAL LETTER AEN;Lu;0;L;;;;;N;;;;2D2D; -10D0;GEORGIAN LETTER AN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER AN;;;; -10D1;GEORGIAN LETTER BAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER BAN;;;; -10D2;GEORGIAN LETTER GAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER GAN;;;; -10D3;GEORGIAN LETTER DON;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER DON;;;; -10D4;GEORGIAN LETTER EN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER EN;;;; -10D5;GEORGIAN LETTER VIN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER VIN;;;; -10D6;GEORGIAN LETTER ZEN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER ZEN;;;; -10D7;GEORGIAN LETTER TAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER TAN;;;; -10D8;GEORGIAN LETTER IN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER IN;;;; -10D9;GEORGIAN LETTER KAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER KAN;;;; -10DA;GEORGIAN LETTER LAS;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER LAS;;;; -10DB;GEORGIAN LETTER MAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER MAN;;;; -10DC;GEORGIAN LETTER NAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER NAR;;;; -10DD;GEORGIAN LETTER ON;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER ON;;;; -10DE;GEORGIAN LETTER PAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER PAR;;;; -10DF;GEORGIAN LETTER ZHAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER ZHAR;;;; -10E0;GEORGIAN LETTER RAE;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER RAE;;;; -10E1;GEORGIAN LETTER SAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER SAN;;;; -10E2;GEORGIAN LETTER TAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER TAR;;;; -10E3;GEORGIAN LETTER UN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER UN;;;; -10E4;GEORGIAN LETTER PHAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER PHAR;;;; -10E5;GEORGIAN LETTER KHAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER KHAR;;;; -10E6;GEORGIAN LETTER GHAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER GHAN;;;; -10E7;GEORGIAN LETTER QAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER QAR;;;; -10E8;GEORGIAN LETTER SHIN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER SHIN;;;; -10E9;GEORGIAN LETTER CHIN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER CHIN;;;; -10EA;GEORGIAN LETTER CAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER CAN;;;; -10EB;GEORGIAN LETTER JIL;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER JIL;;;; -10EC;GEORGIAN LETTER CIL;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER CIL;;;; -10ED;GEORGIAN LETTER CHAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER CHAR;;;; -10EE;GEORGIAN LETTER XAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER XAN;;;; -10EF;GEORGIAN LETTER JHAN;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER JHAN;;;; -10F0;GEORGIAN LETTER HAE;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER HAE;;;; -10F1;GEORGIAN LETTER HE;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER HE;;;; -10F2;GEORGIAN LETTER HIE;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER HIE;;;; -10F3;GEORGIAN LETTER WE;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER WE;;;; -10F4;GEORGIAN LETTER HAR;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER HAR;;;; -10F5;GEORGIAN LETTER HOE;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER HOE;;;; -10F6;GEORGIAN LETTER FI;Lo;0;L;;;;;N;GEORGIAN SMALL LETTER FI;;;; -10F7;GEORGIAN LETTER YN;Lo;0;L;;;;;N;;;;; -10F8;GEORGIAN LETTER ELIFI;Lo;0;L;;;;;N;;;;; -10F9;GEORGIAN LETTER TURNED GAN;Lo;0;L;;;;;N;;;;; -10FA;GEORGIAN LETTER AIN;Lo;0;L;;;;;N;;;;; +10D0;GEORGIAN LETTER AN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER AN;;1C90;;10D0 +10D1;GEORGIAN LETTER BAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER BAN;;1C91;;10D1 +10D2;GEORGIAN LETTER GAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER GAN;;1C92;;10D2 +10D3;GEORGIAN LETTER DON;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER DON;;1C93;;10D3 +10D4;GEORGIAN LETTER EN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER EN;;1C94;;10D4 +10D5;GEORGIAN LETTER VIN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER VIN;;1C95;;10D5 +10D6;GEORGIAN LETTER ZEN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER ZEN;;1C96;;10D6 +10D7;GEORGIAN LETTER TAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER TAN;;1C97;;10D7 +10D8;GEORGIAN LETTER IN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER IN;;1C98;;10D8 +10D9;GEORGIAN LETTER KAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER KAN;;1C99;;10D9 +10DA;GEORGIAN LETTER LAS;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER LAS;;1C9A;;10DA +10DB;GEORGIAN LETTER MAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER MAN;;1C9B;;10DB +10DC;GEORGIAN LETTER NAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER NAR;;1C9C;;10DC +10DD;GEORGIAN LETTER ON;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER ON;;1C9D;;10DD +10DE;GEORGIAN LETTER PAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER PAR;;1C9E;;10DE +10DF;GEORGIAN LETTER ZHAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER ZHAR;;1C9F;;10DF +10E0;GEORGIAN LETTER RAE;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER RAE;;1CA0;;10E0 +10E1;GEORGIAN LETTER SAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER SAN;;1CA1;;10E1 +10E2;GEORGIAN LETTER TAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER TAR;;1CA2;;10E2 +10E3;GEORGIAN LETTER UN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER UN;;1CA3;;10E3 +10E4;GEORGIAN LETTER PHAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER PHAR;;1CA4;;10E4 +10E5;GEORGIAN LETTER KHAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER KHAR;;1CA5;;10E5 +10E6;GEORGIAN LETTER GHAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER GHAN;;1CA6;;10E6 +10E7;GEORGIAN LETTER QAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER QAR;;1CA7;;10E7 +10E8;GEORGIAN LETTER SHIN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER SHIN;;1CA8;;10E8 +10E9;GEORGIAN LETTER CHIN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER CHIN;;1CA9;;10E9 +10EA;GEORGIAN LETTER CAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER CAN;;1CAA;;10EA +10EB;GEORGIAN LETTER JIL;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER JIL;;1CAB;;10EB +10EC;GEORGIAN LETTER CIL;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER CIL;;1CAC;;10EC +10ED;GEORGIAN LETTER CHAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER CHAR;;1CAD;;10ED +10EE;GEORGIAN LETTER XAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER XAN;;1CAE;;10EE +10EF;GEORGIAN LETTER JHAN;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER JHAN;;1CAF;;10EF +10F0;GEORGIAN LETTER HAE;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER HAE;;1CB0;;10F0 +10F1;GEORGIAN LETTER HE;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER HE;;1CB1;;10F1 +10F2;GEORGIAN LETTER HIE;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER HIE;;1CB2;;10F2 +10F3;GEORGIAN LETTER WE;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER WE;;1CB3;;10F3 +10F4;GEORGIAN LETTER HAR;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER HAR;;1CB4;;10F4 +10F5;GEORGIAN LETTER HOE;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER HOE;;1CB5;;10F5 +10F6;GEORGIAN LETTER FI;Ll;0;L;;;;;N;GEORGIAN SMALL LETTER FI;;1CB6;;10F6 +10F7;GEORGIAN LETTER YN;Ll;0;L;;;;;N;;;1CB7;;10F7 +10F8;GEORGIAN LETTER ELIFI;Ll;0;L;;;;;N;;;1CB8;;10F8 +10F9;GEORGIAN LETTER TURNED GAN;Ll;0;L;;;;;N;;;1CB9;;10F9 +10FA;GEORGIAN LETTER AIN;Ll;0;L;;;;;N;;;1CBA;;10FA 10FB;GEORGIAN PARAGRAPH SEPARATOR;Po;0;L;;;;;N;;;;; 10FC;MODIFIER LETTER GEORGIAN NAR;Lm;0;L; 10DC;;;;N;;;;; -10FD;GEORGIAN LETTER AEN;Lo;0;L;;;;;N;;;;; -10FE;GEORGIAN LETTER HARD SIGN;Lo;0;L;;;;;N;;;;; -10FF;GEORGIAN LETTER LABIAL SIGN;Lo;0;L;;;;;N;;;;; +10FD;GEORGIAN LETTER AEN;Ll;0;L;;;;;N;;;1CBD;;10FD +10FE;GEORGIAN LETTER HARD SIGN;Ll;0;L;;;;;N;;;1CBE;;10FE +10FF;GEORGIAN LETTER LABIAL SIGN;Ll;0;L;;;;;N;;;1CBF;;10FF 1100;HANGUL CHOSEONG KIYEOK;Lo;0;L;;;;;N;;;;; 1101;HANGUL CHOSEONG SSANGKIYEOK;Lo;0;L;;;;;N;;;;; 1102;HANGUL CHOSEONG NIEUN;Lo;0;L;;;;;N;;;;; @@ -5513,6 +5524,7 @@ 1875;MONGOLIAN LETTER MANCHU RA;Lo;0;L;;;;;N;;;;; 1876;MONGOLIAN LETTER MANCHU FA;Lo;0;L;;;;;N;;;;; 1877;MONGOLIAN LETTER MANCHU ZHA;Lo;0;L;;;;;N;;;;; +1878;MONGOLIAN LETTER CHA WITH TWO DOTS;Lo;0;L;;;;;N;;;;; 1880;MONGOLIAN LETTER ALI GALI ANUSVARA ONE;Lo;0;L;;;;;N;;;;; 1881;MONGOLIAN LETTER ALI GALI VISARGA ONE;Lo;0;L;;;;;N;;;;; 1882;MONGOLIAN LETTER ALI GALI DAMARU;Lo;0;L;;;;;N;;;;; @@ -6388,6 +6400,52 @@ 1C86;CYRILLIC SMALL LETTER TALL HARD SIGN;Ll;0;L;;;;;N;;;042A;;042A 1C87;CYRILLIC SMALL LETTER TALL YAT;Ll;0;L;;;;;N;;;0462;;0462 1C88;CYRILLIC SMALL LETTER UNBLENDED UK;Ll;0;L;;;;;N;;;A64A;;A64A +1C90;GEORGIAN MTAVRULI CAPITAL LETTER AN;Lu;0;L;;;;;N;;;;10D0; +1C91;GEORGIAN MTAVRULI CAPITAL LETTER BAN;Lu;0;L;;;;;N;;;;10D1; +1C92;GEORGIAN MTAVRULI CAPITAL LETTER GAN;Lu;0;L;;;;;N;;;;10D2; +1C93;GEORGIAN MTAVRULI CAPITAL LETTER DON;Lu;0;L;;;;;N;;;;10D3; +1C94;GEORGIAN MTAVRULI CAPITAL LETTER EN;Lu;0;L;;;;;N;;;;10D4; +1C95;GEORGIAN MTAVRULI CAPITAL LETTER VIN;Lu;0;L;;;;;N;;;;10D5; +1C96;GEORGIAN MTAVRULI CAPITAL LETTER ZEN;Lu;0;L;;;;;N;;;;10D6; +1C97;GEORGIAN MTAVRULI CAPITAL LETTER TAN;Lu;0;L;;;;;N;;;;10D7; +1C98;GEORGIAN MTAVRULI CAPITAL LETTER IN;Lu;0;L;;;;;N;;;;10D8; +1C99;GEORGIAN MTAVRULI CAPITAL LETTER KAN;Lu;0;L;;;;;N;;;;10D9; +1C9A;GEORGIAN MTAVRULI CAPITAL LETTER LAS;Lu;0;L;;;;;N;;;;10DA; +1C9B;GEORGIAN MTAVRULI CAPITAL LETTER MAN;Lu;0;L;;;;;N;;;;10DB; +1C9C;GEORGIAN MTAVRULI CAPITAL LETTER NAR;Lu;0;L;;;;;N;;;;10DC; +1C9D;GEORGIAN MTAVRULI CAPITAL LETTER ON;Lu;0;L;;;;;N;;;;10DD; +1C9E;GEORGIAN MTAVRULI CAPITAL LETTER PAR;Lu;0;L;;;;;N;;;;10DE; +1C9F;GEORGIAN MTAVRULI CAPITAL LETTER ZHAR;Lu;0;L;;;;;N;;;;10DF; +1CA0;GEORGIAN MTAVRULI CAPITAL LETTER RAE;Lu;0;L;;;;;N;;;;10E0; +1CA1;GEORGIAN MTAVRULI CAPITAL LETTER SAN;Lu;0;L;;;;;N;;;;10E1; +1CA2;GEORGIAN MTAVRULI CAPITAL LETTER TAR;Lu;0;L;;;;;N;;;;10E2; +1CA3;GEORGIAN MTAVRULI CAPITAL LETTER UN;Lu;0;L;;;;;N;;;;10E3; +1CA4;GEORGIAN MTAVRULI CAPITAL LETTER PHAR;Lu;0;L;;;;;N;;;;10E4; +1CA5;GEORGIAN MTAVRULI CAPITAL LETTER KHAR;Lu;0;L;;;;;N;;;;10E5; +1CA6;GEORGIAN MTAVRULI CAPITAL LETTER GHAN;Lu;0;L;;;;;N;;;;10E6; +1CA7;GEORGIAN MTAVRULI CAPITAL LETTER QAR;Lu;0;L;;;;;N;;;;10E7; +1CA8;GEORGIAN MTAVRULI CAPITAL LETTER SHIN;Lu;0;L;;;;;N;;;;10E8; +1CA9;GEORGIAN MTAVRULI CAPITAL LETTER CHIN;Lu;0;L;;;;;N;;;;10E9; +1CAA;GEORGIAN MTAVRULI CAPITAL LETTER CAN;Lu;0;L;;;;;N;;;;10EA; +1CAB;GEORGIAN MTAVRULI CAPITAL LETTER JIL;Lu;0;L;;;;;N;;;;10EB; +1CAC;GEORGIAN MTAVRULI CAPITAL LETTER CIL;Lu;0;L;;;;;N;;;;10EC; +1CAD;GEORGIAN MTAVRULI CAPITAL LETTER CHAR;Lu;0;L;;;;;N;;;;10ED; +1CAE;GEORGIAN MTAVRULI CAPITAL LETTER XAN;Lu;0;L;;;;;N;;;;10EE; +1CAF;GEORGIAN MTAVRULI CAPITAL LETTER JHAN;Lu;0;L;;;;;N;;;;10EF; +1CB0;GEORGIAN MTAVRULI CAPITAL LETTER HAE;Lu;0;L;;;;;N;;;;10F0; +1CB1;GEORGIAN MTAVRULI CAPITAL LETTER HE;Lu;0;L;;;;;N;;;;10F1; +1CB2;GEORGIAN MTAVRULI CAPITAL LETTER HIE;Lu;0;L;;;;;N;;;;10F2; +1CB3;GEORGIAN MTAVRULI CAPITAL LETTER WE;Lu;0;L;;;;;N;;;;10F3; +1CB4;GEORGIAN MTAVRULI CAPITAL LETTER HAR;Lu;0;L;;;;;N;;;;10F4; +1CB5;GEORGIAN MTAVRULI CAPITAL LETTER HOE;Lu;0;L;;;;;N;;;;10F5; +1CB6;GEORGIAN MTAVRULI CAPITAL LETTER FI;Lu;0;L;;;;;N;;;;10F6; +1CB7;GEORGIAN MTAVRULI CAPITAL LETTER YN;Lu;0;L;;;;;N;;;;10F7; +1CB8;GEORGIAN MTAVRULI CAPITAL LETTER ELIFI;Lu;0;L;;;;;N;;;;10F8; +1CB9;GEORGIAN MTAVRULI CAPITAL LETTER TURNED GAN;Lu;0;L;;;;;N;;;;10F9; +1CBA;GEORGIAN MTAVRULI CAPITAL LETTER AIN;Lu;0;L;;;;;N;;;;10FA; +1CBD;GEORGIAN MTAVRULI CAPITAL LETTER AEN;Lu;0;L;;;;;N;;;;10FD; +1CBE;GEORGIAN MTAVRULI CAPITAL LETTER HARD SIGN;Lu;0;L;;;;;N;;;;10FE; +1CBF;GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN;Lu;0;L;;;;;N;;;;10FF; 1CC0;SUNDANESE PUNCTUATION BINDU SURYA;Po;0;L;;;;;N;;;;; 1CC1;SUNDANESE PUNCTUATION BINDU PANGLONG;Po;0;L;;;;;N;;;;; 1CC2;SUNDANESE PUNCTUATION BINDU PURNAMA;Po;0;L;;;;;N;;;;; @@ -9559,7 +9617,7 @@ 299E;ANGLE WITH S INSIDE;Sm;0;ON;;;;;Y;;;;; 299F;ACUTE ANGLE;Sm;0;ON;;;;;Y;;;;; 29A0;SPHERICAL ANGLE OPENING LEFT;Sm;0;ON;;;;;Y;;;;; -29A1;SPHERICAL ANGLE OPENING UP;Sm;0;ON;;;;;Y;;;;; +29A1;SPHERICAL ANGLE OPENING UP;Sm;0;ON;;;;;N;;;;; 29A2;TURNED ANGLE;Sm;0;ON;;;;;Y;;;;; 29A3;REVERSED ANGLE;Sm;0;ON;;;;;Y;;;;; 29A4;ANGLE WITH UNDERBAR;Sm;0;ON;;;;;Y;;;;; @@ -10092,6 +10150,9 @@ 2BB7;RIBBON ARROW RIGHT DOWN;So;0;ON;;;;;N;;;;; 2BB8;UPWARDS WHITE ARROW FROM BAR WITH HORIZONTAL BAR;So;0;ON;;;;;N;;;;; 2BB9;UP ARROWHEAD IN A RECTANGLE BOX;So;0;ON;;;;;N;;;;; +2BBA;OVERLAPPING WHITE SQUARES;So;0;ON;;;;;N;;;;; +2BBB;OVERLAPPING WHITE AND BLACK SQUARES;So;0;ON;;;;;N;;;;; +2BBC;OVERLAPPING BLACK SQUARES;So;0;ON;;;;;N;;;;; 2BBD;BALLOT BOX WITH LIGHT X;So;0;ON;;;;;N;;;;; 2BBE;CIRCLED X;So;0;ON;;;;;N;;;;; 2BBF;CIRCLED BOLD X;So;0;ON;;;;;N;;;;; @@ -10113,10 +10174,50 @@ 2BD0;SQUARE POSITION INDICATOR;So;0;ON;;;;;N;;;;; 2BD1;UNCERTAINTY SIGN;So;0;ON;;;;;N;;;;; 2BD2;GROUP MARK;So;0;ON;;;;;N;;;;; +2BD3;PLUTO FORM TWO;So;0;ON;;;;;N;;;;; +2BD4;PLUTO FORM THREE;So;0;ON;;;;;N;;;;; +2BD5;PLUTO FORM FOUR;So;0;ON;;;;;N;;;;; +2BD6;PLUTO FORM FIVE;So;0;ON;;;;;N;;;;; +2BD7;TRANSPLUTO;So;0;ON;;;;;N;;;;; +2BD8;PROSERPINA;So;0;ON;;;;;N;;;;; +2BD9;ASTRAEA;So;0;ON;;;;;N;;;;; +2BDA;HYGIEA;So;0;ON;;;;;N;;;;; +2BDB;PHOLUS;So;0;ON;;;;;N;;;;; +2BDC;NESSUS;So;0;ON;;;;;N;;;;; +2BDD;WHITE MOON SELENA;So;0;ON;;;;;N;;;;; +2BDE;BLACK DIAMOND ON CROSS;So;0;ON;;;;;N;;;;; +2BDF;TRUE LIGHT MOON ARTA;So;0;ON;;;;;N;;;;; +2BE0;CUPIDO;So;0;ON;;;;;N;;;;; +2BE1;HADES;So;0;ON;;;;;N;;;;; +2BE2;ZEUS;So;0;ON;;;;;N;;;;; +2BE3;KRONOS;So;0;ON;;;;;N;;;;; +2BE4;APOLLON;So;0;ON;;;;;N;;;;; +2BE5;ADMETOS;So;0;ON;;;;;N;;;;; +2BE6;VULCANUS;So;0;ON;;;;;N;;;;; +2BE7;POSEIDON;So;0;ON;;;;;N;;;;; +2BE8;LEFT HALF BLACK STAR;So;0;ON;;;;;N;;;;; +2BE9;RIGHT HALF BLACK STAR;So;0;ON;;;;;N;;;;; +2BEA;STAR WITH LEFT HALF BLACK;So;0;ON;;;;;N;;;;; +2BEB;STAR WITH RIGHT HALF BLACK;So;0;ON;;;;;N;;;;; 2BEC;LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS;So;0;ON;;;;;N;;;;; 2BED;UPWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS;So;0;ON;;;;;N;;;;; 2BEE;RIGHTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS;So;0;ON;;;;;N;;;;; 2BEF;DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS;So;0;ON;;;;;N;;;;; +2BF0;ERIS FORM ONE;So;0;ON;;;;;N;;;;; +2BF1;ERIS FORM TWO;So;0;ON;;;;;N;;;;; +2BF2;SEDNA;So;0;ON;;;;;N;;;;; +2BF3;RUSSIAN ASTROLOGICAL SYMBOL VIGINTILE;So;0;ON;;;;;N;;;;; +2BF4;RUSSIAN ASTROLOGICAL SYMBOL NOVILE;So;0;ON;;;;;N;;;;; +2BF5;RUSSIAN ASTROLOGICAL SYMBOL QUINTILE;So;0;ON;;;;;N;;;;; +2BF6;RUSSIAN ASTROLOGICAL SYMBOL BINOVILE;So;0;ON;;;;;N;;;;; +2BF7;RUSSIAN ASTROLOGICAL SYMBOL SENTAGON;So;0;ON;;;;;N;;;;; +2BF8;RUSSIAN ASTROLOGICAL SYMBOL TREDECILE;So;0;ON;;;;;N;;;;; +2BF9;EQUALS SIGN WITH INFINITY BELOW;So;0;ON;;;;;N;;;;; +2BFA;UNITED SYMBOL;So;0;ON;;;;;N;;;;; +2BFB;SEPARATED SYMBOL;So;0;ON;;;;;N;;;;; +2BFC;DOUBLED SYMBOL;So;0;ON;;;;;N;;;;; +2BFD;PASSED SYMBOL;So;0;ON;;;;;N;;;;; +2BFE;REVERSED RIGHT ANGLE;So;0;ON;;;;;Y;;;;; 2C00;GLAGOLITIC CAPITAL LETTER AZU;Lu;0;L;;;;;N;;;;2C30; 2C01;GLAGOLITIC CAPITAL LETTER BUKY;Lu;0;L;;;;;N;;;;2C31; 2C02;GLAGOLITIC CAPITAL LETTER VEDE;Lu;0;L;;;;;N;;;;2C32; @@ -10650,6 +10751,11 @@ 2E47;LOW KAVYKA;Po;0;ON;;;;;N;;;;; 2E48;LOW KAVYKA WITH DOT;Po;0;ON;;;;;N;;;;; 2E49;DOUBLE STACKED COMMA;Po;0;ON;;;;;N;;;;; +2E4A;DOTTED SOLIDUS;Po;0;ON;;;;;N;;;;; +2E4B;TRIPLE DAGGER;Po;0;ON;;;;;N;;;;; +2E4C;MEDIEVAL COMMA;Po;0;ON;;;;;N;;;;; +2E4D;PARAGRAPHUS MARK;Po;0;ON;;;;;N;;;;; +2E4E;PUNCTUS ELEVATUS MARK;Po;0;ON;;;;;N;;;;; 2E80;CJK RADICAL REPEAT;So;0;ON;;;;;N;;;;; 2E81;CJK RADICAL CLIFF;So;0;ON;;;;;N;;;;; 2E82;CJK RADICAL SECOND ONE;So;0;ON;;;;;N;;;;; @@ -11286,6 +11392,7 @@ 312C;BOPOMOFO LETTER GN;Lo;0;L;;;;;N;;;;; 312D;BOPOMOFO LETTER IH;Lo;0;L;;;;;N;;;;; 312E;BOPOMOFO LETTER O WITH DOT ABOVE;Lo;0;L;;;;;N;;;;; +312F;BOPOMOFO LETTER NN;Lo;0;L;;;;;N;;;;; 3131;HANGUL LETTER KIYEOK;Lo;0;L; 1100;;;;N;HANGUL LETTER GIYEOG;;;; 3132;HANGUL LETTER SSANGKIYEOK;Lo;0;L; 1101;;;;N;HANGUL LETTER SSANG GIYEOG;;;; 3133;HANGUL LETTER KIYEOK-SIOS;Lo;0;L; 11AA;;;;N;HANGUL LETTER GIYEOG SIOS;;;; @@ -12052,7 +12159,7 @@ 4DFE;HEXAGRAM FOR AFTER COMPLETION;So;0;ON;;;;;N;;;;; 4DFF;HEXAGRAM FOR BEFORE COMPLETION;So;0;ON;;;;;N;;;;; 4E00;;Lo;0;L;;;;;N;;;;; -9FEA;;Lo;0;L;;;;;N;;;;; +9FEF;;Lo;0;L;;;;;N;;;;; A000;YI SYLLABLE IT;Lo;0;L;;;;;N;;;;; A001;YI SYLLABLE IX;Lo;0;L;;;;;N;;;;; A002;YI SYLLABLE I;Lo;0;L;;;;;N;;;;; @@ -13980,6 +14087,7 @@ A7AB;LATIN CAPITAL LETTER REVERSED OPEN E;Lu;0;L;;;;;N;;;;025C; A7AC;LATIN CAPITAL LETTER SCRIPT G;Lu;0;L;;;;;N;;;;0261; A7AD;LATIN CAPITAL LETTER L WITH BELT;Lu;0;L;;;;;N;;;;026C; A7AE;LATIN CAPITAL LETTER SMALL CAPITAL I;Lu;0;L;;;;;N;;;;026A; +A7AF;LATIN LETTER SMALL CAPITAL Q;Ll;0;L;;;;;N;;;;; A7B0;LATIN CAPITAL LETTER TURNED K;Lu;0;L;;;;;N;;;;029E; A7B1;LATIN CAPITAL LETTER TURNED T;Lu;0;L;;;;;N;;;;0287; A7B2;LATIN CAPITAL LETTER J WITH CROSSED-TAIL;Lu;0;L;;;;;N;;;;029D; @@ -13988,6 +14096,8 @@ A7B4;LATIN CAPITAL LETTER BETA;Lu;0;L;;;;;N;;;;A7B5; A7B5;LATIN SMALL LETTER BETA;Ll;0;L;;;;;N;;;A7B4;;A7B4 A7B6;LATIN CAPITAL LETTER OMEGA;Lu;0;L;;;;;N;;;;A7B7; A7B7;LATIN SMALL LETTER OMEGA;Ll;0;L;;;;;N;;;A7B6;;A7B6 +A7B8;LATIN CAPITAL LETTER U WITH STROKE;Lu;0;L;;;;;N;;;;A7B9; +A7B9;LATIN SMALL LETTER U WITH STROKE;Ll;0;L;;;;;N;;;A7B8;;A7B8 A7F7;LATIN EPIGRAPHIC LETTER SIDEWAYS I;Lo;0;L;;;;;N;;;;; A7F8;MODIFIER LETTER CAPITAL H WITH STROKE;Lm;0;L; 0126;;;;N;;;;; A7F9;MODIFIER LETTER SMALL LIGATURE OE;Lm;0;L; 0153;;;;N;;;;; @@ -14219,6 +14329,8 @@ A8FA;DEVANAGARI CARET;Po;0;L;;;;;N;;;;; A8FB;DEVANAGARI HEADSTROKE;Lo;0;L;;;;;N;;;;; A8FC;DEVANAGARI SIGN SIDDHAM;Po;0;L;;;;;N;;;;; A8FD;DEVANAGARI JAIN OM;Lo;0;L;;;;;N;;;;; +A8FE;DEVANAGARI LETTER AY;Lo;0;L;;;;;N;;;;; +A8FF;DEVANAGARI VOWEL SIGN AY;Mn;0;NSM;;;;;N;;;;; A900;KAYAH LI DIGIT ZERO;Nd;0;L;;0;0;0;N;;;;; A901;KAYAH LI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;; A902;KAYAH LI DIGIT TWO;Nd;0;L;;2;2;2;N;;;;; @@ -18363,6 +18475,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 10A31;KHAROSHTHI LETTER HA;Lo;0;R;;;;;N;;;;; 10A32;KHAROSHTHI LETTER KKA;Lo;0;R;;;;;N;;;;; 10A33;KHAROSHTHI LETTER TTTHA;Lo;0;R;;;;;N;;;;; +10A34;KHAROSHTHI LETTER TTTA;Lo;0;R;;;;;N;;;;; +10A35;KHAROSHTHI LETTER VHA;Lo;0;R;;;;;N;;;;; 10A38;KHAROSHTHI SIGN BAR ABOVE;Mn;230;NSM;;;;;N;;;;; 10A39;KHAROSHTHI SIGN CAUDA;Mn;1;NSM;;;;;N;;;;; 10A3A;KHAROSHTHI SIGN DOT BELOW;Mn;220;NSM;;;;;N;;;;; @@ -18375,6 +18489,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 10A45;KHAROSHTHI NUMBER TWENTY;No;0;R;;;;20;N;;;;; 10A46;KHAROSHTHI NUMBER ONE HUNDRED;No;0;R;;;;100;N;;;;; 10A47;KHAROSHTHI NUMBER ONE THOUSAND;No;0;R;;;;1000;N;;;;; +10A48;KHAROSHTHI FRACTION ONE HALF;No;0;R;;;;1/2;N;;;;; 10A50;KHAROSHTHI PUNCTUATION DOT;Po;0;R;;;;;N;;;;; 10A51;KHAROSHTHI PUNCTUATION SMALL CIRCLE;Po;0;R;;;;;N;;;;; 10A52;KHAROSHTHI PUNCTUATION CIRCLE;Po;0;R;;;;;N;;;;; @@ -18827,6 +18942,56 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 10CFD;OLD HUNGARIAN NUMBER FIFTY;No;0;R;;;;50;N;;;;; 10CFE;OLD HUNGARIAN NUMBER ONE HUNDRED;No;0;R;;;;100;N;;;;; 10CFF;OLD HUNGARIAN NUMBER ONE THOUSAND;No;0;R;;;;1000;N;;;;; +10D00;HANIFI ROHINGYA LETTER A;Lo;0;AL;;;;;N;;;;; +10D01;HANIFI ROHINGYA LETTER BA;Lo;0;AL;;;;;N;;;;; +10D02;HANIFI ROHINGYA LETTER PA;Lo;0;AL;;;;;N;;;;; +10D03;HANIFI ROHINGYA LETTER TA;Lo;0;AL;;;;;N;;;;; +10D04;HANIFI ROHINGYA LETTER TTA;Lo;0;AL;;;;;N;;;;; +10D05;HANIFI ROHINGYA LETTER JA;Lo;0;AL;;;;;N;;;;; +10D06;HANIFI ROHINGYA LETTER CA;Lo;0;AL;;;;;N;;;;; +10D07;HANIFI ROHINGYA LETTER HA;Lo;0;AL;;;;;N;;;;; +10D08;HANIFI ROHINGYA LETTER KHA;Lo;0;AL;;;;;N;;;;; +10D09;HANIFI ROHINGYA LETTER FA;Lo;0;AL;;;;;N;;;;; +10D0A;HANIFI ROHINGYA LETTER DA;Lo;0;AL;;;;;N;;;;; +10D0B;HANIFI ROHINGYA LETTER DDA;Lo;0;AL;;;;;N;;;;; +10D0C;HANIFI ROHINGYA LETTER RA;Lo;0;AL;;;;;N;;;;; +10D0D;HANIFI ROHINGYA LETTER RRA;Lo;0;AL;;;;;N;;;;; +10D0E;HANIFI ROHINGYA LETTER ZA;Lo;0;AL;;;;;N;;;;; +10D0F;HANIFI ROHINGYA LETTER SA;Lo;0;AL;;;;;N;;;;; +10D10;HANIFI ROHINGYA LETTER SHA;Lo;0;AL;;;;;N;;;;; +10D11;HANIFI ROHINGYA LETTER KA;Lo;0;AL;;;;;N;;;;; +10D12;HANIFI ROHINGYA LETTER GA;Lo;0;AL;;;;;N;;;;; +10D13;HANIFI ROHINGYA LETTER LA;Lo;0;AL;;;;;N;;;;; +10D14;HANIFI ROHINGYA LETTER MA;Lo;0;AL;;;;;N;;;;; +10D15;HANIFI ROHINGYA LETTER NA;Lo;0;AL;;;;;N;;;;; +10D16;HANIFI ROHINGYA LETTER WA;Lo;0;AL;;;;;N;;;;; +10D17;HANIFI ROHINGYA LETTER KINNA WA;Lo;0;AL;;;;;N;;;;; +10D18;HANIFI ROHINGYA LETTER YA;Lo;0;AL;;;;;N;;;;; +10D19;HANIFI ROHINGYA LETTER KINNA YA;Lo;0;AL;;;;;N;;;;; +10D1A;HANIFI ROHINGYA LETTER NGA;Lo;0;AL;;;;;N;;;;; +10D1B;HANIFI ROHINGYA LETTER NYA;Lo;0;AL;;;;;N;;;;; +10D1C;HANIFI ROHINGYA LETTER VA;Lo;0;AL;;;;;N;;;;; +10D1D;HANIFI ROHINGYA VOWEL A;Lo;0;AL;;;;;N;;;;; +10D1E;HANIFI ROHINGYA VOWEL I;Lo;0;AL;;;;;N;;;;; +10D1F;HANIFI ROHINGYA VOWEL U;Lo;0;AL;;;;;N;;;;; +10D20;HANIFI ROHINGYA VOWEL E;Lo;0;AL;;;;;N;;;;; +10D21;HANIFI ROHINGYA VOWEL O;Lo;0;AL;;;;;N;;;;; +10D22;HANIFI ROHINGYA MARK SAKIN;Lo;0;AL;;;;;N;;;;; +10D23;HANIFI ROHINGYA MARK NA KHONNA;Lo;0;AL;;;;;N;;;;; +10D24;HANIFI ROHINGYA SIGN HARBAHAY;Mn;230;NSM;;;;;N;;;;; +10D25;HANIFI ROHINGYA SIGN TAHALA;Mn;230;NSM;;;;;N;;;;; +10D26;HANIFI ROHINGYA SIGN TANA;Mn;230;NSM;;;;;N;;;;; +10D27;HANIFI ROHINGYA SIGN TASSI;Mn;230;NSM;;;;;N;;;;; +10D30;HANIFI ROHINGYA DIGIT ZERO;Nd;0;AN;;0;0;0;N;;;;; +10D31;HANIFI ROHINGYA DIGIT ONE;Nd;0;AN;;1;1;1;N;;;;; +10D32;HANIFI ROHINGYA DIGIT TWO;Nd;0;AN;;2;2;2;N;;;;; +10D33;HANIFI ROHINGYA DIGIT THREE;Nd;0;AN;;3;3;3;N;;;;; +10D34;HANIFI ROHINGYA DIGIT FOUR;Nd;0;AN;;4;4;4;N;;;;; +10D35;HANIFI ROHINGYA DIGIT FIVE;Nd;0;AN;;5;5;5;N;;;;; +10D36;HANIFI ROHINGYA DIGIT SIX;Nd;0;AN;;6;6;6;N;;;;; +10D37;HANIFI ROHINGYA DIGIT SEVEN;Nd;0;AN;;7;7;7;N;;;;; +10D38;HANIFI ROHINGYA DIGIT EIGHT;Nd;0;AN;;8;8;8;N;;;;; +10D39;HANIFI ROHINGYA DIGIT NINE;Nd;0;AN;;9;9;9;N;;;;; 10E60;RUMI DIGIT ONE;No;0;AN;;;1;1;N;;;;; 10E61;RUMI DIGIT TWO;No;0;AN;;;2;2;N;;;;; 10E62;RUMI DIGIT THREE;No;0;AN;;;3;3;N;;;;; @@ -18858,6 +19023,88 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 10E7C;RUMI FRACTION ONE QUARTER;No;0;AN;;;;1/4;N;;;;; 10E7D;RUMI FRACTION ONE THIRD;No;0;AN;;;;1/3;N;;;;; 10E7E;RUMI FRACTION TWO THIRDS;No;0;AN;;;;2/3;N;;;;; +10F00;OLD SOGDIAN LETTER ALEPH;Lo;0;R;;;;;N;;;;; +10F01;OLD SOGDIAN LETTER FINAL ALEPH;Lo;0;R;;;;;N;;;;; +10F02;OLD SOGDIAN LETTER BETH;Lo;0;R;;;;;N;;;;; +10F03;OLD SOGDIAN LETTER FINAL BETH;Lo;0;R;;;;;N;;;;; +10F04;OLD SOGDIAN LETTER GIMEL;Lo;0;R;;;;;N;;;;; +10F05;OLD SOGDIAN LETTER HE;Lo;0;R;;;;;N;;;;; +10F06;OLD SOGDIAN LETTER FINAL HE;Lo;0;R;;;;;N;;;;; +10F07;OLD SOGDIAN LETTER WAW;Lo;0;R;;;;;N;;;;; +10F08;OLD SOGDIAN LETTER ZAYIN;Lo;0;R;;;;;N;;;;; +10F09;OLD SOGDIAN LETTER HETH;Lo;0;R;;;;;N;;;;; +10F0A;OLD SOGDIAN LETTER YODH;Lo;0;R;;;;;N;;;;; +10F0B;OLD SOGDIAN LETTER KAPH;Lo;0;R;;;;;N;;;;; +10F0C;OLD SOGDIAN LETTER LAMEDH;Lo;0;R;;;;;N;;;;; +10F0D;OLD SOGDIAN LETTER MEM;Lo;0;R;;;;;N;;;;; +10F0E;OLD SOGDIAN LETTER NUN;Lo;0;R;;;;;N;;;;; +10F0F;OLD SOGDIAN LETTER FINAL NUN;Lo;0;R;;;;;N;;;;; +10F10;OLD SOGDIAN LETTER FINAL NUN WITH VERTICAL TAIL;Lo;0;R;;;;;N;;;;; +10F11;OLD SOGDIAN LETTER SAMEKH;Lo;0;R;;;;;N;;;;; +10F12;OLD SOGDIAN LETTER AYIN;Lo;0;R;;;;;N;;;;; +10F13;OLD SOGDIAN LETTER ALTERNATE AYIN;Lo;0;R;;;;;N;;;;; +10F14;OLD SOGDIAN LETTER PE;Lo;0;R;;;;;N;;;;; +10F15;OLD SOGDIAN LETTER SADHE;Lo;0;R;;;;;N;;;;; +10F16;OLD SOGDIAN LETTER FINAL SADHE;Lo;0;R;;;;;N;;;;; +10F17;OLD SOGDIAN LETTER FINAL SADHE WITH VERTICAL TAIL;Lo;0;R;;;;;N;;;;; +10F18;OLD SOGDIAN LETTER RESH-AYIN-DALETH;Lo;0;R;;;;;N;;;;; +10F19;OLD SOGDIAN LETTER SHIN;Lo;0;R;;;;;N;;;;; +10F1A;OLD SOGDIAN LETTER TAW;Lo;0;R;;;;;N;;;;; +10F1B;OLD SOGDIAN LETTER FINAL TAW;Lo;0;R;;;;;N;;;;; +10F1C;OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL;Lo;0;R;;;;;N;;;;; +10F1D;OLD SOGDIAN NUMBER ONE;No;0;R;;;;1;N;;;;; +10F1E;OLD SOGDIAN NUMBER TWO;No;0;R;;;;2;N;;;;; +10F1F;OLD SOGDIAN NUMBER THREE;No;0;R;;;;3;N;;;;; +10F20;OLD SOGDIAN NUMBER FOUR;No;0;R;;;;4;N;;;;; +10F21;OLD SOGDIAN NUMBER FIVE;No;0;R;;;;5;N;;;;; +10F22;OLD SOGDIAN NUMBER TEN;No;0;R;;;;10;N;;;;; +10F23;OLD SOGDIAN NUMBER TWENTY;No;0;R;;;;20;N;;;;; +10F24;OLD SOGDIAN NUMBER THIRTY;No;0;R;;;;30;N;;;;; +10F25;OLD SOGDIAN NUMBER ONE HUNDRED;No;0;R;;;;100;N;;;;; +10F26;OLD SOGDIAN FRACTION ONE HALF;No;0;R;;;;1/2;N;;;;; +10F27;OLD SOGDIAN LIGATURE AYIN-DALETH;Lo;0;R;;;;;N;;;;; +10F30;SOGDIAN LETTER ALEPH;Lo;0;AL;;;;;N;;;;; +10F31;SOGDIAN LETTER BETH;Lo;0;AL;;;;;N;;;;; +10F32;SOGDIAN LETTER GIMEL;Lo;0;AL;;;;;N;;;;; +10F33;SOGDIAN LETTER HE;Lo;0;AL;;;;;N;;;;; +10F34;SOGDIAN LETTER WAW;Lo;0;AL;;;;;N;;;;; +10F35;SOGDIAN LETTER ZAYIN;Lo;0;AL;;;;;N;;;;; +10F36;SOGDIAN LETTER HETH;Lo;0;AL;;;;;N;;;;; +10F37;SOGDIAN LETTER YODH;Lo;0;AL;;;;;N;;;;; +10F38;SOGDIAN LETTER KAPH;Lo;0;AL;;;;;N;;;;; +10F39;SOGDIAN LETTER LAMEDH;Lo;0;AL;;;;;N;;;;; +10F3A;SOGDIAN LETTER MEM;Lo;0;AL;;;;;N;;;;; +10F3B;SOGDIAN LETTER NUN;Lo;0;AL;;;;;N;;;;; +10F3C;SOGDIAN LETTER SAMEKH;Lo;0;AL;;;;;N;;;;; +10F3D;SOGDIAN LETTER AYIN;Lo;0;AL;;;;;N;;;;; +10F3E;SOGDIAN LETTER PE;Lo;0;AL;;;;;N;;;;; +10F3F;SOGDIAN LETTER SADHE;Lo;0;AL;;;;;N;;;;; +10F40;SOGDIAN LETTER RESH-AYIN;Lo;0;AL;;;;;N;;;;; +10F41;SOGDIAN LETTER SHIN;Lo;0;AL;;;;;N;;;;; +10F42;SOGDIAN LETTER TAW;Lo;0;AL;;;;;N;;;;; +10F43;SOGDIAN LETTER FETH;Lo;0;AL;;;;;N;;;;; +10F44;SOGDIAN LETTER LESH;Lo;0;AL;;;;;N;;;;; +10F45;SOGDIAN INDEPENDENT SHIN;Lo;0;AL;;;;;N;;;;; +10F46;SOGDIAN COMBINING DOT BELOW;Mn;220;NSM;;;;;N;;;;; +10F47;SOGDIAN COMBINING TWO DOTS BELOW;Mn;220;NSM;;;;;N;;;;; +10F48;SOGDIAN COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;;;;; +10F49;SOGDIAN COMBINING TWO DOTS ABOVE;Mn;230;NSM;;;;;N;;;;; +10F4A;SOGDIAN COMBINING CURVE ABOVE;Mn;230;NSM;;;;;N;;;;; +10F4B;SOGDIAN COMBINING CURVE BELOW;Mn;220;NSM;;;;;N;;;;; +10F4C;SOGDIAN COMBINING HOOK ABOVE;Mn;230;NSM;;;;;N;;;;; +10F4D;SOGDIAN COMBINING HOOK BELOW;Mn;220;NSM;;;;;N;;;;; +10F4E;SOGDIAN COMBINING LONG HOOK BELOW;Mn;220;NSM;;;;;N;;;;; +10F4F;SOGDIAN COMBINING RESH BELOW;Mn;220;NSM;;;;;N;;;;; +10F50;SOGDIAN COMBINING STROKE BELOW;Mn;220;NSM;;;;;N;;;;; +10F51;SOGDIAN NUMBER ONE;No;0;AL;;;;1;N;;;;; +10F52;SOGDIAN NUMBER TEN;No;0;AL;;;;10;N;;;;; +10F53;SOGDIAN NUMBER TWENTY;No;0;AL;;;;20;N;;;;; +10F54;SOGDIAN NUMBER ONE HUNDRED;No;0;AL;;;;100;N;;;;; +10F55;SOGDIAN PUNCTUATION TWO VERTICAL BARS;Po;0;AL;;;;;N;;;;; +10F56;SOGDIAN PUNCTUATION TWO VERTICAL BARS WITH DOTS;Po;0;AL;;;;;N;;;;; +10F57;SOGDIAN PUNCTUATION CIRCLE WITH DOT;Po;0;AL;;;;;N;;;;; +10F58;SOGDIAN PUNCTUATION TWO CIRCLES WITH DOTS;Po;0;AL;;;;;N;;;;; +10F59;SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT;Po;0;AL;;;;;N;;;;; 11000;BRAHMI SIGN CANDRABINDU;Mc;0;L;;;;;N;;;;; 11001;BRAHMI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;; 11002;BRAHMI SIGN VISARGA;Mc;0;L;;;;;N;;;;; @@ -19033,6 +19280,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 110BF;KAITHI DOUBLE SECTION MARK;Po;0;L;;;;;N;;;;; 110C0;KAITHI DANDA;Po;0;L;;;;;N;;;;; 110C1;KAITHI DOUBLE DANDA;Po;0;L;;;;;N;;;;; +110CD;KAITHI NUMBER SIGN ABOVE;Cf;0;L;;;;;N;;;;; 110D0;SORA SOMPENG LETTER SAH;Lo;0;L;;;;;N;;;;; 110D1;SORA SOMPENG LETTER TAH;Lo;0;L;;;;;N;;;;; 110D2;SORA SOMPENG LETTER BAH;Lo;0;L;;;;;N;;;;; @@ -19135,6 +19383,9 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11141;CHAKMA DANDA;Po;0;L;;;;;N;;;;; 11142;CHAKMA DOUBLE DANDA;Po;0;L;;;;;N;;;;; 11143;CHAKMA QUESTION MARK;Po;0;L;;;;;N;;;;; +11144;CHAKMA LETTER LHAA;Lo;0;L;;;;;N;;;;; +11145;CHAKMA VOWEL SIGN AA;Mc;0;L;;;;;N;;;;; +11146;CHAKMA VOWEL SIGN EI;Mc;0;L;;;;;N;;;;; 11150;MAHAJANI LETTER A;Lo;0;L;;;;;N;;;;; 11151;MAHAJANI LETTER I;Lo;0;L;;;;;N;;;;; 11152;MAHAJANI LETTER U;Lo;0;L;;;;;N;;;;; @@ -19247,7 +19498,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 111C6;SHARADA DOUBLE DANDA;Po;0;L;;;;;N;;;;; 111C7;SHARADA ABBREVIATION SIGN;Po;0;L;;;;;N;;;;; 111C8;SHARADA SEPARATOR;Po;0;L;;;;;N;;;;; -111C9;SHARADA SANDHI MARK;Po;0;L;;;;;N;;;;; +111C9;SHARADA SANDHI MARK;Mn;0;NSM;;;;;N;;;;; 111CA;SHARADA SIGN NUKTA;Mn;7;NSM;;;;;N;;;;; 111CB;SHARADA VOWEL MODIFIER MARK;Mn;0;NSM;;;;;N;;;;; 111CC;SHARADA EXTRA SHORT VOWEL MARK;Mn;0;NSM;;;;;N;;;;; @@ -19507,6 +19758,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11337;GRANTHA LETTER SSA;Lo;0;L;;;;;N;;;;; 11338;GRANTHA LETTER SA;Lo;0;L;;;;;N;;;;; 11339;GRANTHA LETTER HA;Lo;0;L;;;;;N;;;;; +1133B;COMBINING BINDU BELOW;Mn;7;NSM;;;;;N;;;;; 1133C;GRANTHA SIGN NUKTA;Mn;7;NSM;;;;;N;;;;; 1133D;GRANTHA SIGN AVAGRAHA;Lo;0;L;;;;;N;;;;; 1133E;GRANTHA VOWEL SIGN AA;Mc;0;L;;;;;N;;;;; @@ -19634,6 +19886,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11459;NEWA DIGIT NINE;Nd;0;L;;9;9;9;N;;;;; 1145B;NEWA PLACEHOLDER MARK;Po;0;L;;;;;N;;;;; 1145D;NEWA INSERTION SIGN;Po;0;L;;;;;N;;;;; +1145E;NEWA SANDHI MARK;Mn;230;NSM;;;;;N;;;;; 11480;TIRHUTA ANJI;Lo;0;L;;;;;N;;;;; 11481;TIRHUTA LETTER A;Lo;0;L;;;;;N;;;;; 11482;TIRHUTA LETTER AA;Lo;0;L;;;;;N;;;;; @@ -19992,6 +20245,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11717;AHOM LETTER GHA;Lo;0;L;;;;;N;;;;; 11718;AHOM LETTER BHA;Lo;0;L;;;;;N;;;;; 11719;AHOM LETTER JHA;Lo;0;L;;;;;N;;;;; +1171A;AHOM LETTER ALTERNATE BA;Lo;0;L;;;;;N;;;;; 1171D;AHOM CONSONANT SIGN MEDIAL LA;Mn;0;NSM;;;;;N;;;;; 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;; 1171F;AHOM CONSONANT SIGN MEDIAL LIGATING RA;Mn;0;NSM;;;;;N;;;;; @@ -20023,6 +20277,66 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1173D;AHOM SIGN SECTION;Po;0;L;;;;;N;;;;; 1173E;AHOM SIGN RULAI;Po;0;L;;;;;N;;;;; 1173F;AHOM SYMBOL VI;So;0;L;;;;;N;;;;; +11800;DOGRA LETTER A;Lo;0;L;;;;;N;;;;; +11801;DOGRA LETTER AA;Lo;0;L;;;;;N;;;;; +11802;DOGRA LETTER I;Lo;0;L;;;;;N;;;;; +11803;DOGRA LETTER II;Lo;0;L;;;;;N;;;;; +11804;DOGRA LETTER U;Lo;0;L;;;;;N;;;;; +11805;DOGRA LETTER UU;Lo;0;L;;;;;N;;;;; +11806;DOGRA LETTER E;Lo;0;L;;;;;N;;;;; +11807;DOGRA LETTER AI;Lo;0;L;;;;;N;;;;; +11808;DOGRA LETTER O;Lo;0;L;;;;;N;;;;; +11809;DOGRA LETTER AU;Lo;0;L;;;;;N;;;;; +1180A;DOGRA LETTER KA;Lo;0;L;;;;;N;;;;; +1180B;DOGRA LETTER KHA;Lo;0;L;;;;;N;;;;; +1180C;DOGRA LETTER GA;Lo;0;L;;;;;N;;;;; +1180D;DOGRA LETTER GHA;Lo;0;L;;;;;N;;;;; +1180E;DOGRA LETTER NGA;Lo;0;L;;;;;N;;;;; +1180F;DOGRA LETTER CA;Lo;0;L;;;;;N;;;;; +11810;DOGRA LETTER CHA;Lo;0;L;;;;;N;;;;; +11811;DOGRA LETTER JA;Lo;0;L;;;;;N;;;;; +11812;DOGRA LETTER JHA;Lo;0;L;;;;;N;;;;; +11813;DOGRA LETTER NYA;Lo;0;L;;;;;N;;;;; +11814;DOGRA LETTER TTA;Lo;0;L;;;;;N;;;;; +11815;DOGRA LETTER TTHA;Lo;0;L;;;;;N;;;;; +11816;DOGRA LETTER DDA;Lo;0;L;;;;;N;;;;; +11817;DOGRA LETTER DDHA;Lo;0;L;;;;;N;;;;; +11818;DOGRA LETTER NNA;Lo;0;L;;;;;N;;;;; +11819;DOGRA LETTER TA;Lo;0;L;;;;;N;;;;; +1181A;DOGRA LETTER THA;Lo;0;L;;;;;N;;;;; +1181B;DOGRA LETTER DA;Lo;0;L;;;;;N;;;;; +1181C;DOGRA LETTER DHA;Lo;0;L;;;;;N;;;;; +1181D;DOGRA LETTER NA;Lo;0;L;;;;;N;;;;; +1181E;DOGRA LETTER PA;Lo;0;L;;;;;N;;;;; +1181F;DOGRA LETTER PHA;Lo;0;L;;;;;N;;;;; +11820;DOGRA LETTER BA;Lo;0;L;;;;;N;;;;; +11821;DOGRA LETTER BHA;Lo;0;L;;;;;N;;;;; +11822;DOGRA LETTER MA;Lo;0;L;;;;;N;;;;; +11823;DOGRA LETTER YA;Lo;0;L;;;;;N;;;;; +11824;DOGRA LETTER RA;Lo;0;L;;;;;N;;;;; +11825;DOGRA LETTER LA;Lo;0;L;;;;;N;;;;; +11826;DOGRA LETTER VA;Lo;0;L;;;;;N;;;;; +11827;DOGRA LETTER SHA;Lo;0;L;;;;;N;;;;; +11828;DOGRA LETTER SSA;Lo;0;L;;;;;N;;;;; +11829;DOGRA LETTER SA;Lo;0;L;;;;;N;;;;; +1182A;DOGRA LETTER HA;Lo;0;L;;;;;N;;;;; +1182B;DOGRA LETTER RRA;Lo;0;L;;;;;N;;;;; +1182C;DOGRA VOWEL SIGN AA;Mc;0;L;;;;;N;;;;; +1182D;DOGRA VOWEL SIGN I;Mc;0;L;;;;;N;;;;; +1182E;DOGRA VOWEL SIGN II;Mc;0;L;;;;;N;;;;; +1182F;DOGRA VOWEL SIGN U;Mn;0;NSM;;;;;N;;;;; +11830;DOGRA VOWEL SIGN UU;Mn;0;NSM;;;;;N;;;;; +11831;DOGRA VOWEL SIGN VOCALIC R;Mn;0;NSM;;;;;N;;;;; +11832;DOGRA VOWEL SIGN VOCALIC RR;Mn;0;NSM;;;;;N;;;;; +11833;DOGRA VOWEL SIGN E;Mn;0;NSM;;;;;N;;;;; +11834;DOGRA VOWEL SIGN AI;Mn;0;NSM;;;;;N;;;;; +11835;DOGRA VOWEL SIGN O;Mn;0;NSM;;;;;N;;;;; +11836;DOGRA VOWEL SIGN AU;Mn;0;NSM;;;;;N;;;;; +11837;DOGRA SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;; +11838;DOGRA SIGN VISARGA;Mc;0;L;;;;;N;;;;; +11839;DOGRA SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;; +1183A;DOGRA SIGN NUKTA;Mn;7;NSM;;;;;N;;;;; +1183B;DOGRA ABBREVIATION SIGN;Po;0;L;;;;;N;;;;; 118A0;WARANG CITI CAPITAL LETTER NGAA;Lu;0;L;;;;;N;;;;118C0; 118A1;WARANG CITI CAPITAL LETTER A;Lu;0;L;;;;;N;;;;118C1; 118A2;WARANG CITI CAPITAL LETTER WI;Lu;0;L;;;;;N;;;;118C2; @@ -20114,8 +20428,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11A04;ZANABAZAR SQUARE VOWEL SIGN E;Mn;0;NSM;;;;;N;;;;; 11A05;ZANABAZAR SQUARE VOWEL SIGN OE;Mn;0;NSM;;;;;N;;;;; 11A06;ZANABAZAR SQUARE VOWEL SIGN O;Mn;0;NSM;;;;;N;;;;; -11A07;ZANABAZAR SQUARE VOWEL SIGN AI;Mc;0;L;;;;;N;;;;; -11A08;ZANABAZAR SQUARE VOWEL SIGN AU;Mc;0;L;;;;;N;;;;; +11A07;ZANABAZAR SQUARE VOWEL SIGN AI;Mn;0;L;;;;;N;;;;; +11A08;ZANABAZAR SQUARE VOWEL SIGN AU;Mn;0;L;;;;;N;;;;; 11A09;ZANABAZAR SQUARE VOWEL SIGN REVERSED I;Mn;0;NSM;;;;;N;;;;; 11A0A;ZANABAZAR SQUARE VOWEL LENGTH MARK;Mn;0;NSM;;;;;N;;;;; 11A0B;ZANABAZAR SQUARE LETTER KA;Lo;0;L;;;;;N;;;;; @@ -20254,6 +20568,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11A9A;SOYOMBO MARK TSHEG;Po;0;L;;;;;N;;;;; 11A9B;SOYOMBO MARK SHAD;Po;0;L;;;;;N;;;;; 11A9C;SOYOMBO MARK DOUBLE SHAD;Po;0;L;;;;;N;;;;; +11A9D;SOYOMBO MARK PLUTA;Lo;0;L;;;;;N;;;;; 11A9E;SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME;Po;0;L;;;;;N;;;;; 11A9F;SOYOMBO HEAD MARK WITH MOON AND SUN AND FLAME;Po;0;L;;;;;N;;;;; 11AA0;SOYOMBO HEAD MARK WITH MOON AND SUN;Po;0;L;;;;;N;;;;; @@ -20556,6 +20871,94 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 11D57;MASARAM GONDI DIGIT SEVEN;Nd;0;L;;7;7;7;N;;;;; 11D58;MASARAM GONDI DIGIT EIGHT;Nd;0;L;;8;8;8;N;;;;; 11D59;MASARAM GONDI DIGIT NINE;Nd;0;L;;9;9;9;N;;;;; +11D60;GUNJALA GONDI LETTER A;Lo;0;L;;;;;N;;;;; +11D61;GUNJALA GONDI LETTER AA;Lo;0;L;;;;;N;;;;; +11D62;GUNJALA GONDI LETTER I;Lo;0;L;;;;;N;;;;; +11D63;GUNJALA GONDI LETTER II;Lo;0;L;;;;;N;;;;; +11D64;GUNJALA GONDI LETTER U;Lo;0;L;;;;;N;;;;; +11D65;GUNJALA GONDI LETTER UU;Lo;0;L;;;;;N;;;;; +11D67;GUNJALA GONDI LETTER EE;Lo;0;L;;;;;N;;;;; +11D68;GUNJALA GONDI LETTER AI;Lo;0;L;;;;;N;;;;; +11D6A;GUNJALA GONDI LETTER OO;Lo;0;L;;;;;N;;;;; +11D6B;GUNJALA GONDI LETTER AU;Lo;0;L;;;;;N;;;;; +11D6C;GUNJALA GONDI LETTER YA;Lo;0;L;;;;;N;;;;; +11D6D;GUNJALA GONDI LETTER VA;Lo;0;L;;;;;N;;;;; +11D6E;GUNJALA GONDI LETTER BA;Lo;0;L;;;;;N;;;;; +11D6F;GUNJALA GONDI LETTER BHA;Lo;0;L;;;;;N;;;;; +11D70;GUNJALA GONDI LETTER MA;Lo;0;L;;;;;N;;;;; +11D71;GUNJALA GONDI LETTER KA;Lo;0;L;;;;;N;;;;; +11D72;GUNJALA GONDI LETTER KHA;Lo;0;L;;;;;N;;;;; +11D73;GUNJALA GONDI LETTER TA;Lo;0;L;;;;;N;;;;; +11D74;GUNJALA GONDI LETTER THA;Lo;0;L;;;;;N;;;;; +11D75;GUNJALA GONDI LETTER LA;Lo;0;L;;;;;N;;;;; +11D76;GUNJALA GONDI LETTER GA;Lo;0;L;;;;;N;;;;; +11D77;GUNJALA GONDI LETTER GHA;Lo;0;L;;;;;N;;;;; +11D78;GUNJALA GONDI LETTER DA;Lo;0;L;;;;;N;;;;; +11D79;GUNJALA GONDI LETTER DHA;Lo;0;L;;;;;N;;;;; +11D7A;GUNJALA GONDI LETTER NA;Lo;0;L;;;;;N;;;;; +11D7B;GUNJALA GONDI LETTER CA;Lo;0;L;;;;;N;;;;; +11D7C;GUNJALA GONDI LETTER CHA;Lo;0;L;;;;;N;;;;; +11D7D;GUNJALA GONDI LETTER TTA;Lo;0;L;;;;;N;;;;; +11D7E;GUNJALA GONDI LETTER TTHA;Lo;0;L;;;;;N;;;;; +11D7F;GUNJALA GONDI LETTER LLA;Lo;0;L;;;;;N;;;;; +11D80;GUNJALA GONDI LETTER JA;Lo;0;L;;;;;N;;;;; +11D81;GUNJALA GONDI LETTER JHA;Lo;0;L;;;;;N;;;;; +11D82;GUNJALA GONDI LETTER DDA;Lo;0;L;;;;;N;;;;; +11D83;GUNJALA GONDI LETTER DDHA;Lo;0;L;;;;;N;;;;; +11D84;GUNJALA GONDI LETTER NGA;Lo;0;L;;;;;N;;;;; +11D85;GUNJALA GONDI LETTER PA;Lo;0;L;;;;;N;;;;; +11D86;GUNJALA GONDI LETTER PHA;Lo;0;L;;;;;N;;;;; +11D87;GUNJALA GONDI LETTER HA;Lo;0;L;;;;;N;;;;; +11D88;GUNJALA GONDI LETTER RA;Lo;0;L;;;;;N;;;;; +11D89;GUNJALA GONDI LETTER SA;Lo;0;L;;;;;N;;;;; +11D8A;GUNJALA GONDI VOWEL SIGN AA;Mc;0;L;;;;;N;;;;; +11D8B;GUNJALA GONDI VOWEL SIGN I;Mc;0;L;;;;;N;;;;; +11D8C;GUNJALA GONDI VOWEL SIGN II;Mc;0;L;;;;;N;;;;; +11D8D;GUNJALA GONDI VOWEL SIGN U;Mc;0;L;;;;;N;;;;; +11D8E;GUNJALA GONDI VOWEL SIGN UU;Mc;0;L;;;;;N;;;;; +11D90;GUNJALA GONDI VOWEL SIGN EE;Mn;0;NSM;;;;;N;;;;; +11D91;GUNJALA GONDI VOWEL SIGN AI;Mn;0;NSM;;;;;N;;;;; +11D93;GUNJALA GONDI VOWEL SIGN OO;Mc;0;L;;;;;N;;;;; +11D94;GUNJALA GONDI VOWEL SIGN AU;Mc;0;L;;;;;N;;;;; +11D95;GUNJALA GONDI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;; +11D96;GUNJALA GONDI SIGN VISARGA;Mc;0;L;;;;;N;;;;; +11D97;GUNJALA GONDI VIRAMA;Mn;9;NSM;;;;;N;;;;; +11D98;GUNJALA GONDI OM;Lo;0;L;;;;;N;;;;; +11DA0;GUNJALA GONDI DIGIT ZERO;Nd;0;L;;0;0;0;N;;;;; +11DA1;GUNJALA GONDI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;; +11DA2;GUNJALA GONDI DIGIT TWO;Nd;0;L;;2;2;2;N;;;;; +11DA3;GUNJALA GONDI DIGIT THREE;Nd;0;L;;3;3;3;N;;;;; +11DA4;GUNJALA GONDI DIGIT FOUR;Nd;0;L;;4;4;4;N;;;;; +11DA5;GUNJALA GONDI DIGIT FIVE;Nd;0;L;;5;5;5;N;;;;; +11DA6;GUNJALA GONDI DIGIT SIX;Nd;0;L;;6;6;6;N;;;;; +11DA7;GUNJALA GONDI DIGIT SEVEN;Nd;0;L;;7;7;7;N;;;;; +11DA8;GUNJALA GONDI DIGIT EIGHT;Nd;0;L;;8;8;8;N;;;;; +11DA9;GUNJALA GONDI DIGIT NINE;Nd;0;L;;9;9;9;N;;;;; +11EE0;MAKASAR LETTER KA;Lo;0;L;;;;;N;;;;; +11EE1;MAKASAR LETTER GA;Lo;0;L;;;;;N;;;;; +11EE2;MAKASAR LETTER NGA;Lo;0;L;;;;;N;;;;; +11EE3;MAKASAR LETTER PA;Lo;0;L;;;;;N;;;;; +11EE4;MAKASAR LETTER BA;Lo;0;L;;;;;N;;;;; +11EE5;MAKASAR LETTER MA;Lo;0;L;;;;;N;;;;; +11EE6;MAKASAR LETTER TA;Lo;0;L;;;;;N;;;;; +11EE7;MAKASAR LETTER DA;Lo;0;L;;;;;N;;;;; +11EE8;MAKASAR LETTER NA;Lo;0;L;;;;;N;;;;; +11EE9;MAKASAR LETTER CA;Lo;0;L;;;;;N;;;;; +11EEA;MAKASAR LETTER JA;Lo;0;L;;;;;N;;;;; +11EEB;MAKASAR LETTER NYA;Lo;0;L;;;;;N;;;;; +11EEC;MAKASAR LETTER YA;Lo;0;L;;;;;N;;;;; +11EED;MAKASAR LETTER RA;Lo;0;L;;;;;N;;;;; +11EEE;MAKASAR LETTER LA;Lo;0;L;;;;;N;;;;; +11EEF;MAKASAR LETTER VA;Lo;0;L;;;;;N;;;;; +11EF0;MAKASAR LETTER SA;Lo;0;L;;;;;N;;;;; +11EF1;MAKASAR LETTER A;Lo;0;L;;;;;N;;;;; +11EF2;MAKASAR ANGKA;Lo;0;L;;;;;N;;;;; +11EF3;MAKASAR VOWEL SIGN I;Mn;0;NSM;;;;;N;;;;; +11EF4;MAKASAR VOWEL SIGN U;Mn;0;NSM;;;;;N;;;;; +11EF5;MAKASAR VOWEL SIGN E;Mc;0;L;;;;;N;;;;; +11EF6;MAKASAR VOWEL SIGN O;Mc;0;L;;;;;N;;;;; +11EF7;MAKASAR PASSIMBANG;Po;0;L;;;;;N;;;;; +11EF8;MAKASAR END OF SECTION;Po;0;L;;;;;N;;;;; 12000;CUNEIFORM SIGN A;Lo;0;L;;;;;N;;;;; 12001;CUNEIFORM SIGN A TIMES A;Lo;0;L;;;;;N;;;;; 12002;CUNEIFORM SIGN A TIMES BAD;Lo;0;L;;;;;N;;;;; @@ -24219,6 +24622,97 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 16B8D;PAHAWH HMONG CLAN SIGN TSWB;Lo;0;L;;;;;N;;;;; 16B8E;PAHAWH HMONG CLAN SIGN KWM;Lo;0;L;;;;;N;;;;; 16B8F;PAHAWH HMONG CLAN SIGN VWJ;Lo;0;L;;;;;N;;;;; +16E40;MEDEFAIDRIN CAPITAL LETTER M;Lu;0;L;;;;;N;;;;16E60; +16E41;MEDEFAIDRIN CAPITAL LETTER S;Lu;0;L;;;;;N;;;;16E61; +16E42;MEDEFAIDRIN CAPITAL LETTER V;Lu;0;L;;;;;N;;;;16E62; +16E43;MEDEFAIDRIN CAPITAL LETTER W;Lu;0;L;;;;;N;;;;16E63; +16E44;MEDEFAIDRIN CAPITAL LETTER ATIU;Lu;0;L;;;;;N;;;;16E64; +16E45;MEDEFAIDRIN CAPITAL LETTER Z;Lu;0;L;;;;;N;;;;16E65; +16E46;MEDEFAIDRIN CAPITAL LETTER KP;Lu;0;L;;;;;N;;;;16E66; +16E47;MEDEFAIDRIN CAPITAL LETTER P;Lu;0;L;;;;;N;;;;16E67; +16E48;MEDEFAIDRIN CAPITAL LETTER T;Lu;0;L;;;;;N;;;;16E68; +16E49;MEDEFAIDRIN CAPITAL LETTER G;Lu;0;L;;;;;N;;;;16E69; +16E4A;MEDEFAIDRIN CAPITAL LETTER F;Lu;0;L;;;;;N;;;;16E6A; +16E4B;MEDEFAIDRIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;16E6B; +16E4C;MEDEFAIDRIN CAPITAL LETTER K;Lu;0;L;;;;;N;;;;16E6C; +16E4D;MEDEFAIDRIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;16E6D; +16E4E;MEDEFAIDRIN CAPITAL LETTER J;Lu;0;L;;;;;N;;;;16E6E; +16E4F;MEDEFAIDRIN CAPITAL LETTER E;Lu;0;L;;;;;N;;;;16E6F; +16E50;MEDEFAIDRIN CAPITAL LETTER B;Lu;0;L;;;;;N;;;;16E70; +16E51;MEDEFAIDRIN CAPITAL LETTER C;Lu;0;L;;;;;N;;;;16E71; +16E52;MEDEFAIDRIN CAPITAL LETTER U;Lu;0;L;;;;;N;;;;16E72; +16E53;MEDEFAIDRIN CAPITAL LETTER YU;Lu;0;L;;;;;N;;;;16E73; +16E54;MEDEFAIDRIN CAPITAL LETTER L;Lu;0;L;;;;;N;;;;16E74; +16E55;MEDEFAIDRIN CAPITAL LETTER Q;Lu;0;L;;;;;N;;;;16E75; +16E56;MEDEFAIDRIN CAPITAL LETTER HP;Lu;0;L;;;;;N;;;;16E76; +16E57;MEDEFAIDRIN CAPITAL LETTER NY;Lu;0;L;;;;;N;;;;16E77; +16E58;MEDEFAIDRIN CAPITAL LETTER X;Lu;0;L;;;;;N;;;;16E78; +16E59;MEDEFAIDRIN CAPITAL LETTER D;Lu;0;L;;;;;N;;;;16E79; +16E5A;MEDEFAIDRIN CAPITAL LETTER OE;Lu;0;L;;;;;N;;;;16E7A; +16E5B;MEDEFAIDRIN CAPITAL LETTER N;Lu;0;L;;;;;N;;;;16E7B; +16E5C;MEDEFAIDRIN CAPITAL LETTER R;Lu;0;L;;;;;N;;;;16E7C; +16E5D;MEDEFAIDRIN CAPITAL LETTER O;Lu;0;L;;;;;N;;;;16E7D; +16E5E;MEDEFAIDRIN CAPITAL LETTER AI;Lu;0;L;;;;;N;;;;16E7E; +16E5F;MEDEFAIDRIN CAPITAL LETTER Y;Lu;0;L;;;;;N;;;;16E7F; +16E60;MEDEFAIDRIN SMALL LETTER M;Ll;0;L;;;;;N;;;16E40;;16E40 +16E61;MEDEFAIDRIN SMALL LETTER S;Ll;0;L;;;;;N;;;16E41;;16E41 +16E62;MEDEFAIDRIN SMALL LETTER V;Ll;0;L;;;;;N;;;16E42;;16E42 +16E63;MEDEFAIDRIN SMALL LETTER W;Ll;0;L;;;;;N;;;16E43;;16E43 +16E64;MEDEFAIDRIN SMALL LETTER ATIU;Ll;0;L;;;;;N;;;16E44;;16E44 +16E65;MEDEFAIDRIN SMALL LETTER Z;Ll;0;L;;;;;N;;;16E45;;16E45 +16E66;MEDEFAIDRIN SMALL LETTER KP;Ll;0;L;;;;;N;;;16E46;;16E46 +16E67;MEDEFAIDRIN SMALL LETTER P;Ll;0;L;;;;;N;;;16E47;;16E47 +16E68;MEDEFAIDRIN SMALL LETTER T;Ll;0;L;;;;;N;;;16E48;;16E48 +16E69;MEDEFAIDRIN SMALL LETTER G;Ll;0;L;;;;;N;;;16E49;;16E49 +16E6A;MEDEFAIDRIN SMALL LETTER F;Ll;0;L;;;;;N;;;16E4A;;16E4A +16E6B;MEDEFAIDRIN SMALL LETTER I;Ll;0;L;;;;;N;;;16E4B;;16E4B +16E6C;MEDEFAIDRIN SMALL LETTER K;Ll;0;L;;;;;N;;;16E4C;;16E4C +16E6D;MEDEFAIDRIN SMALL LETTER A;Ll;0;L;;;;;N;;;16E4D;;16E4D +16E6E;MEDEFAIDRIN SMALL LETTER J;Ll;0;L;;;;;N;;;16E4E;;16E4E +16E6F;MEDEFAIDRIN SMALL LETTER E;Ll;0;L;;;;;N;;;16E4F;;16E4F +16E70;MEDEFAIDRIN SMALL LETTER B;Ll;0;L;;;;;N;;;16E50;;16E50 +16E71;MEDEFAIDRIN SMALL LETTER C;Ll;0;L;;;;;N;;;16E51;;16E51 +16E72;MEDEFAIDRIN SMALL LETTER U;Ll;0;L;;;;;N;;;16E52;;16E52 +16E73;MEDEFAIDRIN SMALL LETTER YU;Ll;0;L;;;;;N;;;16E53;;16E53 +16E74;MEDEFAIDRIN SMALL LETTER L;Ll;0;L;;;;;N;;;16E54;;16E54 +16E75;MEDEFAIDRIN SMALL LETTER Q;Ll;0;L;;;;;N;;;16E55;;16E55 +16E76;MEDEFAIDRIN SMALL LETTER HP;Ll;0;L;;;;;N;;;16E56;;16E56 +16E77;MEDEFAIDRIN SMALL LETTER NY;Ll;0;L;;;;;N;;;16E57;;16E57 +16E78;MEDEFAIDRIN SMALL LETTER X;Ll;0;L;;;;;N;;;16E58;;16E58 +16E79;MEDEFAIDRIN SMALL LETTER D;Ll;0;L;;;;;N;;;16E59;;16E59 +16E7A;MEDEFAIDRIN SMALL LETTER OE;Ll;0;L;;;;;N;;;16E5A;;16E5A +16E7B;MEDEFAIDRIN SMALL LETTER N;Ll;0;L;;;;;N;;;16E5B;;16E5B +16E7C;MEDEFAIDRIN SMALL LETTER R;Ll;0;L;;;;;N;;;16E5C;;16E5C +16E7D;MEDEFAIDRIN SMALL LETTER O;Ll;0;L;;;;;N;;;16E5D;;16E5D +16E7E;MEDEFAIDRIN SMALL LETTER AI;Ll;0;L;;;;;N;;;16E5E;;16E5E +16E7F;MEDEFAIDRIN SMALL LETTER Y;Ll;0;L;;;;;N;;;16E5F;;16E5F +16E80;MEDEFAIDRIN DIGIT ZERO;No;0;L;;;;0;N;;;;; +16E81;MEDEFAIDRIN DIGIT ONE;No;0;L;;;;1;N;;;;; +16E82;MEDEFAIDRIN DIGIT TWO;No;0;L;;;;2;N;;;;; +16E83;MEDEFAIDRIN DIGIT THREE;No;0;L;;;;3;N;;;;; +16E84;MEDEFAIDRIN DIGIT FOUR;No;0;L;;;;4;N;;;;; +16E85;MEDEFAIDRIN DIGIT FIVE;No;0;L;;;;5;N;;;;; +16E86;MEDEFAIDRIN DIGIT SIX;No;0;L;;;;6;N;;;;; +16E87;MEDEFAIDRIN DIGIT SEVEN;No;0;L;;;;7;N;;;;; +16E88;MEDEFAIDRIN DIGIT EIGHT;No;0;L;;;;8;N;;;;; +16E89;MEDEFAIDRIN DIGIT NINE;No;0;L;;;;9;N;;;;; +16E8A;MEDEFAIDRIN NUMBER TEN;No;0;L;;;;10;N;;;;; +16E8B;MEDEFAIDRIN NUMBER ELEVEN;No;0;L;;;;11;N;;;;; +16E8C;MEDEFAIDRIN NUMBER TWELVE;No;0;L;;;;12;N;;;;; +16E8D;MEDEFAIDRIN NUMBER THIRTEEN;No;0;L;;;;13;N;;;;; +16E8E;MEDEFAIDRIN NUMBER FOURTEEN;No;0;L;;;;14;N;;;;; +16E8F;MEDEFAIDRIN NUMBER FIFTEEN;No;0;L;;;;15;N;;;;; +16E90;MEDEFAIDRIN NUMBER SIXTEEN;No;0;L;;;;16;N;;;;; +16E91;MEDEFAIDRIN NUMBER SEVENTEEN;No;0;L;;;;17;N;;;;; +16E92;MEDEFAIDRIN NUMBER EIGHTEEN;No;0;L;;;;18;N;;;;; +16E93;MEDEFAIDRIN NUMBER NINETEEN;No;0;L;;;;19;N;;;;; +16E94;MEDEFAIDRIN DIGIT ONE ALTERNATE FORM;No;0;L;;;;1;N;;;;; +16E95;MEDEFAIDRIN DIGIT TWO ALTERNATE FORM;No;0;L;;;;2;N;;;;; +16E96;MEDEFAIDRIN DIGIT THREE ALTERNATE FORM;No;0;L;;;;3;N;;;;; +16E97;MEDEFAIDRIN COMMA;Po;0;L;;;;;N;;;;; +16E98;MEDEFAIDRIN FULL STOP;Po;0;L;;;;;N;;;;; +16E99;MEDEFAIDRIN SYMBOL AIVA;Po;0;L;;;;;N;;;;; +16E9A;MEDEFAIDRIN EXCLAMATION OH;Po;0;L;;;;;N;;;;; 16F00;MIAO LETTER PA;Lo;0;L;;;;;N;;;;; 16F01;MIAO LETTER BA;Lo;0;L;;;;;N;;;;; 16F02;MIAO LETTER YI PA;Lo;0;L;;;;;N;;;;; @@ -24355,7 +24849,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 16FE0;TANGUT ITERATION MARK;Lm;0;L;;;;;N;;;;; 16FE1;NUSHU ITERATION MARK;Lm;0;L;;;;;N;;;;; 17000;;Lo;0;L;;;;;N;;;;; -187EC;;Lo;0;L;;;;;N;;;;; +187F1;;Lo;0;L;;;;;N;;;;; 18800;TANGUT COMPONENT-001;Lo;0;L;;;;;N;;;;; 18801;TANGUT COMPONENT-002;Lo;0;L;;;;;N;;;;; 18802;TANGUT COMPONENT-003;Lo;0;L;;;;;N;;;;; @@ -26488,6 +26982,26 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1D243;COMBINING GREEK MUSICAL TETRASEME;Mn;230;NSM;;;;;N;;;;; 1D244;COMBINING GREEK MUSICAL PENTASEME;Mn;230;NSM;;;;;N;;;;; 1D245;GREEK MUSICAL LEIMMA;So;0;ON;;;;;N;;;;; +1D2E0;MAYAN NUMERAL ZERO;No;0;L;;;;0;N;;;;; +1D2E1;MAYAN NUMERAL ONE;No;0;L;;;;1;N;;;;; +1D2E2;MAYAN NUMERAL TWO;No;0;L;;;;2;N;;;;; +1D2E3;MAYAN NUMERAL THREE;No;0;L;;;;3;N;;;;; +1D2E4;MAYAN NUMERAL FOUR;No;0;L;;;;4;N;;;;; +1D2E5;MAYAN NUMERAL FIVE;No;0;L;;;;5;N;;;;; +1D2E6;MAYAN NUMERAL SIX;No;0;L;;;;6;N;;;;; +1D2E7;MAYAN NUMERAL SEVEN;No;0;L;;;;7;N;;;;; +1D2E8;MAYAN NUMERAL EIGHT;No;0;L;;;;8;N;;;;; +1D2E9;MAYAN NUMERAL NINE;No;0;L;;;;9;N;;;;; +1D2EA;MAYAN NUMERAL TEN;No;0;L;;;;10;N;;;;; +1D2EB;MAYAN NUMERAL ELEVEN;No;0;L;;;;11;N;;;;; +1D2EC;MAYAN NUMERAL TWELVE;No;0;L;;;;12;N;;;;; +1D2ED;MAYAN NUMERAL THIRTEEN;No;0;L;;;;13;N;;;;; +1D2EE;MAYAN NUMERAL FOURTEEN;No;0;L;;;;14;N;;;;; +1D2EF;MAYAN NUMERAL FIFTEEN;No;0;L;;;;15;N;;;;; +1D2F0;MAYAN NUMERAL SIXTEEN;No;0;L;;;;16;N;;;;; +1D2F1;MAYAN NUMERAL SEVENTEEN;No;0;L;;;;17;N;;;;; +1D2F2;MAYAN NUMERAL EIGHTEEN;No;0;L;;;;18;N;;;;; +1D2F3;MAYAN NUMERAL NINETEEN;No;0;L;;;;19;N;;;;; 1D300;MONOGRAM FOR EARTH;So;0;ON;;;;;N;;;;; 1D301;DIGRAM FOR HEAVENLY EARTH;So;0;ON;;;;;N;;;;; 1D302;DIGRAM FOR HUMAN EARTH;So;0;ON;;;;;N;;;;; @@ -26593,6 +27107,13 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1D36F;COUNTING ROD TENS DIGIT SEVEN;No;0;L;;;;70;N;;;;; 1D370;COUNTING ROD TENS DIGIT EIGHT;No;0;L;;;;80;N;;;;; 1D371;COUNTING ROD TENS DIGIT NINE;No;0;L;;;;90;N;;;;; +1D372;IDEOGRAPHIC TALLY MARK ONE;No;0;L;;;;1;N;;;;; +1D373;IDEOGRAPHIC TALLY MARK TWO;No;0;L;;;;2;N;;;;; +1D374;IDEOGRAPHIC TALLY MARK THREE;No;0;L;;;;3;N;;;;; +1D375;IDEOGRAPHIC TALLY MARK FOUR;No;0;L;;;;4;N;;;;; +1D376;IDEOGRAPHIC TALLY MARK FIVE;No;0;L;;;;5;N;;;;; +1D377;TALLY MARK ONE;No;0;L;;;;1;N;;;;; +1D378;TALLY MARK FIVE;No;0;L;;;;5;N;;;;; 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041;;;;N;;;;; 1D401;MATHEMATICAL BOLD CAPITAL B;Lu;0;L; 0042;;;;N;;;;; 1D402;MATHEMATICAL BOLD CAPITAL C;Lu;0;L; 0043;;;;N;;;;; @@ -28599,6 +29120,74 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1E959;ADLAM DIGIT NINE;Nd;0;R;;9;9;9;N;;;;; 1E95E;ADLAM INITIAL EXCLAMATION MARK;Po;0;R;;;;;N;;;;; 1E95F;ADLAM INITIAL QUESTION MARK;Po;0;R;;;;;N;;;;; +1EC71;INDIC SIYAQ NUMBER ONE;No;0;AL;;;;1;N;;;;; +1EC72;INDIC SIYAQ NUMBER TWO;No;0;AL;;;;2;N;;;;; +1EC73;INDIC SIYAQ NUMBER THREE;No;0;AL;;;;3;N;;;;; +1EC74;INDIC SIYAQ NUMBER FOUR;No;0;AL;;;;4;N;;;;; +1EC75;INDIC SIYAQ NUMBER FIVE;No;0;AL;;;;5;N;;;;; +1EC76;INDIC SIYAQ NUMBER SIX;No;0;AL;;;;6;N;;;;; +1EC77;INDIC SIYAQ NUMBER SEVEN;No;0;AL;;;;7;N;;;;; +1EC78;INDIC SIYAQ NUMBER EIGHT;No;0;AL;;;;8;N;;;;; +1EC79;INDIC SIYAQ NUMBER NINE;No;0;AL;;;;9;N;;;;; +1EC7A;INDIC SIYAQ NUMBER TEN;No;0;AL;;;;10;N;;;;; +1EC7B;INDIC SIYAQ NUMBER TWENTY;No;0;AL;;;;20;N;;;;; +1EC7C;INDIC SIYAQ NUMBER THIRTY;No;0;AL;;;;30;N;;;;; +1EC7D;INDIC SIYAQ NUMBER FORTY;No;0;AL;;;;40;N;;;;; +1EC7E;INDIC SIYAQ NUMBER FIFTY;No;0;AL;;;;50;N;;;;; +1EC7F;INDIC SIYAQ NUMBER SIXTY;No;0;AL;;;;60;N;;;;; +1EC80;INDIC SIYAQ NUMBER SEVENTY;No;0;AL;;;;70;N;;;;; +1EC81;INDIC SIYAQ NUMBER EIGHTY;No;0;AL;;;;80;N;;;;; +1EC82;INDIC SIYAQ NUMBER NINETY;No;0;AL;;;;90;N;;;;; +1EC83;INDIC SIYAQ NUMBER ONE HUNDRED;No;0;AL;;;;100;N;;;;; +1EC84;INDIC SIYAQ NUMBER TWO HUNDRED;No;0;AL;;;;200;N;;;;; +1EC85;INDIC SIYAQ NUMBER THREE HUNDRED;No;0;AL;;;;300;N;;;;; +1EC86;INDIC SIYAQ NUMBER FOUR HUNDRED;No;0;AL;;;;400;N;;;;; +1EC87;INDIC SIYAQ NUMBER FIVE HUNDRED;No;0;AL;;;;500;N;;;;; +1EC88;INDIC SIYAQ NUMBER SIX HUNDRED;No;0;AL;;;;600;N;;;;; +1EC89;INDIC SIYAQ NUMBER SEVEN HUNDRED;No;0;AL;;;;700;N;;;;; +1EC8A;INDIC SIYAQ NUMBER EIGHT HUNDRED;No;0;AL;;;;800;N;;;;; +1EC8B;INDIC SIYAQ NUMBER NINE HUNDRED;No;0;AL;;;;900;N;;;;; +1EC8C;INDIC SIYAQ NUMBER ONE THOUSAND;No;0;AL;;;;1000;N;;;;; +1EC8D;INDIC SIYAQ NUMBER TWO THOUSAND;No;0;AL;;;;2000;N;;;;; +1EC8E;INDIC SIYAQ NUMBER THREE THOUSAND;No;0;AL;;;;3000;N;;;;; +1EC8F;INDIC SIYAQ NUMBER FOUR THOUSAND;No;0;AL;;;;4000;N;;;;; +1EC90;INDIC SIYAQ NUMBER FIVE THOUSAND;No;0;AL;;;;5000;N;;;;; +1EC91;INDIC SIYAQ NUMBER SIX THOUSAND;No;0;AL;;;;6000;N;;;;; +1EC92;INDIC SIYAQ NUMBER SEVEN THOUSAND;No;0;AL;;;;7000;N;;;;; +1EC93;INDIC SIYAQ NUMBER EIGHT THOUSAND;No;0;AL;;;;8000;N;;;;; +1EC94;INDIC SIYAQ NUMBER NINE THOUSAND;No;0;AL;;;;9000;N;;;;; +1EC95;INDIC SIYAQ NUMBER TEN THOUSAND;No;0;AL;;;;10000;N;;;;; +1EC96;INDIC SIYAQ NUMBER TWENTY THOUSAND;No;0;AL;;;;20000;N;;;;; +1EC97;INDIC SIYAQ NUMBER THIRTY THOUSAND;No;0;AL;;;;30000;N;;;;; +1EC98;INDIC SIYAQ NUMBER FORTY THOUSAND;No;0;AL;;;;40000;N;;;;; +1EC99;INDIC SIYAQ NUMBER FIFTY THOUSAND;No;0;AL;;;;50000;N;;;;; +1EC9A;INDIC SIYAQ NUMBER SIXTY THOUSAND;No;0;AL;;;;60000;N;;;;; +1EC9B;INDIC SIYAQ NUMBER SEVENTY THOUSAND;No;0;AL;;;;70000;N;;;;; +1EC9C;INDIC SIYAQ NUMBER EIGHTY THOUSAND;No;0;AL;;;;80000;N;;;;; +1EC9D;INDIC SIYAQ NUMBER NINETY THOUSAND;No;0;AL;;;;90000;N;;;;; +1EC9E;INDIC SIYAQ NUMBER LAKH;No;0;AL;;;;100000;N;;;;; +1EC9F;INDIC SIYAQ NUMBER LAKHAN;No;0;AL;;;;200000;N;;;;; +1ECA0;INDIC SIYAQ LAKH MARK;No;0;AL;;;;100000;N;;;;; +1ECA1;INDIC SIYAQ NUMBER KAROR;No;0;AL;;;;10000000;N;;;;; +1ECA2;INDIC SIYAQ NUMBER KARORAN;No;0;AL;;;;20000000;N;;;;; +1ECA3;INDIC SIYAQ NUMBER PREFIXED ONE;No;0;AL;;;;1;N;;;;; +1ECA4;INDIC SIYAQ NUMBER PREFIXED TWO;No;0;AL;;;;2;N;;;;; +1ECA5;INDIC SIYAQ NUMBER PREFIXED THREE;No;0;AL;;;;3;N;;;;; +1ECA6;INDIC SIYAQ NUMBER PREFIXED FOUR;No;0;AL;;;;4;N;;;;; +1ECA7;INDIC SIYAQ NUMBER PREFIXED FIVE;No;0;AL;;;;5;N;;;;; +1ECA8;INDIC SIYAQ NUMBER PREFIXED SIX;No;0;AL;;;;6;N;;;;; +1ECA9;INDIC SIYAQ NUMBER PREFIXED SEVEN;No;0;AL;;;;7;N;;;;; +1ECAA;INDIC SIYAQ NUMBER PREFIXED EIGHT;No;0;AL;;;;8;N;;;;; +1ECAB;INDIC SIYAQ NUMBER PREFIXED NINE;No;0;AL;;;;9;N;;;;; +1ECAC;INDIC SIYAQ PLACEHOLDER;So;0;AL;;;;;N;;;;; +1ECAD;INDIC SIYAQ FRACTION ONE QUARTER;No;0;AL;;;;1/4;N;;;;; +1ECAE;INDIC SIYAQ FRACTION ONE HALF;No;0;AL;;;;1/2;N;;;;; +1ECAF;INDIC SIYAQ FRACTION THREE QUARTERS;No;0;AL;;;;3/4;N;;;;; +1ECB0;INDIC SIYAQ RUPEE MARK;Sc;0;AL;;;;;N;;;;; +1ECB1;INDIC SIYAQ NUMBER ALTERNATE ONE;No;0;AL;;;;1;N;;;;; +1ECB2;INDIC SIYAQ NUMBER ALTERNATE TWO;No;0;AL;;;;2;N;;;;; +1ECB3;INDIC SIYAQ NUMBER ALTERNATE TEN THOUSAND;No;0;AL;;;;10000;N;;;;; +1ECB4;INDIC SIYAQ ALTERNATE LAKH MARK;No;0;AL;;;;100000;N;;;;; 1EE00;ARABIC MATHEMATICAL ALEF;Lo;0;AL; 0627;;;;N;;;;; 1EE01;ARABIC MATHEMATICAL BEH;Lo;0;AL; 0628;;;;N;;;;; 1EE02;ARABIC MATHEMATICAL JEEM;Lo;0;AL; 062C;;;;N;;;;; @@ -29012,6 +29601,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F12C;CIRCLED ITALIC LATIN CAPITAL LETTER R;So;0;L; 0052;;;;N;;;;; 1F12D;CIRCLED CD;So;0;L; 0043 0044;;;;N;;;;; 1F12E;CIRCLED WZ;So;0;L; 0057 005A;;;;N;;;;; +1F12F;COPYLEFT SYMBOL;So;0;ON;;;;;N;;;;; 1F130;SQUARED LATIN CAPITAL LETTER A;So;0;L; 0041;;;;N;;;;; 1F131;SQUARED LATIN CAPITAL LETTER B;So;0;L; 0042;;;;N;;;;; 1F132;SQUARED LATIN CAPITAL LETTER C;So;0;L; 0043;;;;N;;;;; @@ -30226,6 +30816,7 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F6F6;CANOE;So;0;ON;;;;;N;;;;; 1F6F7;SLED;So;0;ON;;;;;N;;;;; 1F6F8;FLYING SAUCER;So;0;ON;;;;;N;;;;; +1F6F9;SKATEBOARD;So;0;ON;;;;;N;;;;; 1F700;ALCHEMICAL SYMBOL FOR QUINTESSENCE;So;0;ON;;;;;N;;;;; 1F701;ALCHEMICAL SYMBOL FOR AIR;So;0;ON;;;;;N;;;;; 1F702;ALCHEMICAL SYMBOL FOR FIRE;So;0;ON;;;;;N;;;;; @@ -30427,6 +31018,10 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F7D2;LIGHT TWELVE POINTED BLACK STAR;So;0;ON;;;;;N;;;;; 1F7D3;HEAVY TWELVE POINTED BLACK STAR;So;0;ON;;;;;N;;;;; 1F7D4;HEAVY TWELVE POINTED PINWHEEL STAR;So;0;ON;;;;;N;;;;; +1F7D5;CIRCLED TRIANGLE;So;0;ON;;;;;N;;;;; +1F7D6;NEGATIVE CIRCLED TRIANGLE;So;0;ON;;;;;N;;;;; +1F7D7;CIRCLED SQUARE;So;0;ON;;;;;N;;;;; +1F7D8;NEGATIVE CIRCLED SQUARE;So;0;ON;;;;;N;;;;; 1F800;LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD;So;0;ON;;;;;N;;;;; 1F801;UPWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD;So;0;ON;;;;;N;;;;; 1F802;RIGHTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD;So;0;ON;;;;;N;;;;; @@ -30647,6 +31242,9 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F94A;BOXING GLOVE;So;0;ON;;;;;N;;;;; 1F94B;MARTIAL ARTS UNIFORM;So;0;ON;;;;;N;;;;; 1F94C;CURLING STONE;So;0;ON;;;;;N;;;;; +1F94D;LACROSSE STICK AND BALL;So;0;ON;;;;;N;;;;; +1F94E;SOFTBALL;So;0;ON;;;;;N;;;;; +1F94F;FLYING DISC;So;0;ON;;;;;N;;;;; 1F950;CROISSANT;So;0;ON;;;;;N;;;;; 1F951;AVOCADO;So;0;ON;;;;;N;;;;; 1F952;CUCUMBER;So;0;ON;;;;;N;;;;; @@ -30675,6 +31273,20 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F969;CUT OF MEAT;So;0;ON;;;;;N;;;;; 1F96A;SANDWICH;So;0;ON;;;;;N;;;;; 1F96B;CANNED FOOD;So;0;ON;;;;;N;;;;; +1F96C;LEAFY GREEN;So;0;ON;;;;;N;;;;; +1F96D;MANGO;So;0;ON;;;;;N;;;;; +1F96E;MOON CAKE;So;0;ON;;;;;N;;;;; +1F96F;BAGEL;So;0;ON;;;;;N;;;;; +1F970;SMILING FACE WITH SMILING EYES AND THREE HEARTS;So;0;ON;;;;;N;;;;; +1F973;FACE WITH PARTY HORN AND PARTY HAT;So;0;ON;;;;;N;;;;; +1F974;FACE WITH UNEVEN EYES AND WAVY MOUTH;So;0;ON;;;;;N;;;;; +1F975;OVERHEATED FACE;So;0;ON;;;;;N;;;;; +1F976;FREEZING FACE;So;0;ON;;;;;N;;;;; +1F97A;FACE WITH PLEADING EYES;So;0;ON;;;;;N;;;;; +1F97C;LAB COAT;So;0;ON;;;;;N;;;;; +1F97D;GOGGLES;So;0;ON;;;;;N;;;;; +1F97E;HIKING BOOT;So;0;ON;;;;;N;;;;; +1F97F;FLAT SHOE;So;0;ON;;;;;N;;;;; 1F980;CRAB;So;0;ON;;;;;N;;;;; 1F981;LION FACE;So;0;ON;;;;;N;;;;; 1F982;SCORPION;So;0;ON;;;;;N;;;;; @@ -30699,7 +31311,30 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F995;SAUROPOD;So;0;ON;;;;;N;;;;; 1F996;T-REX;So;0;ON;;;;;N;;;;; 1F997;CRICKET;So;0;ON;;;;;N;;;;; +1F998;KANGAROO;So;0;ON;;;;;N;;;;; +1F999;LLAMA;So;0;ON;;;;;N;;;;; +1F99A;PEACOCK;So;0;ON;;;;;N;;;;; +1F99B;HIPPOPOTAMUS;So;0;ON;;;;;N;;;;; +1F99C;PARROT;So;0;ON;;;;;N;;;;; +1F99D;RACCOON;So;0;ON;;;;;N;;;;; +1F99E;LOBSTER;So;0;ON;;;;;N;;;;; +1F99F;MOSQUITO;So;0;ON;;;;;N;;;;; +1F9A0;MICROBE;So;0;ON;;;;;N;;;;; +1F9A1;BADGER;So;0;ON;;;;;N;;;;; +1F9A2;SWAN;So;0;ON;;;;;N;;;;; +1F9B0;EMOJI COMPONENT RED HAIR;So;0;ON;;;;;N;;;;; +1F9B1;EMOJI COMPONENT CURLY HAIR;So;0;ON;;;;;N;;;;; +1F9B2;EMOJI COMPONENT BALD;So;0;ON;;;;;N;;;;; +1F9B3;EMOJI COMPONENT WHITE HAIR;So;0;ON;;;;;N;;;;; +1F9B4;BONE;So;0;ON;;;;;N;;;;; +1F9B5;LEG;So;0;ON;;;;;N;;;;; +1F9B6;FOOT;So;0;ON;;;;;N;;;;; +1F9B7;TOOTH;So;0;ON;;;;;N;;;;; +1F9B8;SUPERHERO;So;0;ON;;;;;N;;;;; +1F9B9;SUPERVILLAIN;So;0;ON;;;;;N;;;;; 1F9C0;CHEESE WEDGE;So;0;ON;;;;;N;;;;; +1F9C1;CUPCAKE;So;0;ON;;;;;N;;;;; +1F9C2;SALT SHAKER;So;0;ON;;;;;N;;;;; 1F9D0;FACE WITH MONOCLE;So;0;ON;;;;;N;;;;; 1F9D1;ADULT;So;0;ON;;;;;N;;;;; 1F9D2;CHILD;So;0;ON;;;;;N;;;;; @@ -30723,6 +31358,45 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; 1F9E4;GLOVES;So;0;ON;;;;;N;;;;; 1F9E5;COAT;So;0;ON;;;;;N;;;;; 1F9E6;SOCKS;So;0;ON;;;;;N;;;;; +1F9E7;RED GIFT ENVELOPE;So;0;ON;;;;;N;;;;; +1F9E8;FIRECRACKER;So;0;ON;;;;;N;;;;; +1F9E9;JIGSAW PUZZLE PIECE;So;0;ON;;;;;N;;;;; +1F9EA;TEST TUBE;So;0;ON;;;;;N;;;;; +1F9EB;PETRI DISH;So;0;ON;;;;;N;;;;; +1F9EC;DNA DOUBLE HELIX;So;0;ON;;;;;N;;;;; +1F9ED;COMPASS;So;0;ON;;;;;N;;;;; +1F9EE;ABACUS;So;0;ON;;;;;N;;;;; +1F9EF;FIRE EXTINGUISHER;So;0;ON;;;;;N;;;;; +1F9F0;TOOLBOX;So;0;ON;;;;;N;;;;; +1F9F1;BRICK;So;0;ON;;;;;N;;;;; +1F9F2;MAGNET;So;0;ON;;;;;N;;;;; +1F9F3;LUGGAGE;So;0;ON;;;;;N;;;;; +1F9F4;LOTION BOTTLE;So;0;ON;;;;;N;;;;; +1F9F5;SPOOL OF THREAD;So;0;ON;;;;;N;;;;; +1F9F6;BALL OF YARN;So;0;ON;;;;;N;;;;; +1F9F7;SAFETY PIN;So;0;ON;;;;;N;;;;; +1F9F8;TEDDY BEAR;So;0;ON;;;;;N;;;;; +1F9F9;BROOM;So;0;ON;;;;;N;;;;; +1F9FA;BASKET;So;0;ON;;;;;N;;;;; +1F9FB;ROLL OF PAPER;So;0;ON;;;;;N;;;;; +1F9FC;BAR OF SOAP;So;0;ON;;;;;N;;;;; +1F9FD;SPONGE;So;0;ON;;;;;N;;;;; +1F9FE;RECEIPT;So;0;ON;;;;;N;;;;; +1F9FF;NAZAR AMULET;So;0;ON;;;;;N;;;;; +1FA60;XIANGQI RED GENERAL;So;0;ON;;;;;N;;;;; +1FA61;XIANGQI RED MANDARIN;So;0;ON;;;;;N;;;;; +1FA62;XIANGQI RED ELEPHANT;So;0;ON;;;;;N;;;;; +1FA63;XIANGQI RED HORSE;So;0;ON;;;;;N;;;;; +1FA64;XIANGQI RED CHARIOT;So;0;ON;;;;;N;;;;; +1FA65;XIANGQI RED CANNON;So;0;ON;;;;;N;;;;; +1FA66;XIANGQI RED SOLDIER;So;0;ON;;;;;N;;;;; +1FA67;XIANGQI BLACK GENERAL;So;0;ON;;;;;N;;;;; +1FA68;XIANGQI BLACK MANDARIN;So;0;ON;;;;;N;;;;; +1FA69;XIANGQI BLACK ELEPHANT;So;0;ON;;;;;N;;;;; +1FA6A;XIANGQI BLACK HORSE;So;0;ON;;;;;N;;;;; +1FA6B;XIANGQI BLACK CHARIOT;So;0;ON;;;;;N;;;;; +1FA6C;XIANGQI BLACK CANNON;So;0;ON;;;;;N;;;;; +1FA6D;XIANGQI BLACK SOLDIER;So;0;ON;;;;;N;;;;; 20000;;Lo;0;L;;;;;N;;;;; 2A6D6;;Lo;0;L;;;;;N;;;;; 2A700;;Lo;0;L;;;;;N;;;;; diff --git a/maint/Unicode.tables/emoji-data.txt b/maint/Unicode.tables/emoji-data.txt new file mode 100644 index 0000000..6e66455 --- /dev/null +++ b/maint/Unicode.tables/emoji-data.txt @@ -0,0 +1,714 @@ +# emoji-data.txt +# Date: 2018-02-07, 07:55:18 GMT +# © 2018 Unicode®, Inc. +# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. +# For terms of use, see http://www.unicode.org/terms_of_use.html +# +# Emoji Data for UTS #51 +# Version: 11.0 +# +# For documentation and usage, see http://www.unicode.org/reports/tr51 +# +# Format: +# ; # +# Note: there is no guarantee as to the structure of whitespace or comments +# +# Characters and sequences are listed in code point order. Users should be shown a more natural order. +# See the CLDR collation order for Emoji. + + +# ================================================ + +# All omitted code points have Emoji=No +# @missing: 0000..10FFFF ; Emoji ; No + +0023 ; Emoji # 1.1 [1] (#️) number sign +002A ; Emoji # 1.1 [1] (*️) asterisk +0030..0039 ; Emoji # 1.1 [10] (0️..9️) digit zero..digit nine +00A9 ; Emoji # 1.1 [1] (©️) copyright +00AE ; Emoji # 1.1 [1] (®️) registered +203C ; Emoji # 1.1 [1] (‼️) double exclamation mark +2049 ; Emoji # 3.0 [1] (⁉️) exclamation question mark +2122 ; Emoji # 1.1 [1] (™️) trade mark +2139 ; Emoji # 3.0 [1] (ℹ️) information +2194..2199 ; Emoji # 1.1 [6] (↔️..↙️) left-right arrow..down-left arrow +21A9..21AA ; Emoji # 1.1 [2] (↩️..↪️) right arrow curving left..left arrow curving right +231A..231B ; Emoji # 1.1 [2] (⌚..⌛) watch..hourglass done +2328 ; Emoji # 1.1 [1] (⌨️) keyboard +23CF ; Emoji # 4.0 [1] (⏏️) eject button +23E9..23F3 ; Emoji # 6.0 [11] (⏩..⏳) fast-forward button..hourglass not done +23F8..23FA ; Emoji # 7.0 [3] (⏸️..⏺️) pause button..record button +24C2 ; Emoji # 1.1 [1] (Ⓜ️) circled M +25AA..25AB ; Emoji # 1.1 [2] (▪️..▫️) black small square..white small square +25B6 ; Emoji # 1.1 [1] (▶️) play button +25C0 ; Emoji # 1.1 [1] (◀️) reverse button +25FB..25FE ; Emoji # 3.2 [4] (◻️..◾) white medium square..black medium-small square +2600..2604 ; Emoji # 1.1 [5] (☀️..☄️) sun..comet +260E ; Emoji # 1.1 [1] (☎️) telephone +2611 ; Emoji # 1.1 [1] (☑️) ballot box with check +2614..2615 ; Emoji # 4.0 [2] (☔..☕) umbrella with rain drops..hot beverage +2618 ; Emoji # 4.1 [1] (☘️) shamrock +261D ; Emoji # 1.1 [1] (☝️) index pointing up +2620 ; Emoji # 1.1 [1] (☠️) skull and crossbones +2622..2623 ; Emoji # 1.1 [2] (☢️..☣️) radioactive..biohazard +2626 ; Emoji # 1.1 [1] (☦️) orthodox cross +262A ; Emoji # 1.1 [1] (☪️) star and crescent +262E..262F ; Emoji # 1.1 [2] (☮️..☯️) peace symbol..yin yang +2638..263A ; Emoji # 1.1 [3] (☸️..☺️) wheel of dharma..smiling face +2640 ; Emoji # 1.1 [1] (♀️) female sign +2642 ; Emoji # 1.1 [1] (♂️) male sign +2648..2653 ; Emoji # 1.1 [12] (♈..♓) Aries..Pisces +265F..2660 ; Emoji # 1.1 [2] (♟️..♠️) chess pawn..spade suit +2663 ; Emoji # 1.1 [1] (♣️) club suit +2665..2666 ; Emoji # 1.1 [2] (♥️..♦️) heart suit..diamond suit +2668 ; Emoji # 1.1 [1] (♨️) hot springs +267B ; Emoji # 3.2 [1] (♻️) recycling symbol +267E..267F ; Emoji # 4.1 [2] (♾️..♿) infinity..wheelchair symbol +2692..2697 ; Emoji # 4.1 [6] (⚒️..⚗️) hammer and pick..alembic +2699 ; Emoji # 4.1 [1] (⚙️) gear +269B..269C ; Emoji # 4.1 [2] (⚛️..⚜️) atom symbol..fleur-de-lis +26A0..26A1 ; Emoji # 4.0 [2] (⚠️..⚡) warning..high voltage +26AA..26AB ; Emoji # 4.1 [2] (⚪..⚫) white circle..black circle +26B0..26B1 ; Emoji # 4.1 [2] (⚰️..⚱️) coffin..funeral urn +26BD..26BE ; Emoji # 5.2 [2] (⚽..⚾) soccer ball..baseball +26C4..26C5 ; Emoji # 5.2 [2] (⛄..⛅) snowman without snow..sun behind cloud +26C8 ; Emoji # 5.2 [1] (⛈️) cloud with lightning and rain +26CE ; Emoji # 6.0 [1] (⛎) Ophiuchus +26CF ; Emoji # 5.2 [1] (⛏️) pick +26D1 ; Emoji # 5.2 [1] (⛑️) rescue worker’s helmet +26D3..26D4 ; Emoji # 5.2 [2] (⛓️..⛔) chains..no entry +26E9..26EA ; Emoji # 5.2 [2] (⛩️..⛪) shinto shrine..church +26F0..26F5 ; Emoji # 5.2 [6] (⛰️..⛵) mountain..sailboat +26F7..26FA ; Emoji # 5.2 [4] (⛷️..⛺) skier..tent +26FD ; Emoji # 5.2 [1] (⛽) fuel pump +2702 ; Emoji # 1.1 [1] (✂️) scissors +2705 ; Emoji # 6.0 [1] (✅) white heavy check mark +2708..2709 ; Emoji # 1.1 [2] (✈️..✉️) airplane..envelope +270A..270B ; Emoji # 6.0 [2] (✊..✋) raised fist..raised hand +270C..270D ; Emoji # 1.1 [2] (✌️..✍️) victory hand..writing hand +270F ; Emoji # 1.1 [1] (✏️) pencil +2712 ; Emoji # 1.1 [1] (✒️) black nib +2714 ; Emoji # 1.1 [1] (✔️) heavy check mark +2716 ; Emoji # 1.1 [1] (✖️) heavy multiplication x +271D ; Emoji # 1.1 [1] (✝️) latin cross +2721 ; Emoji # 1.1 [1] (✡️) star of David +2728 ; Emoji # 6.0 [1] (✨) sparkles +2733..2734 ; Emoji # 1.1 [2] (✳️..✴️) eight-spoked asterisk..eight-pointed star +2744 ; Emoji # 1.1 [1] (❄️) snowflake +2747 ; Emoji # 1.1 [1] (❇️) sparkle +274C ; Emoji # 6.0 [1] (❌) cross mark +274E ; Emoji # 6.0 [1] (❎) cross mark button +2753..2755 ; Emoji # 6.0 [3] (❓..❕) question mark..white exclamation mark +2757 ; Emoji # 5.2 [1] (❗) exclamation mark +2763..2764 ; Emoji # 1.1 [2] (❣️..❤️) heavy heart exclamation..red heart +2795..2797 ; Emoji # 6.0 [3] (➕..➗) heavy plus sign..heavy division sign +27A1 ; Emoji # 1.1 [1] (➡️) right arrow +27B0 ; Emoji # 6.0 [1] (➰) curly loop +27BF ; Emoji # 6.0 [1] (➿) double curly loop +2934..2935 ; Emoji # 3.2 [2] (⤴️..⤵️) right arrow curving up..right arrow curving down +2B05..2B07 ; Emoji # 4.0 [3] (⬅️..⬇️) left arrow..down arrow +2B1B..2B1C ; Emoji # 5.1 [2] (⬛..⬜) black large square..white large square +2B50 ; Emoji # 5.1 [1] (⭐) star +2B55 ; Emoji # 5.2 [1] (⭕) heavy large circle +3030 ; Emoji # 1.1 [1] (〰️) wavy dash +303D ; Emoji # 3.2 [1] (〽️) part alternation mark +3297 ; Emoji # 1.1 [1] (㊗️) Japanese “congratulations” button +3299 ; Emoji # 1.1 [1] (㊙️) Japanese “secret” button +1F004 ; Emoji # 5.1 [1] (🀄) mahjong red dragon +1F0CF ; Emoji # 6.0 [1] (🃏) joker +1F170..1F171 ; Emoji # 6.0 [2] (🅰️..🅱️) A button (blood type)..B button (blood type) +1F17E ; Emoji # 6.0 [1] (🅾️) O button (blood type) +1F17F ; Emoji # 5.2 [1] (🅿️) P button +1F18E ; Emoji # 6.0 [1] (🆎) AB button (blood type) +1F191..1F19A ; Emoji # 6.0 [10] (🆑..🆚) CL button..VS button +1F1E6..1F1FF ; Emoji # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z +1F201..1F202 ; Emoji # 6.0 [2] (🈁..🈂️) Japanese “here” button..Japanese “service charge” button +1F21A ; Emoji # 5.2 [1] (🈚) Japanese “free of charge” button +1F22F ; Emoji # 5.2 [1] (🈯) Japanese “reserved” button +1F232..1F23A ; Emoji # 6.0 [9] (🈲..🈺) Japanese “prohibited” button..Japanese “open for business” button +1F250..1F251 ; Emoji # 6.0 [2] (🉐..🉑) Japanese “bargain” button..Japanese “acceptable” button +1F300..1F320 ; Emoji # 6.0 [33] (🌀..🌠) cyclone..shooting star +1F321 ; Emoji # 7.0 [1] (🌡️) thermometer +1F324..1F32C ; Emoji # 7.0 [9] (🌤️..🌬️) sun behind small cloud..wind face +1F32D..1F32F ; Emoji # 8.0 [3] (🌭..🌯) hot dog..burrito +1F330..1F335 ; Emoji # 6.0 [6] (🌰..🌵) chestnut..cactus +1F336 ; Emoji # 7.0 [1] (🌶️) hot pepper +1F337..1F37C ; Emoji # 6.0 [70] (🌷..🍼) tulip..baby bottle +1F37D ; Emoji # 7.0 [1] (🍽️) fork and knife with plate +1F37E..1F37F ; Emoji # 8.0 [2] (🍾..🍿) bottle with popping cork..popcorn +1F380..1F393 ; Emoji # 6.0 [20] (🎀..🎓) ribbon..graduation cap +1F396..1F397 ; Emoji # 7.0 [2] (🎖️..🎗️) military medal..reminder ribbon +1F399..1F39B ; Emoji # 7.0 [3] (🎙️..🎛️) studio microphone..control knobs +1F39E..1F39F ; Emoji # 7.0 [2] (🎞️..🎟️) film frames..admission tickets +1F3A0..1F3C4 ; Emoji # 6.0 [37] (🎠..🏄) carousel horse..person surfing +1F3C5 ; Emoji # 7.0 [1] (🏅) sports medal +1F3C6..1F3CA ; Emoji # 6.0 [5] (🏆..🏊) trophy..person swimming +1F3CB..1F3CE ; Emoji # 7.0 [4] (🏋️..🏎️) person lifting weights..racing car +1F3CF..1F3D3 ; Emoji # 8.0 [5] (🏏..🏓) cricket game..ping pong +1F3D4..1F3DF ; Emoji # 7.0 [12] (🏔️..🏟️) snow-capped mountain..stadium +1F3E0..1F3F0 ; Emoji # 6.0 [17] (🏠..🏰) house..castle +1F3F3..1F3F5 ; Emoji # 7.0 [3] (🏳️..🏵️) white flag..rosette +1F3F7 ; Emoji # 7.0 [1] (🏷️) label +1F3F8..1F3FF ; Emoji # 8.0 [8] (🏸..🏿) badminton..dark skin tone +1F400..1F43E ; Emoji # 6.0 [63] (🐀..🐾) rat..paw prints +1F43F ; Emoji # 7.0 [1] (🐿️) chipmunk +1F440 ; Emoji # 6.0 [1] (👀) eyes +1F441 ; Emoji # 7.0 [1] (👁️) eye +1F442..1F4F7 ; Emoji # 6.0[182] (👂..📷) ear..camera +1F4F8 ; Emoji # 7.0 [1] (📸) camera with flash +1F4F9..1F4FC ; Emoji # 6.0 [4] (📹..📼) video camera..videocassette +1F4FD ; Emoji # 7.0 [1] (📽️) film projector +1F4FF ; Emoji # 8.0 [1] (📿) prayer beads +1F500..1F53D ; Emoji # 6.0 [62] (🔀..🔽) shuffle tracks button..downwards button +1F549..1F54A ; Emoji # 7.0 [2] (🕉️..🕊️) om..dove +1F54B..1F54E ; Emoji # 8.0 [4] (🕋..🕎) kaaba..menorah +1F550..1F567 ; Emoji # 6.0 [24] (🕐..🕧) one o’clock..twelve-thirty +1F56F..1F570 ; Emoji # 7.0 [2] (🕯️..🕰️) candle..mantelpiece clock +1F573..1F579 ; Emoji # 7.0 [7] (🕳️..🕹️) hole..joystick +1F57A ; Emoji # 9.0 [1] (🕺) man dancing +1F587 ; Emoji # 7.0 [1] (🖇️) linked paperclips +1F58A..1F58D ; Emoji # 7.0 [4] (🖊️..🖍️) pen..crayon +1F590 ; Emoji # 7.0 [1] (🖐️) hand with fingers splayed +1F595..1F596 ; Emoji # 7.0 [2] (🖕..🖖) middle finger..vulcan salute +1F5A4 ; Emoji # 9.0 [1] (🖤) black heart +1F5A5 ; Emoji # 7.0 [1] (🖥️) desktop computer +1F5A8 ; Emoji # 7.0 [1] (🖨️) printer +1F5B1..1F5B2 ; Emoji # 7.0 [2] (🖱️..🖲️) computer mouse..trackball +1F5BC ; Emoji # 7.0 [1] (🖼️) framed picture +1F5C2..1F5C4 ; Emoji # 7.0 [3] (🗂️..🗄️) card index dividers..file cabinet +1F5D1..1F5D3 ; Emoji # 7.0 [3] (🗑️..🗓️) wastebasket..spiral calendar +1F5DC..1F5DE ; Emoji # 7.0 [3] (🗜️..🗞️) clamp..rolled-up newspaper +1F5E1 ; Emoji # 7.0 [1] (🗡️) dagger +1F5E3 ; Emoji # 7.0 [1] (🗣️) speaking head +1F5E8 ; Emoji # 7.0 [1] (🗨️) left speech bubble +1F5EF ; Emoji # 7.0 [1] (🗯️) right anger bubble +1F5F3 ; Emoji # 7.0 [1] (🗳️) ballot box with ballot +1F5FA ; Emoji # 7.0 [1] (🗺️) world map +1F5FB..1F5FF ; Emoji # 6.0 [5] (🗻..🗿) mount fuji..moai +1F600 ; Emoji # 6.1 [1] (😀) grinning face +1F601..1F610 ; Emoji # 6.0 [16] (😁..😐) beaming face with smiling eyes..neutral face +1F611 ; Emoji # 6.1 [1] (😑) expressionless face +1F612..1F614 ; Emoji # 6.0 [3] (😒..😔) unamused face..pensive face +1F615 ; Emoji # 6.1 [1] (😕) confused face +1F616 ; Emoji # 6.0 [1] (😖) confounded face +1F617 ; Emoji # 6.1 [1] (😗) kissing face +1F618 ; Emoji # 6.0 [1] (😘) face blowing a kiss +1F619 ; Emoji # 6.1 [1] (😙) kissing face with smiling eyes +1F61A ; Emoji # 6.0 [1] (😚) kissing face with closed eyes +1F61B ; Emoji # 6.1 [1] (😛) face with tongue +1F61C..1F61E ; Emoji # 6.0 [3] (😜..😞) winking face with tongue..disappointed face +1F61F ; Emoji # 6.1 [1] (😟) worried face +1F620..1F625 ; Emoji # 6.0 [6] (😠..😥) angry face..sad but relieved face +1F626..1F627 ; Emoji # 6.1 [2] (😦..😧) frowning face with open mouth..anguished face +1F628..1F62B ; Emoji # 6.0 [4] (😨..😫) fearful face..tired face +1F62C ; Emoji # 6.1 [1] (😬) grimacing face +1F62D ; Emoji # 6.0 [1] (😭) loudly crying face +1F62E..1F62F ; Emoji # 6.1 [2] (😮..😯) face with open mouth..hushed face +1F630..1F633 ; Emoji # 6.0 [4] (😰..😳) anxious face with sweat..flushed face +1F634 ; Emoji # 6.1 [1] (😴) sleeping face +1F635..1F640 ; Emoji # 6.0 [12] (😵..🙀) dizzy face..weary cat face +1F641..1F642 ; Emoji # 7.0 [2] (🙁..🙂) slightly frowning face..slightly smiling face +1F643..1F644 ; Emoji # 8.0 [2] (🙃..🙄) upside-down face..face with rolling eyes +1F645..1F64F ; Emoji # 6.0 [11] (🙅..🙏) person gesturing NO..folded hands +1F680..1F6C5 ; Emoji # 6.0 [70] (🚀..🛅) rocket..left luggage +1F6CB..1F6CF ; Emoji # 7.0 [5] (🛋️..🛏️) couch and lamp..bed +1F6D0 ; Emoji # 8.0 [1] (🛐) place of worship +1F6D1..1F6D2 ; Emoji # 9.0 [2] (🛑..🛒) stop sign..shopping cart +1F6E0..1F6E5 ; Emoji # 7.0 [6] (🛠️..🛥️) hammer and wrench..motor boat +1F6E9 ; Emoji # 7.0 [1] (🛩️) small airplane +1F6EB..1F6EC ; Emoji # 7.0 [2] (🛫..🛬) airplane departure..airplane arrival +1F6F0 ; Emoji # 7.0 [1] (🛰️) satellite +1F6F3 ; Emoji # 7.0 [1] (🛳️) passenger ship +1F6F4..1F6F6 ; Emoji # 9.0 [3] (🛴..🛶) kick scooter..canoe +1F6F7..1F6F8 ; Emoji # 10.0 [2] (🛷..🛸) sled..flying saucer +1F6F9 ; Emoji # 11.0 [1] (🛹) skateboard +1F910..1F918 ; Emoji # 8.0 [9] (🤐..🤘) zipper-mouth face..sign of the horns +1F919..1F91E ; Emoji # 9.0 [6] (🤙..🤞) call me hand..crossed fingers +1F91F ; Emoji # 10.0 [1] (🤟) love-you gesture +1F920..1F927 ; Emoji # 9.0 [8] (🤠..🤧) cowboy hat face..sneezing face +1F928..1F92F ; Emoji # 10.0 [8] (🤨..🤯) face with raised eyebrow..exploding head +1F930 ; Emoji # 9.0 [1] (🤰) pregnant woman +1F931..1F932 ; Emoji # 10.0 [2] (🤱..🤲) breast-feeding..palms up together +1F933..1F93A ; Emoji # 9.0 [8] (🤳..🤺) selfie..person fencing +1F93C..1F93E ; Emoji # 9.0 [3] (🤼..🤾) people wrestling..person playing handball +1F940..1F945 ; Emoji # 9.0 [6] (🥀..🥅) wilted flower..goal net +1F947..1F94B ; Emoji # 9.0 [5] (🥇..🥋) 1st place medal..martial arts uniform +1F94C ; Emoji # 10.0 [1] (🥌) curling stone +1F94D..1F94F ; Emoji # 11.0 [3] (🥍..🥏) lacrosse..flying disc +1F950..1F95E ; Emoji # 9.0 [15] (🥐..🥞) croissant..pancakes +1F95F..1F96B ; Emoji # 10.0 [13] (🥟..🥫) dumpling..canned food +1F96C..1F970 ; Emoji # 11.0 [5] (🥬..🥰) leafy green..smiling face with 3 hearts +1F973..1F976 ; Emoji # 11.0 [4] (🥳..🥶) partying face..cold face +1F97A ; Emoji # 11.0 [1] (🥺) pleading face +1F97C..1F97F ; Emoji # 11.0 [4] (🥼..🥿) lab coat..woman’s flat shoe +1F980..1F984 ; Emoji # 8.0 [5] (🦀..🦄) crab..unicorn face +1F985..1F991 ; Emoji # 9.0 [13] (🦅..🦑) eagle..squid +1F992..1F997 ; Emoji # 10.0 [6] (🦒..🦗) giraffe..cricket +1F998..1F9A2 ; Emoji # 11.0 [11] (🦘..🦢) kangaroo..swan +1F9B0..1F9B9 ; Emoji # 11.0 [10] (🦰..🦹) red-haired..supervillain +1F9C0 ; Emoji # 8.0 [1] (🧀) cheese wedge +1F9C1..1F9C2 ; Emoji # 11.0 [2] (🧁..🧂) cupcake..salt +1F9D0..1F9E6 ; Emoji # 10.0 [23] (🧐..🧦) face with monocle..socks +1F9E7..1F9FF ; Emoji # 11.0 [25] (🧧..🧿) red envelope..nazar amulet + +# Total elements: 1250 + +# ================================================ + +# All omitted code points have Emoji_Presentation=No +# @missing: 0000..10FFFF ; Emoji_Presentation ; No + +231A..231B ; Emoji_Presentation # 1.1 [2] (⌚..⌛) watch..hourglass done +23E9..23EC ; Emoji_Presentation # 6.0 [4] (⏩..⏬) fast-forward button..fast down button +23F0 ; Emoji_Presentation # 6.0 [1] (⏰) alarm clock +23F3 ; Emoji_Presentation # 6.0 [1] (⏳) hourglass not done +25FD..25FE ; Emoji_Presentation # 3.2 [2] (◽..◾) white medium-small square..black medium-small square +2614..2615 ; Emoji_Presentation # 4.0 [2] (☔..☕) umbrella with rain drops..hot beverage +2648..2653 ; Emoji_Presentation # 1.1 [12] (♈..♓) Aries..Pisces +267F ; Emoji_Presentation # 4.1 [1] (♿) wheelchair symbol +2693 ; Emoji_Presentation # 4.1 [1] (⚓) anchor +26A1 ; Emoji_Presentation # 4.0 [1] (⚡) high voltage +26AA..26AB ; Emoji_Presentation # 4.1 [2] (⚪..⚫) white circle..black circle +26BD..26BE ; Emoji_Presentation # 5.2 [2] (⚽..⚾) soccer ball..baseball +26C4..26C5 ; Emoji_Presentation # 5.2 [2] (⛄..⛅) snowman without snow..sun behind cloud +26CE ; Emoji_Presentation # 6.0 [1] (⛎) Ophiuchus +26D4 ; Emoji_Presentation # 5.2 [1] (⛔) no entry +26EA ; Emoji_Presentation # 5.2 [1] (⛪) church +26F2..26F3 ; Emoji_Presentation # 5.2 [2] (⛲..⛳) fountain..flag in hole +26F5 ; Emoji_Presentation # 5.2 [1] (⛵) sailboat +26FA ; Emoji_Presentation # 5.2 [1] (⛺) tent +26FD ; Emoji_Presentation # 5.2 [1] (⛽) fuel pump +2705 ; Emoji_Presentation # 6.0 [1] (✅) white heavy check mark +270A..270B ; Emoji_Presentation # 6.0 [2] (✊..✋) raised fist..raised hand +2728 ; Emoji_Presentation # 6.0 [1] (✨) sparkles +274C ; Emoji_Presentation # 6.0 [1] (❌) cross mark +274E ; Emoji_Presentation # 6.0 [1] (❎) cross mark button +2753..2755 ; Emoji_Presentation # 6.0 [3] (❓..❕) question mark..white exclamation mark +2757 ; Emoji_Presentation # 5.2 [1] (❗) exclamation mark +2795..2797 ; Emoji_Presentation # 6.0 [3] (➕..➗) heavy plus sign..heavy division sign +27B0 ; Emoji_Presentation # 6.0 [1] (➰) curly loop +27BF ; Emoji_Presentation # 6.0 [1] (➿) double curly loop +2B1B..2B1C ; Emoji_Presentation # 5.1 [2] (⬛..⬜) black large square..white large square +2B50 ; Emoji_Presentation # 5.1 [1] (⭐) star +2B55 ; Emoji_Presentation # 5.2 [1] (⭕) heavy large circle +1F004 ; Emoji_Presentation # 5.1 [1] (🀄) mahjong red dragon +1F0CF ; Emoji_Presentation # 6.0 [1] (🃏) joker +1F18E ; Emoji_Presentation # 6.0 [1] (🆎) AB button (blood type) +1F191..1F19A ; Emoji_Presentation # 6.0 [10] (🆑..🆚) CL button..VS button +1F1E6..1F1FF ; Emoji_Presentation # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z +1F201 ; Emoji_Presentation # 6.0 [1] (🈁) Japanese “here” button +1F21A ; Emoji_Presentation # 5.2 [1] (🈚) Japanese “free of charge” button +1F22F ; Emoji_Presentation # 5.2 [1] (🈯) Japanese “reserved” button +1F232..1F236 ; Emoji_Presentation # 6.0 [5] (🈲..🈶) Japanese “prohibited” button..Japanese “not free of charge” button +1F238..1F23A ; Emoji_Presentation # 6.0 [3] (🈸..🈺) Japanese “application” button..Japanese “open for business” button +1F250..1F251 ; Emoji_Presentation # 6.0 [2] (🉐..🉑) Japanese “bargain” button..Japanese “acceptable” button +1F300..1F320 ; Emoji_Presentation # 6.0 [33] (🌀..🌠) cyclone..shooting star +1F32D..1F32F ; Emoji_Presentation # 8.0 [3] (🌭..🌯) hot dog..burrito +1F330..1F335 ; Emoji_Presentation # 6.0 [6] (🌰..🌵) chestnut..cactus +1F337..1F37C ; Emoji_Presentation # 6.0 [70] (🌷..🍼) tulip..baby bottle +1F37E..1F37F ; Emoji_Presentation # 8.0 [2] (🍾..🍿) bottle with popping cork..popcorn +1F380..1F393 ; Emoji_Presentation # 6.0 [20] (🎀..🎓) ribbon..graduation cap +1F3A0..1F3C4 ; Emoji_Presentation # 6.0 [37] (🎠..🏄) carousel horse..person surfing +1F3C5 ; Emoji_Presentation # 7.0 [1] (🏅) sports medal +1F3C6..1F3CA ; Emoji_Presentation # 6.0 [5] (🏆..🏊) trophy..person swimming +1F3CF..1F3D3 ; Emoji_Presentation # 8.0 [5] (🏏..🏓) cricket game..ping pong +1F3E0..1F3F0 ; Emoji_Presentation # 6.0 [17] (🏠..🏰) house..castle +1F3F4 ; Emoji_Presentation # 7.0 [1] (🏴) black flag +1F3F8..1F3FF ; Emoji_Presentation # 8.0 [8] (🏸..🏿) badminton..dark skin tone +1F400..1F43E ; Emoji_Presentation # 6.0 [63] (🐀..🐾) rat..paw prints +1F440 ; Emoji_Presentation # 6.0 [1] (👀) eyes +1F442..1F4F7 ; Emoji_Presentation # 6.0[182] (👂..📷) ear..camera +1F4F8 ; Emoji_Presentation # 7.0 [1] (📸) camera with flash +1F4F9..1F4FC ; Emoji_Presentation # 6.0 [4] (📹..📼) video camera..videocassette +1F4FF ; Emoji_Presentation # 8.0 [1] (📿) prayer beads +1F500..1F53D ; Emoji_Presentation # 6.0 [62] (🔀..🔽) shuffle tracks button..downwards button +1F54B..1F54E ; Emoji_Presentation # 8.0 [4] (🕋..🕎) kaaba..menorah +1F550..1F567 ; Emoji_Presentation # 6.0 [24] (🕐..🕧) one o’clock..twelve-thirty +1F57A ; Emoji_Presentation # 9.0 [1] (🕺) man dancing +1F595..1F596 ; Emoji_Presentation # 7.0 [2] (🖕..🖖) middle finger..vulcan salute +1F5A4 ; Emoji_Presentation # 9.0 [1] (🖤) black heart +1F5FB..1F5FF ; Emoji_Presentation # 6.0 [5] (🗻..🗿) mount fuji..moai +1F600 ; Emoji_Presentation # 6.1 [1] (😀) grinning face +1F601..1F610 ; Emoji_Presentation # 6.0 [16] (😁..😐) beaming face with smiling eyes..neutral face +1F611 ; Emoji_Presentation # 6.1 [1] (😑) expressionless face +1F612..1F614 ; Emoji_Presentation # 6.0 [3] (😒..😔) unamused face..pensive face +1F615 ; Emoji_Presentation # 6.1 [1] (😕) confused face +1F616 ; Emoji_Presentation # 6.0 [1] (😖) confounded face +1F617 ; Emoji_Presentation # 6.1 [1] (😗) kissing face +1F618 ; Emoji_Presentation # 6.0 [1] (😘) face blowing a kiss +1F619 ; Emoji_Presentation # 6.1 [1] (😙) kissing face with smiling eyes +1F61A ; Emoji_Presentation # 6.0 [1] (😚) kissing face with closed eyes +1F61B ; Emoji_Presentation # 6.1 [1] (😛) face with tongue +1F61C..1F61E ; Emoji_Presentation # 6.0 [3] (😜..😞) winking face with tongue..disappointed face +1F61F ; Emoji_Presentation # 6.1 [1] (😟) worried face +1F620..1F625 ; Emoji_Presentation # 6.0 [6] (😠..😥) angry face..sad but relieved face +1F626..1F627 ; Emoji_Presentation # 6.1 [2] (😦..😧) frowning face with open mouth..anguished face +1F628..1F62B ; Emoji_Presentation # 6.0 [4] (😨..😫) fearful face..tired face +1F62C ; Emoji_Presentation # 6.1 [1] (😬) grimacing face +1F62D ; Emoji_Presentation # 6.0 [1] (😭) loudly crying face +1F62E..1F62F ; Emoji_Presentation # 6.1 [2] (😮..😯) face with open mouth..hushed face +1F630..1F633 ; Emoji_Presentation # 6.0 [4] (😰..😳) anxious face with sweat..flushed face +1F634 ; Emoji_Presentation # 6.1 [1] (😴) sleeping face +1F635..1F640 ; Emoji_Presentation # 6.0 [12] (😵..🙀) dizzy face..weary cat face +1F641..1F642 ; Emoji_Presentation # 7.0 [2] (🙁..🙂) slightly frowning face..slightly smiling face +1F643..1F644 ; Emoji_Presentation # 8.0 [2] (🙃..🙄) upside-down face..face with rolling eyes +1F645..1F64F ; Emoji_Presentation # 6.0 [11] (🙅..🙏) person gesturing NO..folded hands +1F680..1F6C5 ; Emoji_Presentation # 6.0 [70] (🚀..🛅) rocket..left luggage +1F6CC ; Emoji_Presentation # 7.0 [1] (🛌) person in bed +1F6D0 ; Emoji_Presentation # 8.0 [1] (🛐) place of worship +1F6D1..1F6D2 ; Emoji_Presentation # 9.0 [2] (🛑..🛒) stop sign..shopping cart +1F6EB..1F6EC ; Emoji_Presentation # 7.0 [2] (🛫..🛬) airplane departure..airplane arrival +1F6F4..1F6F6 ; Emoji_Presentation # 9.0 [3] (🛴..🛶) kick scooter..canoe +1F6F7..1F6F8 ; Emoji_Presentation # 10.0 [2] (🛷..🛸) sled..flying saucer +1F6F9 ; Emoji_Presentation # 11.0 [1] (🛹) skateboard +1F910..1F918 ; Emoji_Presentation # 8.0 [9] (🤐..🤘) zipper-mouth face..sign of the horns +1F919..1F91E ; Emoji_Presentation # 9.0 [6] (🤙..🤞) call me hand..crossed fingers +1F91F ; Emoji_Presentation # 10.0 [1] (🤟) love-you gesture +1F920..1F927 ; Emoji_Presentation # 9.0 [8] (🤠..🤧) cowboy hat face..sneezing face +1F928..1F92F ; Emoji_Presentation # 10.0 [8] (🤨..🤯) face with raised eyebrow..exploding head +1F930 ; Emoji_Presentation # 9.0 [1] (🤰) pregnant woman +1F931..1F932 ; Emoji_Presentation # 10.0 [2] (🤱..🤲) breast-feeding..palms up together +1F933..1F93A ; Emoji_Presentation # 9.0 [8] (🤳..🤺) selfie..person fencing +1F93C..1F93E ; Emoji_Presentation # 9.0 [3] (🤼..🤾) people wrestling..person playing handball +1F940..1F945 ; Emoji_Presentation # 9.0 [6] (🥀..🥅) wilted flower..goal net +1F947..1F94B ; Emoji_Presentation # 9.0 [5] (🥇..🥋) 1st place medal..martial arts uniform +1F94C ; Emoji_Presentation # 10.0 [1] (🥌) curling stone +1F94D..1F94F ; Emoji_Presentation # 11.0 [3] (🥍..🥏) lacrosse..flying disc +1F950..1F95E ; Emoji_Presentation # 9.0 [15] (🥐..🥞) croissant..pancakes +1F95F..1F96B ; Emoji_Presentation # 10.0 [13] (🥟..🥫) dumpling..canned food +1F96C..1F970 ; Emoji_Presentation # 11.0 [5] (🥬..🥰) leafy green..smiling face with 3 hearts +1F973..1F976 ; Emoji_Presentation # 11.0 [4] (🥳..🥶) partying face..cold face +1F97A ; Emoji_Presentation # 11.0 [1] (🥺) pleading face +1F97C..1F97F ; Emoji_Presentation # 11.0 [4] (🥼..🥿) lab coat..woman’s flat shoe +1F980..1F984 ; Emoji_Presentation # 8.0 [5] (🦀..🦄) crab..unicorn face +1F985..1F991 ; Emoji_Presentation # 9.0 [13] (🦅..🦑) eagle..squid +1F992..1F997 ; Emoji_Presentation # 10.0 [6] (🦒..🦗) giraffe..cricket +1F998..1F9A2 ; Emoji_Presentation # 11.0 [11] (🦘..🦢) kangaroo..swan +1F9B0..1F9B9 ; Emoji_Presentation # 11.0 [10] (🦰..🦹) red-haired..supervillain +1F9C0 ; Emoji_Presentation # 8.0 [1] (🧀) cheese wedge +1F9C1..1F9C2 ; Emoji_Presentation # 11.0 [2] (🧁..🧂) cupcake..salt +1F9D0..1F9E6 ; Emoji_Presentation # 10.0 [23] (🧐..🧦) face with monocle..socks +1F9E7..1F9FF ; Emoji_Presentation # 11.0 [25] (🧧..🧿) red envelope..nazar amulet + +# Total elements: 1032 + +# ================================================ + +# All omitted code points have Emoji_Modifier=No +# @missing: 0000..10FFFF ; Emoji_Modifier ; No + +1F3FB..1F3FF ; Emoji_Modifier # 8.0 [5] (🏻..🏿) light skin tone..dark skin tone + +# Total elements: 5 + +# ================================================ + +# All omitted code points have Emoji_Modifier_Base=No +# @missing: 0000..10FFFF ; Emoji_Modifier_Base ; No + +261D ; Emoji_Modifier_Base # 1.1 [1] (☝️) index pointing up +26F9 ; Emoji_Modifier_Base # 5.2 [1] (⛹️) person bouncing ball +270A..270B ; Emoji_Modifier_Base # 6.0 [2] (✊..✋) raised fist..raised hand +270C..270D ; Emoji_Modifier_Base # 1.1 [2] (✌️..✍️) victory hand..writing hand +1F385 ; Emoji_Modifier_Base # 6.0 [1] (🎅) Santa Claus +1F3C2..1F3C4 ; Emoji_Modifier_Base # 6.0 [3] (🏂..🏄) snowboarder..person surfing +1F3C7 ; Emoji_Modifier_Base # 6.0 [1] (🏇) horse racing +1F3CA ; Emoji_Modifier_Base # 6.0 [1] (🏊) person swimming +1F3CB..1F3CC ; Emoji_Modifier_Base # 7.0 [2] (🏋️..🏌️) person lifting weights..person golfing +1F442..1F443 ; Emoji_Modifier_Base # 6.0 [2] (👂..👃) ear..nose +1F446..1F450 ; Emoji_Modifier_Base # 6.0 [11] (👆..👐) backhand index pointing up..open hands +1F466..1F469 ; Emoji_Modifier_Base # 6.0 [4] (👦..👩) boy..woman +1F46E ; Emoji_Modifier_Base # 6.0 [1] (👮) police officer +1F470..1F478 ; Emoji_Modifier_Base # 6.0 [9] (👰..👸) bride with veil..princess +1F47C ; Emoji_Modifier_Base # 6.0 [1] (👼) baby angel +1F481..1F483 ; Emoji_Modifier_Base # 6.0 [3] (💁..💃) person tipping hand..woman dancing +1F485..1F487 ; Emoji_Modifier_Base # 6.0 [3] (💅..💇) nail polish..person getting haircut +1F4AA ; Emoji_Modifier_Base # 6.0 [1] (💪) flexed biceps +1F574..1F575 ; Emoji_Modifier_Base # 7.0 [2] (🕴️..🕵️) man in suit levitating..detective +1F57A ; Emoji_Modifier_Base # 9.0 [1] (🕺) man dancing +1F590 ; Emoji_Modifier_Base # 7.0 [1] (🖐️) hand with fingers splayed +1F595..1F596 ; Emoji_Modifier_Base # 7.0 [2] (🖕..🖖) middle finger..vulcan salute +1F645..1F647 ; Emoji_Modifier_Base # 6.0 [3] (🙅..🙇) person gesturing NO..person bowing +1F64B..1F64F ; Emoji_Modifier_Base # 6.0 [5] (🙋..🙏) person raising hand..folded hands +1F6A3 ; Emoji_Modifier_Base # 6.0 [1] (🚣) person rowing boat +1F6B4..1F6B6 ; Emoji_Modifier_Base # 6.0 [3] (🚴..🚶) person biking..person walking +1F6C0 ; Emoji_Modifier_Base # 6.0 [1] (🛀) person taking bath +1F6CC ; Emoji_Modifier_Base # 7.0 [1] (🛌) person in bed +1F918 ; Emoji_Modifier_Base # 8.0 [1] (🤘) sign of the horns +1F919..1F91C ; Emoji_Modifier_Base # 9.0 [4] (🤙..🤜) call me hand..right-facing fist +1F91E ; Emoji_Modifier_Base # 9.0 [1] (🤞) crossed fingers +1F91F ; Emoji_Modifier_Base # 10.0 [1] (🤟) love-you gesture +1F926 ; Emoji_Modifier_Base # 9.0 [1] (🤦) person facepalming +1F930 ; Emoji_Modifier_Base # 9.0 [1] (🤰) pregnant woman +1F931..1F932 ; Emoji_Modifier_Base # 10.0 [2] (🤱..🤲) breast-feeding..palms up together +1F933..1F939 ; Emoji_Modifier_Base # 9.0 [7] (🤳..🤹) selfie..person juggling +1F93D..1F93E ; Emoji_Modifier_Base # 9.0 [2] (🤽..🤾) person playing water polo..person playing handball +1F9B5..1F9B6 ; Emoji_Modifier_Base # 11.0 [2] (🦵..🦶) leg..foot +1F9B8..1F9B9 ; Emoji_Modifier_Base # 11.0 [2] (🦸..🦹) superhero..supervillain +1F9D1..1F9DD ; Emoji_Modifier_Base # 10.0 [13] (🧑..🧝) adult..elf + +# Total elements: 106 + +# ================================================ + +# All omitted code points have Emoji_Component=No +# @missing: 0000..10FFFF ; Emoji_Component ; No + +0023 ; Emoji_Component # 1.1 [1] (#️) number sign +002A ; Emoji_Component # 1.1 [1] (*️) asterisk +0030..0039 ; Emoji_Component # 1.1 [10] (0️..9️) digit zero..digit nine +200D ; Emoji_Component # 1.1 [1] (‍) zero width joiner +20E3 ; Emoji_Component # 3.0 [1] (⃣) combining enclosing keycap +FE0F ; Emoji_Component # 3.2 [1] () VARIATION SELECTOR-16 +1F1E6..1F1FF ; Emoji_Component # 6.0 [26] (🇦..🇿) regional indicator symbol letter a..regional indicator symbol letter z +1F3FB..1F3FF ; Emoji_Component # 8.0 [5] (🏻..🏿) light skin tone..dark skin tone +1F9B0..1F9B3 ; Emoji_Component # 11.0 [4] (🦰..🦳) red-haired..white-haired +E0020..E007F ; Emoji_Component # 3.1 [96] (󠀠..󠁿) tag space..cancel tag + +# Total elements: 146 + +# ================================================ + +# All omitted code points have Extended_Pictographic=No +# @missing: 0000..10FFFF ; Extended_Pictographic ; No + +00A9 ; Extended_Pictographic# 1.1 [1] (©️) copyright +00AE ; Extended_Pictographic# 1.1 [1] (®️) registered +203C ; Extended_Pictographic# 1.1 [1] (‼️) double exclamation mark +2049 ; Extended_Pictographic# 3.0 [1] (⁉️) exclamation question mark +2122 ; Extended_Pictographic# 1.1 [1] (™️) trade mark +2139 ; Extended_Pictographic# 3.0 [1] (ℹ️) information +2194..2199 ; Extended_Pictographic# 1.1 [6] (↔️..↙️) left-right arrow..down-left arrow +21A9..21AA ; Extended_Pictographic# 1.1 [2] (↩️..↪️) right arrow curving left..left arrow curving right +231A..231B ; Extended_Pictographic# 1.1 [2] (⌚..⌛) watch..hourglass done +2328 ; Extended_Pictographic# 1.1 [1] (⌨️) keyboard +2388 ; Extended_Pictographic# 3.0 [1] (⎈️) HELM SYMBOL +23CF ; Extended_Pictographic# 4.0 [1] (⏏️) eject button +23E9..23F3 ; Extended_Pictographic# 6.0 [11] (⏩..⏳) fast-forward button..hourglass not done +23F8..23FA ; Extended_Pictographic# 7.0 [3] (⏸️..⏺️) pause button..record button +24C2 ; Extended_Pictographic# 1.1 [1] (Ⓜ️) circled M +25AA..25AB ; Extended_Pictographic# 1.1 [2] (▪️..▫️) black small square..white small square +25B6 ; Extended_Pictographic# 1.1 [1] (▶️) play button +25C0 ; Extended_Pictographic# 1.1 [1] (◀️) reverse button +25FB..25FE ; Extended_Pictographic# 3.2 [4] (◻️..◾) white medium square..black medium-small square +2600..2605 ; Extended_Pictographic# 1.1 [6] (☀️..★️) sun..BLACK STAR +2607..2612 ; Extended_Pictographic# 1.1 [12] (☇️..☒️) LIGHTNING..BALLOT BOX WITH X +2614..2615 ; Extended_Pictographic# 4.0 [2] (☔..☕) umbrella with rain drops..hot beverage +2616..2617 ; Extended_Pictographic# 3.2 [2] (☖️..☗️) WHITE SHOGI PIECE..BLACK SHOGI PIECE +2618 ; Extended_Pictographic# 4.1 [1] (☘️) shamrock +2619 ; Extended_Pictographic# 3.0 [1] (☙️) REVERSED ROTATED FLORAL HEART BULLET +261A..266F ; Extended_Pictographic# 1.1 [86] (☚️..♯️) BLACK LEFT POINTING INDEX..MUSIC SHARP SIGN +2670..2671 ; Extended_Pictographic# 3.0 [2] (♰️..♱️) WEST SYRIAC CROSS..EAST SYRIAC CROSS +2672..267D ; Extended_Pictographic# 3.2 [12] (♲️..♽️) UNIVERSAL RECYCLING SYMBOL..PARTIALLY-RECYCLED PAPER SYMBOL +267E..267F ; Extended_Pictographic# 4.1 [2] (♾️..♿) infinity..wheelchair symbol +2680..2685 ; Extended_Pictographic# 3.2 [6] (⚀️..⚅️) DIE FACE-1..DIE FACE-6 +2690..2691 ; Extended_Pictographic# 4.0 [2] (⚐️..⚑️) WHITE FLAG..BLACK FLAG +2692..269C ; Extended_Pictographic# 4.1 [11] (⚒️..⚜️) hammer and pick..fleur-de-lis +269D ; Extended_Pictographic# 5.1 [1] (⚝️) OUTLINED WHITE STAR +269E..269F ; Extended_Pictographic# 5.2 [2] (⚞️..⚟️) THREE LINES CONVERGING RIGHT..THREE LINES CONVERGING LEFT +26A0..26A1 ; Extended_Pictographic# 4.0 [2] (⚠️..⚡) warning..high voltage +26A2..26B1 ; Extended_Pictographic# 4.1 [16] (⚢️..⚱️) DOUBLED FEMALE SIGN..funeral urn +26B2 ; Extended_Pictographic# 5.0 [1] (⚲️) NEUTER +26B3..26BC ; Extended_Pictographic# 5.1 [10] (⚳️..⚼️) CERES..SESQUIQUADRATE +26BD..26BF ; Extended_Pictographic# 5.2 [3] (⚽..⚿️) soccer ball..SQUARED KEY +26C0..26C3 ; Extended_Pictographic# 5.1 [4] (⛀️..⛃️) WHITE DRAUGHTS MAN..BLACK DRAUGHTS KING +26C4..26CD ; Extended_Pictographic# 5.2 [10] (⛄..⛍️) snowman without snow..DISABLED CAR +26CE ; Extended_Pictographic# 6.0 [1] (⛎) Ophiuchus +26CF..26E1 ; Extended_Pictographic# 5.2 [19] (⛏️..⛡️) pick..RESTRICTED LEFT ENTRY-2 +26E2 ; Extended_Pictographic# 6.0 [1] (⛢️) ASTRONOMICAL SYMBOL FOR URANUS +26E3 ; Extended_Pictographic# 5.2 [1] (⛣️) HEAVY CIRCLE WITH STROKE AND TWO DOTS ABOVE +26E4..26E7 ; Extended_Pictographic# 6.0 [4] (⛤️..⛧️) PENTAGRAM..INVERTED PENTAGRAM +26E8..26FF ; Extended_Pictographic# 5.2 [24] (⛨️..⛿️) BLACK CROSS ON SHIELD..WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE +2700 ; Extended_Pictographic# 7.0 [1] (✀️) BLACK SAFETY SCISSORS +2701..2704 ; Extended_Pictographic# 1.1 [4] (✁️..✄️) UPPER BLADE SCISSORS..WHITE SCISSORS +2705 ; Extended_Pictographic# 6.0 [1] (✅) white heavy check mark +2708..2709 ; Extended_Pictographic# 1.1 [2] (✈️..✉️) airplane..envelope +270A..270B ; Extended_Pictographic# 6.0 [2] (✊..✋) raised fist..raised hand +270C..2712 ; Extended_Pictographic# 1.1 [7] (✌️..✒️) victory hand..black nib +2714 ; Extended_Pictographic# 1.1 [1] (✔️) heavy check mark +2716 ; Extended_Pictographic# 1.1 [1] (✖️) heavy multiplication x +271D ; Extended_Pictographic# 1.1 [1] (✝️) latin cross +2721 ; Extended_Pictographic# 1.1 [1] (✡️) star of David +2728 ; Extended_Pictographic# 6.0 [1] (✨) sparkles +2733..2734 ; Extended_Pictographic# 1.1 [2] (✳️..✴️) eight-spoked asterisk..eight-pointed star +2744 ; Extended_Pictographic# 1.1 [1] (❄️) snowflake +2747 ; Extended_Pictographic# 1.1 [1] (❇️) sparkle +274C ; Extended_Pictographic# 6.0 [1] (❌) cross mark +274E ; Extended_Pictographic# 6.0 [1] (❎) cross mark button +2753..2755 ; Extended_Pictographic# 6.0 [3] (❓..❕) question mark..white exclamation mark +2757 ; Extended_Pictographic# 5.2 [1] (❗) exclamation mark +2763..2767 ; Extended_Pictographic# 1.1 [5] (❣️..❧️) heavy heart exclamation..ROTATED FLORAL HEART BULLET +2795..2797 ; Extended_Pictographic# 6.0 [3] (➕..➗) heavy plus sign..heavy division sign +27A1 ; Extended_Pictographic# 1.1 [1] (➡️) right arrow +27B0 ; Extended_Pictographic# 6.0 [1] (➰) curly loop +27BF ; Extended_Pictographic# 6.0 [1] (➿) double curly loop +2934..2935 ; Extended_Pictographic# 3.2 [2] (⤴️..⤵️) right arrow curving up..right arrow curving down +2B05..2B07 ; Extended_Pictographic# 4.0 [3] (⬅️..⬇️) left arrow..down arrow +2B1B..2B1C ; Extended_Pictographic# 5.1 [2] (⬛..⬜) black large square..white large square +2B50 ; Extended_Pictographic# 5.1 [1] (⭐) star +2B55 ; Extended_Pictographic# 5.2 [1] (⭕) heavy large circle +3030 ; Extended_Pictographic# 1.1 [1] (〰️) wavy dash +303D ; Extended_Pictographic# 3.2 [1] (〽️) part alternation mark +3297 ; Extended_Pictographic# 1.1 [1] (㊗️) Japanese “congratulations” button +3299 ; Extended_Pictographic# 1.1 [1] (㊙️) Japanese “secret” button +1F000..1F02B ; Extended_Pictographic# 5.1 [44] (🀀️..🀫️) MAHJONG TILE EAST WIND..MAHJONG TILE BACK +1F02C..1F02F ; Extended_Pictographic# NA [4] (🀬️..🀯️) .. +1F030..1F093 ; Extended_Pictographic# 5.1[100] (🀰️..🂓️) DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06 +1F094..1F09F ; Extended_Pictographic# NA [12] (🂔️..🂟️) .. +1F0A0..1F0AE ; Extended_Pictographic# 6.0 [15] (🂠️..🂮️) PLAYING CARD BACK..PLAYING CARD KING OF SPADES +1F0AF..1F0B0 ; Extended_Pictographic# NA [2] (🂯️..🂰️) .. +1F0B1..1F0BE ; Extended_Pictographic# 6.0 [14] (🂱️..🂾️) PLAYING CARD ACE OF HEARTS..PLAYING CARD KING OF HEARTS +1F0BF ; Extended_Pictographic# 7.0 [1] (🂿️) PLAYING CARD RED JOKER +1F0C0 ; Extended_Pictographic# NA [1] (🃀️) +1F0C1..1F0CF ; Extended_Pictographic# 6.0 [15] (🃁️..🃏) PLAYING CARD ACE OF DIAMONDS..joker +1F0D0 ; Extended_Pictographic# NA [1] (🃐️) +1F0D1..1F0DF ; Extended_Pictographic# 6.0 [15] (🃑️..🃟️) PLAYING CARD ACE OF CLUBS..PLAYING CARD WHITE JOKER +1F0E0..1F0F5 ; Extended_Pictographic# 7.0 [22] (🃠️..🃵️) PLAYING CARD FOOL..PLAYING CARD TRUMP-21 +1F0F6..1F0FF ; Extended_Pictographic# NA [10] (🃶️..🃿️) .. +1F10D..1F10F ; Extended_Pictographic# NA [3] (🄍️..🄏️) .. +1F12F ; Extended_Pictographic# 11.0 [1] (🄯️) COPYLEFT SYMBOL +1F16C..1F16F ; Extended_Pictographic# NA [4] (🅬️..🅯️) .. +1F170..1F171 ; Extended_Pictographic# 6.0 [2] (🅰️..🅱️) A button (blood type)..B button (blood type) +1F17E ; Extended_Pictographic# 6.0 [1] (🅾️) O button (blood type) +1F17F ; Extended_Pictographic# 5.2 [1] (🅿️) P button +1F18E ; Extended_Pictographic# 6.0 [1] (🆎) AB button (blood type) +1F191..1F19A ; Extended_Pictographic# 6.0 [10] (🆑..🆚) CL button..VS button +1F1AD..1F1E5 ; Extended_Pictographic# NA [57] (🆭️..🇥️) .. +1F201..1F202 ; Extended_Pictographic# 6.0 [2] (🈁..🈂️) Japanese “here” button..Japanese “service charge” button +1F203..1F20F ; Extended_Pictographic# NA [13] (🈃️..🈏️) .. +1F21A ; Extended_Pictographic# 5.2 [1] (🈚) Japanese “free of charge” button +1F22F ; Extended_Pictographic# 5.2 [1] (🈯) Japanese “reserved” button +1F232..1F23A ; Extended_Pictographic# 6.0 [9] (🈲..🈺) Japanese “prohibited” button..Japanese “open for business” button +1F23C..1F23F ; Extended_Pictographic# NA [4] (🈼️..🈿️) .. +1F249..1F24F ; Extended_Pictographic# NA [7] (🉉️..🉏️) .. +1F250..1F251 ; Extended_Pictographic# 6.0 [2] (🉐..🉑) Japanese “bargain” button..Japanese “acceptable” button +1F252..1F25F ; Extended_Pictographic# NA [14] (🉒️..🉟️) .. +1F260..1F265 ; Extended_Pictographic# 10.0 [6] (🉠️..🉥️) ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI +1F266..1F2FF ; Extended_Pictographic# NA[154] (🉦️..🋿️) .. +1F300..1F320 ; Extended_Pictographic# 6.0 [33] (🌀..🌠) cyclone..shooting star +1F321..1F32C ; Extended_Pictographic# 7.0 [12] (🌡️..🌬️) thermometer..wind face +1F32D..1F32F ; Extended_Pictographic# 8.0 [3] (🌭..🌯) hot dog..burrito +1F330..1F335 ; Extended_Pictographic# 6.0 [6] (🌰..🌵) chestnut..cactus +1F336 ; Extended_Pictographic# 7.0 [1] (🌶️) hot pepper +1F337..1F37C ; Extended_Pictographic# 6.0 [70] (🌷..🍼) tulip..baby bottle +1F37D ; Extended_Pictographic# 7.0 [1] (🍽️) fork and knife with plate +1F37E..1F37F ; Extended_Pictographic# 8.0 [2] (🍾..🍿) bottle with popping cork..popcorn +1F380..1F393 ; Extended_Pictographic# 6.0 [20] (🎀..🎓) ribbon..graduation cap +1F394..1F39F ; Extended_Pictographic# 7.0 [12] (🎔️..🎟️) HEART WITH TIP ON THE LEFT..admission tickets +1F3A0..1F3C4 ; Extended_Pictographic# 6.0 [37] (🎠..🏄) carousel horse..person surfing +1F3C5 ; Extended_Pictographic# 7.0 [1] (🏅) sports medal +1F3C6..1F3CA ; Extended_Pictographic# 6.0 [5] (🏆..🏊) trophy..person swimming +1F3CB..1F3CE ; Extended_Pictographic# 7.0 [4] (🏋️..🏎️) person lifting weights..racing car +1F3CF..1F3D3 ; Extended_Pictographic# 8.0 [5] (🏏..🏓) cricket game..ping pong +1F3D4..1F3DF ; Extended_Pictographic# 7.0 [12] (🏔️..🏟️) snow-capped mountain..stadium +1F3E0..1F3F0 ; Extended_Pictographic# 6.0 [17] (🏠..🏰) house..castle +1F3F1..1F3F7 ; Extended_Pictographic# 7.0 [7] (🏱️..🏷️) WHITE PENNANT..label +1F3F8..1F3FA ; Extended_Pictographic# 8.0 [3] (🏸..🏺) badminton..amphora +1F400..1F43E ; Extended_Pictographic# 6.0 [63] (🐀..🐾) rat..paw prints +1F43F ; Extended_Pictographic# 7.0 [1] (🐿️) chipmunk +1F440 ; Extended_Pictographic# 6.0 [1] (👀) eyes +1F441 ; Extended_Pictographic# 7.0 [1] (👁️) eye +1F442..1F4F7 ; Extended_Pictographic# 6.0[182] (👂..📷) ear..camera +1F4F8 ; Extended_Pictographic# 7.0 [1] (📸) camera with flash +1F4F9..1F4FC ; Extended_Pictographic# 6.0 [4] (📹..📼) video camera..videocassette +1F4FD..1F4FE ; Extended_Pictographic# 7.0 [2] (📽️..📾️) film projector..PORTABLE STEREO +1F4FF ; Extended_Pictographic# 8.0 [1] (📿) prayer beads +1F500..1F53D ; Extended_Pictographic# 6.0 [62] (🔀..🔽) shuffle tracks button..downwards button +1F546..1F54A ; Extended_Pictographic# 7.0 [5] (🕆️..🕊️) WHITE LATIN CROSS..dove +1F54B..1F54F ; Extended_Pictographic# 8.0 [5] (🕋..🕏️) kaaba..BOWL OF HYGIEIA +1F550..1F567 ; Extended_Pictographic# 6.0 [24] (🕐..🕧) one o’clock..twelve-thirty +1F568..1F579 ; Extended_Pictographic# 7.0 [18] (🕨️..🕹️) RIGHT SPEAKER..joystick +1F57A ; Extended_Pictographic# 9.0 [1] (🕺) man dancing +1F57B..1F5A3 ; Extended_Pictographic# 7.0 [41] (🕻️..🖣️) LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX +1F5A4 ; Extended_Pictographic# 9.0 [1] (🖤) black heart +1F5A5..1F5FA ; Extended_Pictographic# 7.0 [86] (🖥️..🗺️) desktop computer..world map +1F5FB..1F5FF ; Extended_Pictographic# 6.0 [5] (🗻..🗿) mount fuji..moai +1F600 ; Extended_Pictographic# 6.1 [1] (😀) grinning face +1F601..1F610 ; Extended_Pictographic# 6.0 [16] (😁..😐) beaming face with smiling eyes..neutral face +1F611 ; Extended_Pictographic# 6.1 [1] (😑) expressionless face +1F612..1F614 ; Extended_Pictographic# 6.0 [3] (😒..😔) unamused face..pensive face +1F615 ; Extended_Pictographic# 6.1 [1] (😕) confused face +1F616 ; Extended_Pictographic# 6.0 [1] (😖) confounded face +1F617 ; Extended_Pictographic# 6.1 [1] (😗) kissing face +1F618 ; Extended_Pictographic# 6.0 [1] (😘) face blowing a kiss +1F619 ; Extended_Pictographic# 6.1 [1] (😙) kissing face with smiling eyes +1F61A ; Extended_Pictographic# 6.0 [1] (😚) kissing face with closed eyes +1F61B ; Extended_Pictographic# 6.1 [1] (😛) face with tongue +1F61C..1F61E ; Extended_Pictographic# 6.0 [3] (😜..😞) winking face with tongue..disappointed face +1F61F ; Extended_Pictographic# 6.1 [1] (😟) worried face +1F620..1F625 ; Extended_Pictographic# 6.0 [6] (😠..😥) angry face..sad but relieved face +1F626..1F627 ; Extended_Pictographic# 6.1 [2] (😦..😧) frowning face with open mouth..anguished face +1F628..1F62B ; Extended_Pictographic# 6.0 [4] (😨..😫) fearful face..tired face +1F62C ; Extended_Pictographic# 6.1 [1] (😬) grimacing face +1F62D ; Extended_Pictographic# 6.0 [1] (😭) loudly crying face +1F62E..1F62F ; Extended_Pictographic# 6.1 [2] (😮..😯) face with open mouth..hushed face +1F630..1F633 ; Extended_Pictographic# 6.0 [4] (😰..😳) anxious face with sweat..flushed face +1F634 ; Extended_Pictographic# 6.1 [1] (😴) sleeping face +1F635..1F640 ; Extended_Pictographic# 6.0 [12] (😵..🙀) dizzy face..weary cat face +1F641..1F642 ; Extended_Pictographic# 7.0 [2] (🙁..🙂) slightly frowning face..slightly smiling face +1F643..1F644 ; Extended_Pictographic# 8.0 [2] (🙃..🙄) upside-down face..face with rolling eyes +1F645..1F64F ; Extended_Pictographic# 6.0 [11] (🙅..🙏) person gesturing NO..folded hands +1F680..1F6C5 ; Extended_Pictographic# 6.0 [70] (🚀..🛅) rocket..left luggage +1F6C6..1F6CF ; Extended_Pictographic# 7.0 [10] (🛆️..🛏️) TRIANGLE WITH ROUNDED CORNERS..bed +1F6D0 ; Extended_Pictographic# 8.0 [1] (🛐) place of worship +1F6D1..1F6D2 ; Extended_Pictographic# 9.0 [2] (🛑..🛒) stop sign..shopping cart +1F6D3..1F6D4 ; Extended_Pictographic# 10.0 [2] (🛓️..🛔️) STUPA..PAGODA +1F6D5..1F6DF ; Extended_Pictographic# NA [11] (🛕️..🛟️) .. +1F6E0..1F6EC ; Extended_Pictographic# 7.0 [13] (🛠️..🛬) hammer and wrench..airplane arrival +1F6ED..1F6EF ; Extended_Pictographic# NA [3] (🛭️..🛯️) .. +1F6F0..1F6F3 ; Extended_Pictographic# 7.0 [4] (🛰️..🛳️) satellite..passenger ship +1F6F4..1F6F6 ; Extended_Pictographic# 9.0 [3] (🛴..🛶) kick scooter..canoe +1F6F7..1F6F8 ; Extended_Pictographic# 10.0 [2] (🛷..🛸) sled..flying saucer +1F6F9 ; Extended_Pictographic# 11.0 [1] (🛹) skateboard +1F6FA..1F6FF ; Extended_Pictographic# NA [6] (🛺️..🛿️) .. +1F774..1F77F ; Extended_Pictographic# NA [12] (🝴️..🝿️) .. +1F7D5..1F7D8 ; Extended_Pictographic# 11.0 [4] (🟕️..🟘️) CIRCLED TRIANGLE..NEGATIVE CIRCLED SQUARE +1F7D9..1F7FF ; Extended_Pictographic# NA [39] (🟙️..🟿️) .. +1F80C..1F80F ; Extended_Pictographic# NA [4] (🠌️..🠏️) .. +1F848..1F84F ; Extended_Pictographic# NA [8] (🡈️..🡏️) .. +1F85A..1F85F ; Extended_Pictographic# NA [6] (🡚️..🡟️) .. +1F888..1F88F ; Extended_Pictographic# NA [8] (🢈️..🢏️) .. +1F8AE..1F8FF ; Extended_Pictographic# NA [82] (🢮️..🣿️) .. +1F90C..1F90F ; Extended_Pictographic# NA [4] (🤌️..🤏️) .. +1F910..1F918 ; Extended_Pictographic# 8.0 [9] (🤐..🤘) zipper-mouth face..sign of the horns +1F919..1F91E ; Extended_Pictographic# 9.0 [6] (🤙..🤞) call me hand..crossed fingers +1F91F ; Extended_Pictographic# 10.0 [1] (🤟) love-you gesture +1F920..1F927 ; Extended_Pictographic# 9.0 [8] (🤠..🤧) cowboy hat face..sneezing face +1F928..1F92F ; Extended_Pictographic# 10.0 [8] (🤨..🤯) face with raised eyebrow..exploding head +1F930 ; Extended_Pictographic# 9.0 [1] (🤰) pregnant woman +1F931..1F932 ; Extended_Pictographic# 10.0 [2] (🤱..🤲) breast-feeding..palms up together +1F933..1F93A ; Extended_Pictographic# 9.0 [8] (🤳..🤺) selfie..person fencing +1F93C..1F93E ; Extended_Pictographic# 9.0 [3] (🤼..🤾) people wrestling..person playing handball +1F93F ; Extended_Pictographic# NA [1] (🤿️) +1F940..1F945 ; Extended_Pictographic# 9.0 [6] (🥀..🥅) wilted flower..goal net +1F947..1F94B ; Extended_Pictographic# 9.0 [5] (🥇..🥋) 1st place medal..martial arts uniform +1F94C ; Extended_Pictographic# 10.0 [1] (🥌) curling stone +1F94D..1F94F ; Extended_Pictographic# 11.0 [3] (🥍..🥏) lacrosse..flying disc +1F950..1F95E ; Extended_Pictographic# 9.0 [15] (🥐..🥞) croissant..pancakes +1F95F..1F96B ; Extended_Pictographic# 10.0 [13] (🥟..🥫) dumpling..canned food +1F96C..1F970 ; Extended_Pictographic# 11.0 [5] (🥬..🥰) leafy green..smiling face with 3 hearts +1F971..1F972 ; Extended_Pictographic# NA [2] (🥱️..🥲️) .. +1F973..1F976 ; Extended_Pictographic# 11.0 [4] (🥳..🥶) partying face..cold face +1F977..1F979 ; Extended_Pictographic# NA [3] (🥷️..🥹️) .. +1F97A ; Extended_Pictographic# 11.0 [1] (🥺) pleading face +1F97B ; Extended_Pictographic# NA [1] (🥻️) +1F97C..1F97F ; Extended_Pictographic# 11.0 [4] (🥼..🥿) lab coat..woman’s flat shoe +1F980..1F984 ; Extended_Pictographic# 8.0 [5] (🦀..🦄) crab..unicorn face +1F985..1F991 ; Extended_Pictographic# 9.0 [13] (🦅..🦑) eagle..squid +1F992..1F997 ; Extended_Pictographic# 10.0 [6] (🦒..🦗) giraffe..cricket +1F998..1F9A2 ; Extended_Pictographic# 11.0 [11] (🦘..🦢) kangaroo..swan +1F9A3..1F9AF ; Extended_Pictographic# NA [13] (🦣️..🦯️) .. +1F9B0..1F9B9 ; Extended_Pictographic# 11.0 [10] (🦰..🦹) red-haired..supervillain +1F9BA..1F9BF ; Extended_Pictographic# NA [6] (🦺️..🦿️) .. +1F9C0 ; Extended_Pictographic# 8.0 [1] (🧀) cheese wedge +1F9C1..1F9C2 ; Extended_Pictographic# 11.0 [2] (🧁..🧂) cupcake..salt +1F9C3..1F9CF ; Extended_Pictographic# NA [13] (🧃️..🧏️) .. +1F9D0..1F9E6 ; Extended_Pictographic# 10.0 [23] (🧐..🧦) face with monocle..socks +1F9E7..1F9FF ; Extended_Pictographic# 11.0 [25] (🧧..🧿) red envelope..nazar amulet +1FA00..1FA5F ; Extended_Pictographic# NA [96] (🨀️..🩟️) .. +1FA60..1FA6D ; Extended_Pictographic# 11.0 [14] (🩠️..🩭️) XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER +1FA6E..1FFFD ; Extended_Pictographic# NA[1424] (🩮️..🿽️) .. + +# Total elements: 3793 + +#EOF diff --git a/maint/ucptest.c b/maint/ucptest.c index 8e305ba..a3dfd26 100644 --- a/maint/ucptest.c +++ b/maint/ucptest.c @@ -2,7 +2,7 @@ * A program for testing the Unicode property table * ***************************************************/ -/* Copyright (c) University of Cambridge 2008 - 2014 */ +/* Copyright (c) University of Cambridge 2008 - 2018 */ /* Compile thus: gcc -DHAVE_CONFIG_H -DPCRE2_CODE_UNIT_WIDTH=8 -o ucptest \ @@ -123,7 +123,13 @@ switch(gbprop) case ucp_gbT: graphbreak = US"Hangul syllable type T"; break; case ucp_gbLV: graphbreak = US"Hangul syllable type LV"; break; case ucp_gbLVT: graphbreak = US"Hangul syllable type LVT"; break; + case ucp_gbRegionalIndicator: + graphbreak = US"Regional Indicator"; break; case ucp_gbOther: graphbreak = US"Other"; break; + case ucp_gbZWJ: graphbreak = US"Zero Width Joiner"; break; + case ucp_gbExtended_Pictographic: + graphbreak = US"Extended Pictographic"; break; + default: graphbreak = US"Unknown"; break; } switch(script) @@ -268,6 +274,27 @@ switch(script) case ucp_Multani: scriptname = US"Multani"; break; case ucp_Old_Hungarian: scriptname = US"Old_Hungarian"; break; case ucp_SignWriting: scriptname = US"SignWriting"; break; + + /* New for Unicode 10.0.0 (no update since 8.0.0) */ + case ucp_Adlam: scriptname = US"Adlam"; break; + case ucp_Bhaiksuki: scriptname = US"Bhaiksuki"; break; + case ucp_Marchen: scriptname = US"Marchen"; break; + case ucp_Newa: scriptname = US"Newa"; break; + case ucp_Osage: scriptname = US"Osage"; break; + case ucp_Tangut: scriptname = US"Tangut"; break; + case ucp_Masaram_Gondi: scriptname = US"Masaram_Gondi"; break; + case ucp_Nushu: scriptname = US"Nushu"; break; + case ucp_Soyombo: scriptname = US"Soyombo"; break; + case ucp_Zanabazar_Square: scriptname = US"Zanabazar_Square"; break; + + /* New for Unicode 11.0.0 */ + case ucp_Dogra: scriptname = US"Dogra"; break; + case ucp_Gunjala_Gondi: scriptname = US"Gunjala_Gondi"; break; + case ucp_Hanifi_Rohingya: scriptname = US"Hanifi_Rohingya"; break; + case ucp_Makasar: scriptname = US"Makasar"; break; + case ucp_Medefaidrin: scriptname = US"Medefaidrin"; break; + case ucp_Old_Sogdian: scriptname = US"Old_Sogdian"; break; + case ucp_Sogdian: scriptname = US"Sogdian"; break; } printf("%04x %s: %s, %s, %s", c, typename, fulltypename, scriptname, graphbreak); diff --git a/maint/ucptestdata/testinput1 b/maint/ucptestdata/testinput1 index 5586867..c98c617 100644 --- a/maint/ucptestdata/testinput1 +++ b/maint/ucptestdata/testinput1 @@ -36,3 +36,5 @@ findprop 0d 0a 0e 0711 1b04 1111 1169 11fe ae4c ad89 findprop 118a0 11ac7 16ad0 findprop 11700 14400 108e0 11280 1d800 + +findprop 11800 1e903 11da9 10d27 11ee0 16e48 10f27 10f30 diff --git a/maint/ucptestdata/testoutput1 b/maint/ucptestdata/testoutput1 index 7553714..67170dd 100644 --- a/maint/ucptestdata/testoutput1 +++ b/maint/ucptestdata/testoutput1 @@ -179,12 +179,12 @@ findprop a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af 00a6 Symbol: Other symbol, Common, Other 00a7 Punctuation: Other punctuation, Common, Other 00a8 Symbol: Modifier symbol, Common, Other -00a9 Symbol: Other symbol, Common, Other +00a9 Symbol: Other symbol, Common, Extended Pictographic 00aa Letter: Other letter, Latin, Other 00ab Punctuation: Initial punctuation, Common, Other 00ac Symbol: Mathematical symbol, Common, Other 00ad Control: Format, Common, Control -00ae Symbol: Other symbol, Common, Other +00ae Symbol: Other symbol, Common, Extended Pictographic 00af Symbol: Modifier symbol, Common, Other findprop b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf 00b0 Symbol: Other symbol, Common, Other @@ -369,3 +369,13 @@ findprop 11700 14400 108e0 11280 1d800 108e0 Letter: Other letter, Hatran, Other 11280 Letter: Other letter, Multani, Other 1d800 Symbol: Other symbol, SignWriting, Other + +findprop 11800 1e903 11da9 10d27 11ee0 16e48 10f27 10f30 +11800 Letter: Other letter, Dogra, Other +1e903 Letter: Upper case letter, Adlam, Other, 1e925 +11da9 Number: Decimal number, Gunjala_Gondi, Other +10d27 Mark: Non-spacing mark, Hanifi_Rohingya, Extend +11ee0 Letter: Other letter, Makasar, Other +16e48 Letter: Upper case letter, Medefaidrin, Other, 16e68 +10f27 Letter: Other letter, Old_Sogdian, Other +10f30 Letter: Other letter, Sogdian, Other diff --git a/src/pcre2_extuni.c b/src/pcre2_extuni.c index 11a0bfb..1741922 100644 --- a/src/pcre2_extuni.c +++ b/src/pcre2_extuni.c @@ -129,13 +129,13 @@ while (eptr < end_subject) if ((ricount & 1) != 0) break; /* Grapheme break required */ } - /* If Extend follows E_Base[_GAZ] do not update lgb; this allows - any number of Extend before a following E_Modifier. */ + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ - if (rgb != ucp_gbExtend || - (lgb != ucp_gbE_Base && lgb != ucp_gbE_Base_GAZ)) + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) lgb = rgb; - + eptr += len; if (xcount != NULL) *xcount += 1; } diff --git a/src/pcre2_internal.h b/src/pcre2_internal.h index f837d15..de6d43b 100644 --- a/src/pcre2_internal.h +++ b/src/pcre2_internal.h @@ -1901,7 +1901,7 @@ extern const ucd_record PRIV(ucd_records)[]; #if PCRE2_CODE_UNIT_WIDTH == 32 extern const ucd_record PRIV(dummy_ucd_record)[]; #endif -extern const uint8_t PRIV(ucd_stage1)[]; +extern const uint16_t PRIV(ucd_stage1)[]; extern const uint16_t PRIV(ucd_stage2)[]; extern const uint32_t PRIV(ucp_gbtable)[]; extern const uint32_t PRIV(ucp_gentype)[]; diff --git a/src/pcre2_jit_compile.c b/src/pcre2_jit_compile.c index 80ed1c4..7875938 100644 --- a/src/pcre2_jit_compile.c +++ b/src/pcre2_jit_compile.c @@ -3666,7 +3666,8 @@ if (!common->utf) #endif OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); -OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP2, 0); +OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK); OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); @@ -6627,7 +6628,8 @@ if (needstype || needsscript) #endif OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP2, 0); + OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK); OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); @@ -7254,12 +7256,13 @@ while (cc < end_subject) if ((ricount & 1) != 0) break; /* Grapheme break required */ } - /* If Extend follows E_Base[_GAZ] do not update lgb; this allows - any number of Extend before a following E_Modifier. */ - - if (rgb != ucp_gbExtend || (lgb != ucp_gbE_Base && lgb != ucp_gbE_Base_GAZ)) + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ + + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) lgb = rgb; - + prevcc = cc; cc += len; } @@ -7309,12 +7312,13 @@ while (cc < end_subject) if ((ricount & 1) != 0) break; /* Grapheme break required */ } - /* If Extend follows E_Base[_GAZ] do not update lgb; this allows - any number of Extend before a following E_Modifier. */ - - if (rgb != ucp_gbExtend || (lgb != ucp_gbE_Base && lgb != ucp_gbE_Base_GAZ)) + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ + + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) lgb = rgb; - + cc++; } diff --git a/src/pcre2_tables.c b/src/pcre2_tables.c index 934692b..83d6f9d 100644 --- a/src/pcre2_tables.c +++ b/src/pcre2_tables.c @@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel Original API code Copyright (c) 1997-2012 University of Cambridge - New API code Copyright (c) 2016-2017 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -137,9 +137,10 @@ const uint32_t PRIV(ucp_gentype)[] = { /* This table encodes the rules for finding the end of an extended grapheme cluster. Every code point has a grapheme break property which is one of the -ucp_gbXX values defined in pcre2_ucp.h. The 2-dimensional table is indexed by -the properties of two adjacent code points. The left property selects a word -from the table, and the right property selects a bit from that word like this: +ucp_gbXX values defined in pcre2_ucp.h. These changed between Unicode versions +10 and 11. The 2-dimensional table is indexed by the properties of two adjacent +code points. The left property selects a word from the table, and the right +property selects a bit from that word like this: PRIV(ucp_gbtable)[left-property] & (1 << right-property) @@ -166,49 +167,41 @@ are implementing). 6. Do not break after Prepend characters. -7. Do not break within emoji modifier sequences (E_Base or E_Base_GAZ followed - by E_Modifier). Extend characters are allowed before the modifier; this - cannot be represented in this table, the code has to deal with it. +7. Do not break within emoji modifier sequences or emoji zwj sequences. That + is, do not break between characters with the Extended_Pictographic property. + Extend and ZWJ characters are allowed between the characters; this cannot be + represented in this table, the code has to deal with it. -8. Do not break within emoji zwj sequences (ZWJ followed by Glue_After_Zwj or - E_Base_GAZ). - -9. Do not break within emoji flag sequences. That is, do not break between +8. Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there are an odd number of RI characters before the break point. This table encodes "join RI characters"; the code has to deal with checking for previous adjoining RIs. -10. Otherwise, break everywhere. +9. Otherwise, break everywhere. */ #define ESZ (1<