Update to Unicode 10.0.0 and add callout_no_where to pcre2test to aid testing.

This commit is contained in:
Philip.Hazel 2017-07-02 16:32:01 +00:00
parent b7d5cee61f
commit 41bb787fb3
22 changed files with 6797 additions and 3360 deletions

View File

@ -209,6 +209,9 @@ much faster.
because this can give a fast "no match" without searching for a "required code because this can give a fast "no match" without searching for a "required code
unit". Previously only non-anchored patterns did this. unit". Previously only non-anchored patterns did this.
47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
48. Add the callout_no_where modifier to pcre2test.
Version 10.23 14-February-2017 Version 10.23 14-February-2017

View File

@ -171,7 +171,10 @@ library. They are also documented in the pcre2build man page.
give large performance improvements on certain platforms, add --enable-jit to give large performance improvements on certain platforms, add --enable-jit to
the "configure" command. This support is available only for certain hardware the "configure" command. This support is available only for certain hardware
architectures. If you try to enable it on an unsupported architecture, there architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error. will be a compile time error. If you are running under SELinux you may also
want to add --enable-jit-sealloc, which enables the use of an execmem
allocator in JIT that is compatible with SELinux. This has no effect if JIT
is not enabled.
. If you do not want to make use of the default support for UTF-8 Unicode . If you do not want to make use of the default support for UTF-8 Unicode
character strings in the 8-bit library, UTF-16 Unicode character strings in character strings in the 8-bit library, UTF-16 Unicode character strings in
@ -874,4 +877,4 @@ The distribution should contain the files listed below.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 11 April 2017 Last updated: 17 June 2017

View File

@ -170,8 +170,13 @@ Just-in-time (JIT) compiler support is included in the build by specifying
--enable-jit --enable-jit
</pre> </pre>
This support is available only for certain hardware architectures. If this This support is available only for certain hardware architectures. If this
option is set for an unsupported architecture, a building error occurs. option is set for an unsupported architecture, a building error occurs. If you
See the are running under SELinux you may also want to add
<pre>
--enable-jit-sealloc
</pre>
which enables the use of an execmem allocator in JIT that is compatible with
SELinux. This has no effect if JIT is not enabled. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for a discussion of JIT usage. When JIT support is enabled, documentation for a discussion of JIT usage. When JIT support is enabled,
pcre2grep automatically makes use of it, unless you add pcre2grep automatically makes use of it, unless you add
@ -516,7 +521,7 @@ contains a single function called LLVMFuzzerTestOneInput() whose arguments are
a pointer to a string and the length of the string. When called, this function a pointer to a string and the length of the string. When called, this function
tries to compile the string as a pattern, and if that succeeds, to match it. tries to compile the string as a pattern, and if that succeeds, to match it.
This is done both with no options and with some random options bits that are This is done both with no options and with some random options bits that are
generated from the string. generated from the string.
</P> </P>
<P> <P>
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b> Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
@ -529,13 +534,13 @@ file are the test string.
</P> </P>
<br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br> <br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br>
<P> <P>
In versions of PCRE2 prior to 10.30, there were two ways of handling In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the <b>pcre2_match()</b> function. The default was to use the backtracking in the <b>pcre2_match()</b> function. The default was to use the
system stack, but if system stack, but if
<pre> <pre>
--disable-stack-for-recursion --disable-stack-for-recursion
</pre> </pre>
was set, memory on the heap was used. From release 10.30 onwards this has was set, memory on the heap was used. From release 10.30 onwards this has
changed (the stack is no longer used) and this option now does nothing except changed (the stack is no longer used) and this option now does nothing except
give a warning. give a warning.
</P> </P>
@ -554,7 +559,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br> <br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 30 May 2017 Last updated: 17 June 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -755,6 +755,7 @@ Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is: "Common". The current list of scripts is:
</P> </P>
<P> <P>
Adlam,
Ahom, Ahom,
Anatolian_Hieroglyphs, Anatolian_Hieroglyphs,
Arabic, Arabic,
@ -765,6 +766,7 @@ Bamum,
Bassa_Vah, Bassa_Vah,
Batak, Batak,
Bengali, Bengali,
Bhaiksuki,
Bopomofo, Bopomofo,
Brahmi, Brahmi,
Braille, Braille,
@ -826,6 +828,8 @@ Mahajani,
Malayalam, Malayalam,
Mandaic, Mandaic,
Manichaean, Manichaean,
Marchen,
Masaram_Gondi,
Meetei_Mayek, Meetei_Mayek,
Mende_Kikakui, Mende_Kikakui,
Meroitic_Cursive, Meroitic_Cursive,
@ -838,7 +842,9 @@ Multani,
Myanmar, Myanmar,
Nabataean, Nabataean,
New_Tai_Lue, New_Tai_Lue,
Newa,
Nko, Nko,
Nushu,
Ogham, Ogham,
Ol_Chiki, Ol_Chiki,
Old_Hungarian, Old_Hungarian,
@ -849,6 +855,7 @@ Old_Persian,
Old_South_Arabian, Old_South_Arabian,
Old_Turkic, Old_Turkic,
Oriya, Oriya,
Osage,
Osmanya, Osmanya,
Pahawh_Hmong, Pahawh_Hmong,
Palmyrene, Palmyrene,
@ -866,6 +873,7 @@ Siddham,
SignWriting, SignWriting,
Sinhala, Sinhala,
Sora_Sompeng, Sora_Sompeng,
Soyombo,
Sundanese, Sundanese,
Syloti_Nagri, Syloti_Nagri,
Syriac, Syriac,
@ -876,6 +884,7 @@ Tai_Tham,
Tai_Viet, Tai_Viet,
Takri, Takri,
Tamil, Tamil,
Tangut,
Telugu, Telugu,
Thaana, Thaana,
Thai, Thai,
@ -885,7 +894,8 @@ Tirhuta,
Ugaritic, Ugaritic,
Vai, Vai,
Warang_Citi, Warang_Citi,
Yi. Yi,
Zanabazar_Square.
</P> </P>
<P> <P>
Each character has exactly one Unicode general category property, specified by Each character has exactly one Unicode general category property, specified by
@ -3445,7 +3455,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 30 May 2017 Last updated: 02 July 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -568,7 +568,7 @@ Setting compilation options
</b><br> </b><br>
<P> <P>
The following modifiers set options for <b>pcre2_compile()</b>. Most of them set The following modifiers set options for <b>pcre2_compile()</b>. Most of them set
bits in the options argument of that function, but those whose names start with bits in the options argument of that function, but those whose names start with
PCRE2_EXTRA are additional options that are set in the compile context. For the PCRE2_EXTRA are additional options that are set in the compile context. For the
main options, there are some single-letter abbreviations that are the same as main options, there are some single-letter abbreviations that are the same as
Perl options. There is special handling for /x: if a second x is present, Perl options. There is special handling for /x: if a second x is present,
@ -579,25 +579,25 @@ way <b>pcre2_compile()</b> behaves. See
for a description of the effects of these options. for a description of the effects of these options.
<pre> <pre>
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX alt_circumflex set PCRE2_ALT_CIRCUMFLEX
alt_verbnames set PCRE2_ALT_VERBNAMES alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS /i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL /s dotall set PCRE2_DOTALL
dupnames set PCRE2_DUPNAMES dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED /x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE /xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE match_line set PCRE2_EXTRA_MATCH_LINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE /m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP never_ucp set PCRE2_NEVER_UCP
@ -631,7 +631,7 @@ heavily used in the test files.
/B bincode show binary code without lengths /B bincode show binary code without lengths
callout_info show callout information callout_info show callout information
debug same as info,fullbincode debug same as info,fullbincode
framesize show matching frame size framesize show matching frame size
fullbincode show binary code with lengths fullbincode show binary code with lengths
/I info show info about compiled pattern /I info show info about compiled pattern
hex unquoted characters are hexadecimal hex unquoted characters are hexadecimal
@ -649,7 +649,7 @@ heavily used in the test files.
push push compiled pattern onto the stack push push compiled pattern onto the stack
pushcopy push a copy onto the stack pushcopy push a copy onto the stack
stackguard=&#60;number&#62; test the stackguard feature stackguard=&#60;number&#62; test the stackguard feature
subject_literal treat all subject lines as literal subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8 utf8_input treat input as UTF-8
@ -720,7 +720,7 @@ not necessarily the last character. These lines are omitted if no starting or
ending code units are recorded. ending code units are recorded.
</P> </P>
<P> <P>
The <b>framesize</b> modifier shows the size, in bytes, of the storage frames The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
used by <b>pcre2_match()</b> for handling backtracking. The size depends on the used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
number of capturing parentheses in the pattern. number of capturing parentheses in the pattern.
</P> </P>
@ -972,8 +972,8 @@ below. All other modifiers are either ignored, with a warning message, or cause
an error. an error.
</P> </P>
<P> <P>
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
REG_PEND extension is used to pass it by length. REG_PEND extension is used to pass it by length.
</P> </P>
<br><b> <br><b>
@ -1013,7 +1013,7 @@ are mutually exclusive.
Setting certain match controls Setting certain match controls
</b><br> </b><br>
<P> <P>
The following modifiers are really subject modifiers, and are described under The following modifiers are really subject modifiers, and are described under
"Subject Modifiers" below. However, they may be included in a pattern's "Subject Modifiers" below. However, they may be included in a pattern's
modifier list, in which case they are applied to every subject line that is modifier list, in which case they are applied to every subject line that is
processed with that pattern. They may not appear in <b>#pattern</b> commands. processed with that pattern. They may not appear in <b>#pattern</b> commands.
@ -1040,9 +1040,9 @@ defaults, set them in a <b>#subject</b> command.
Specifying literal subject lines Specifying literal subject lines
</b><br> </b><br>
<P> <P>
If the <b>subject_literal</b> modifier is present on a pattern, all the subject If the <b>subject_literal</b> modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a <b>#subject</b> command are recognized. that are set as defaults by a <b>#subject</b> command are recognized.
</P> </P>
<br><b> <br><b>
@ -1054,7 +1054,8 @@ pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
line to contain a new pattern (or a command) instead of a subject line. This line to contain a new pattern (or a command) instead of a subject line. This
facility is used when saving compiled patterns to a file, as described in the facility is used when saving compiled patterns to a file, as described in the
section entitled "Saving and restoring compiled patterns" section entitled "Saving and restoring compiled patterns"
<a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a> <a href="#saverestore">below.</a>
If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled
pattern is stacked, leaving the original as current, ready to match the pattern is stacked, leaving the original as current, ready to match the
following input lines. This provides a way of testing the following input lines. This provides a way of testing the
<b>pcre2_code_copy()</b> function. <b>pcre2_code_copy()</b> function.
@ -1103,18 +1104,18 @@ causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
<b>regexec()</b>. The other modifiers are ignored, with a warning message. <b>regexec()</b>. The other modifiers are ignored, with a warning message.
</P> </P>
<P> <P>
There is one additional modifier that can be used with the POSIX wrapper. It is There is one additional modifier that can be used with the POSIX wrapper. It is
ignored (with a warning) if used for non-POSIX matching. ignored (with a warning) if used for non-POSIX matching.
<pre> <pre>
posix_startend=&#60;n&#62;[:&#60;m&#62;] posix_startend=&#60;n&#62;[:&#60;m&#62;]
</pre> </pre>
This causes the subject string to be passed to <b>regexec()</b> using the This causes the subject string to be passed to <b>regexec()</b> using the
REG_STARTEND option, which uses offsets to specify which part of the string is REG_STARTEND option, which uses offsets to specify which part of the string is
searched. If only one number is given, the end offset is passed as the end of searched. If only one number is given, the end offset is passed as the end of
the subject string. For more detail of REG_STARTEND, see the the subject string. For more detail of REG_STARTEND, see the
<a href="pcre2posix.html"><b>pcre2posix</b></a> <a href="pcre2posix.html"><b>pcre2posix</b></a>
documentation. If the subject string contains binary zeros (coded as escapes documentation. If the subject string contains binary zeros (coded as escapes
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
its input), you must use <b>posix_startend</b> to specify its length. its input), you must use <b>posix_startend</b> to specify its length.
</P> </P>
<br><b> <br><b>
@ -1135,6 +1136,7 @@ pattern.
callout_data=&#60;n&#62; set a value to pass via callouts callout_data=&#60;n&#62; set a value to pass via callouts
callout_error=&#60;n&#62;[:&#60;m&#62;] control callout error callout_error=&#60;n&#62;[:&#60;m&#62;] control callout error
callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function callout_none do not supply a callout function
copy=&#60;number or name&#62; copy captured substring copy=&#60;number or name&#62; copy captured substring
depth_limit=&#60;n&#62; set a depth limit depth_limit=&#60;n&#62; set a depth limit
@ -1230,29 +1232,10 @@ Testing callouts
</b><br> </b><br>
<P> <P>
A callout function is supplied when <b>pcre2test</b> calls the library matching A callout function is supplied when <b>pcre2test</b> calls the library matching
functions, unless <b>callout_none</b> is specified. If <b>callout_capture</b> is functions, unless <b>callout_none</b> is specified. Its behaviour can be
set, the current captured groups are output when a callout occurs. The default controlled by various modifiers listed above whose names begin with
return from the callout function is zero, which allows matching to continue. <b>callout_</b>. Details are given in the section entitled "Callouts"
</P> <a href="#callouts">below.</a>
<P>
The <b>callout_fail</b> modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to backtrack)
when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;) are given, 1
is returned when callout &#60;n&#62; is reached and there have been at least &#60;m&#62;
callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence.
</P>
<P>
Note that callouts with string arguments are always given the number zero. See
"Callouts" below for a description of the output when a callout it taken.
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P> </P>
<br><b> <br><b>
Finding all matches in a string Finding all matches in a string
@ -1384,7 +1367,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Setting optimization is not being used. The value is a number of kilobytes. Setting
zero reverts to the default of 32K. Providing a stack that is larger than the zero reverts to the default of 32K. Providing a stack that is larger than the
default is necessary only for very complicated patterns. If <b>jitstack</b> is default is necessary only for very complicated patterns. If <b>jitstack</b> is
set non-zero on a subject line it overrides any value that was set on the set non-zero on a subject line it overrides any value that was set on the
pattern. pattern.
</P> </P>
<br><b> <br><b>
@ -1414,7 +1397,7 @@ The <i>match_limit</i> number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with numbers of matching possibilities, it can become large very quickly with
increasing length of subject string. increasing length of subject string.
</P> </P>
<P> <P>
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
@ -1660,7 +1643,7 @@ restart the match with additional subject data by means of the
For further information about partial matching, see the For further information about partial matching, see the
<a href="pcre2partial.html"><b>pcre2partial</b></a> <a href="pcre2partial.html"><b>pcre2partial</b></a>
documentation. documentation.
</P> <a name="callouts"></a></P>
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br> <br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
<P> <P>
If the pattern contains any callout requests, <b>pcre2test</b>'s callout If the pattern contains any callout requests, <b>pcre2test</b>'s callout
@ -1669,8 +1652,33 @@ This works with both matching functions.
</P> </P>
<P> <P>
The callout function in <b>pcre2test</b> returns zero (carry on matching) by The callout function in <b>pcre2test</b> returns zero (carry on matching) by
default, but you can use a <b>callout_fail</b> modifier in a subject line (as default, but you can use a <b>callout_fail</b> modifier in a subject line to
described above) to change this and other parameters of the callout. change this and other parameters of the callout.
</P>
<P>
If <b>callout_capture</b> is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
<b>callout_no_where</b> modifier is set.
</P>
<P>
The default return from the callout function is zero, which allows matching to
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence. Note that callouts with string arguments
are always given the number zero. See
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P> </P>
<P> <P>
Inserting callouts can be helpful when using <b>pcre2test</b> to check Inserting callouts can be helpful when using <b>pcre2test</b> to check
@ -1858,7 +1866,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 16 June 2017 Last updated: 02 July 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -3543,9 +3543,14 @@ JUST-IN-TIME COMPILER SUPPORT
This support is available only for certain hardware architectures. If This support is available only for certain hardware architectures. If
this option is set for an unsupported architecture, a building error this option is set for an unsupported architecture, a building error
occurs. See the pcre2jit documentation for a discussion of JIT usage. occurs. If you are running under SELinux you may also want to add
When JIT support is enabled, pcre2grep automatically makes use of it,
unless you add --enable-jit-sealloc
which enables the use of an execmem allocator in JIT that is compatible
with SELinux. This has no effect if JIT is not enabled. See the
pcre2jit documentation for a discussion of JIT usage. When JIT support
is enabled, pcre2grep automatically makes use of it, unless you add
--disable-pcre2grep-jit --disable-pcre2grep-jit
@ -3554,14 +3559,14 @@ JUST-IN-TIME COMPILER SUPPORT
NEWLINE RECOGNITION NEWLINE RECOGNITION
By default, PCRE2 interprets the linefeed (LF) character as indicating By default, PCRE2 interprets the linefeed (LF) character as indicating
the end of a line. This is the normal newline character on Unix-like the end of a line. This is the normal newline character on Unix-like
systems. You can compile PCRE2 to use carriage return (CR) instead, by systems. You can compile PCRE2 to use carriage return (CR) instead, by
adding adding
--enable-newline-is-cr --enable-newline-is-cr
to the configure command. There is also an --enable-newline-is-lf to the configure command. There is also an --enable-newline-is-lf
option, which explicitly specifies linefeed as the newline character. option, which explicitly specifies linefeed as the newline character.
Alternatively, you can specify that line endings are to be indicated by Alternatively, you can specify that line endings are to be indicated by
@ -3574,104 +3579,104 @@ NEWLINE RECOGNITION
--enable-newline-is-anycrlf --enable-newline-is-anycrlf
which causes PCRE2 to recognize any of the three sequences CR, LF, or which causes PCRE2 to recognize any of the three sequences CR, LF, or
CRLF as indicating a line ending. Finally, a fifth option, specified by CRLF as indicating a line ending. Finally, a fifth option, specified by
--enable-newline-is-any --enable-newline-is-any
causes PCRE2 to recognize any Unicode newline sequence. The Unicode causes PCRE2 to recognize any Unicode newline sequence. The Unicode
newline sequences are the three just mentioned, plus the single charac- newline sequences are the three just mentioned, plus the single charac-
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+0085), LS (line separator, U+2028), and PS (paragraph separator,
U+2029). U+2029).
Whatever default line ending convention is selected when PCRE2 is built Whatever default line ending convention is selected when PCRE2 is built
can be overridden by applications that use the library. At build time can be overridden by applications that use the library. At build time
it is conventional to use the standard for your operating system. it is conventional to use the standard for your operating system.
WHAT \R MATCHES WHAT \R MATCHES
By default, the sequence \R in a pattern matches any Unicode newline By default, the sequence \R in a pattern matches any Unicode newline
sequence, independently of what has been selected as the line ending sequence, independently of what has been selected as the line ending
sequence. If you specify sequence. If you specify
--enable-bsr-anycrlf --enable-bsr-anycrlf
the default is changed so that \R matches only CR, LF, or CRLF. What- the default is changed so that \R matches only CR, LF, or CRLF. What-
ever is selected when PCRE2 is built can be overridden by applications ever is selected when PCRE2 is built can be overridden by applications
that use the library. that use the library.
HANDLING VERY LARGE PATTERNS HANDLING VERY LARGE PATTERNS
Within a compiled pattern, offset values are used to point from one Within a compiled pattern, offset values are used to point from one
part to another (for example, from an opening parenthesis to an alter- part to another (for example, from an opening parenthesis to an alter-
nation metacharacter). By default, in the 8-bit and 16-bit libraries, nation metacharacter). By default, in the 8-bit and 16-bit libraries,
two-byte values are used for these offsets, leading to a maximum size two-byte values are used for these offsets, leading to a maximum size
for a compiled pattern of around 64K code units. This is sufficient to for a compiled pattern of around 64K code units. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do handle all but the most gigantic patterns. Nevertheless, some people do
want to process truly enormous patterns, so it is possible to compile want to process truly enormous patterns, so it is possible to compile
PCRE2 to use three-byte or four-byte offsets by adding a setting such PCRE2 to use three-byte or four-byte offsets by adding a setting such
as as
--with-link-size=3 --with-link-size=3
to the configure command. The value given must be 2, 3, or 4. For the to the configure command. The value given must be 2, 3, or 4. For the
16-bit library, a value of 3 is rounded up to 4. In these libraries, 16-bit library, a value of 3 is rounded up to 4. In these libraries,
using longer offsets slows down the operation of PCRE2 because it has using longer offsets slows down the operation of PCRE2 because it has
to load additional data when handling them. For the 32-bit library the to load additional data when handling them. For the 32-bit library the
value is always 4 and cannot be overridden; the value of --with-link- value is always 4 and cannot be overridden; the value of --with-link-
size is ignored. size is ignored.
LIMITING PCRE2 RESOURCE USAGE LIMITING PCRE2 RESOURCE USAGE
The pcre2_match() function increments a counter each time it goes round The pcre2_match() function increments a counter each time it goes round
its main loop. Putting a limit on this counter controls the amount of its main loop. Putting a limit on this counter controls the amount of
computing resource used by a single call to pcre2_match(). The limit computing resource used by a single call to pcre2_match(). The limit
can be changed at run time, as described in the pcre2api documentation. can be changed at run time, as described in the pcre2api documentation.
The default is 10 million, but this can be changed by adding a setting The default is 10 million, but this can be changed by adding a setting
such as such as
--with-match-limit=500000 --with-match-limit=500000
to the configure command. This setting also applies to the to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the pcre2_dfa_match() matching function, and to JIT matching (though the
counting is done differently). counting is done differently).
The pcre2_match() function starts out using a 20K vector on the system The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking stack to record backtracking points. The more nested backtracking
points there are (that is, the deeper the search tree), the more memory points there are (that is, the deeper the search tree), the more memory
is needed. If the initial vector is not large enough, heap memory is is needed. If the initial vector is not large enough, heap memory is
used, up to a certain limit, which is specified in kilobytes. The limit used, up to a certain limit, which is specified in kilobytes. The limit
can be changed at run time, as described in the pcre2api documentation. can be changed at run time, as described in the pcre2api documentation.
The default limit (in effect unlimited) is 20 million. You can change The default limit (in effect unlimited) is 20 million. You can change
this by a setting such as this by a setting such as
--with-heap-limit=500 --with-heap-limit=500
which limits the amount of heap to 500 kilobytes. This limit applies which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply JIT (which has its own memory arrangements) is used, nor does it apply
to pcre2_dfa_match(). to pcre2_dfa_match().
You can also explicitly limit the depth of nested backtracking in the You can also explicitly limit the depth of nested backtracking in the
pcre2_match() interpreter. This limit defaults to the value that is set pcre2_match() interpreter. This limit defaults to the value that is set
for --with-match-limit. You can set a lower default limit by adding, for --with-match-limit. You can set a lower default limit by adding,
for example, for example,
--with-match-limit_depth=10000 --with-match-limit_depth=10000
to the configure command. This value can be overridden at run time. to the configure command. This value can be overridden at run time.
This depth limit indirectly limits the amount of heap memory that is This depth limit indirectly limits the amount of heap memory that is
used, but because the size of each backtracking "frame" depends on the used, but because the size of each backtracking "frame" depends on the
number of capturing parentheses in a pattern, the amount of heap that number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern. is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function This limit was more useful in versions before 10.30, where function
recursion was used for backtracking. However, as well as applying to recursion was used for backtracking. However, as well as applying to
pcre2_match(), this limit also controls the depth of recursive function pcre2_match(), this limit also controls the depth of recursive function
calls in pcre2_dfa_match(). These are used for lookaround assertions, calls in pcre2_dfa_match(). These are used for lookaround assertions,
atomic groups, and recursion within patterns. The limit does not apply atomic groups, and recursion within patterns. The limit does not apply
to JIT matching. to JIT matching.
@ -3680,45 +3685,45 @@ CREATING CHARACTER TABLES AT BUILD TIME
PCRE2 uses fixed tables for processing characters whose code points are PCRE2 uses fixed tables for processing characters whose code points are
less than 256. By default, PCRE2 is built with a set of tables that are less than 256. By default, PCRE2 is built with a set of tables that are
distributed in the file src/pcre2_chartables.c.dist. These tables are distributed in the file src/pcre2_chartables.c.dist. These tables are
for ASCII codes only. If you add for ASCII codes only. If you add
--enable-rebuild-chartables --enable-rebuild-chartables
to the configure command, the distributed tables are no longer used. to the configure command, the distributed tables are no longer used.
Instead, a program called dftables is compiled and run. This outputs Instead, a program called dftables is compiled and run. This outputs
the source for new set of tables, created in the default locale of your the source for new set of tables, created in the default locale of your
C run-time system. This method of replacing the tables does not work if C run-time system. This method of replacing the tables does not work if
you are cross compiling, because dftables is run on the local host. If you are cross compiling, because dftables is run on the local host. If
you need to create alternative tables when cross compiling, you will you need to create alternative tables when cross compiling, you will
have to do so "by hand". have to do so "by hand".
USING EBCDIC CODE USING EBCDIC CODE
PCRE2 assumes by default that it will run in an environment where the PCRE2 assumes by default that it will run in an environment where the
character code is ASCII or Unicode, which is a superset of ASCII. This character code is ASCII or Unicode, which is a superset of ASCII. This
is the case for most computer operating systems. PCRE2 can, however, be is the case for most computer operating systems. PCRE2 can, however, be
compiled to run in an 8-bit EBCDIC environment by adding compiled to run in an 8-bit EBCDIC environment by adding
--enable-ebcdic --disable-unicode --enable-ebcdic --disable-unicode
to the configure command. This setting implies --enable-rebuild-charta- to the configure command. This setting implies --enable-rebuild-charta-
bles. You should only use it if you know that you are in an EBCDIC bles. You should only use it if you know that you are in an EBCDIC
environment (for example, an IBM mainframe operating system). environment (for example, an IBM mainframe operating system).
It is not possible to support both EBCDIC and UTF-8 codes in the same It is not possible to support both EBCDIC and UTF-8 codes in the same
version of the library. Consequently, --enable-unicode and --enable- version of the library. Consequently, --enable-unicode and --enable-
ebcdic are mutually exclusive. ebcdic are mutually exclusive.
The EBCDIC character that corresponds to an ASCII LF is assumed to have The EBCDIC character that corresponds to an ASCII LF is assumed to have
the value 0x15 by default. However, in some EBCDIC environments, 0x25 the value 0x15 by default. However, in some EBCDIC environments, 0x25
is used. In such an environment you should use is used. In such an environment you should use
--enable-ebcdic-nl25 --enable-ebcdic-nl25
as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
0x25 is not chosen as LF is made to correspond to the Unicode NEL char- 0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
acter (which, in Unicode, is 0x85). acter (which, in Unicode, is 0x85).
@ -3731,34 +3736,34 @@ PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS
By default, on non-Windows systems, pcre2grep supports the use of call- By default, on non-Windows systems, pcre2grep supports the use of call-
outs with string arguments within the patterns it is matching, in order outs with string arguments within the patterns it is matching, in order
to run external scripts. For details, see the pcre2grep documentation. to run external scripts. For details, see the pcre2grep documentation.
This support can be disabled by adding --disable-pcre2grep-callout to This support can be disabled by adding --disable-pcre2grep-callout to
the configure command. the configure command.
PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT
By default, pcre2grep reads all files as plain text. You can build it By default, pcre2grep reads all files as plain text. You can build it
so that it recognizes files whose names end in .gz or .bz2, and reads so that it recognizes files whose names end in .gz or .bz2, and reads
them with libz or libbz2, respectively, by adding one or both of them with libz or libbz2, respectively, by adding one or both of
--enable-pcre2grep-libz --enable-pcre2grep-libz
--enable-pcre2grep-libbz2 --enable-pcre2grep-libbz2
to the configure command. These options naturally require that the rel- to the configure command. These options naturally require that the rel-
evant libraries are installed on your system. Configuration will fail evant libraries are installed on your system. Configuration will fail
if they are not. if they are not.
PCRE2GREP BUFFER SIZE PCRE2GREP BUFFER SIZE
pcre2grep uses an internal buffer to hold a "window" on the file it is pcre2grep uses an internal buffer to hold a "window" on the file it is
scanning, in order to be able to output "before" and "after" lines when scanning, in order to be able to output "before" and "after" lines when
it finds a match. The starting size of the buffer is controlled by a it finds a match. The starting size of the buffer is controlled by a
parameter whose default value is 20K. The buffer itself is three times parameter whose default value is 20K. The buffer itself is three times
this size, but because of the way it is used for holding "before" this size, but because of the way it is used for holding "before"
lines, the longest line that is guaranteed to be processable is the lines, the longest line that is guaranteed to be processable is the
parameter size. If a longer line is encountered, pcre2grep automati- parameter size. If a longer line is encountered, pcre2grep automati-
cally expands the buffer, up to a specified maximum size, whose default cally expands the buffer, up to a specified maximum size, whose default
is 1M or the starting size, whichever is the larger. You can change the is 1M or the starting size, whichever is the larger. You can change the
default parameter values by adding, for example, default parameter values by adding, for example,
@ -3766,8 +3771,8 @@ PCRE2GREP BUFFER SIZE
--with-pcre2grep-bufsize=51200 --with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152 --with-pcre2grep-max-bufsize=2097152
to the configure command. The caller of pcre2grep can override these to the configure command. The caller of pcre2grep can override these
values by using --buffer-size and --max-buffer-size on the command values by using --buffer-size and --max-buffer-size on the command
line. line.
@ -3778,26 +3783,26 @@ PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
--enable-pcre2test-libreadline --enable-pcre2test-libreadline
--enable-pcre2test-libedit --enable-pcre2test-libedit
to the configure command, pcre2test is linked with the libreadline to the configure command, pcre2test is linked with the libreadline
orlibedit library, respectively, and when its input is from a terminal, orlibedit library, respectively, and when its input is from a terminal,
it reads it using the readline() function. This provides line-editing it reads it using the readline() function. This provides line-editing
and history facilities. Note that libreadline is GPL-licensed, so if and history facilities. Note that libreadline is GPL-licensed, so if
you distribute a binary of pcre2test linked in this way, there may be you distribute a binary of pcre2test linked in this way, there may be
licensing issues. These can be avoided by linking instead with libedit, licensing issues. These can be avoided by linking instead with libedit,
which has a BSD licence. which has a BSD licence.
Setting --enable-pcre2test-libreadline causes the -lreadline option to Setting --enable-pcre2test-libreadline causes the -lreadline option to
be added to the pcre2test build. In many operating environments with a be added to the pcre2test build. In many operating environments with a
sytem-installed readline library this is sufficient. However, in some sytem-installed readline library this is sufficient. However, in some
environments (e.g. if an unmodified distribution version of readline is environments (e.g. if an unmodified distribution version of readline is
in use), some extra configuration may be necessary. The INSTALL file in use), some extra configuration may be necessary. The INSTALL file
for libreadline says this: for libreadline says this:
"Readline uses the termcap functions, but does not link with "Readline uses the termcap functions, but does not link with
the termcap or curses library itself, allowing applications the termcap or curses library itself, allowing applications
which link with readline the to choose an appropriate library." which link with readline the to choose an appropriate library."
If your environment has not been set up so that an appropriate library If your environment has not been set up so that an appropriate library
is automatically included, you may need to add something like is automatically included, you may need to add something like
LIBS="-ncurses" LIBS="-ncurses"
@ -3811,7 +3816,7 @@ INCLUDING DEBUGGING CODE
--enable-debug --enable-debug
to the configure command, additional debugging code is included in the to the configure command, additional debugging code is included in the
build. This feature is intended for use by the PCRE2 maintainers. build. This feature is intended for use by the PCRE2 maintainers.
@ -3821,15 +3826,15 @@ DEBUGGING WITH VALGRIND SUPPORT
--enable-valgrind --enable-valgrind
to the configure command, PCRE2 will use valgrind annotations to mark to the configure command, PCRE2 will use valgrind annotations to mark
certain memory regions as unaddressable. This allows it to detect certain memory regions as unaddressable. This allows it to detect
invalid memory accesses, and is mostly useful for debugging PCRE2 invalid memory accesses, and is mostly useful for debugging PCRE2
itself. itself.
CODE COVERAGE REPORTING CODE COVERAGE REPORTING
If your C compiler is gcc, you can build a version of PCRE2 that can If your C compiler is gcc, you can build a version of PCRE2 that can
generate a code coverage report for its test suite. To enable this, you generate a code coverage report for its test suite. To enable this, you
must install lcov version 1.6 or above. Then specify must install lcov version 1.6 or above. Then specify
@ -3838,20 +3843,20 @@ CODE COVERAGE REPORTING
to the configure command and build PCRE2 in the usual way. to the configure command and build PCRE2 in the usual way.
Note that using ccache (a caching C compiler) is incompatible with code Note that using ccache (a caching C compiler) is incompatible with code
coverage reporting. If you have configured ccache to run automatically coverage reporting. If you have configured ccache to run automatically
on your system, you must set the environment variable on your system, you must set the environment variable
CCACHE_DISABLE=1 CCACHE_DISABLE=1
before running make to build PCRE2, so that ccache is not used. before running make to build PCRE2, so that ccache is not used.
When --enable-coverage is used, the following addition targets are When --enable-coverage is used, the following addition targets are
added to the Makefile: added to the Makefile:
make coverage make coverage
This creates a fresh coverage report for the PCRE2 test suite. It is This creates a fresh coverage report for the PCRE2 test suite. It is
equivalent to running "make coverage-reset", "make coverage-baseline", equivalent to running "make coverage-reset", "make coverage-baseline",
"make check", and then "make coverage-report". "make check", and then "make coverage-report".
make coverage-reset make coverage-reset
@ -3868,56 +3873,56 @@ CODE COVERAGE REPORTING
make coverage-clean-report make coverage-clean-report
This removes the generated coverage report without cleaning the cover- This removes the generated coverage report without cleaning the cover-
age data itself. age data itself.
make coverage-clean-data make coverage-clean-data
This removes the captured coverage data without removing the coverage This removes the captured coverage data without removing the coverage
files created at compile time (*.gcno). files created at compile time (*.gcno).
make coverage-clean make coverage-clean
This cleans all coverage data including the generated coverage report. This cleans all coverage data including the generated coverage report.
For more information about code coverage, see the gcov and lcov docu- For more information about code coverage, see the gcov and lcov docu-
mentation. mentation.
SUPPORT FOR FUZZERS SUPPORT FOR FUZZERS
There is a special option for use by people who want to run fuzzing There is a special option for use by people who want to run fuzzing
tests on PCRE2: tests on PCRE2:
--enable-fuzz-support --enable-fuzz-support
At present this applies only to the 8-bit library. If set, it causes an At present this applies only to the 8-bit library. If set, it causes an
extra library called libpcre2-fuzzsupport.a to be built, but not extra library called libpcre2-fuzzsupport.a to be built, but not
installed. This contains a single function called LLVMFuzzerTestOneIn- installed. This contains a single function called LLVMFuzzerTestOneIn-
put() whose arguments are a pointer to a string and the length of the put() whose arguments are a pointer to a string and the length of the
string. When called, this function tries to compile the string as a string. When called, this function tries to compile the string as a
pattern, and if that succeeds, to match it. This is done both with no pattern, and if that succeeds, to match it. This is done both with no
options and with some random options bits that are generated from the options and with some random options bits that are generated from the
string. string.
Setting --enable-fuzz-support also causes a binary called pcre2fuz- Setting --enable-fuzz-support also causes a binary called pcre2fuz-
zcheck to be created. This is normally run under valgrind or used when zcheck to be created. This is normally run under valgrind or used when
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
function and outputs information about it is doing. The input strings function and outputs information about it is doing. The input strings
are specified by arguments: if an argument starts with "=" the rest of are specified by arguments: if an argument starts with "=" the rest of
it is a literal input string. Otherwise, it is assumed to be a file it is a literal input string. Otherwise, it is assumed to be a file
name, and the contents of the file are the test string. name, and the contents of the file are the test string.
OBSOLETE OPTION OBSOLETE OPTION
In versions of PCRE2 prior to 10.30, there were two ways of handling In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the pcre2_match() function. The default was to use the backtracking in the pcre2_match() function. The default was to use the
system stack, but if system stack, but if
--disable-stack-for-recursion --disable-stack-for-recursion
was set, memory on the heap was used. From release 10.30 onwards this was set, memory on the heap was used. From release 10.30 onwards this
has changed (the stack is no longer used) and this option now does has changed (the stack is no longer used) and this option now does
nothing except give a warning. nothing except give a warning.
@ -3935,7 +3940,7 @@ AUTHOR
REVISION REVISION
Last updated: 30 May 2017 Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -6309,26 +6314,28 @@ BACKSLASH
Those that are not part of an identified script are lumped together as Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is: "Common". The current list of scripts is:
Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Balinese, Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali-
Bamum, Bassa_Vah, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese, nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
Buhid, Canadian_Aboriginal, Carian, Caucasian_Albanian, Chakma, Cham, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba-
Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot,
Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan, Ethiopic, Geor- Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan,
gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gurmukhi, Han, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gur-
Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited, mukhi, Han, Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Ara-
Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan- maic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,
nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Kho-
Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha- jki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu,
jani, Malayalam, Mandaic, Manichaean, Meetei_Mayek, Mende_Kikakui, Lycian, Lydian, Mahajani, Malayalam, Mandaic, Manichaean, Marchen,
Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Masaram_Gondi, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Multani, Myanmar, Nabataean, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar,
Old_Hungarian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
Old_South_Arabian, Old_Turkic, Oriya, Osmanya, Pahawh_Hmong, Palmyrene, ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang, Runic, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pahawh_Hmong,
Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, Sinhala, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang,
Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Runic, Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting,
Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, Thaana, Thai, Sinhala, Sora_Sompeng, Soyombo, Sundanese, Syloti_Nagri, Syriac, Taga-
Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi. log, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Tangut, Tel-
ugu, Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai,
Warang_Citi, Yi, Zanabazar_Square.
Each character has exactly one Unicode general category property, spec- Each character has exactly one Unicode general category property, spec-
ified by a two-letter abbreviation. For compatibility with Perl, nega- ified by a two-letter abbreviation. For compatibility with Perl, nega-
@ -8737,7 +8744,7 @@ AUTHOR
REVISION REVISION
Last updated: 30 May 2017 Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30" .TH PCRE2PATTERN 3 "02 July 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -754,6 +754,7 @@ example:
Those that are not part of an identified script are lumped together as Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is: "Common". The current list of scripts is:
.P .P
Adlam,
Ahom, Ahom,
Anatolian_Hieroglyphs, Anatolian_Hieroglyphs,
Arabic, Arabic,
@ -764,6 +765,7 @@ Bamum,
Bassa_Vah, Bassa_Vah,
Batak, Batak,
Bengali, Bengali,
Bhaiksuki,
Bopomofo, Bopomofo,
Brahmi, Brahmi,
Braille, Braille,
@ -825,6 +827,8 @@ Mahajani,
Malayalam, Malayalam,
Mandaic, Mandaic,
Manichaean, Manichaean,
Marchen,
Masaram_Gondi,
Meetei_Mayek, Meetei_Mayek,
Mende_Kikakui, Mende_Kikakui,
Meroitic_Cursive, Meroitic_Cursive,
@ -837,7 +841,9 @@ Multani,
Myanmar, Myanmar,
Nabataean, Nabataean,
New_Tai_Lue, New_Tai_Lue,
Newa,
Nko, Nko,
Nushu,
Ogham, Ogham,
Ol_Chiki, Ol_Chiki,
Old_Hungarian, Old_Hungarian,
@ -848,6 +854,7 @@ Old_Persian,
Old_South_Arabian, Old_South_Arabian,
Old_Turkic, Old_Turkic,
Oriya, Oriya,
Osage,
Osmanya, Osmanya,
Pahawh_Hmong, Pahawh_Hmong,
Palmyrene, Palmyrene,
@ -865,6 +872,7 @@ Siddham,
SignWriting, SignWriting,
Sinhala, Sinhala,
Sora_Sompeng, Sora_Sompeng,
Soyombo,
Sundanese, Sundanese,
Syloti_Nagri, Syloti_Nagri,
Syriac, Syriac,
@ -875,6 +883,7 @@ Tai_Tham,
Tai_Viet, Tai_Viet,
Takri, Takri,
Tamil, Tamil,
Tangut,
Telugu, Telugu,
Thaana, Thaana,
Thai, Thai,
@ -884,7 +893,8 @@ Tirhuta,
Ugaritic, Ugaritic,
Vai, Vai,
Warang_Citi, Warang_Citi,
Yi. Yi,
Zanabazar_Square.
.P .P
Each character has exactly one Unicode general category property, specified by Each character has exactly one Unicode general category property, specified by
a two-letter abbreviation. For compatibility with Perl, negation can be a two-letter abbreviation. For compatibility with Perl, negation can be
@ -3475,6 +3485,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 30 May 2017 Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "16 June 2017" "PCRE 10.30" .TH PCRE2TEST 1 "02 July 2017" "PCRE 10.30"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -527,7 +527,7 @@ by a previous \fB#pattern\fP command.
.rs .rs
.sp .sp
The following modifiers set options for \fBpcre2_compile()\fP. Most of them set The following modifiers set options for \fBpcre2_compile()\fP. Most of them set
bits in the options argument of that function, but those whose names start with bits in the options argument of that function, but those whose names start with
PCRE2_EXTRA are additional options that are set in the compile context. For the PCRE2_EXTRA are additional options that are set in the compile context. For the
main options, there are some single-letter abbreviations that are the same as main options, there are some single-letter abbreviations that are the same as
Perl options. There is special handling for /x: if a second x is present, Perl options. There is special handling for /x: if a second x is present,
@ -540,25 +540,25 @@ way \fBpcre2_compile()\fP behaves. See
for a description of the effects of these options. for a description of the effects of these options.
.sp .sp
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX alt_circumflex set PCRE2_ALT_CIRCUMFLEX
alt_verbnames set PCRE2_ALT_VERBNAMES alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS /i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL /s dotall set PCRE2_DOTALL
dupnames set PCRE2_DUPNAMES dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED /x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE /xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE match_line set PCRE2_EXTRA_MATCH_LINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE /m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP never_ucp set PCRE2_NEVER_UCP
@ -593,7 +593,7 @@ heavily used in the test files.
/B bincode show binary code without lengths /B bincode show binary code without lengths
callout_info show callout information callout_info show callout information
debug same as info,fullbincode debug same as info,fullbincode
framesize show matching frame size framesize show matching frame size
fullbincode show binary code with lengths fullbincode show binary code with lengths
/I info show info about compiled pattern /I info show info about compiled pattern
hex unquoted characters are hexadecimal hex unquoted characters are hexadecimal
@ -611,7 +611,7 @@ heavily used in the test files.
push push compiled pattern onto the stack push push compiled pattern onto the stack
pushcopy push a copy onto the stack pushcopy push a copy onto the stack
stackguard=<number> test the stackguard feature stackguard=<number> test the stackguard feature
subject_literal treat all subject lines as literal subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8 utf8_input treat input as UTF-8
@ -677,7 +677,7 @@ unit" is the last literal code unit that must be present in any match. This is
not necessarily the last character. These lines are omitted if no starting or not necessarily the last character. These lines are omitted if no starting or
ending code units are recorded. ending code units are recorded.
.P .P
The \fBframesize\fP modifier shows the size, in bytes, of the storage frames The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
used by \fBpcre2_match()\fP for handling backtracking. The size depends on the used by \fBpcre2_match()\fP for handling backtracking. The size depends on the
number of capturing parentheses in the pattern. number of capturing parentheses in the pattern.
.P .P
@ -934,8 +934,8 @@ The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described
below. All other modifiers are either ignored, with a warning message, or cause below. All other modifiers are either ignored, with a warning message, or cause
an error. an error.
.P .P
The pattern is passed to \fBregcomp()\fP as a zero-terminated string by The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
REG_PEND extension is used to pass it by length. REG_PEND extension is used to pass it by length.
. .
. .
@ -977,7 +977,7 @@ are mutually exclusive.
.SS "Setting certain match controls" .SS "Setting certain match controls"
.rs .rs
.sp .sp
The following modifiers are really subject modifiers, and are described under The following modifiers are really subject modifiers, and are described under
"Subject Modifiers" below. However, they may be included in a pattern's "Subject Modifiers" below. However, they may be included in a pattern's
modifier list, in which case they are applied to every subject line that is modifier list, in which case they are applied to every subject line that is
processed with that pattern. They may not appear in \fB#pattern\fP commands. processed with that pattern. They may not appear in \fB#pattern\fP commands.
@ -1004,9 +1004,9 @@ defaults, set them in a \fB#subject\fP command.
.SS "Specifying literal subject lines" .SS "Specifying literal subject lines"
.rs .rs
.sp .sp
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a \fB#subject\fP command are recognized. that are set as defaults by a \fB#subject\fP command are recognized.
. .
. .
@ -1020,7 +1020,9 @@ facility is used when saving compiled patterns to a file, as described in the
section entitled "Saving and restoring compiled patterns" section entitled "Saving and restoring compiled patterns"
.\" HTML <a href="#saverestore"> .\" HTML <a href="#saverestore">
.\" </a> .\" </a>
below. If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled below.
.\"
If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
pattern is stacked, leaving the original as current, ready to match the pattern is stacked, leaving the original as current, ready to match the
following input lines. This provides a way of testing the following input lines. This provides a way of testing the
\fBpcre2_code_copy()\fP function. \fBpcre2_code_copy()\fP function.
@ -1073,10 +1075,10 @@ that have any effect are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP,
causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
\fBregexec()\fP. The other modifiers are ignored, with a warning message. \fBregexec()\fP. The other modifiers are ignored, with a warning message.
.P .P
There is one additional modifier that can be used with the POSIX wrapper. It is There is one additional modifier that can be used with the POSIX wrapper. It is
ignored (with a warning) if used for non-POSIX matching. ignored (with a warning) if used for non-POSIX matching.
.sp .sp
posix_startend=<n>[:<m>] posix_startend=<n>[:<m>]
.sp .sp
This causes the subject string to be passed to \fBregexec()\fP using the This causes the subject string to be passed to \fBregexec()\fP using the
REG_STARTEND option, which uses offsets to specify which part of the string is REG_STARTEND option, which uses offsets to specify which part of the string is
@ -1085,8 +1087,8 @@ the subject string. For more detail of REG_STARTEND, see the
.\" HREF .\" HREF
\fBpcre2posix\fP \fBpcre2posix\fP
.\" .\"
documentation. If the subject string contains binary zeros (coded as escapes documentation. If the subject string contains binary zeros (coded as escapes
such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in
its input), you must use \fBposix_startend\fP to specify its length. its input), you must use \fBposix_startend\fP to specify its length.
. .
. .
@ -1107,6 +1109,7 @@ pattern.
callout_data=<n> set a value to pass via callouts callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error callout_error=<n>[:<m>] control callout error
callout_fail=<n>[:<m>] control callout failure callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function callout_none do not supply a callout function
copy=<number or name> copy captured substring copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit depth_limit=<n> set a depth limit
@ -1200,26 +1203,13 @@ does no capturing); it is ignored, with a warning message, if present.
.rs .rs
.sp .sp
A callout function is supplied when \fBpcre2test\fP calls the library matching A callout function is supplied when \fBpcre2test\fP calls the library matching
functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is functions, unless \fBcallout_none\fP is specified. Its behaviour can be
set, the current captured groups are output when a callout occurs. The default controlled by various modifiers listed above whose names begin with
return from the callout function is zero, which allows matching to continue. \fBcallout_\fP. Details are given in the section entitled "Callouts"
.P .\" HTML <a href="#callouts">
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is .\" </a>
only one number, 1 is returned instead of 0 (causing matching to backtrack) below.
when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1 .\"
is returned when callout <n> is reached and there have been at least <m>
callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence.
.P
Note that callouts with string arguments are always given the number zero. See
"Callouts" below for a description of the output when a callout it taken.
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
. .
. .
.SS "Finding all matches in a string" .SS "Finding all matches in a string"
@ -1344,7 +1334,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Setting optimization is not being used. The value is a number of kilobytes. Setting
zero reverts to the default of 32K. Providing a stack that is larger than the zero reverts to the default of 32K. Providing a stack that is larger than the
default is necessary only for very complicated patterns. If \fBjitstack\fP is default is necessary only for very complicated patterns. If \fBjitstack\fP is
set non-zero on a subject line it overrides any value that was set on the set non-zero on a subject line it overrides any value that was set on the
pattern. pattern.
. .
. .
@ -1372,7 +1362,7 @@ The \fImatch_limit\fP number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with numbers of matching possibilities, it can become large very quickly with
increasing length of subject string. increasing length of subject string.
.P .P
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
much nested backtracking happens (that is, how deeply the pattern's tree is much nested backtracking happens (that is, how deeply the pattern's tree is
@ -1625,6 +1615,7 @@ For further information about partial matching, see the
documentation. documentation.
. .
. .
.\" HTML <a name="callouts"></a>
.SH CALLOUTS .SH CALLOUTS
.rs .rs
.sp .sp
@ -1633,8 +1624,30 @@ function is called during matching unless \fBcallout_none\fP is specified.
This works with both matching functions. This works with both matching functions.
.P .P
The callout function in \fBpcre2test\fP returns zero (carry on matching) by The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as default, but you can use a \fBcallout_fail\fP modifier in a subject line to
described above) to change this and other parameters of the callout. change this and other parameters of the callout.
.P
If \fBcallout_capture\fP is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
\fBcallout_no_where\fP modifier is set.
.P
The default return from the callout function is zero, which allows matching to
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
are given, 1 is returned when callout <n> is reached and there have been at
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
are always given the number zero. See
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
.P .P
Inserting callouts can be helpful when using \fBpcre2test\fP to check Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see complicated regular expressions. For further information about callouts, see
@ -1837,6 +1850,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 16 June 2017 Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -943,7 +943,7 @@ PATTERN MODIFIERS
next line to contain a new pattern (or a command) instead of a subject next line to contain a new pattern (or a command) instead of a subject
line. This facility is used when saving compiled patterns to a file, as line. This facility is used when saving compiled patterns to a file, as
described in the section entitled "Saving and restoring compiled pat- described in the section entitled "Saving and restoring compiled pat-
terns" below. If pushcopy is used instead of push, a copy of the com- terns" below. If pushcopy is used instead of push, a copy of the com-
piled pattern is stacked, leaving the original as current, ready to piled pattern is stacked, leaving the original as current, ready to
match the following input lines. This provides a way of testing the match the following input lines. This provides a way of testing the
pcre2_code_copy() function. The push and pushcopy modifiers are pcre2_code_copy() function. The push and pushcopy modifiers are
@ -1016,6 +1016,7 @@ SUBJECT MODIFIERS
callout_data=<n> set a value to pass via callouts callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error callout_error=<n>[:<m>] control callout error
callout_fail=<n>[:<m>] control callout failure callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function callout_none do not supply a callout function
copy=<number or name> copy captured substring copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit depth_limit=<n> set a depth limit
@ -1107,29 +1108,9 @@ SUBJECT MODIFIERS
Testing callouts Testing callouts
A callout function is supplied when pcre2test calls the library match- A callout function is supplied when pcre2test calls the library match-
ing functions, unless callout_none is specified. If callout_capture is ing functions, unless callout_none is specified. Its behaviour can be
set, the current captured groups are output when a callout occurs. The controlled by various modifiers listed above whose names begin with
default return from the callout function is zero, which allows matching callout_. Details are given in the section entitled "Callouts" below.
to continue.
The callout_fail modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to back-
track) when a callout of that number is reached. If two numbers
(<n>:<m>) are given, 1 is returned when callout <n> is reached and
there have been at least <m> callouts. The callout_error modifier is
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
entire matching process to be aborted. If both these modifiers are set
for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See "Callouts" below for a description of the output when a call-
out it taken.
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Finding all matches in a string Finding all matches in a string
@ -1511,8 +1492,32 @@ CALLOUTS
works with both matching functions. works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line (as default, but you can use a callout_fail modifier in a subject line to
described above) to change this and other parameters of the callout. change this and other parameters of the callout.
If callout_capture is set, the current captured groups are output when
a callout occurs. By default, the callout function then generates out-
put that indicates where the current match start and matching points
are in the subject, and what the next pattern item is. This output is
suppressed if the callout_no_where modifier is set.
The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Inserting callouts can be helpful when using pcre2test to check compli- Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see cated regular expressions. For further information about callouts, see
@ -1687,5 +1692,5 @@ AUTHOR
REVISION REVISION
Last updated: 16 June 2017 Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.

View File

@ -23,6 +23,7 @@
# Script updated to Python 3 by running it through the 2to3 converter. # Script updated to Python 3 by running it through the 2to3 converter.
# Added script names for Unicode 7.0.0, 20-June-2014. # Added script names for Unicode 7.0.0, 20-June-2014.
# Added script names for Unicode 8.0.0, 19-June-2015. # Added script names for Unicode 8.0.0, 19-June-2015.
# Added script names for Unicode 10.0.0, 02-July-2017.
script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \ 'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@ -51,7 +52,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi', 'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
# New for Unicode 8.0.0 # New for Unicode 8.0.0
'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian', 'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
'SignWriting' 'SignWriting',
# New for Unicode 10.0.0
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
'Nushu', 'Soyombo', 'Zanabazar_Square'
] ]
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu', category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',

View File

@ -122,6 +122,7 @@
# 20-June-2014: Updated for Unicode 7.0.0 # 20-June-2014: Updated for Unicode 7.0.0
# 12-August-2014: Updated to put Unicode version into the file # 12-August-2014: Updated to put Unicode version into the file
# 19-June-2015: Updated for Unicode 8.0.0 # 19-June-2015: Updated for Unicode 8.0.0
# 02-July-2017: Updated for Unicode 10.0.0
############################################################################## ##############################################################################
@ -335,7 +336,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi', 'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
# New for Unicode 8.0.0 # New for Unicode 8.0.0
'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian', 'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
'SignWriting' 'SignWriting',
# New for Unicode 10.0.0
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
'Nushu', 'Soyombo', 'Zanabazar_Square'
] ]
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu', category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
@ -343,7 +347,8 @@ category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ] 'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ]
break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend', break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend',
'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other' ] 'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other',
'E_Base', 'E_Modifier', 'E_Base_GAZ', 'ZWJ', 'Glue_After_Zwj' ]
test_record_size() test_record_size()
unicode_version = "" unicode_version = ""

View File

@ -1,10 +1,11 @@
# CaseFolding-8.0.0.txt # CaseFolding-10.0.0.txt
# Date: 2015-01-13, 18:16:36 GMT [MD] # Date: 2017-04-14, 05:40:18 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# #
# Unicode Character Database # Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc. # For documentation, see http://www.unicode.org/reports/tr44/
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# #
# Case Folding Properties # Case Folding Properties
# #
@ -23,7 +24,7 @@
# #
# NOTE: case folding does not preserve normalization formats! # NOTE: case folding does not preserve normalization formats!
# #
# For information on case folding, including how to have case folding # For information on case folding, including how to have case folding
# preserve normalization formats, see Section 3.13 Default Case Algorithms in # preserve normalization formats, see Section 3.13 Default Case Algorithms in
# The Unicode Standard. # The Unicode Standard.
# #
@ -593,6 +594,15 @@
13FB; C; 13F3; # CHEROKEE SMALL LETTER YU 13FB; C; 13F3; # CHEROKEE SMALL LETTER YU
13FC; C; 13F4; # CHEROKEE SMALL LETTER YV 13FC; C; 13F4; # CHEROKEE SMALL LETTER YV
13FD; C; 13F5; # CHEROKEE SMALL LETTER MV 13FD; C; 13F5; # CHEROKEE SMALL LETTER MV
1C80; C; 0432; # CYRILLIC SMALL LETTER ROUNDED VE
1C81; C; 0434; # CYRILLIC SMALL LETTER LONG-LEGGED DE
1C82; C; 043E; # CYRILLIC SMALL LETTER NARROW O
1C83; C; 0441; # CYRILLIC SMALL LETTER WIDE ES
1C84; C; 0442; # CYRILLIC SMALL LETTER TALL TE
1C85; C; 0442; # CYRILLIC SMALL LETTER THREE-LEGGED TE
1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN
1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT
1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK
1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW 1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW
1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE 1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE
1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW 1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW
@ -1163,6 +1173,7 @@ A7AA; C; 0266; # LATIN CAPITAL LETTER H WITH HOOK
A7AB; C; 025C; # LATIN CAPITAL LETTER REVERSED OPEN E A7AB; C; 025C; # LATIN CAPITAL LETTER REVERSED OPEN E
A7AC; C; 0261; # LATIN CAPITAL LETTER SCRIPT G A7AC; C; 0261; # LATIN CAPITAL LETTER SCRIPT G
A7AD; C; 026C; # LATIN CAPITAL LETTER L WITH BELT A7AD; C; 026C; # LATIN CAPITAL LETTER L WITH BELT
A7AE; C; 026A; # LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0; C; 029E; # LATIN CAPITAL LETTER TURNED K A7B0; C; 029E; # LATIN CAPITAL LETTER TURNED K
A7B1; C; 0287; # LATIN CAPITAL LETTER TURNED T A7B1; C; 0287; # LATIN CAPITAL LETTER TURNED T
A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL
@ -1327,6 +1338,42 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
10425; C; 1044D; # DESERET CAPITAL LETTER ENG 10425; C; 1044D; # DESERET CAPITAL LETTER ENG
10426; C; 1044E; # DESERET CAPITAL LETTER OI 10426; C; 1044E; # DESERET CAPITAL LETTER OI
10427; C; 1044F; # DESERET CAPITAL LETTER EW 10427; C; 1044F; # DESERET CAPITAL LETTER EW
104B0; C; 104D8; # OSAGE CAPITAL LETTER A
104B1; C; 104D9; # OSAGE CAPITAL LETTER AI
104B2; C; 104DA; # OSAGE CAPITAL LETTER AIN
104B3; C; 104DB; # OSAGE CAPITAL LETTER AH
104B4; C; 104DC; # OSAGE CAPITAL LETTER BRA
104B5; C; 104DD; # OSAGE CAPITAL LETTER CHA
104B6; C; 104DE; # OSAGE CAPITAL LETTER EHCHA
104B7; C; 104DF; # OSAGE CAPITAL LETTER E
104B8; C; 104E0; # OSAGE CAPITAL LETTER EIN
104B9; C; 104E1; # OSAGE CAPITAL LETTER HA
104BA; C; 104E2; # OSAGE CAPITAL LETTER HYA
104BB; C; 104E3; # OSAGE CAPITAL LETTER I
104BC; C; 104E4; # OSAGE CAPITAL LETTER KA
104BD; C; 104E5; # OSAGE CAPITAL LETTER EHKA
104BE; C; 104E6; # OSAGE CAPITAL LETTER KYA
104BF; C; 104E7; # OSAGE CAPITAL LETTER LA
104C0; C; 104E8; # OSAGE CAPITAL LETTER MA
104C1; C; 104E9; # OSAGE CAPITAL LETTER NA
104C2; C; 104EA; # OSAGE CAPITAL LETTER O
104C3; C; 104EB; # OSAGE CAPITAL LETTER OIN
104C4; C; 104EC; # OSAGE CAPITAL LETTER PA
104C5; C; 104ED; # OSAGE CAPITAL LETTER EHPA
104C6; C; 104EE; # OSAGE CAPITAL LETTER SA
104C7; C; 104EF; # OSAGE CAPITAL LETTER SHA
104C8; C; 104F0; # OSAGE CAPITAL LETTER TA
104C9; C; 104F1; # OSAGE CAPITAL LETTER EHTA
104CA; C; 104F2; # OSAGE CAPITAL LETTER TSA
104CB; C; 104F3; # OSAGE CAPITAL LETTER EHTSA
104CC; C; 104F4; # OSAGE CAPITAL LETTER TSHA
104CD; C; 104F5; # OSAGE CAPITAL LETTER DHA
104CE; C; 104F6; # OSAGE CAPITAL LETTER U
104CF; C; 104F7; # OSAGE CAPITAL LETTER WA
104D0; C; 104F8; # OSAGE CAPITAL LETTER KHA
104D1; C; 104F9; # OSAGE CAPITAL LETTER GHA
104D2; C; 104FA; # OSAGE CAPITAL LETTER ZA
104D3; C; 104FB; # OSAGE CAPITAL LETTER ZHA
10C80; C; 10CC0; # OLD HUNGARIAN CAPITAL LETTER A 10C80; C; 10CC0; # OLD HUNGARIAN CAPITAL LETTER A
10C81; C; 10CC1; # OLD HUNGARIAN CAPITAL LETTER AA 10C81; C; 10CC1; # OLD HUNGARIAN CAPITAL LETTER AA
10C82; C; 10CC2; # OLD HUNGARIAN CAPITAL LETTER EB 10C82; C; 10CC2; # OLD HUNGARIAN CAPITAL LETTER EB
@ -1410,5 +1457,39 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU 118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU
118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII 118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII
118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO 118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO
1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF
1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI
1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM
1E903; C; 1E925; # ADLAM CAPITAL LETTER MIIM
1E904; C; 1E926; # ADLAM CAPITAL LETTER BA
1E905; C; 1E927; # ADLAM CAPITAL LETTER SINNYIIYHE
1E906; C; 1E928; # ADLAM CAPITAL LETTER PE
1E907; C; 1E929; # ADLAM CAPITAL LETTER BHE
1E908; C; 1E92A; # ADLAM CAPITAL LETTER RA
1E909; C; 1E92B; # ADLAM CAPITAL LETTER E
1E90A; C; 1E92C; # ADLAM CAPITAL LETTER FA
1E90B; C; 1E92D; # ADLAM CAPITAL LETTER I
1E90C; C; 1E92E; # ADLAM CAPITAL LETTER O
1E90D; C; 1E92F; # ADLAM CAPITAL LETTER DHA
1E90E; C; 1E930; # ADLAM CAPITAL LETTER YHE
1E90F; C; 1E931; # ADLAM CAPITAL LETTER WAW
1E910; C; 1E932; # ADLAM CAPITAL LETTER NUN
1E911; C; 1E933; # ADLAM CAPITAL LETTER KAF
1E912; C; 1E934; # ADLAM CAPITAL LETTER YA
1E913; C; 1E935; # ADLAM CAPITAL LETTER U
1E914; C; 1E936; # ADLAM CAPITAL LETTER JIIM
1E915; C; 1E937; # ADLAM CAPITAL LETTER CHI
1E916; C; 1E938; # ADLAM CAPITAL LETTER HA
1E917; C; 1E939; # ADLAM CAPITAL LETTER QAAF
1E918; C; 1E93A; # ADLAM CAPITAL LETTER GA
1E919; C; 1E93B; # ADLAM CAPITAL LETTER NYA
1E91A; C; 1E93C; # ADLAM CAPITAL LETTER TU
1E91B; C; 1E93D; # ADLAM CAPITAL LETTER NHA
1E91C; C; 1E93E; # ADLAM CAPITAL LETTER VA
1E91D; C; 1E93F; # ADLAM CAPITAL LETTER KHA
1E91E; C; 1E940; # ADLAM CAPITAL LETTER GBE
1E91F; C; 1E941; # ADLAM CAPITAL LETTER ZAL
1E920; C; 1E942; # ADLAM CAPITAL LETTER KPO
1E921; C; 1E943; # ADLAM CAPITAL LETTER SHA
# #
# EOF # EOF

View File

@ -1,10 +1,11 @@
# DerivedGeneralCategory-8.0.0.txt # DerivedGeneralCategory-10.0.0.txt
# Date: 2015-02-13, 13:47:11 GMT [MD] # Date: 2017-03-08, 08:41:49 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# #
# Unicode Character Database # Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc. # For documentation, see http://www.unicode.org/reports/tr44/
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# ================================================ # ================================================
@ -36,8 +37,10 @@
082E..082F ; Cn # [2] <reserved-082E>..<reserved-082F> 082E..082F ; Cn # [2] <reserved-082E>..<reserved-082F>
083F ; Cn # <reserved-083F> 083F ; Cn # <reserved-083F>
085C..085D ; Cn # [2] <reserved-085C>..<reserved-085D> 085C..085D ; Cn # [2] <reserved-085C>..<reserved-085D>
085F..089F ; Cn # [65] <reserved-085F>..<reserved-089F> 085F ; Cn # <reserved-085F>
08B5..08E2 ; Cn # [46] <reserved-08B5>..<reserved-08E2> 086B..089F ; Cn # [53] <reserved-086B>..<reserved-089F>
08B5 ; Cn # <reserved-08B5>
08BE..08D3 ; Cn # [22] <reserved-08BE>..<reserved-08D3>
0984 ; Cn # <reserved-0984> 0984 ; Cn # <reserved-0984>
098D..098E ; Cn # [2] <reserved-098D>..<reserved-098E> 098D..098E ; Cn # [2] <reserved-098D>..<reserved-098E>
0991..0992 ; Cn # [2] <reserved-0991>..<reserved-0992> 0991..0992 ; Cn # [2] <reserved-0991>..<reserved-0992>
@ -51,7 +54,7 @@
09D8..09DB ; Cn # [4] <reserved-09D8>..<reserved-09DB> 09D8..09DB ; Cn # [4] <reserved-09D8>..<reserved-09DB>
09DE ; Cn # <reserved-09DE> 09DE ; Cn # <reserved-09DE>
09E4..09E5 ; Cn # [2] <reserved-09E4>..<reserved-09E5> 09E4..09E5 ; Cn # [2] <reserved-09E4>..<reserved-09E5>
09FC..0A00 ; Cn # [5] <reserved-09FC>..<reserved-0A00> 09FE..0A00 ; Cn # [3] <reserved-09FE>..<reserved-0A00>
0A04 ; Cn # <reserved-0A04> 0A04 ; Cn # <reserved-0A04>
0A0B..0A0E ; Cn # [4] <reserved-0A0B>..<reserved-0A0E> 0A0B..0A0E ; Cn # [4] <reserved-0A0B>..<reserved-0A0E>
0A11..0A12 ; Cn # [2] <reserved-0A11>..<reserved-0A12> 0A11..0A12 ; Cn # [2] <reserved-0A11>..<reserved-0A12>
@ -81,7 +84,7 @@
0AD1..0ADF ; Cn # [15] <reserved-0AD1>..<reserved-0ADF> 0AD1..0ADF ; Cn # [15] <reserved-0AD1>..<reserved-0ADF>
0AE4..0AE5 ; Cn # [2] <reserved-0AE4>..<reserved-0AE5> 0AE4..0AE5 ; Cn # [2] <reserved-0AE4>..<reserved-0AE5>
0AF2..0AF8 ; Cn # [7] <reserved-0AF2>..<reserved-0AF8> 0AF2..0AF8 ; Cn # [7] <reserved-0AF2>..<reserved-0AF8>
0AFA..0B00 ; Cn # [7] <reserved-0AFA>..<reserved-0B00> 0B00 ; Cn # <reserved-0B00>
0B04 ; Cn # <reserved-0B04> 0B04 ; Cn # <reserved-0B04>
0B0D..0B0E ; Cn # [2] <reserved-0B0D>..<reserved-0B0E> 0B0D..0B0E ; Cn # [2] <reserved-0B0D>..<reserved-0B0E>
0B11..0B12 ; Cn # [2] <reserved-0B11>..<reserved-0B12> 0B11..0B12 ; Cn # [2] <reserved-0B11>..<reserved-0B12>
@ -124,7 +127,6 @@
0C5B..0C5F ; Cn # [5] <reserved-0C5B>..<reserved-0C5F> 0C5B..0C5F ; Cn # [5] <reserved-0C5B>..<reserved-0C5F>
0C64..0C65 ; Cn # [2] <reserved-0C64>..<reserved-0C65> 0C64..0C65 ; Cn # [2] <reserved-0C64>..<reserved-0C65>
0C70..0C77 ; Cn # [8] <reserved-0C70>..<reserved-0C77> 0C70..0C77 ; Cn # [8] <reserved-0C70>..<reserved-0C77>
0C80 ; Cn # <reserved-0C80>
0C84 ; Cn # <reserved-0C84> 0C84 ; Cn # <reserved-0C84>
0C8D ; Cn # <reserved-0C8D> 0C8D ; Cn # <reserved-0C8D>
0C91 ; Cn # <reserved-0C91> 0C91 ; Cn # <reserved-0C91>
@ -138,17 +140,14 @@
0CDF ; Cn # <reserved-0CDF> 0CDF ; Cn # <reserved-0CDF>
0CE4..0CE5 ; Cn # [2] <reserved-0CE4>..<reserved-0CE5> 0CE4..0CE5 ; Cn # [2] <reserved-0CE4>..<reserved-0CE5>
0CF0 ; Cn # <reserved-0CF0> 0CF0 ; Cn # <reserved-0CF0>
0CF3..0D00 ; Cn # [14] <reserved-0CF3>..<reserved-0D00> 0CF3..0CFF ; Cn # [13] <reserved-0CF3>..<reserved-0CFF>
0D04 ; Cn # <reserved-0D04> 0D04 ; Cn # <reserved-0D04>
0D0D ; Cn # <reserved-0D0D> 0D0D ; Cn # <reserved-0D0D>
0D11 ; Cn # <reserved-0D11> 0D11 ; Cn # <reserved-0D11>
0D3B..0D3C ; Cn # [2] <reserved-0D3B>..<reserved-0D3C>
0D45 ; Cn # <reserved-0D45> 0D45 ; Cn # <reserved-0D45>
0D49 ; Cn # <reserved-0D49> 0D49 ; Cn # <reserved-0D49>
0D4F..0D56 ; Cn # [8] <reserved-0D4F>..<reserved-0D56> 0D50..0D53 ; Cn # [4] <reserved-0D50>..<reserved-0D53>
0D58..0D5E ; Cn # [7] <reserved-0D58>..<reserved-0D5E>
0D64..0D65 ; Cn # [2] <reserved-0D64>..<reserved-0D65> 0D64..0D65 ; Cn # [2] <reserved-0D64>..<reserved-0D65>
0D76..0D78 ; Cn # [3] <reserved-0D76>..<reserved-0D78>
0D80..0D81 ; Cn # [2] <reserved-0D80>..<reserved-0D81> 0D80..0D81 ; Cn # [2] <reserved-0D80>..<reserved-0D81>
0D84 ; Cn # <reserved-0D84> 0D84 ; Cn # <reserved-0D84>
0D97..0D99 ; Cn # [3] <reserved-0D97>..<reserved-0D99> 0D97..0D99 ; Cn # [3] <reserved-0D97>..<reserved-0D99>
@ -249,11 +248,10 @@
1BF4..1BFB ; Cn # [8] <reserved-1BF4>..<reserved-1BFB> 1BF4..1BFB ; Cn # [8] <reserved-1BF4>..<reserved-1BFB>
1C38..1C3A ; Cn # [3] <reserved-1C38>..<reserved-1C3A> 1C38..1C3A ; Cn # [3] <reserved-1C38>..<reserved-1C3A>
1C4A..1C4C ; Cn # [3] <reserved-1C4A>..<reserved-1C4C> 1C4A..1C4C ; Cn # [3] <reserved-1C4A>..<reserved-1C4C>
1C80..1CBF ; Cn # [64] <reserved-1C80>..<reserved-1CBF> 1C89..1CBF ; Cn # [55] <reserved-1C89>..<reserved-1CBF>
1CC8..1CCF ; Cn # [8] <reserved-1CC8>..<reserved-1CCF> 1CC8..1CCF ; Cn # [8] <reserved-1CC8>..<reserved-1CCF>
1CF7 ; Cn # <reserved-1CF7>
1CFA..1CFF ; Cn # [6] <reserved-1CFA>..<reserved-1CFF> 1CFA..1CFF ; Cn # [6] <reserved-1CFA>..<reserved-1CFF>
1DF6..1DFB ; Cn # [6] <reserved-1DF6>..<reserved-1DFB> 1DFA ; Cn # <reserved-1DFA>
1F16..1F17 ; Cn # [2] <reserved-1F16>..<reserved-1F17> 1F16..1F17 ; Cn # [2] <reserved-1F16>..<reserved-1F17>
1F1E..1F1F ; Cn # [2] <reserved-1F1E>..<reserved-1F1F> 1F1E..1F1F ; Cn # [2] <reserved-1F1E>..<reserved-1F1F>
1F46..1F47 ; Cn # [2] <reserved-1F46>..<reserved-1F47> 1F46..1F47 ; Cn # [2] <reserved-1F46>..<reserved-1F47>
@ -274,17 +272,16 @@
2072..2073 ; Cn # [2] <reserved-2072>..<reserved-2073> 2072..2073 ; Cn # [2] <reserved-2072>..<reserved-2073>
208F ; Cn # <reserved-208F> 208F ; Cn # <reserved-208F>
209D..209F ; Cn # [3] <reserved-209D>..<reserved-209F> 209D..209F ; Cn # [3] <reserved-209D>..<reserved-209F>
20BF..20CF ; Cn # [17] <reserved-20BF>..<reserved-20CF> 20C0..20CF ; Cn # [16] <reserved-20C0>..<reserved-20CF>
20F1..20FF ; Cn # [15] <reserved-20F1>..<reserved-20FF> 20F1..20FF ; Cn # [15] <reserved-20F1>..<reserved-20FF>
218C..218F ; Cn # [4] <reserved-218C>..<reserved-218F> 218C..218F ; Cn # [4] <reserved-218C>..<reserved-218F>
23FB..23FF ; Cn # [5] <reserved-23FB>..<reserved-23FF>
2427..243F ; Cn # [25] <reserved-2427>..<reserved-243F> 2427..243F ; Cn # [25] <reserved-2427>..<reserved-243F>
244B..245F ; Cn # [21] <reserved-244B>..<reserved-245F> 244B..245F ; Cn # [21] <reserved-244B>..<reserved-245F>
2B74..2B75 ; Cn # [2] <reserved-2B74>..<reserved-2B75> 2B74..2B75 ; Cn # [2] <reserved-2B74>..<reserved-2B75>
2B96..2B97 ; Cn # [2] <reserved-2B96>..<reserved-2B97> 2B96..2B97 ; Cn # [2] <reserved-2B96>..<reserved-2B97>
2BBA..2BBC ; Cn # [3] <reserved-2BBA>..<reserved-2BBC> 2BBA..2BBC ; Cn # [3] <reserved-2BBA>..<reserved-2BBC>
2BC9 ; Cn # <reserved-2BC9> 2BC9 ; Cn # <reserved-2BC9>
2BD2..2BEB ; Cn # [26] <reserved-2BD2>..<reserved-2BEB> 2BD3..2BEB ; Cn # [25] <reserved-2BD3>..<reserved-2BEB>
2BF0..2BFF ; Cn # [16] <reserved-2BF0>..<reserved-2BFF> 2BF0..2BFF ; Cn # [16] <reserved-2BF0>..<reserved-2BFF>
2C2F ; Cn # <reserved-2C2F> 2C2F ; Cn # <reserved-2C2F>
2C5F ; Cn # <reserved-2C5F> 2C5F ; Cn # <reserved-2C5F>
@ -303,7 +300,7 @@
2DCF ; Cn # <reserved-2DCF> 2DCF ; Cn # <reserved-2DCF>
2DD7 ; Cn # <reserved-2DD7> 2DD7 ; Cn # <reserved-2DD7>
2DDF ; Cn # <reserved-2DDF> 2DDF ; Cn # <reserved-2DDF>
2E43..2E7F ; Cn # [61] <reserved-2E43>..<reserved-2E7F> 2E4A..2E7F ; Cn # [54] <reserved-2E4A>..<reserved-2E7F>
2E9A ; Cn # <reserved-2E9A> 2E9A ; Cn # <reserved-2E9A>
2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF> 2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF>
2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF> 2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF>
@ -311,24 +308,24 @@
3040 ; Cn # <reserved-3040> 3040 ; Cn # <reserved-3040>
3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098> 3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098>
3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104> 3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104>
312E..3130 ; Cn # [3] <reserved-312E>..<reserved-3130> 312F..3130 ; Cn # [2] <reserved-312F>..<reserved-3130>
318F ; Cn # <reserved-318F> 318F ; Cn # <reserved-318F>
31BB..31BF ; Cn # [5] <reserved-31BB>..<reserved-31BF> 31BB..31BF ; Cn # [5] <reserved-31BB>..<reserved-31BF>
31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF> 31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF>
321F ; Cn # <reserved-321F> 321F ; Cn # <reserved-321F>
32FF ; Cn # <reserved-32FF> 32FF ; Cn # <reserved-32FF>
4DB6..4DBF ; Cn # [10] <reserved-4DB6>..<reserved-4DBF> 4DB6..4DBF ; Cn # [10] <reserved-4DB6>..<reserved-4DBF>
9FD6..9FFF ; Cn # [42] <reserved-9FD6>..<reserved-9FFF> 9FEB..9FFF ; Cn # [21] <reserved-9FEB>..<reserved-9FFF>
A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F> A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F>
A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF> A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF>
A62C..A63F ; Cn # [20] <reserved-A62C>..<reserved-A63F> A62C..A63F ; Cn # [20] <reserved-A62C>..<reserved-A63F>
A6F8..A6FF ; Cn # [8] <reserved-A6F8>..<reserved-A6FF> A6F8..A6FF ; Cn # [8] <reserved-A6F8>..<reserved-A6FF>
A7AE..A7AF ; Cn # [2] <reserved-A7AE>..<reserved-A7AF> A7AF ; Cn # <reserved-A7AF>
A7B8..A7F6 ; Cn # [63] <reserved-A7B8>..<reserved-A7F6> A7B8..A7F6 ; Cn # [63] <reserved-A7B8>..<reserved-A7F6>
A82C..A82F ; Cn # [4] <reserved-A82C>..<reserved-A82F> A82C..A82F ; Cn # [4] <reserved-A82C>..<reserved-A82F>
A83A..A83F ; Cn # [6] <reserved-A83A>..<reserved-A83F> A83A..A83F ; Cn # [6] <reserved-A83A>..<reserved-A83F>
A878..A87F ; Cn # [8] <reserved-A878>..<reserved-A87F> A878..A87F ; Cn # [8] <reserved-A878>..<reserved-A87F>
A8C5..A8CD ; Cn # [9] <reserved-A8C5>..<reserved-A8CD> A8C6..A8CD ; Cn # [8] <reserved-A8C6>..<reserved-A8CD>
A8DA..A8DF ; Cn # [6] <reserved-A8DA>..<reserved-A8DF> A8DA..A8DF ; Cn # [6] <reserved-A8DA>..<reserved-A8DF>
A8FE..A8FF ; Cn # [2] <reserved-A8FE>..<reserved-A8FF> A8FE..A8FF ; Cn # [2] <reserved-A8FE>..<reserved-A8FF>
A954..A95E ; Cn # [11] <reserved-A954>..<reserved-A95E> A954..A95E ; Cn # [11] <reserved-A954>..<reserved-A95E>
@ -390,21 +387,23 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
100FB..100FF ; Cn # [5] <reserved-100FB>..<reserved-100FF> 100FB..100FF ; Cn # [5] <reserved-100FB>..<reserved-100FF>
10103..10106 ; Cn # [4] <reserved-10103>..<reserved-10106> 10103..10106 ; Cn # [4] <reserved-10103>..<reserved-10106>
10134..10136 ; Cn # [3] <reserved-10134>..<reserved-10136> 10134..10136 ; Cn # [3] <reserved-10134>..<reserved-10136>
1018D..1018F ; Cn # [3] <reserved-1018D>..<reserved-1018F> 1018F ; Cn # <reserved-1018F>
1019C..1019F ; Cn # [4] <reserved-1019C>..<reserved-1019F> 1019C..1019F ; Cn # [4] <reserved-1019C>..<reserved-1019F>
101A1..101CF ; Cn # [47] <reserved-101A1>..<reserved-101CF> 101A1..101CF ; Cn # [47] <reserved-101A1>..<reserved-101CF>
101FE..1027F ; Cn # [130] <reserved-101FE>..<reserved-1027F> 101FE..1027F ; Cn # [130] <reserved-101FE>..<reserved-1027F>
1029D..1029F ; Cn # [3] <reserved-1029D>..<reserved-1029F> 1029D..1029F ; Cn # [3] <reserved-1029D>..<reserved-1029F>
102D1..102DF ; Cn # [15] <reserved-102D1>..<reserved-102DF> 102D1..102DF ; Cn # [15] <reserved-102D1>..<reserved-102DF>
102FC..102FF ; Cn # [4] <reserved-102FC>..<reserved-102FF> 102FC..102FF ; Cn # [4] <reserved-102FC>..<reserved-102FF>
10324..1032F ; Cn # [12] <reserved-10324>..<reserved-1032F> 10324..1032C ; Cn # [9] <reserved-10324>..<reserved-1032C>
1034B..1034F ; Cn # [5] <reserved-1034B>..<reserved-1034F> 1034B..1034F ; Cn # [5] <reserved-1034B>..<reserved-1034F>
1037B..1037F ; Cn # [5] <reserved-1037B>..<reserved-1037F> 1037B..1037F ; Cn # [5] <reserved-1037B>..<reserved-1037F>
1039E ; Cn # <reserved-1039E> 1039E ; Cn # <reserved-1039E>
103C4..103C7 ; Cn # [4] <reserved-103C4>..<reserved-103C7> 103C4..103C7 ; Cn # [4] <reserved-103C4>..<reserved-103C7>
103D6..103FF ; Cn # [42] <reserved-103D6>..<reserved-103FF> 103D6..103FF ; Cn # [42] <reserved-103D6>..<reserved-103FF>
1049E..1049F ; Cn # [2] <reserved-1049E>..<reserved-1049F> 1049E..1049F ; Cn # [2] <reserved-1049E>..<reserved-1049F>
104AA..104FF ; Cn # [86] <reserved-104AA>..<reserved-104FF> 104AA..104AF ; Cn # [6] <reserved-104AA>..<reserved-104AF>
104D4..104D7 ; Cn # [4] <reserved-104D4>..<reserved-104D7>
104FC..104FF ; Cn # [4] <reserved-104FC>..<reserved-104FF>
10528..1052F ; Cn # [8] <reserved-10528>..<reserved-1052F> 10528..1052F ; Cn # [8] <reserved-10528>..<reserved-1052F>
10564..1056E ; Cn # [11] <reserved-10564>..<reserved-1056E> 10564..1056E ; Cn # [11] <reserved-10564>..<reserved-1056E>
10570..105FF ; Cn # [144] <reserved-10570>..<reserved-105FF> 10570..105FF ; Cn # [144] <reserved-10570>..<reserved-105FF>
@ -460,7 +459,7 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
111E0 ; Cn # <reserved-111E0> 111E0 ; Cn # <reserved-111E0>
111F5..111FF ; Cn # [11] <reserved-111F5>..<reserved-111FF> 111F5..111FF ; Cn # [11] <reserved-111F5>..<reserved-111FF>
11212 ; Cn # <reserved-11212> 11212 ; Cn # <reserved-11212>
1123E..1127F ; Cn # [66] <reserved-1123E>..<reserved-1127F> 1123F..1127F ; Cn # [65] <reserved-1123F>..<reserved-1127F>
11287 ; Cn # <reserved-11287> 11287 ; Cn # <reserved-11287>
11289 ; Cn # <reserved-11289> 11289 ; Cn # <reserved-11289>
1128E ; Cn # <reserved-1128E> 1128E ; Cn # <reserved-1128E>
@ -482,21 +481,43 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
11358..1135C ; Cn # [5] <reserved-11358>..<reserved-1135C> 11358..1135C ; Cn # [5] <reserved-11358>..<reserved-1135C>
11364..11365 ; Cn # [2] <reserved-11364>..<reserved-11365> 11364..11365 ; Cn # [2] <reserved-11364>..<reserved-11365>
1136D..1136F ; Cn # [3] <reserved-1136D>..<reserved-1136F> 1136D..1136F ; Cn # [3] <reserved-1136D>..<reserved-1136F>
11375..1147F ; Cn # [267] <reserved-11375>..<reserved-1147F> 11375..113FF ; Cn # [139] <reserved-11375>..<reserved-113FF>
1145A ; Cn # <reserved-1145A>
1145C ; Cn # <reserved-1145C>
1145E..1147F ; Cn # [34] <reserved-1145E>..<reserved-1147F>
114C8..114CF ; Cn # [8] <reserved-114C8>..<reserved-114CF> 114C8..114CF ; Cn # [8] <reserved-114C8>..<reserved-114CF>
114DA..1157F ; Cn # [166] <reserved-114DA>..<reserved-1157F> 114DA..1157F ; Cn # [166] <reserved-114DA>..<reserved-1157F>
115B6..115B7 ; Cn # [2] <reserved-115B6>..<reserved-115B7> 115B6..115B7 ; Cn # [2] <reserved-115B6>..<reserved-115B7>
115DE..115FF ; Cn # [34] <reserved-115DE>..<reserved-115FF> 115DE..115FF ; Cn # [34] <reserved-115DE>..<reserved-115FF>
11645..1164F ; Cn # [11] <reserved-11645>..<reserved-1164F> 11645..1164F ; Cn # [11] <reserved-11645>..<reserved-1164F>
1165A..1167F ; Cn # [38] <reserved-1165A>..<reserved-1167F> 1165A..1165F ; Cn # [6] <reserved-1165A>..<reserved-1165F>
1166D..1167F ; Cn # [19] <reserved-1166D>..<reserved-1167F>
116B8..116BF ; Cn # [8] <reserved-116B8>..<reserved-116BF> 116B8..116BF ; Cn # [8] <reserved-116B8>..<reserved-116BF>
116CA..116FF ; Cn # [54] <reserved-116CA>..<reserved-116FF> 116CA..116FF ; Cn # [54] <reserved-116CA>..<reserved-116FF>
1171A..1171C ; Cn # [3] <reserved-1171A>..<reserved-1171C> 1171A..1171C ; Cn # [3] <reserved-1171A>..<reserved-1171C>
1172C..1172F ; Cn # [4] <reserved-1172C>..<reserved-1172F> 1172C..1172F ; Cn # [4] <reserved-1172C>..<reserved-1172F>
11740..1189F ; Cn # [352] <reserved-11740>..<reserved-1189F> 11740..1189F ; Cn # [352] <reserved-11740>..<reserved-1189F>
118F3..118FE ; Cn # [12] <reserved-118F3>..<reserved-118FE> 118F3..118FE ; Cn # [12] <reserved-118F3>..<reserved-118FE>
11900..11ABF ; Cn # [448] <reserved-11900>..<reserved-11ABF> 11900..119FF ; Cn # [256] <reserved-11900>..<reserved-119FF>
11AF9..11FFF ; Cn # [1287] <reserved-11AF9>..<reserved-11FFF> 11A48..11A4F ; Cn # [8] <reserved-11A48>..<reserved-11A4F>
11A84..11A85 ; Cn # [2] <reserved-11A84>..<reserved-11A85>
11A9D ; Cn # <reserved-11A9D>
11AA3..11ABF ; Cn # [29] <reserved-11AA3>..<reserved-11ABF>
11AF9..11BFF ; Cn # [263] <reserved-11AF9>..<reserved-11BFF>
11C09 ; Cn # <reserved-11C09>
11C37 ; Cn # <reserved-11C37>
11C46..11C4F ; Cn # [10] <reserved-11C46>..<reserved-11C4F>
11C6D..11C6F ; Cn # [3] <reserved-11C6D>..<reserved-11C6F>
11C90..11C91 ; Cn # [2] <reserved-11C90>..<reserved-11C91>
11CA8 ; Cn # <reserved-11CA8>
11CB7..11CFF ; Cn # [73] <reserved-11CB7>..<reserved-11CFF>
11D07 ; Cn # <reserved-11D07>
11D0A ; Cn # <reserved-11D0A>
11D37..11D39 ; Cn # [3] <reserved-11D37>..<reserved-11D39>
11D3B ; Cn # <reserved-11D3B>
11D3E ; Cn # <reserved-11D3E>
11D48..11D4F ; Cn # [8] <reserved-11D48>..<reserved-11D4F>
11D5A..11FFF ; Cn # [678] <reserved-11D5A>..<reserved-11FFF>
1239A..123FF ; Cn # [102] <reserved-1239A>..<reserved-123FF> 1239A..123FF ; Cn # [102] <reserved-1239A>..<reserved-123FF>
1246F ; Cn # <reserved-1246F> 1246F ; Cn # <reserved-1246F>
12475..1247F ; Cn # [11] <reserved-12475>..<reserved-1247F> 12475..1247F ; Cn # [11] <reserved-12475>..<reserved-1247F>
@ -516,8 +537,12 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
16B90..16EFF ; Cn # [880] <reserved-16B90>..<reserved-16EFF> 16B90..16EFF ; Cn # [880] <reserved-16B90>..<reserved-16EFF>
16F45..16F4F ; Cn # [11] <reserved-16F45>..<reserved-16F4F> 16F45..16F4F ; Cn # [11] <reserved-16F45>..<reserved-16F4F>
16F7F..16F8E ; Cn # [16] <reserved-16F7F>..<reserved-16F8E> 16F7F..16F8E ; Cn # [16] <reserved-16F7F>..<reserved-16F8E>
16FA0..1AFFF ; Cn # [16480] <reserved-16FA0>..<reserved-1AFFF> 16FA0..16FDF ; Cn # [64] <reserved-16FA0>..<reserved-16FDF>
1B002..1BBFF ; Cn # [3070] <reserved-1B002>..<reserved-1BBFF> 16FE2..16FFF ; Cn # [30] <reserved-16FE2>..<reserved-16FFF>
187ED..187FF ; Cn # [19] <reserved-187ED>..<reserved-187FF>
18AF3..1AFFF ; Cn # [9485] <reserved-18AF3>..<reserved-1AFFF>
1B11F..1B16F ; Cn # [81] <reserved-1B11F>..<reserved-1B16F>
1B2FC..1BBFF ; Cn # [2308] <reserved-1B2FC>..<reserved-1BBFF>
1BC6B..1BC6F ; Cn # [5] <reserved-1BC6B>..<reserved-1BC6F> 1BC6B..1BC6F ; Cn # [5] <reserved-1BC6B>..<reserved-1BC6F>
1BC7D..1BC7F ; Cn # [3] <reserved-1BC7D>..<reserved-1BC7F> 1BC7D..1BC7F ; Cn # [3] <reserved-1BC7D>..<reserved-1BC7F>
1BC89..1BC8F ; Cn # [7] <reserved-1BC89>..<reserved-1BC8F> 1BC89..1BC8F ; Cn # [7] <reserved-1BC89>..<reserved-1BC8F>
@ -551,9 +576,17 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1D7CC..1D7CD ; Cn # [2] <reserved-1D7CC>..<reserved-1D7CD> 1D7CC..1D7CD ; Cn # [2] <reserved-1D7CC>..<reserved-1D7CD>
1DA8C..1DA9A ; Cn # [15] <reserved-1DA8C>..<reserved-1DA9A> 1DA8C..1DA9A ; Cn # [15] <reserved-1DA8C>..<reserved-1DA9A>
1DAA0 ; Cn # <reserved-1DAA0> 1DAA0 ; Cn # <reserved-1DAA0>
1DAB0..1E7FF ; Cn # [3408] <reserved-1DAB0>..<reserved-1E7FF> 1DAB0..1DFFF ; Cn # [1360] <reserved-1DAB0>..<reserved-1DFFF>
1E007 ; Cn # <reserved-1E007>
1E019..1E01A ; Cn # [2] <reserved-1E019>..<reserved-1E01A>
1E022 ; Cn # <reserved-1E022>
1E025 ; Cn # <reserved-1E025>
1E02B..1E7FF ; Cn # [2005] <reserved-1E02B>..<reserved-1E7FF>
1E8C5..1E8C6 ; Cn # [2] <reserved-1E8C5>..<reserved-1E8C6> 1E8C5..1E8C6 ; Cn # [2] <reserved-1E8C5>..<reserved-1E8C6>
1E8D7..1EDFF ; Cn # [1321] <reserved-1E8D7>..<reserved-1EDFF> 1E8D7..1E8FF ; Cn # [41] <reserved-1E8D7>..<reserved-1E8FF>
1E94B..1E94F ; Cn # [5] <reserved-1E94B>..<reserved-1E94F>
1E95A..1E95D ; Cn # [4] <reserved-1E95A>..<reserved-1E95D>
1E960..1EDFF ; Cn # [1184] <reserved-1E960>..<reserved-1EDFF>
1EE04 ; Cn # <reserved-1EE04> 1EE04 ; Cn # <reserved-1EE04>
1EE20 ; Cn # <reserved-1EE20> 1EE20 ; Cn # <reserved-1EE20>
1EE23 ; Cn # <reserved-1EE23> 1EE23 ; Cn # <reserved-1EE23>
@ -597,30 +630,34 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1F10D..1F10F ; Cn # [3] <reserved-1F10D>..<reserved-1F10F> 1F10D..1F10F ; Cn # [3] <reserved-1F10D>..<reserved-1F10F>
1F12F ; Cn # <reserved-1F12F> 1F12F ; Cn # <reserved-1F12F>
1F16C..1F16F ; Cn # [4] <reserved-1F16C>..<reserved-1F16F> 1F16C..1F16F ; Cn # [4] <reserved-1F16C>..<reserved-1F16F>
1F19B..1F1E5 ; Cn # [75] <reserved-1F19B>..<reserved-1F1E5> 1F1AD..1F1E5 ; Cn # [57] <reserved-1F1AD>..<reserved-1F1E5>
1F203..1F20F ; Cn # [13] <reserved-1F203>..<reserved-1F20F> 1F203..1F20F ; Cn # [13] <reserved-1F203>..<reserved-1F20F>
1F23B..1F23F ; Cn # [5] <reserved-1F23B>..<reserved-1F23F> 1F23C..1F23F ; Cn # [4] <reserved-1F23C>..<reserved-1F23F>
1F249..1F24F ; Cn # [7] <reserved-1F249>..<reserved-1F24F> 1F249..1F24F ; Cn # [7] <reserved-1F249>..<reserved-1F24F>
1F252..1F2FF ; Cn # [174] <reserved-1F252>..<reserved-1F2FF> 1F252..1F25F ; Cn # [14] <reserved-1F252>..<reserved-1F25F>
1F57A ; Cn # <reserved-1F57A> 1F266..1F2FF ; Cn # [154] <reserved-1F266>..<reserved-1F2FF>
1F5A4 ; Cn # <reserved-1F5A4> 1F6D5..1F6DF ; Cn # [11] <reserved-1F6D5>..<reserved-1F6DF>
1F6D1..1F6DF ; Cn # [15] <reserved-1F6D1>..<reserved-1F6DF>
1F6ED..1F6EF ; Cn # [3] <reserved-1F6ED>..<reserved-1F6EF> 1F6ED..1F6EF ; Cn # [3] <reserved-1F6ED>..<reserved-1F6EF>
1F6F4..1F6FF ; Cn # [12] <reserved-1F6F4>..<reserved-1F6FF> 1F6F9..1F6FF ; Cn # [7] <reserved-1F6F9>..<reserved-1F6FF>
1F774..1F77F ; Cn # [12] <reserved-1F774>..<reserved-1F77F> 1F774..1F77F ; Cn # [12] <reserved-1F774>..<reserved-1F77F>
1F7D5..1F7FF ; Cn # [43] <reserved-1F7D5>..<reserved-1F7FF> 1F7D5..1F7FF ; Cn # [43] <reserved-1F7D5>..<reserved-1F7FF>
1F80C..1F80F ; Cn # [4] <reserved-1F80C>..<reserved-1F80F> 1F80C..1F80F ; Cn # [4] <reserved-1F80C>..<reserved-1F80F>
1F848..1F84F ; Cn # [8] <reserved-1F848>..<reserved-1F84F> 1F848..1F84F ; Cn # [8] <reserved-1F848>..<reserved-1F84F>
1F85A..1F85F ; Cn # [6] <reserved-1F85A>..<reserved-1F85F> 1F85A..1F85F ; Cn # [6] <reserved-1F85A>..<reserved-1F85F>
1F888..1F88F ; Cn # [8] <reserved-1F888>..<reserved-1F88F> 1F888..1F88F ; Cn # [8] <reserved-1F888>..<reserved-1F88F>
1F8AE..1F90F ; Cn # [98] <reserved-1F8AE>..<reserved-1F90F> 1F8AE..1F8FF ; Cn # [82] <reserved-1F8AE>..<reserved-1F8FF>
1F919..1F97F ; Cn # [103] <reserved-1F919>..<reserved-1F97F> 1F90C..1F90F ; Cn # [4] <reserved-1F90C>..<reserved-1F90F>
1F985..1F9BF ; Cn # [59] <reserved-1F985>..<reserved-1F9BF> 1F93F ; Cn # <reserved-1F93F>
1F9C1..1FFFF ; Cn # [1599] <reserved-1F9C1>..<noncharacter-1FFFF> 1F94D..1F94F ; Cn # [3] <reserved-1F94D>..<reserved-1F94F>
1F96C..1F97F ; Cn # [20] <reserved-1F96C>..<reserved-1F97F>
1F998..1F9BF ; Cn # [40] <reserved-1F998>..<reserved-1F9BF>
1F9C1..1F9CF ; Cn # [15] <reserved-1F9C1>..<reserved-1F9CF>
1F9E7..1FFFF ; Cn # [1561] <reserved-1F9E7>..<noncharacter-1FFFF>
2A6D7..2A6FF ; Cn # [41] <reserved-2A6D7>..<reserved-2A6FF> 2A6D7..2A6FF ; Cn # [41] <reserved-2A6D7>..<reserved-2A6FF>
2B735..2B73F ; Cn # [11] <reserved-2B735>..<reserved-2B73F> 2B735..2B73F ; Cn # [11] <reserved-2B735>..<reserved-2B73F>
2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F> 2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F>
2CEA2..2F7FF ; Cn # [10590] <reserved-2CEA2>..<reserved-2F7FF> 2CEA2..2CEAF ; Cn # [14] <reserved-2CEA2>..<reserved-2CEAF>
2EBE1..2F7FF ; Cn # [3103] <reserved-2EBE1>..<reserved-2F7FF>
2FA1E..E0000 ; Cn # [722403] <reserved-2FA1E>..<reserved-E0000> 2FA1E..E0000 ; Cn # [722403] <reserved-2FA1E>..<reserved-E0000>
E0002..E001F ; Cn # [30] <reserved-E0002>..<reserved-E001F> E0002..E001F ; Cn # [30] <reserved-E0002>..<reserved-E001F>
E0080..E00FF ; Cn # [128] <reserved-E0080>..<reserved-E00FF> E0080..E00FF ; Cn # [128] <reserved-E0080>..<reserved-E00FF>
@ -628,7 +665,7 @@ E01F0..EFFFF ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF>
FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF> FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF> 10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>
# Total code points: 853859 # Total code points: 837841
# ================================================ # ================================================
@ -1221,11 +1258,12 @@ A7A2 ; Lu # LATIN CAPITAL LETTER K WITH OBLIQUE STROKE
A7A4 ; Lu # LATIN CAPITAL LETTER N WITH OBLIQUE STROKE A7A4 ; Lu # LATIN CAPITAL LETTER N WITH OBLIQUE STROKE
A7A6 ; Lu # LATIN CAPITAL LETTER R WITH OBLIQUE STROKE A7A6 ; Lu # LATIN CAPITAL LETTER R WITH OBLIQUE STROKE
A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE
A7AA..A7AD ; Lu # [4] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER L WITH BELT A7AA..A7AE ; Lu # [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA
A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA
FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW 10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW
104B0..104D3 ; Lu # [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US 10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US
118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO 118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO
1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z 1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z
@ -1259,8 +1297,9 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
1D756..1D76E ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA 1D756..1D76E ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA
1D790..1D7A8 ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA 1D790..1D7A8 ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA
1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA 1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA
1E900..1E921 ; Lu # [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA
# Total code points: 1631 # Total code points: 1702
# ================================================ # ================================================
@ -1537,6 +1576,7 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER 052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER
0561..0587 ; Ll # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN 0561..0587 ; Ll # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV 13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
1C80..1C88 ; Ll # [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL 1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL
1D6B..1D77 ; Ll # [13] LATIN SMALL LETTER UE..LATIN SMALL LETTER TURNED G 1D6B..1D77 ; Ll # [13] LATIN SMALL LETTER UE..LATIN SMALL LETTER TURNED G
1D79..1D9A ; Ll # [34] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK 1D79..1D9A ; Ll # [34] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK
@ -1866,6 +1906,7 @@ FB00..FB06 ; Ll # [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST
FB13..FB17 ; Ll # [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH FB13..FB17 ; Ll # [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
10428..1044F ; Ll # [40] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER EW 10428..1044F ; Ll # [40] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER EW
104D8..104FB ; Ll # [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US 10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US
118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO 118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO
1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z 1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z
@ -1896,8 +1937,9 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL
1D7AA..1D7C2 ; Ll # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA 1D7AA..1D7C2 ; Ll # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA
1D7C4..1D7C9 ; Ll # [6] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL 1D7C4..1D7C9 ; Ll # [6] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL
1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA 1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA
1E922..1E943 ; Ll # [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA
# Total code points: 1984 # Total code points: 2063
# ================================================ # ================================================
@ -1976,8 +2018,9 @@ FF70 ; Lm # HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
16B40..16B43 ; Lm # [4] PAHAWH HMONG SIGN VOS SEEV..PAHAWH HMONG SIGN IB YAM 16B40..16B43 ; Lm # [4] PAHAWH HMONG SIGN VOS SEEV..PAHAWH HMONG SIGN IB YAM
16F93..16F9F ; Lm # [13] MIAO LETTER TONE-2..MIAO LETTER REFORMED TONE-8 16F93..16F9F ; Lm # [13] MIAO LETTER TONE-2..MIAO LETTER REFORMED TONE-8
16FE0..16FE1 ; Lm # [2] TANGUT ITERATION MARK..NUSHU ITERATION MARK
# Total code points: 248 # Total code points: 250
# ================================================ # ================================================
@ -2005,7 +2048,9 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
07CA..07EA ; Lo # [33] NKO LETTER A..NKO LETTER JONA RA 07CA..07EA ; Lo # [33] NKO LETTER A..NKO LETTER JONA RA
0800..0815 ; Lo # [22] SAMARITAN LETTER ALAF..SAMARITAN LETTER TAAF 0800..0815 ; Lo # [22] SAMARITAN LETTER ALAF..SAMARITAN LETTER TAAF
0840..0858 ; Lo # [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN 0840..0858 ; Lo # [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN
0860..086A ; Lo # [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
08A0..08B4 ; Lo # [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW 08A0..08B4 ; Lo # [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08B6..08BD ; Lo # [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
0904..0939 ; Lo # [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA 0904..0939 ; Lo # [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA
093D ; Lo # DEVANAGARI SIGN AVAGRAHA 093D ; Lo # DEVANAGARI SIGN AVAGRAHA
0950 ; Lo # DEVANAGARI OM 0950 ; Lo # DEVANAGARI OM
@ -2022,6 +2067,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
09DC..09DD ; Lo # [2] BENGALI LETTER RRA..BENGALI LETTER RHA 09DC..09DD ; Lo # [2] BENGALI LETTER RRA..BENGALI LETTER RHA
09DF..09E1 ; Lo # [3] BENGALI LETTER YYA..BENGALI LETTER VOCALIC LL 09DF..09E1 ; Lo # [3] BENGALI LETTER YYA..BENGALI LETTER VOCALIC LL
09F0..09F1 ; Lo # [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL 09F0..09F1 ; Lo # [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL
09FC ; Lo # BENGALI LETTER VEDIC ANUSVARA
0A05..0A0A ; Lo # [6] GURMUKHI LETTER A..GURMUKHI LETTER UU 0A05..0A0A ; Lo # [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
0A0F..0A10 ; Lo # [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI 0A0F..0A10 ; Lo # [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI
0A13..0A28 ; Lo # [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA 0A13..0A28 ; Lo # [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA
@ -2070,6 +2116,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
0C3D ; Lo # TELUGU SIGN AVAGRAHA 0C3D ; Lo # TELUGU SIGN AVAGRAHA
0C58..0C5A ; Lo # [3] TELUGU LETTER TSA..TELUGU LETTER RRRA 0C58..0C5A ; Lo # [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
0C60..0C61 ; Lo # [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL 0C60..0C61 ; Lo # [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
0C80 ; Lo # KANNADA SIGN SPACING CANDRABINDU
0C85..0C8C ; Lo # [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L 0C85..0C8C ; Lo # [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
0C8E..0C90 ; Lo # [3] KANNADA LETTER E..KANNADA LETTER AI 0C8E..0C90 ; Lo # [3] KANNADA LETTER E..KANNADA LETTER AI
0C92..0CA8 ; Lo # [23] KANNADA LETTER O..KANNADA LETTER NA 0C92..0CA8 ; Lo # [23] KANNADA LETTER O..KANNADA LETTER NA
@ -2084,6 +2131,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
0D12..0D3A ; Lo # [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA 0D12..0D3A ; Lo # [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
0D3D ; Lo # MALAYALAM SIGN AVAGRAHA 0D3D ; Lo # MALAYALAM SIGN AVAGRAHA
0D4E ; Lo # MALAYALAM LETTER DOT REPH 0D4E ; Lo # MALAYALAM LETTER DOT REPH
0D54..0D56 ; Lo # [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
0D5F..0D61 ; Lo # [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL 0D5F..0D61 ; Lo # [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
0D7A..0D7F ; Lo # [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K 0D7A..0D7F ; Lo # [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
0D85..0D96 ; Lo # [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA 0D85..0D96 ; Lo # [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
@ -2156,7 +2204,8 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
17DC ; Lo # KHMER SIGN AVAKRAHASANYA 17DC ; Lo # KHMER SIGN AVAKRAHASANYA
1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI 1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1844..1877 ; Lo # [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA 1844..1877 ; Lo # [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
1880..18A8 ; Lo # [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA 1880..1884 ; Lo # [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
1887..18A8 ; Lo # [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA 18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA
18B0..18F5 ; Lo # [70] CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CARRIER DENTAL S 18B0..18F5 ; Lo # [70] CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CARRIER DENTAL S
1900..191E ; Lo # [31] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER TRA 1900..191E ; Lo # [31] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER TRA
@ -2194,12 +2243,12 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
309F ; Lo # HIRAGANA DIGRAPH YORI 309F ; Lo # HIRAGANA DIGRAPH YORI
30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO 30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
30FF ; Lo # KATAKANA DIGRAPH KOTO 30FF ; Lo # KATAKANA DIGRAPH KOTO
3105..312D ; Lo # [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH 3105..312E ; Lo # [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE 3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
31A0..31BA ; Lo # [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY 31A0..31BA ; Lo # [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO 31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
3400..4DB5 ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5 3400..4DB5 ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4E00..9FD5 ; Lo # [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5 4E00..9FEA ; Lo # [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E
A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE
@ -2283,7 +2332,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
10280..1029C ; Lo # [29] LYCIAN LETTER A..LYCIAN LETTER X 10280..1029C ; Lo # [29] LYCIAN LETTER A..LYCIAN LETTER X
102A0..102D0 ; Lo # [49] CARIAN LETTER A..CARIAN LETTER UUU3 102A0..102D0 ; Lo # [49] CARIAN LETTER A..CARIAN LETTER UUU3
10300..1031F ; Lo # [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS 10300..1031F ; Lo # [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
10330..10340 ; Lo # [17] GOTHIC LETTER AHSA..GOTHIC LETTER PAIRTHRA 1032D..10340 ; Lo # [20] OLD ITALIC LETTER YE..GOTHIC LETTER PAIRTHRA
10342..10349 ; Lo # [8] GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL 10342..10349 ; Lo # [8] GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL
10350..10375 ; Lo # [38] OLD PERMIC LETTER AN..OLD PERMIC LETTER IA 10350..10375 ; Lo # [38] OLD PERMIC LETTER AN..OLD PERMIC LETTER IA
10380..1039D ; Lo # [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU 10380..1039D ; Lo # [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU
@ -2349,6 +2398,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
1133D ; Lo # GRANTHA SIGN AVAGRAHA 1133D ; Lo # GRANTHA SIGN AVAGRAHA
11350 ; Lo # GRANTHA OM 11350 ; Lo # GRANTHA OM
1135D..11361 ; Lo # [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL 1135D..11361 ; Lo # [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL
11400..11434 ; Lo # [53] NEWA LETTER A..NEWA LETTER HA
11447..1144A ; Lo # [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
11480..114AF ; Lo # [48] TIRHUTA ANJI..TIRHUTA LETTER HA 11480..114AF ; Lo # [48] TIRHUTA ANJI..TIRHUTA LETTER HA
114C4..114C5 ; Lo # [2] TIRHUTA SIGN AVAGRAHA..TIRHUTA GVANG 114C4..114C5 ; Lo # [2] TIRHUTA SIGN AVAGRAHA..TIRHUTA GVANG
114C7 ; Lo # TIRHUTA OM 114C7 ; Lo # TIRHUTA OM
@ -2359,7 +2410,21 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA 11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA
11700..11719 ; Lo # [26] AHOM LETTER KA..AHOM LETTER JHA 11700..11719 ; Lo # [26] AHOM LETTER KA..AHOM LETTER JHA
118FF ; Lo # WARANG CITI OM 118FF ; Lo # WARANG CITI OM
11A00 ; Lo # ZANABAZAR SQUARE LETTER A
11A0B..11A32 ; Lo # [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
11A3A ; Lo # ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A50 ; Lo # SOYOMBO LETTER A
11A5C..11A83 ; Lo # [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
11A86..11A89 ; Lo # [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL 11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
11C00..11C08 ; Lo # [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
11C0A..11C2E ; Lo # [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
11C40 ; Lo # BHAIKSUKI SIGN AVAGRAHA
11C72..11C8F ; Lo # [30] MARCHEN LETTER KA..MARCHEN LETTER A
11D00..11D06 ; Lo # [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
11D08..11D09 ; Lo # [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
11D0B..11D30 ; Lo # [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
11D46 ; Lo # MASARAM GONDI REPHA
12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U 12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U
12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU 12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032 13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
@ -2372,7 +2437,10 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ 16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ
16F00..16F44 ; Lo # [69] MIAO LETTER PA..MIAO LETTER HHA 16F00..16F44 ; Lo # [69] MIAO LETTER PA..MIAO LETTER HHA
16F50 ; Lo # MIAO LETTER NASALIZATION 16F50 ; Lo # MIAO LETTER NASALIZATION
1B000..1B001 ; Lo # [2] KATAKANA LETTER ARCHAIC E..HIRAGANA LETTER ARCHAIC YE 17000..187EC ; Lo # [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
18800..18AF2 ; Lo # [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
1B000..1B11E ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2
1B170..1B2FB ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
1BC00..1BC6A ; Lo # [107] DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M 1BC00..1BC6A ; Lo # [107] DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M
1BC70..1BC7C ; Lo # [13] DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLOYAN AFFIX ATTACHED TANGENT HOOK 1BC70..1BC7C ; Lo # [13] DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLOYAN AFFIX ATTACHED TANGENT HOOK
1BC80..1BC88 ; Lo # [9] DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HIGH VERTICAL 1BC80..1BC88 ; Lo # [9] DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HIGH VERTICAL
@ -2415,9 +2483,10 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
2A700..2B734 ; Lo # [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734 2A700..2B734 ; Lo # [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
2B740..2B81D ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B740..2B81D ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2B820..2CEA1 ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
# Total code points: 105697 # Total code points: 121047
# ================================================ # ================================================
@ -2446,6 +2515,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U 0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA 0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK 0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
08D4..08E1 ; Mn # [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA 08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
093A ; Mn # DEVANAGARI VOWEL SIGN OE 093A ; Mn # DEVANAGARI VOWEL SIGN OE
093C ; Mn # DEVANAGARI SIGN NUKTA 093C ; Mn # DEVANAGARI SIGN NUKTA
@ -2472,6 +2542,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0AC7..0AC8 ; Mn # [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI 0AC7..0AC8 ; Mn # [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
0ACD ; Mn # GUJARATI SIGN VIRAMA 0ACD ; Mn # GUJARATI SIGN VIRAMA
0AE2..0AE3 ; Mn # [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL 0AE2..0AE3 ; Mn # [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
0AFA..0AFF ; Mn # [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
0B01 ; Mn # ORIYA SIGN CANDRABINDU 0B01 ; Mn # ORIYA SIGN CANDRABINDU
0B3C ; Mn # ORIYA SIGN NUKTA 0B3C ; Mn # ORIYA SIGN NUKTA
0B3F ; Mn # ORIYA VOWEL SIGN I 0B3F ; Mn # ORIYA VOWEL SIGN I
@ -2494,7 +2565,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0CC6 ; Mn # KANNADA VOWEL SIGN E 0CC6 ; Mn # KANNADA VOWEL SIGN E
0CCC..0CCD ; Mn # [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA 0CCC..0CCD ; Mn # [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CE2..0CE3 ; Mn # [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL 0CE2..0CE3 ; Mn # [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0D01 ; Mn # MALAYALAM SIGN CANDRABINDU 0D00..0D01 ; Mn # [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D3B..0D3C ; Mn # [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
0D41..0D44 ; Mn # [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR 0D41..0D44 ; Mn # [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
0D4D ; Mn # MALAYALAM SIGN VIRAMA 0D4D ; Mn # MALAYALAM SIGN VIRAMA
0D62..0D63 ; Mn # [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL 0D62..0D63 ; Mn # [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
@ -2540,6 +2612,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
17C9..17D3 ; Mn # [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT 17C9..17D3 ; Mn # [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
17DD ; Mn # KHMER SIGN ATTHACAN 17DD ; Mn # KHMER SIGN ATTHACAN
180B..180D ; Mn # [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE 180B..180D ; Mn # [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
1885..1886 ; Mn # [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
18A9 ; Mn # MONGOLIAN LETTER ALI GALI DAGALGA 18A9 ; Mn # MONGOLIAN LETTER ALI GALI DAGALGA
1920..1922 ; Mn # [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U 1920..1922 ; Mn # [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
1927..1928 ; Mn # [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O 1927..1928 ; Mn # [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
@ -2577,8 +2650,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
1CED ; Mn # VEDIC SIGN TIRYAK 1CED ; Mn # VEDIC SIGN TIRYAK
1CF4 ; Mn # VEDIC TONE CANDRA ABOVE 1CF4 ; Mn # VEDIC TONE CANDRA ABOVE
1CF8..1CF9 ; Mn # [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE 1CF8..1CF9 ; Mn # [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
1DC0..1DF5 ; Mn # [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE 1DC0..1DF9 ; Mn # [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFC..1DFF ; Mn # [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW 1DFB..1DFF ; Mn # [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
20D0..20DC ; Mn # [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE 20D0..20DC ; Mn # [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20E1 ; Mn # COMBINING LEFT RIGHT ARROW ABOVE 20E1 ; Mn # COMBINING LEFT RIGHT ARROW ABOVE
20E5..20F0 ; Mn # [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE 20E5..20F0 ; Mn # [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE
@ -2595,7 +2668,7 @@ A802 ; Mn # SYLOTI NAGRI SIGN DVISVARA
A806 ; Mn # SYLOTI NAGRI SIGN HASANTA A806 ; Mn # SYLOTI NAGRI SIGN HASANTA
A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA
A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A8C4 ; Mn # SAURASHTRA SIGN VIRAMA A8C4..A8C5 ; Mn # [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
@ -2647,6 +2720,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI 1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
11234 ; Mn # KHOJKI SIGN ANUSVARA 11234 ; Mn # KHOJKI SIGN ANUSVARA
11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA 11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
1123E ; Mn # KHOJKI SIGN SUKUN
112DF ; Mn # KHUDAWADI SIGN ANUSVARA 112DF ; Mn # KHUDAWADI SIGN ANUSVARA
112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA 112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU 11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
@ -2654,6 +2728,9 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
11340 ; Mn # GRANTHA VOWEL SIGN II 11340 ; Mn # GRANTHA VOWEL SIGN II
11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX 11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA 11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
11438..1143F ; Mn # [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11442..11444 ; Mn # [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11446 ; Mn # NEWA SIGN NUKTA
114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL 114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E 114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E
114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA 114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA
@ -2672,6 +2749,27 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA 1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU 11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER 11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11A01..11A06 ; Mn # [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A09..11A0A ; Mn # [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A33..11A38 ; Mn # [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A3B..11A3E ; Mn # [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A47 ; Mn # ZANABAZAR SQUARE SUBJOINER
11A51..11A56 ; Mn # [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
11A59..11A5B ; Mn # [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
11A8A..11A96 ; Mn # [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
11A98..11A99 ; Mn # [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11C30..11C36 ; Mn # [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
11C38..11C3D ; Mn # [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
11C3F ; Mn # BHAIKSUKI SIGN VIRAMA
11C92..11CA7 ; Mn # [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
11CAA..11CB0 ; Mn # [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
11CB2..11CB3 ; Mn # [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
11CB5..11CB6 ; Mn # [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
11D31..11D36 ; Mn # [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
11D3A ; Mn # MASARAM GONDI VOWEL SIGN E
11D3C..11D3D ; Mn # [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Mn # [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D47 ; Mn # MASARAM GONDI RA-KARA
16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE 16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM 16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW 16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -2687,10 +2785,16 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1DA84 ; Mn # SIGNWRITING LOCATION HEAD NECK 1DA84 ; Mn # SIGNWRITING LOCATION HEAD NECK
1DA9B..1DA9F ; Mn # [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6 1DA9B..1DA9F ; Mn # [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
1DAA1..1DAAF ; Mn # [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16 1DAA1..1DAAF ; Mn # [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
1E000..1E006 ; Mn # [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018 ; Mn # [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021 ; Mn # [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
1E023..1E024 ; Mn # [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
1E026..1E02A ; Mn # [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
1E8D0..1E8D6 ; Mn # [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS 1E8D0..1E8D6 ; Mn # [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
1E944..1E94A ; Mn # [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 1567 # Total code points: 1763
# ================================================ # ================================================
@ -2795,6 +2899,7 @@ A670..A672 ; Me # [3] COMBINING CYRILLIC TEN MILLIONS SIGN..COMBINING CYRIL
1C34..1C35 ; Mc # [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG 1C34..1C35 ; Mc # [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
1CE1 ; Mc # VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA 1CE1 ; Mc # VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
1CF2..1CF3 ; Mc # [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA 1CF2..1CF3 ; Mc # [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
1CF7 ; Mc # VEDIC SIGN ATIKRAMA
302E..302F ; Mc # [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK 302E..302F ; Mc # [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK
A823..A824 ; Mc # [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I A823..A824 ; Mc # [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
A827 ; Mc # SYLOTI NAGRI VOWEL SIGN OO A827 ; Mc # SYLOTI NAGRI VOWEL SIGN OO
@ -2837,6 +2942,9 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
1134B..1134D ; Mc # [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA 1134B..1134D ; Mc # [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
11357 ; Mc # GRANTHA AU LENGTH MARK 11357 ; Mc # GRANTHA AU LENGTH MARK
11362..11363 ; Mc # [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL 11362..11363 ; Mc # [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
11435..11437 ; Mc # [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
11440..11441 ; Mc # [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
11445 ; Mc # NEWA SIGN VISARGA
114B0..114B2 ; Mc # [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II 114B0..114B2 ; Mc # [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II
114B9 ; Mc # TIRHUTA VOWEL SIGN E 114B9 ; Mc # TIRHUTA VOWEL SIGN E
114BB..114BE ; Mc # [4] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN AU 114BB..114BE ; Mc # [4] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN AU
@ -2852,11 +2960,20 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
116B6 ; Mc # TAKRI SIGN VIRAMA 116B6 ; Mc # TAKRI SIGN VIRAMA
11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA 11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11726 ; Mc # AHOM VOWEL SIGN E 11726 ; Mc # AHOM VOWEL SIGN E
11A07..11A08 ; Mc # [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A39 ; Mc # ZANABAZAR SQUARE SIGN VISARGA
11A57..11A58 ; Mc # [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A97 ; Mc # SOYOMBO SIGN VISARGA
11C2F ; Mc # BHAIKSUKI VOWEL SIGN AA
11C3E ; Mc # BHAIKSUKI SIGN VISARGA
11CA9 ; Mc # MARCHEN SUBJOINED LETTER YA
11CB1 ; Mc # MARCHEN VOWEL SIGN I
11CB4 ; Mc # MARCHEN VOWEL SIGN O
16F51..16F7E ; Mc # [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG 16F51..16F7E ; Mc # [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5 1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5
# Total code points: 383 # Total code points: 401
# ================================================ # ================================================
@ -2905,16 +3022,20 @@ FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE 11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE
111D0..111D9 ; Nd # [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE 111D0..111D9 ; Nd # [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
112F0..112F9 ; Nd # [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE 112F0..112F9 ; Nd # [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE
11450..11459 ; Nd # [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
114D0..114D9 ; Nd # [10] TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE 114D0..114D9 ; Nd # [10] TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE
11650..11659 ; Nd # [10] MODI DIGIT ZERO..MODI DIGIT NINE 11650..11659 ; Nd # [10] MODI DIGIT ZERO..MODI DIGIT NINE
116C0..116C9 ; Nd # [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE 116C0..116C9 ; Nd # [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE
11730..11739 ; Nd # [10] AHOM DIGIT ZERO..AHOM DIGIT NINE 11730..11739 ; Nd # [10] AHOM DIGIT ZERO..AHOM DIGIT NINE
118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE 118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
11C50..11C59 ; Nd # [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
11D50..11D59 ; Nd # [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE 16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE
16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE 16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE
1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE 1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
1E950..1E959 ; Nd # [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
# Total code points: 550 # Total code points: 590
# ================================================ # ================================================
@ -2946,7 +3067,8 @@ A6E6..A6EF ; Nl # [10] BAMUM LETTER MO..BAMUM LETTER KOGHOM
0B72..0B77 ; No # [6] ORIYA FRACTION ONE QUARTER..ORIYA FRACTION THREE SIXTEENTHS 0B72..0B77 ; No # [6] ORIYA FRACTION ONE QUARTER..ORIYA FRACTION THREE SIXTEENTHS
0BF0..0BF2 ; No # [3] TAMIL NUMBER TEN..TAMIL NUMBER ONE THOUSAND 0BF0..0BF2 ; No # [3] TAMIL NUMBER TEN..TAMIL NUMBER ONE THOUSAND
0C78..0C7E ; No # [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR 0C78..0C7E ; No # [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR
0D70..0D75 ; No # [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS 0D58..0D5E ; No # [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
0D70..0D78 ; No # [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
0F2A..0F33 ; No # [10] TIBETAN DIGIT HALF ONE..TIBETAN DIGIT HALF ZERO 0F2A..0F33 ; No # [10] TIBETAN DIGIT HALF ONE..TIBETAN DIGIT HALF ZERO
1369..137C ; No # [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND 1369..137C ; No # [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND
17F0..17F9 ; No # [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON 17F0..17F9 ; No # [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON
@ -2993,12 +3115,13 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND 111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND
1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY 1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY 118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
11C5A..11C6C ; No # [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS 16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS
1D360..1D371 ; No # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE 1D360..1D371 ; No # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE 1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO 1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
# Total code points: 647 # Total code points: 676
# ================================================ # ================================================
@ -3048,6 +3171,7 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
061C ; Cf # ARABIC LETTER MARK 061C ; Cf # ARABIC LETTER MARK
06DD ; Cf # ARABIC END OF AYAH 06DD ; Cf # ARABIC END OF AYAH
070F ; Cf # SYRIAC ABBREVIATION MARK 070F ; Cf # SYRIAC ABBREVIATION MARK
08E2 ; Cf # ARABIC DISPUTED END OF AYAH
180E ; Cf # MONGOLIAN VOWEL SEPARATOR 180E ; Cf # MONGOLIAN VOWEL SEPARATOR
200B..200F ; Cf # [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK 200B..200F ; Cf # [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK
202A..202E ; Cf # [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE 202A..202E ; Cf # [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
@ -3061,7 +3185,7 @@ FFF9..FFFB ; Cf # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION
E0001 ; Cf # LANGUAGE TAG E0001 ; Cf # LANGUAGE TAG
E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG
# Total code points: 150 # Total code points: 151
# ================================================ # ================================================
@ -3315,6 +3439,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
085E ; Po # MANDAIC PUNCTUATION 085E ; Po # MANDAIC PUNCTUATION
0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA 0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
0970 ; Po # DEVANAGARI ABBREVIATION SIGN 0970 ; Po # DEVANAGARI ABBREVIATION SIGN
09FD ; Po # BENGALI ABBREVIATION SIGN
0AF0 ; Po # GUJARATI ABBREVIATION SIGN 0AF0 ; Po # GUJARATI ABBREVIATION SIGN
0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA 0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA
0E4F ; Po # THAI CHARACTER FONGMAN 0E4F ; Po # THAI CHARACTER FONGMAN
@ -3366,6 +3491,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN 2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN
2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM 2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM
2E41 ; Po # REVERSED COMMA 2E41 ; Po # REVERSED COMMA
2E43..2E49 ; Po # [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK 3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK
303D ; Po # PART ALTERNATION MARK 303D ; Po # PART ALTERNATION MARK
30FB ; Po # KATAKANA MIDDLE DOT 30FB ; Po # KATAKANA MIDDLE DOT
@ -3429,10 +3555,19 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2 111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2
11238..1123D ; Po # [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN 11238..1123D ; Po # [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
112A9 ; Po # MULTANI SECTION MARK 112A9 ; Po # MULTANI SECTION MARK
1144B..1144F ; Po # [5] NEWA DANDA..NEWA ABBREVIATION SIGN
1145B ; Po # NEWA PLACEHOLDER MARK
1145D ; Po # NEWA INSERTION SIGN
114C6 ; Po # TIRHUTA ABBREVIATION SIGN 114C6 ; Po # TIRHUTA ABBREVIATION SIGN
115C1..115D7 ; Po # [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES 115C1..115D7 ; Po # [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN 11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN
11660..1166C ; Po # [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI 1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
11A3F..11A46 ; Po # [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
11A9A..11A9C ; Po # [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9E..11AA2 ; Po # [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
11C41..11C45 ; Po # [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
11C70..11C71 ; Po # [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON 12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA 16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA
16AF5 ; Po # BASSA VAH FULL STOP 16AF5 ; Po # BASSA VAH FULL STOP
@ -3440,8 +3575,9 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
16B44 ; Po # PAHAWH HMONG SIGN XAUS 16B44 ; Po # PAHAWH HMONG SIGN XAUS
1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP 1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS 1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
1E95E..1E95F ; Po # [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
# Total code points: 513 # Total code points: 566
# ================================================ # ================================================
@ -3528,7 +3664,7 @@ FFE9..FFEC ; Sm # [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW
0BF9 ; Sc # TAMIL RUPEE SIGN 0BF9 ; Sc # TAMIL RUPEE SIGN
0E3F ; Sc # THAI CURRENCY SYMBOL BAHT 0E3F ; Sc # THAI CURRENCY SYMBOL BAHT
17DB ; Sc # KHMER CURRENCY SYMBOL RIEL 17DB ; Sc # KHMER CURRENCY SYMBOL RIEL
20A0..20BE ; Sc # [31] EURO-CURRENCY SIGN..LARI SIGN 20A0..20BF ; Sc # [32] EURO-CURRENCY SIGN..BITCOIN SIGN
A838 ; Sc # NORTH INDIC RUPEE MARK A838 ; Sc # NORTH INDIC RUPEE MARK
FDFC ; Sc # RIAL SIGN FDFC ; Sc # RIAL SIGN
FE69 ; Sc # SMALL DOLLAR SIGN FE69 ; Sc # SMALL DOLLAR SIGN
@ -3536,7 +3672,7 @@ FF04 ; Sc # FULLWIDTH DOLLAR SIGN
FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN
FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN
# Total code points: 53 # Total code points: 54
# ================================================ # ================================================
@ -3594,6 +3730,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
0BF3..0BF8 ; So # [6] TAMIL DAY SIGN..TAMIL AS ABOVE SIGN 0BF3..0BF8 ; So # [6] TAMIL DAY SIGN..TAMIL AS ABOVE SIGN
0BFA ; So # TAMIL NUMBER SIGN 0BFA ; So # TAMIL NUMBER SIGN
0C7F ; So # TELUGU SIGN TUUMU 0C7F ; So # TELUGU SIGN TUUMU
0D4F ; So # MALAYALAM SIGN PARA
0D79 ; So # MALAYALAM DATE MARK 0D79 ; So # MALAYALAM DATE MARK
0F01..0F03 ; So # [3] TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA 0F01..0F03 ; So # [3] TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
0F13 ; So # TIBETAN MARK CARET -DZUD RTAGS ME LONG CAN 0F13 ; So # TIBETAN MARK CARET -DZUD RTAGS ME LONG CAN
@ -3642,8 +3779,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
232B..237B ; So # [81] ERASE TO THE LEFT..NOT CHECK MARK 232B..237B ; So # [81] ERASE TO THE LEFT..NOT CHECK MARK
237D..239A ; So # [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL 237D..239A ; So # [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL
23B4..23DB ; So # [40] TOP SQUARE BRACKET..FUSE 23B4..23DB ; So # [40] TOP SQUARE BRACKET..FUSE
23E2..23FA ; So # [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD 23E2..2426 ; So # [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
2400..2426 ; So # [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
2440..244A ; So # [11] OCR HOOK..OCR DOUBLE BACKSLASH 2440..244A ; So # [11] OCR HOOK..OCR DOUBLE BACKSLASH
249C..24E9 ; So # [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z 249C..24E9 ; So # [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
2500..25B6 ; So # [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE 2500..25B6 ; So # [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE
@ -3659,7 +3795,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW 2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BB9 ; So # [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX 2B98..2BB9 ; So # [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8 ; So # [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED 2BBD..2BC8 ; So # [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD1 ; So # [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN 2BCA..2BD2 ; So # [9] TOP HALF BLACK CIRCLE..GROUP MARK
2BEC..2BEF ; So # [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS 2BEC..2BEF ; So # [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA 2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA
2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP 2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP
@ -3694,7 +3830,7 @@ FFED..FFEE ; So # [2] HALFWIDTH BLACK SQUARE..HALFWIDTH WHITE CIRCLE
FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
10137..1013F ; So # [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT 10137..1013F ; So # [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT
10179..10189 ; So # [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN 10179..10189 ; So # [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
1018C ; So # GREEK SINUSOID SIGN 1018C..1018E ; So # [3] GREEK SINUSOID SIGN..NOMISMA SIGN
10190..1019B ; So # [12] ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN 10190..1019B ; So # [12] ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN
101A0 ; So # GREEK SYMBOL TAU RHO 101A0 ; So # GREEK SYMBOL TAU RHO
101D0..101FC ; So # [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND 101D0..101FC ; So # [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND
@ -3727,17 +3863,16 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21 1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
1F110..1F12E ; So # [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ 1F110..1F12E ; So # [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
1F130..1F16B ; So # [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN 1F130..1F16B ; So # [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F170..1F19A ; So # [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS 1F170..1F1AC ; So # [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA 1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA
1F210..1F23A ; So # [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6 1F210..1F23B ; So # [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
1F240..1F248 ; So # [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557 1F240..1F248 ; So # [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
1F250..1F251 ; So # [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT 1F250..1F251 ; So # [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
1F260..1F265 ; So # [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
1F300..1F3FA ; So # [251] CYCLONE..AMPHORA 1F300..1F3FA ; So # [251] CYCLONE..AMPHORA
1F400..1F579 ; So # [378] RAT..JOYSTICK 1F400..1F6D4 ; So # [725] RAT..PAGODA
1F57B..1F5A3 ; So # [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
1F5A5..1F6D0 ; So # [300] DESKTOP COMPUTER..PLACE OF WORSHIP
1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING 1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F3 ; So # [4] SATELLITE..PASSENGER SHIP 1F6F0..1F6F8 ; So # [9] SATELLITE..FLYING SAUCER
1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE 1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D4 ; So # [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR 1F780..1F7D4 ; So # [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
@ -3745,11 +3880,15 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW 1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
1F860..1F887 ; So # [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW 1F860..1F887 ; So # [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS 1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F910..1F918 ; So # [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS 1F900..1F90B ; So # [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F980..1F984 ; So # [5] CRAB..UNICORN FACE 1F910..1F93E ; So # [47] ZIPPER-MOUTH FACE..HANDBALL
1F940..1F94C ; So # [13] WILTED FLOWER..CURLING STONE
1F950..1F96B ; So # [28] CROISSANT..CANNED FOOD
1F980..1F997 ; So # [24] CRAB..CRICKET
1F9C0 ; So # CHEESE WEDGE 1F9C0 ; So # CHEESE WEDGE
1F9D0..1F9E6 ; So # [23] FACE WITH MONOCLE..SOCKS
# Total code points: 5677 # Total code points: 5855
# ================================================ # ================================================

View File

@ -1,10 +1,11 @@
# GraphemeBreakProperty-8.0.0.txt # GraphemeBreakProperty-10.0.0.txt
# Date: 2015-02-13, 13:47:14 GMT [MD] # Date: 2017-03-12, 07:03:41 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# #
# Unicode Character Database # Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc. # For documentation, see http://www.unicode.org/reports/tr44/
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# ================================================ # ================================================
@ -17,6 +18,21 @@
# ================================================ # ================================================
0600..0605 ; Prepend # Cf [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
06DD ; Prepend # Cf ARABIC END OF AYAH
070F ; Prepend # Cf SYRIAC ABBREVIATION MARK
08E2 ; Prepend # Cf ARABIC DISPUTED END OF AYAH
0D4E ; Prepend # Lo MALAYALAM LETTER DOT REPH
110BD ; Prepend # Cf KAITHI NUMBER SIGN
111C2..111C3 ; Prepend # Lo [2] SHARADA SIGN JIHVAMULIYA..SHARADA SIGN UPADHMANIYA
11A3A ; Prepend # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A86..11A89 ; Prepend # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11D46 ; Prepend # Lo MASARAM GONDI REPHA
# Total code points: 19
# ================================================
000D ; CR # Cc <control-000D> 000D ; CR # Cc <control-000D>
# Total code points: 1 # Total code points: 1
@ -34,10 +50,7 @@
000E..001F ; Control # Cc [18] <control-000E>..<control-001F> 000E..001F ; Control # Cc [18] <control-000E>..<control-001F>
007F..009F ; Control # Cc [33] <control-007F>..<control-009F> 007F..009F ; Control # Cc [33] <control-007F>..<control-009F>
00AD ; Control # Cf SOFT HYPHEN 00AD ; Control # Cf SOFT HYPHEN
0600..0605 ; Control # Cf [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
061C ; Control # Cf ARABIC LETTER MARK 061C ; Control # Cf ARABIC LETTER MARK
06DD ; Control # Cf ARABIC END OF AYAH
070F ; Control # Cf SYRIAC ABBREVIATION MARK
180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR 180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR
200B ; Control # Cf ZERO WIDTH SPACE 200B ; Control # Cf ZERO WIDTH SPACE
200E..200F ; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK 200E..200F ; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
@ -51,17 +64,15 @@ D800..DFFF ; Control # Cs [2048] <surrogate-D800>..<surrogate-DFFF>
FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE
FFF0..FFF8 ; Control # Cn [9] <reserved-FFF0>..<reserved-FFF8> FFF0..FFF8 ; Control # Cn [9] <reserved-FFF0>..<reserved-FFF8>
FFF9..FFFB ; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR FFF9..FFFB ; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR
110BD ; Control # Cf KAITHI NUMBER SIGN
1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP 1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP
1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE 1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
E0000 ; Control # Cn <reserved-E0000> E0000 ; Control # Cn <reserved-E0000>
E0001 ; Control # Cf LANGUAGE TAG E0001 ; Control # Cf LANGUAGE TAG
E0002..E001F ; Control # Cn [30] <reserved-E0002>..<reserved-E001F> E0002..E001F ; Control # Cn [30] <reserved-E0002>..<reserved-E001F>
E0020..E007F ; Control # Cf [96] TAG SPACE..CANCEL TAG
E0080..E00FF ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF> E0080..E00FF ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF>
E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF> E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
# Total code points: 6030 # Total code points: 5925
# ================================================ # ================================================
@ -89,6 +100,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0825..0827 ; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U 0825..0827 ; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
0829..082D ; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA 0829..082D ; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
0859..085B ; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK 0859..085B ; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
08D4..08E1 ; Extend # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08E3..0902 ; Extend # Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA 08E3..0902 ; Extend # Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
093A ; Extend # Mn DEVANAGARI VOWEL SIGN OE 093A ; Extend # Mn DEVANAGARI VOWEL SIGN OE
093C ; Extend # Mn DEVANAGARI SIGN NUKTA 093C ; Extend # Mn DEVANAGARI SIGN NUKTA
@ -117,6 +129,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0AC7..0AC8 ; Extend # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI 0AC7..0AC8 ; Extend # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
0ACD ; Extend # Mn GUJARATI SIGN VIRAMA 0ACD ; Extend # Mn GUJARATI SIGN VIRAMA
0AE2..0AE3 ; Extend # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL 0AE2..0AE3 ; Extend # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
0AFA..0AFF ; Extend # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
0B01 ; Extend # Mn ORIYA SIGN CANDRABINDU 0B01 ; Extend # Mn ORIYA SIGN CANDRABINDU
0B3C ; Extend # Mn ORIYA SIGN NUKTA 0B3C ; Extend # Mn ORIYA SIGN NUKTA
0B3E ; Extend # Mc ORIYA VOWEL SIGN AA 0B3E ; Extend # Mc ORIYA VOWEL SIGN AA
@ -145,7 +158,8 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0CCC..0CCD ; Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA 0CCC..0CCD ; Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CD5..0CD6 ; Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK 0CD5..0CD6 ; Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
0CE2..0CE3 ; Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL 0CE2..0CE3 ; Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0D01 ; Extend # Mn MALAYALAM SIGN CANDRABINDU 0D00..0D01 ; Extend # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D3B..0D3C ; Extend # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
0D3E ; Extend # Mc MALAYALAM VOWEL SIGN AA 0D3E ; Extend # Mc MALAYALAM VOWEL SIGN AA
0D41..0D44 ; Extend # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR 0D41..0D44 ; Extend # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
0D4D ; Extend # Mn MALAYALAM SIGN VIRAMA 0D4D ; Extend # Mn MALAYALAM SIGN VIRAMA
@ -195,6 +209,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
17C9..17D3 ; Extend # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT 17C9..17D3 ; Extend # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
17DD ; Extend # Mn KHMER SIGN ATTHACAN 17DD ; Extend # Mn KHMER SIGN ATTHACAN
180B..180D ; Extend # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE 180B..180D ; Extend # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
1885..1886 ; Extend # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
18A9 ; Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA 18A9 ; Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA
1920..1922 ; Extend # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U 1920..1922 ; Extend # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
1927..1928 ; Extend # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O 1927..1928 ; Extend # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
@ -233,9 +248,9 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
1CED ; Extend # Mn VEDIC SIGN TIRYAK 1CED ; Extend # Mn VEDIC SIGN TIRYAK
1CF4 ; Extend # Mn VEDIC TONE CANDRA ABOVE 1CF4 ; Extend # Mn VEDIC TONE CANDRA ABOVE
1CF8..1CF9 ; Extend # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE 1CF8..1CF9 ; Extend # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
1DC0..1DF5 ; Extend # Mn [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE 1DC0..1DF9 ; Extend # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFC..1DFF ; Extend # Mn [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW 1DFB..1DFF ; Extend # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
200C..200D ; Extend # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER 200C ; Extend # Cf ZERO WIDTH NON-JOINER
20D0..20DC ; Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE 20D0..20DC ; Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20DD..20E0 ; Extend # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH 20DD..20E0 ; Extend # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
20E1 ; Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE 20E1 ; Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE
@ -256,7 +271,7 @@ A802 ; Extend # Mn SYLOTI NAGRI SIGN DVISVARA
A806 ; Extend # Mn SYLOTI NAGRI SIGN HASANTA A806 ; Extend # Mn SYLOTI NAGRI SIGN HASANTA
A80B ; Extend # Mn SYLOTI NAGRI SIGN ANUSVARA A80B ; Extend # Mn SYLOTI NAGRI SIGN ANUSVARA
A825..A826 ; Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E A825..A826 ; Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A8C4 ; Extend # Mn SAURASHTRA SIGN VIRAMA A8C4..A8C5 ; Extend # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8E0..A8F1 ; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA A8E0..A8F1 ; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
A926..A92D ; Extend # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU A926..A92D ; Extend # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
A947..A951 ; Extend # Mn [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R A947..A951 ; Extend # Mn [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
@ -309,6 +324,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1122F..11231 ; Extend # Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI 1122F..11231 ; Extend # Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
11234 ; Extend # Mn KHOJKI SIGN ANUSVARA 11234 ; Extend # Mn KHOJKI SIGN ANUSVARA
11236..11237 ; Extend # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA 11236..11237 ; Extend # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
1123E ; Extend # Mn KHOJKI SIGN SUKUN
112DF ; Extend # Mn KHUDAWADI SIGN ANUSVARA 112DF ; Extend # Mn KHUDAWADI SIGN ANUSVARA
112E3..112EA ; Extend # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA 112E3..112EA ; Extend # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
11300..11301 ; Extend # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU 11300..11301 ; Extend # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
@ -318,6 +334,9 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
11357 ; Extend # Mc GRANTHA AU LENGTH MARK 11357 ; Extend # Mc GRANTHA AU LENGTH MARK
11366..1136C ; Extend # Mn [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX 11366..1136C ; Extend # Mn [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
11370..11374 ; Extend # Mn [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA 11370..11374 ; Extend # Mn [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
11438..1143F ; Extend # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11442..11444 ; Extend # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11446 ; Extend # Mn NEWA SIGN NUKTA
114B0 ; Extend # Mc TIRHUTA VOWEL SIGN AA 114B0 ; Extend # Mc TIRHUTA VOWEL SIGN AA
114B3..114B8 ; Extend # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL 114B3..114B8 ; Extend # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
114BA ; Extend # Mn TIRHUTA VOWEL SIGN SHORT E 114BA ; Extend # Mn TIRHUTA VOWEL SIGN SHORT E
@ -339,6 +358,27 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1171D..1171F ; Extend # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA 1171D..1171F ; Extend # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11722..11725 ; Extend # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU 11722..11725 ; Extend # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11727..1172B ; Extend # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER 11727..1172B ; Extend # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11A01..11A06 ; Extend # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A09..11A0A ; Extend # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A33..11A38 ; Extend # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A3B..11A3E ; Extend # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A47 ; Extend # Mn ZANABAZAR SQUARE SUBJOINER
11A51..11A56 ; Extend # Mn [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
11A59..11A5B ; Extend # Mn [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
11A8A..11A96 ; Extend # Mn [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
11A98..11A99 ; Extend # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11C30..11C36 ; Extend # Mn [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
11C38..11C3D ; Extend # Mn [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
11C3F ; Extend # Mn BHAIKSUKI SIGN VIRAMA
11C92..11CA7 ; Extend # Mn [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
11CAA..11CB0 ; Extend # Mn [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
11CB2..11CB3 ; Extend # Mn [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
11CB5..11CB6 ; Extend # Mn [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
11D31..11D36 ; Extend # Mn [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
11D3A ; Extend # Mn MASARAM GONDI VOWEL SIGN E
11D3C..11D3D ; Extend # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Extend # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D47 ; Extend # Mn MASARAM GONDI RA-KARA
16AF0..16AF4 ; Extend # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE 16AF0..16AF4 ; Extend # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16B30..16B36 ; Extend # Mn [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM 16B30..16B36 ; Extend # Mn [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
16F8F..16F92 ; Extend # Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW 16F8F..16F92 ; Extend # Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -356,10 +396,17 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1DA84 ; Extend # Mn SIGNWRITING LOCATION HEAD NECK 1DA84 ; Extend # Mn SIGNWRITING LOCATION HEAD NECK
1DA9B..1DA9F ; Extend # Mn [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6 1DA9B..1DA9F ; Extend # Mn [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
1DAA1..1DAAF ; Extend # Mn [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16 1DAA1..1DAAF ; Extend # Mn [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
1E000..1E006 ; Extend # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018 ; Extend # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021 ; Extend # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
1E023..1E024 ; Extend # Mn [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
1E026..1E02A ; Extend # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
1E8D0..1E8D6 ; Extend # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS 1E8D0..1E8D6 ; Extend # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
1E944..1E94A ; Extend # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
E0020..E007F ; Extend # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 1610 # Total code points: 1901
# ================================================ # ================================================
@ -444,6 +491,7 @@ E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
1C34..1C35 ; SpacingMark # Mc [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG 1C34..1C35 ; SpacingMark # Mc [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
1CE1 ; SpacingMark # Mc VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA 1CE1 ; SpacingMark # Mc VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
1CF2..1CF3 ; SpacingMark # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA 1CF2..1CF3 ; SpacingMark # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
1CF7 ; SpacingMark # Mc VEDIC SIGN ATIKRAMA
A823..A824 ; SpacingMark # Mc [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I A823..A824 ; SpacingMark # Mc [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
A827 ; SpacingMark # Mc SYLOTI NAGRI VOWEL SIGN OO A827 ; SpacingMark # Mc SYLOTI NAGRI VOWEL SIGN OO
A880..A881 ; SpacingMark # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA A880..A881 ; SpacingMark # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
@ -482,6 +530,9 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
11347..11348 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI 11347..11348 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI
1134B..1134D ; SpacingMark # Mc [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA 1134B..1134D ; SpacingMark # Mc [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
11362..11363 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL 11362..11363 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
11435..11437 ; SpacingMark # Mc [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
11440..11441 ; SpacingMark # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
11445 ; SpacingMark # Mc NEWA SIGN VISARGA
114B1..114B2 ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN I..TIRHUTA VOWEL SIGN II 114B1..114B2 ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN I..TIRHUTA VOWEL SIGN II
114B9 ; SpacingMark # Mc TIRHUTA VOWEL SIGN E 114B9 ; SpacingMark # Mc TIRHUTA VOWEL SIGN E
114BB..114BC ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN O 114BB..114BC ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN O
@ -498,11 +549,20 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
116B6 ; SpacingMark # Mc TAKRI SIGN VIRAMA 116B6 ; SpacingMark # Mc TAKRI SIGN VIRAMA
11720..11721 ; SpacingMark # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA 11720..11721 ; SpacingMark # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11726 ; SpacingMark # Mc AHOM VOWEL SIGN E 11726 ; SpacingMark # Mc AHOM VOWEL SIGN E
11A07..11A08 ; SpacingMark # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A39 ; SpacingMark # Mc ZANABAZAR SQUARE SIGN VISARGA
11A57..11A58 ; SpacingMark # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A97 ; SpacingMark # Mc SOYOMBO SIGN VISARGA
11C2F ; SpacingMark # Mc BHAIKSUKI VOWEL SIGN AA
11C3E ; SpacingMark # Mc BHAIKSUKI SIGN VISARGA
11CA9 ; SpacingMark # Mc MARCHEN SUBJOINED LETTER YA
11CB1 ; SpacingMark # Mc MARCHEN VOWEL SIGN I
11CB4 ; SpacingMark # Mc MARCHEN VOWEL SIGN O
16F51..16F7E ; SpacingMark # Mc [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG 16F51..16F7E ; SpacingMark # Mc [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
1D166 ; SpacingMark # Mc MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D166 ; SpacingMark # Mc MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
1D16D ; SpacingMark # Mc MUSICAL SYMBOL COMBINING AUGMENTATION DOT 1D16D ; SpacingMark # Mc MUSICAL SYMBOL COMBINING AUGMENTATION DOT
# Total code points: 330 # Total code points: 348
# ================================================ # ================================================
@ -1333,4 +1393,83 @@ D789..D7A3 ; LVT # Lo [27] HANGUL SYLLABLE HIG..HANGUL SYLLABLE HIH
# Total code points: 10773 # Total code points: 10773
# ================================================
261D ; E_Base # So WHITE UP POINTING INDEX
26F9 ; E_Base # So PERSON WITH BALL
270A..270D ; E_Base # So [4] RAISED FIST..WRITING HAND
1F385 ; E_Base # So FATHER CHRISTMAS
1F3C2..1F3C4 ; E_Base # So [3] SNOWBOARDER..SURFER
1F3C7 ; E_Base # So HORSE RACING
1F3CA..1F3CC ; E_Base # So [3] SWIMMER..GOLFER
1F442..1F443 ; E_Base # So [2] EAR..NOSE
1F446..1F450 ; E_Base # So [11] WHITE UP POINTING BACKHAND INDEX..OPEN HANDS SIGN
1F46E ; E_Base # So POLICE OFFICER
1F470..1F478 ; E_Base # So [9] BRIDE WITH VEIL..PRINCESS
1F47C ; E_Base # So BABY ANGEL
1F481..1F483 ; E_Base # So [3] INFORMATION DESK PERSON..DANCER
1F485..1F487 ; E_Base # So [3] NAIL POLISH..HAIRCUT
1F4AA ; E_Base # So FLEXED BICEPS
1F574..1F575 ; E_Base # So [2] MAN IN BUSINESS SUIT LEVITATING..SLEUTH OR SPY
1F57A ; E_Base # So MAN DANCING
1F590 ; E_Base # So RAISED HAND WITH FINGERS SPLAYED
1F595..1F596 ; E_Base # So [2] REVERSED HAND WITH MIDDLE FINGER EXTENDED..RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS
1F645..1F647 ; E_Base # So [3] FACE WITH NO GOOD GESTURE..PERSON BOWING DEEPLY
1F64B..1F64F ; E_Base # So [5] HAPPY PERSON RAISING ONE HAND..PERSON WITH FOLDED HANDS
1F6A3 ; E_Base # So ROWBOAT
1F6B4..1F6B6 ; E_Base # So [3] BICYCLIST..PEDESTRIAN
1F6C0 ; E_Base # So BATH
1F6CC ; E_Base # So SLEEPING ACCOMMODATION
1F918..1F91C ; E_Base # So [5] SIGN OF THE HORNS..RIGHT-FACING FIST
1F91E..1F91F ; E_Base # So [2] HAND WITH INDEX AND MIDDLE FINGERS CROSSED..I LOVE YOU HAND SIGN
1F926 ; E_Base # So FACE PALM
1F930..1F939 ; E_Base # So [10] PREGNANT WOMAN..JUGGLING
1F93D..1F93E ; E_Base # So [2] WATER POLO..HANDBALL
1F9D1..1F9DD ; E_Base # So [13] ADULT..ELF
# Total code points: 98
# ================================================
1F3FB..1F3FF ; E_Modifier # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
# Total code points: 5
# ================================================
200D ; ZWJ # Cf ZERO WIDTH JOINER
# Total code points: 1
# ================================================
2640 ; Glue_After_Zwj # So FEMALE SIGN
2642 ; Glue_After_Zwj # So MALE SIGN
2695..2696 ; Glue_After_Zwj # So [2] STAFF OF AESCULAPIUS..SCALES
2708 ; Glue_After_Zwj # So AIRPLANE
2764 ; Glue_After_Zwj # So HEAVY BLACK HEART
1F308 ; Glue_After_Zwj # So RAINBOW
1F33E ; Glue_After_Zwj # So EAR OF RICE
1F373 ; Glue_After_Zwj # So COOKING
1F393 ; Glue_After_Zwj # So GRADUATION CAP
1F3A4 ; Glue_After_Zwj # So MICROPHONE
1F3A8 ; Glue_After_Zwj # So ARTIST PALETTE
1F3EB ; Glue_After_Zwj # So SCHOOL
1F3ED ; Glue_After_Zwj # So FACTORY
1F48B ; Glue_After_Zwj # So KISS MARK
1F4BB..1F4BC ; Glue_After_Zwj # So [2] PERSONAL COMPUTER..BRIEFCASE
1F527 ; Glue_After_Zwj # So WRENCH
1F52C ; Glue_After_Zwj # So MICROSCOPE
1F5E8 ; Glue_After_Zwj # So LEFT SPEECH BUBBLE
1F680 ; Glue_After_Zwj # So ROCKET
1F692 ; Glue_After_Zwj # So FIRE ENGINE
# Total code points: 22
# ================================================
1F466..1F469 ; E_Base_GAZ # So [4] BOY..WOMAN
# Total code points: 4
# EOF # EOF

View File

@ -1,10 +1,11 @@
# Scripts-8.0.0.txt # Scripts-10.0.0.txt
# Date: 2015-03-11, 22:29:42 GMT [MD] # Date: 2017-03-11, 06:40:37 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# #
# Unicode Character Database # Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc. # For documentation, see http://www.unicode.org/reports/tr44/
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# For more information, see: # For more information, see:
# UAX #24, Unicode Script Property: http://www.unicode.org/reports/tr24/ # UAX #24, Unicode Script Property: http://www.unicode.org/reports/tr24/
# Especially the sections: # Especially the sections:
@ -92,10 +93,10 @@
0605 ; Common # Cf ARABIC NUMBER MARK ABOVE 0605 ; Common # Cf ARABIC NUMBER MARK ABOVE
060C ; Common # Po ARABIC COMMA 060C ; Common # Po ARABIC COMMA
061B ; Common # Po ARABIC SEMICOLON 061B ; Common # Po ARABIC SEMICOLON
061C ; Common # Cf ARABIC LETTER MARK
061F ; Common # Po ARABIC QUESTION MARK 061F ; Common # Po ARABIC QUESTION MARK
0640 ; Common # Lm ARABIC TATWEEL 0640 ; Common # Lm ARABIC TATWEEL
06DD ; Common # Cf ARABIC END OF AYAH 06DD ; Common # Cf ARABIC END OF AYAH
08E2 ; Common # Cf ARABIC DISPUTED END OF AYAH
0964..0965 ; Common # Po [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA 0964..0965 ; Common # Po [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
0E3F ; Common # Sc THAI CURRENCY SYMBOL BAHT 0E3F ; Common # Sc THAI CURRENCY SYMBOL BAHT
0FD5..0FD8 ; Common # So [4] RIGHT-FACING SVASTI SIGN..LEFT-FACING SVASTI SIGN WITH DOTS 0FD5..0FD8 ; Common # So [4] RIGHT-FACING SVASTI SIGN..LEFT-FACING SVASTI SIGN WITH DOTS
@ -110,6 +111,7 @@
1CEE..1CF1 ; Common # Lo [4] VEDIC SIGN HEXIFORM LONG ANUSVARA..VEDIC SIGN ANUSVARA UBHAYATO MUKHA 1CEE..1CF1 ; Common # Lo [4] VEDIC SIGN HEXIFORM LONG ANUSVARA..VEDIC SIGN ANUSVARA UBHAYATO MUKHA
1CF2..1CF3 ; Common # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA 1CF2..1CF3 ; Common # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
1CF5..1CF6 ; Common # Lo [2] VEDIC SIGN JIHVAMULIYA..VEDIC SIGN UPADHMANIYA 1CF5..1CF6 ; Common # Lo [2] VEDIC SIGN JIHVAMULIYA..VEDIC SIGN UPADHMANIYA
1CF7 ; Common # Mc VEDIC SIGN ATIKRAMA
2000..200A ; Common # Zs [11] EN QUAD..HAIR SPACE 2000..200A ; Common # Zs [11] EN QUAD..HAIR SPACE
200B ; Common # Cf ZERO WIDTH SPACE 200B ; Common # Cf ZERO WIDTH SPACE
200E..200F ; Common # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK 200E..200F ; Common # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
@ -153,7 +155,7 @@
208A..208C ; Common # Sm [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN 208A..208C ; Common # Sm [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN
208D ; Common # Ps SUBSCRIPT LEFT PARENTHESIS 208D ; Common # Ps SUBSCRIPT LEFT PARENTHESIS
208E ; Common # Pe SUBSCRIPT RIGHT PARENTHESIS 208E ; Common # Pe SUBSCRIPT RIGHT PARENTHESIS
20A0..20BE ; Common # Sc [31] EURO-CURRENCY SIGN..LARI SIGN 20A0..20BF ; Common # Sc [32] EURO-CURRENCY SIGN..BITCOIN SIGN
2100..2101 ; Common # So [2] ACCOUNT OF..ADDRESSED TO THE SUBJECT 2100..2101 ; Common # So [2] ACCOUNT OF..ADDRESSED TO THE SUBJECT
2102 ; Common # L& DOUBLE-STRUCK CAPITAL C 2102 ; Common # L& DOUBLE-STRUCK CAPITAL C
2103..2106 ; Common # So [4] DEGREE CELSIUS..CADA UNA 2103..2106 ; Common # So [4] DEGREE CELSIUS..CADA UNA
@ -223,8 +225,7 @@
239B..23B3 ; Common # Sm [25] LEFT PARENTHESIS UPPER HOOK..SUMMATION BOTTOM 239B..23B3 ; Common # Sm [25] LEFT PARENTHESIS UPPER HOOK..SUMMATION BOTTOM
23B4..23DB ; Common # So [40] TOP SQUARE BRACKET..FUSE 23B4..23DB ; Common # So [40] TOP SQUARE BRACKET..FUSE
23DC..23E1 ; Common # Sm [6] TOP PARENTHESIS..BOTTOM TORTOISE SHELL BRACKET 23DC..23E1 ; Common # Sm [6] TOP PARENTHESIS..BOTTOM TORTOISE SHELL BRACKET
23E2..23FA ; Common # So [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD 23E2..2426 ; Common # So [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
2400..2426 ; Common # So [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
2440..244A ; Common # So [11] OCR HOOK..OCR DOUBLE BACKSLASH 2440..244A ; Common # So [11] OCR HOOK..OCR DOUBLE BACKSLASH
2460..249B ; Common # No [60] CIRCLED DIGIT ONE..NUMBER TWENTY FULL STOP 2460..249B ; Common # No [60] CIRCLED DIGIT ONE..NUMBER TWENTY FULL STOP
249C..24E9 ; Common # So [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z 249C..24E9 ; Common # So [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
@ -309,7 +310,7 @@
2B76..2B95 ; Common # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW 2B76..2B95 ; Common # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BB9 ; Common # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX 2B98..2BB9 ; Common # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8 ; Common # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED 2BBD..2BC8 ; Common # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD1 ; Common # So [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN 2BCA..2BD2 ; Common # So [9] TOP HALF BLACK CIRCLE..GROUP MARK
2BEC..2BEF ; Common # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS 2BEC..2BEF ; Common # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2E00..2E01 ; Common # Po [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER 2E00..2E01 ; Common # Po [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER
2E02 ; Common # Pi LEFT SUBSTITUTION BRACKET 2E02 ; Common # Pi LEFT SUBSTITUTION BRACKET
@ -348,6 +349,7 @@
2E40 ; Common # Pd DOUBLE HYPHEN 2E40 ; Common # Pd DOUBLE HYPHEN
2E41 ; Common # Po REVERSED COMMA 2E41 ; Common # Po REVERSED COMMA
2E42 ; Common # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK 2E42 ; Common # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK
2E43..2E49 ; Common # Po [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
2FF0..2FFB ; Common # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID 2FF0..2FFB ; Common # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
3000 ; Common # Zs IDEOGRAPHIC SPACE 3000 ; Common # Zs IDEOGRAPHIC SPACE
3001..3003 ; Common # Po [3] IDEOGRAPHIC COMMA..DITTO MARK 3001..3003 ; Common # Po [3] IDEOGRAPHIC COMMA..DITTO MARK
@ -572,19 +574,18 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F100..1F10C ; Common # No [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO 1F100..1F10C ; Common # No [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
1F110..1F12E ; Common # So [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ 1F110..1F12E ; Common # So [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
1F130..1F16B ; Common # So [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN 1F130..1F16B ; Common # So [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F170..1F19A ; Common # So [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS 1F170..1F1AC ; Common # So [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
1F1E6..1F1FF ; Common # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z 1F1E6..1F1FF ; Common # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F201..1F202 ; Common # So [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA 1F201..1F202 ; Common # So [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA
1F210..1F23A ; Common # So [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6 1F210..1F23B ; Common # So [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
1F240..1F248 ; Common # So [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557 1F240..1F248 ; Common # So [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
1F250..1F251 ; Common # So [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT 1F250..1F251 ; Common # So [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
1F260..1F265 ; Common # So [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
1F300..1F3FA ; Common # So [251] CYCLONE..AMPHORA 1F300..1F3FA ; Common # So [251] CYCLONE..AMPHORA
1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6 1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
1F400..1F579 ; Common # So [378] RAT..JOYSTICK 1F400..1F6D4 ; Common # So [725] RAT..PAGODA
1F57B..1F5A3 ; Common # So [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
1F5A5..1F6D0 ; Common # So [300] DESKTOP COMPUTER..PLACE OF WORSHIP
1F6E0..1F6EC ; Common # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING 1F6E0..1F6EC ; Common # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F3 ; Common # So [4] SATELLITE..PASSENGER SHIP 1F6F0..1F6F8 ; Common # So [9] SATELLITE..FLYING SAUCER
1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE 1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D4 ; Common # So [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR 1F780..1F7D4 ; Common # So [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
@ -592,13 +593,17 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F850..1F859 ; Common # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW 1F850..1F859 ; Common # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
1F860..1F887 ; Common # So [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW 1F860..1F887 ; Common # So [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
1F890..1F8AD ; Common # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS 1F890..1F8AD ; Common # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F910..1F918 ; Common # So [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS 1F900..1F90B ; Common # So [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F980..1F984 ; Common # So [5] CRAB..UNICORN FACE 1F910..1F93E ; Common # So [47] ZIPPER-MOUTH FACE..HANDBALL
1F940..1F94C ; Common # So [13] WILTED FLOWER..CURLING STONE
1F950..1F96B ; Common # So [28] CROISSANT..CANNED FOOD
1F980..1F997 ; Common # So [24] CRAB..CRICKET
1F9C0 ; Common # So CHEESE WEDGE 1F9C0 ; Common # So CHEESE WEDGE
1F9D0..1F9E6 ; Common # So [23] FACE WITH MONOCLE..SOCKS
E0001 ; Common # Cf LANGUAGE TAG E0001 ; Common # Cf LANGUAGE TAG
E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG
# Total code points: 7179 # Total code points: 7363
# ================================================ # ================================================
@ -641,7 +646,7 @@ A770 ; Latin # Lm MODIFIER LETTER US
A771..A787 ; Latin # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T A771..A787 ; Latin # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T
A78B..A78E ; Latin # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT A78B..A78E ; Latin # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
A78F ; Latin # Lo LATIN LETTER SINOLOGICAL DOT A78F ; Latin # Lo LATIN LETTER SINOLOGICAL DOT
A790..A7AD ; Latin # L& [30] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER L WITH BELT A790..A7AE ; Latin # L& [31] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0..A7B7 ; Latin # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA A7B0..A7B7 ; Latin # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA
A7F7 ; Latin # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I A7F7 ; Latin # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I
A7F8..A7F9 ; Latin # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE A7F8..A7F9 ; Latin # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
@ -654,7 +659,7 @@ FB00..FB06 ; Latin # L& [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE S
FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
# Total code points: 1349 # Total code points: 1350
# ================================================ # ================================================
@ -708,13 +713,13 @@ AB65 ; Greek # L& GREEK LETTER SMALL CAPITAL OMEGA
10175..10178 ; Greek # No [4] GREEK ONE HALF SIGN..GREEK THREE QUARTERS SIGN 10175..10178 ; Greek # No [4] GREEK ONE HALF SIGN..GREEK THREE QUARTERS SIGN
10179..10189 ; Greek # So [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN 10179..10189 ; Greek # So [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
1018A..1018B ; Greek # No [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN 1018A..1018B ; Greek # No [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN
1018C ; Greek # So GREEK SINUSOID SIGN 1018C..1018E ; Greek # So [3] GREEK SINUSOID SIGN..NOMISMA SIGN
101A0 ; Greek # So GREEK SYMBOL TAU RHO 101A0 ; Greek # So GREEK SYMBOL TAU RHO
1D200..1D241 ; Greek # So [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54 1D200..1D241 ; Greek # So [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54
1D242..1D244 ; Greek # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME 1D242..1D244 ; Greek # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
1D245 ; Greek # So GREEK MUSICAL LEIMMA 1D245 ; Greek # So GREEK MUSICAL LEIMMA
# Total code points: 516 # Total code points: 518
# ================================================ # ================================================
@ -724,6 +729,7 @@ AB65 ; Greek # L& GREEK LETTER SMALL CAPITAL OMEGA
0487 ; Cyrillic # Mn COMBINING CYRILLIC POKRYTIE 0487 ; Cyrillic # Mn COMBINING CYRILLIC POKRYTIE
0488..0489 ; Cyrillic # Me [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN 0488..0489 ; Cyrillic # Me [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN
048A..052F ; Cyrillic # L& [166] CYRILLIC CAPITAL LETTER SHORT I WITH TAIL..CYRILLIC SMALL LETTER EL WITH DESCENDER 048A..052F ; Cyrillic # L& [166] CYRILLIC CAPITAL LETTER SHORT I WITH TAIL..CYRILLIC SMALL LETTER EL WITH DESCENDER
1C80..1C88 ; Cyrillic # L& [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
1D2B ; Cyrillic # L& CYRILLIC LETTER SMALL CAPITAL EL 1D2B ; Cyrillic # L& CYRILLIC LETTER SMALL CAPITAL EL
1D78 ; Cyrillic # Lm MODIFIER LETTER CYRILLIC EN 1D78 ; Cyrillic # Lm MODIFIER LETTER CYRILLIC EN
2DE0..2DFF ; Cyrillic # Mn [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS 2DE0..2DFF ; Cyrillic # Mn [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS
@ -740,7 +746,7 @@ A69C..A69D ; Cyrillic # Lm [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER
A69E..A69F ; Cyrillic # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E A69E..A69F ; Cyrillic # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF
# Total code points: 434 # Total code points: 443
# ================================================ # ================================================
@ -791,6 +797,7 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
060D ; Arabic # Po ARABIC DATE SEPARATOR 060D ; Arabic # Po ARABIC DATE SEPARATOR
060E..060F ; Arabic # So [2] ARABIC POETIC VERSE SIGN..ARABIC SIGN MISRA 060E..060F ; Arabic # So [2] ARABIC POETIC VERSE SIGN..ARABIC SIGN MISRA
0610..061A ; Arabic # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA 0610..061A ; Arabic # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA
061C ; Arabic # Cf ARABIC LETTER MARK
061E ; Arabic # Po ARABIC TRIPLE DOT PUNCTUATION MARK 061E ; Arabic # Po ARABIC TRIPLE DOT PUNCTUATION MARK
0620..063F ; Arabic # Lo [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE 0620..063F ; Arabic # Lo [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE
0641..064A ; Arabic # Lo [10] ARABIC LETTER FEH..ARABIC LETTER YEH 0641..064A ; Arabic # Lo [10] ARABIC LETTER FEH..ARABIC LETTER YEH
@ -815,6 +822,8 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
06FF ; Arabic # Lo ARABIC LETTER HEH WITH INVERTED V 06FF ; Arabic # Lo ARABIC LETTER HEH WITH INVERTED V
0750..077F ; Arabic # Lo [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE 0750..077F ; Arabic # Lo [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE
08A0..08B4 ; Arabic # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW 08A0..08B4 ; Arabic # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08B6..08BD ; Arabic # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
08D4..08E1 ; Arabic # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08E3..08FF ; Arabic # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA 08E3..08FF ; Arabic # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
FB50..FBB1 ; Arabic # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM FB50..FBB1 ; Arabic # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
FBB2..FBC1 ; Arabic # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW FBB2..FBC1 ; Arabic # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW
@ -862,7 +871,7 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN 1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN
1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL 1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL
# Total code points: 1257 # Total code points: 1280
# ================================================ # ================================================
@ -873,8 +882,9 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
0712..072F ; Syriac # Lo [30] SYRIAC LETTER BETH..SYRIAC LETTER PERSIAN DHALATH 0712..072F ; Syriac # Lo [30] SYRIAC LETTER BETH..SYRIAC LETTER PERSIAN DHALATH
0730..074A ; Syriac # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH 0730..074A ; Syriac # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
074D..074F ; Syriac # Lo [3] SYRIAC LETTER SOGDIAN ZHAIN..SYRIAC LETTER SOGDIAN FE 074D..074F ; Syriac # Lo [3] SYRIAC LETTER SOGDIAN ZHAIN..SYRIAC LETTER SOGDIAN FE
0860..086A ; Syriac # Lo [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
# Total code points: 77 # Total code points: 88
# ================================================ # ================================================
@ -944,8 +954,10 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
09F4..09F9 ; Bengali # No [6] BENGALI CURRENCY NUMERATOR ONE..BENGALI CURRENCY DENOMINATOR SIXTEEN 09F4..09F9 ; Bengali # No [6] BENGALI CURRENCY NUMERATOR ONE..BENGALI CURRENCY DENOMINATOR SIXTEEN
09FA ; Bengali # So BENGALI ISSHAR 09FA ; Bengali # So BENGALI ISSHAR
09FB ; Bengali # Sc BENGALI GANDA MARK 09FB ; Bengali # Sc BENGALI GANDA MARK
09FC ; Bengali # Lo BENGALI LETTER VEDIC ANUSVARA
09FD ; Bengali # Po BENGALI ABBREVIATION SIGN
# Total code points: 93 # Total code points: 95
# ================================================ # ================================================
@ -998,8 +1010,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0AF0 ; Gujarati # Po GUJARATI ABBREVIATION SIGN 0AF0 ; Gujarati # Po GUJARATI ABBREVIATION SIGN
0AF1 ; Gujarati # Sc GUJARATI RUPEE SIGN 0AF1 ; Gujarati # Sc GUJARATI RUPEE SIGN
0AF9 ; Gujarati # Lo GUJARATI LETTER ZHA 0AF9 ; Gujarati # Lo GUJARATI LETTER ZHA
0AFA..0AFF ; Gujarati # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
# Total code points: 85 # Total code points: 91
# ================================================ # ================================================
@ -1086,6 +1099,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
# ================================================ # ================================================
0C80 ; Kannada # Lo KANNADA SIGN SPACING CANDRABINDU
0C81 ; Kannada # Mn KANNADA SIGN CANDRABINDU 0C81 ; Kannada # Mn KANNADA SIGN CANDRABINDU
0C82..0C83 ; Kannada # Mc [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA 0C82..0C83 ; Kannada # Mc [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
0C85..0C8C ; Kannada # Lo [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L 0C85..0C8C ; Kannada # Lo [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
@ -1109,15 +1123,16 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE 0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA 0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
# Total code points: 87 # Total code points: 88
# ================================================ # ================================================
0D01 ; Malayalam # Mn MALAYALAM SIGN CANDRABINDU 0D00..0D01 ; Malayalam # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D02..0D03 ; Malayalam # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA 0D02..0D03 ; Malayalam # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
0D05..0D0C ; Malayalam # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L 0D05..0D0C ; Malayalam # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
0D0E..0D10 ; Malayalam # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI 0D0E..0D10 ; Malayalam # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
0D12..0D3A ; Malayalam # Lo [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA 0D12..0D3A ; Malayalam # Lo [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
0D3B..0D3C ; Malayalam # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
0D3D ; Malayalam # Lo MALAYALAM SIGN AVAGRAHA 0D3D ; Malayalam # Lo MALAYALAM SIGN AVAGRAHA
0D3E..0D40 ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN II 0D3E..0D40 ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN II
0D41..0D44 ; Malayalam # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR 0D41..0D44 ; Malayalam # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
@ -1125,15 +1140,18 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0D4A..0D4C ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN O..MALAYALAM VOWEL SIGN AU 0D4A..0D4C ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN O..MALAYALAM VOWEL SIGN AU
0D4D ; Malayalam # Mn MALAYALAM SIGN VIRAMA 0D4D ; Malayalam # Mn MALAYALAM SIGN VIRAMA
0D4E ; Malayalam # Lo MALAYALAM LETTER DOT REPH 0D4E ; Malayalam # Lo MALAYALAM LETTER DOT REPH
0D4F ; Malayalam # So MALAYALAM SIGN PARA
0D54..0D56 ; Malayalam # Lo [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
0D57 ; Malayalam # Mc MALAYALAM AU LENGTH MARK 0D57 ; Malayalam # Mc MALAYALAM AU LENGTH MARK
0D58..0D5E ; Malayalam # No [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
0D5F..0D61 ; Malayalam # Lo [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL 0D5F..0D61 ; Malayalam # Lo [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
0D62..0D63 ; Malayalam # Mn [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL 0D62..0D63 ; Malayalam # Mn [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
0D66..0D6F ; Malayalam # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE 0D66..0D6F ; Malayalam # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
0D70..0D75 ; Malayalam # No [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS 0D70..0D78 ; Malayalam # No [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
0D79 ; Malayalam # So MALAYALAM DATE MARK 0D79 ; Malayalam # So MALAYALAM DATE MARK
0D7A..0D7F ; Malayalam # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K 0D7A..0D7F ; Malayalam # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
# Total code points: 100 # Total code points: 117
# ================================================ # ================================================
@ -1436,21 +1454,24 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
1820..1842 ; Mongolian # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI 1820..1842 ; Mongolian # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1843 ; Mongolian # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN 1843 ; Mongolian # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN
1844..1877 ; Mongolian # Lo [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA 1844..1877 ; Mongolian # Lo [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
1880..18A8 ; Mongolian # Lo [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA 1880..1884 ; Mongolian # Lo [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
1885..1886 ; Mongolian # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
1887..18A8 ; Mongolian # Lo [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
18A9 ; Mongolian # Mn MONGOLIAN LETTER ALI GALI DAGALGA 18A9 ; Mongolian # Mn MONGOLIAN LETTER ALI GALI DAGALGA
18AA ; Mongolian # Lo MONGOLIAN LETTER MANCHU ALI GALI LHA 18AA ; Mongolian # Lo MONGOLIAN LETTER MANCHU ALI GALI LHA
11660..1166C ; Mongolian # Po [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
# Total code points: 153 # Total code points: 166
# ================================================ # ================================================
3041..3096 ; Hiragana # Lo [86] HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMALL KE 3041..3096 ; Hiragana # Lo [86] HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMALL KE
309D..309E ; Hiragana # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK 309D..309E ; Hiragana # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
309F ; Hiragana # Lo HIRAGANA DIGRAPH YORI 309F ; Hiragana # Lo HIRAGANA DIGRAPH YORI
1B001 ; Hiragana # Lo HIRAGANA LETTER ARCHAIC YE 1B001..1B11E ; Hiragana # Lo [286] HIRAGANA LETTER ARCHAIC YE..HENTAIGANA LETTER N-MU-MO-2
1F200 ; Hiragana # So SQUARE HIRAGANA HOKA 1F200 ; Hiragana # So SQUARE HIRAGANA HOKA
# Total code points: 91 # Total code points: 376
# ================================================ # ================================================
@ -1469,10 +1490,10 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
# ================================================ # ================================================
02EA..02EB ; Bopomofo # Sk [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK 02EA..02EB ; Bopomofo # Sk [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK
3105..312D ; Bopomofo # Lo [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH 3105..312E ; Bopomofo # Lo [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
31A0..31BA ; Bopomofo # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY 31A0..31BA ; Bopomofo # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
# Total code points: 70 # Total code points: 71
# ================================================ # ================================================
@ -1485,16 +1506,17 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
3038..303A ; Han # Nl [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY 3038..303A ; Han # Nl [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY
303B ; Han # Lm VERTICAL IDEOGRAPHIC ITERATION MARK 303B ; Han # Lm VERTICAL IDEOGRAPHIC ITERATION MARK
3400..4DB5 ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5 3400..4DB5 ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4E00..9FD5 ; Han # Lo [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5 4E00..9FEA ; Han # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
F900..FA6D ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D F900..FA6D ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D
FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9 FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
20000..2A6D6 ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6 20000..2A6D6 ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
2A700..2B734 ; Han # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734 2A700..2B734 ; Han # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
2B740..2B81D ; Han # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B740..2B81D ; Han # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Han # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2B820..2CEA1 ; Han # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
# Total code points: 81734 # Total code points: 89228
# ================================================ # ================================================
@ -1509,8 +1531,9 @@ A490..A4C6 ; Yi # So [55] YI RADICAL QOT..YI RADICAL KE
10300..1031F ; Old_Italic # Lo [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS 10300..1031F ; Old_Italic # Lo [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
10320..10323 ; Old_Italic # No [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY 10320..10323 ; Old_Italic # No [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY
1032D..1032F ; Old_Italic # Lo [3] OLD ITALIC LETTER YE..OLD ITALIC LETTER SOUTHERN TSE
# Total code points: 36 # Total code points: 39
# ================================================ # ================================================
@ -1542,8 +1565,8 @@ A490..A4C6 ; Yi # So [55] YI RADICAL QOT..YI RADICAL KE
1CED ; Inherited # Mn VEDIC SIGN TIRYAK 1CED ; Inherited # Mn VEDIC SIGN TIRYAK
1CF4 ; Inherited # Mn VEDIC TONE CANDRA ABOVE 1CF4 ; Inherited # Mn VEDIC TONE CANDRA ABOVE
1CF8..1CF9 ; Inherited # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE 1CF8..1CF9 ; Inherited # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
1DC0..1DF5 ; Inherited # Mn [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE 1DC0..1DF9 ; Inherited # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFC..1DFF ; Inherited # Mn [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW 1DFB..1DFF ; Inherited # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
200C..200D ; Inherited # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER 200C..200D ; Inherited # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
20D0..20DC ; Inherited # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE 20D0..20DC ; Inherited # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20DD..20E0 ; Inherited # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH 20DD..20E0 ; Inherited # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
@ -1562,7 +1585,7 @@ FE20..FE2D ; Inherited # Mn [14] COMBINING LIGATURE LEFT HALF..COMBINING CON
1D1AA..1D1AD ; Inherited # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO 1D1AA..1D1AD ; Inherited # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 563 # Total code points: 568
# ================================================ # ================================================
@ -1705,8 +1728,13 @@ E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-2
2C00..2C2E ; Glagolitic # L& [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE 2C00..2C2E ; Glagolitic # L& [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
2C30..2C5E ; Glagolitic # L& [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE 2C30..2C5E ; Glagolitic # L& [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
1E000..1E006 ; Glagolitic # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018 ; Glagolitic # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021 ; Glagolitic # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
1E023..1E024 ; Glagolitic # Mn [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
1E026..1E02A ; Glagolitic # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
# Total code points: 94 # Total code points: 132
# ================================================ # ================================================
@ -1872,11 +1900,11 @@ A62A..A62B ; Vai # Lo [2] VAI SYLLABLE NDOLE MA..VAI SYLLABLE NDOLE DO
A880..A881 ; Saurashtra # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA A880..A881 ; Saurashtra # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
A882..A8B3 ; Saurashtra # Lo [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA A882..A8B3 ; Saurashtra # Lo [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA
A8B4..A8C3 ; Saurashtra # Mc [16] SAURASHTRA CONSONANT SIGN HAARU..SAURASHTRA VOWEL SIGN AU A8B4..A8C3 ; Saurashtra # Mc [16] SAURASHTRA CONSONANT SIGN HAARU..SAURASHTRA VOWEL SIGN AU
A8C4 ; Saurashtra # Mn SAURASHTRA SIGN VIRAMA A8C4..A8C5 ; Saurashtra # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8CE..A8CF ; Saurashtra # Po [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA A8CE..A8CF ; Saurashtra # Po [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA
A8D0..A8D9 ; Saurashtra # Nd [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE A8D0..A8D9 ; Saurashtra # Nd [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE
# Total code points: 81 # Total code points: 82
# ================================================ # ================================================
@ -2314,8 +2342,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
11235 ; Khojki # Mc KHOJKI SIGN VIRAMA 11235 ; Khojki # Mc KHOJKI SIGN VIRAMA
11236..11237 ; Khojki # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA 11236..11237 ; Khojki # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
11238..1123D ; Khojki # Po [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN 11238..1123D ; Khojki # Po [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
1123E ; Khojki # Mn KHOJKI SIGN SUKUN
# Total code points: 61 # Total code points: 62
# ================================================ # ================================================
@ -2536,4 +2565,129 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
# Total code points: 672 # Total code points: 672
# ================================================
1E900..1E943 ; Adlam # L& [68] ADLAM CAPITAL LETTER ALIF..ADLAM SMALL LETTER SHA
1E944..1E94A ; Adlam # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
1E950..1E959 ; Adlam # Nd [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
1E95E..1E95F ; Adlam # Po [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
# Total code points: 87
# ================================================
11C00..11C08 ; Bhaiksuki # Lo [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
11C0A..11C2E ; Bhaiksuki # Lo [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
11C2F ; Bhaiksuki # Mc BHAIKSUKI VOWEL SIGN AA
11C30..11C36 ; Bhaiksuki # Mn [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
11C38..11C3D ; Bhaiksuki # Mn [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
11C3E ; Bhaiksuki # Mc BHAIKSUKI SIGN VISARGA
11C3F ; Bhaiksuki # Mn BHAIKSUKI SIGN VIRAMA
11C40 ; Bhaiksuki # Lo BHAIKSUKI SIGN AVAGRAHA
11C41..11C45 ; Bhaiksuki # Po [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
11C50..11C59 ; Bhaiksuki # Nd [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
11C5A..11C6C ; Bhaiksuki # No [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
# Total code points: 97
# ================================================
11C70..11C71 ; Marchen # Po [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
11C72..11C8F ; Marchen # Lo [30] MARCHEN LETTER KA..MARCHEN LETTER A
11C92..11CA7 ; Marchen # Mn [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
11CA9 ; Marchen # Mc MARCHEN SUBJOINED LETTER YA
11CAA..11CB0 ; Marchen # Mn [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
11CB1 ; Marchen # Mc MARCHEN VOWEL SIGN I
11CB2..11CB3 ; Marchen # Mn [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
11CB4 ; Marchen # Mc MARCHEN VOWEL SIGN O
11CB5..11CB6 ; Marchen # Mn [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
# Total code points: 68
# ================================================
11400..11434 ; Newa # Lo [53] NEWA LETTER A..NEWA LETTER HA
11435..11437 ; Newa # Mc [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
11438..1143F ; Newa # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11440..11441 ; Newa # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
11442..11444 ; Newa # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11445 ; Newa # Mc NEWA SIGN VISARGA
11446 ; Newa # Mn NEWA SIGN NUKTA
11447..1144A ; Newa # Lo [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
1144B..1144F ; Newa # Po [5] NEWA DANDA..NEWA ABBREVIATION SIGN
11450..11459 ; Newa # Nd [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
1145B ; Newa # Po NEWA PLACEHOLDER MARK
1145D ; Newa # Po NEWA INSERTION SIGN
# Total code points: 92
# ================================================
104B0..104D3 ; Osage # L& [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
104D8..104FB ; Osage # L& [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
# Total code points: 72
# ================================================
16FE0 ; Tangut # Lm TANGUT ITERATION MARK
17000..187EC ; Tangut # Lo [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
18800..18AF2 ; Tangut # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
# Total code points: 6881
# ================================================
11D00..11D06 ; Masaram_Gondi # Lo [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
11D08..11D09 ; Masaram_Gondi # Lo [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
11D0B..11D30 ; Masaram_Gondi # Lo [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
11D31..11D36 ; Masaram_Gondi # Mn [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
11D3A ; Masaram_Gondi # Mn MASARAM GONDI VOWEL SIGN E
11D3C..11D3D ; Masaram_Gondi # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Masaram_Gondi # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D46 ; Masaram_Gondi # Lo MASARAM GONDI REPHA
11D47 ; Masaram_Gondi # Mn MASARAM GONDI RA-KARA
11D50..11D59 ; Masaram_Gondi # Nd [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
# Total code points: 75
# ================================================
16FE1 ; Nushu # Lm NUSHU ITERATION MARK
1B170..1B2FB ; Nushu # Lo [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
# Total code points: 397
# ================================================
11A50 ; Soyombo # Lo SOYOMBO LETTER A
11A51..11A56 ; Soyombo # Mn [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
11A57..11A58 ; Soyombo # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A59..11A5B ; Soyombo # Mn [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
11A5C..11A83 ; Soyombo # Lo [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
11A86..11A89 ; Soyombo # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11A8A..11A96 ; Soyombo # Mn [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
11A97 ; Soyombo # Mc SOYOMBO SIGN VISARGA
11A98..11A99 ; Soyombo # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11A9A..11A9C ; Soyombo # Po [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9E..11AA2 ; Soyombo # Po [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
# Total code points: 80
# ================================================
11A00 ; Zanabazar_Square # Lo ZANABAZAR SQUARE LETTER A
11A01..11A06 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A07..11A08 ; Zanabazar_Square # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A09..11A0A ; Zanabazar_Square # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A0B..11A32 ; Zanabazar_Square # Lo [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
11A33..11A38 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A39 ; Zanabazar_Square # Mc ZANABAZAR SQUARE SIGN VISARGA
11A3A ; Zanabazar_Square # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A3B..11A3E ; Zanabazar_Square # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A3F..11A46 ; Zanabazar_Square # Po [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
11A47 ; Zanabazar_Square # Mn ZANABAZAR SQUARE SUBJOINER
# Total code points: 72
# EOF # EOF

File diff suppressed because it is too large Load Diff

View File

@ -192,7 +192,12 @@ const uint32_t PRIV(ucp_gbtable)[] = {
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT), /* 10 LVT */ (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT), /* 10 LVT */
(1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */ (1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Other */ (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 12 Other */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 13 E_Base */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 14 E_Modifier */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 15 E_Base_GAZ */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 16 ZWJ */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Glue_After_Zwj */
}; };
#ifdef SUPPORT_JIT #ifdef SUPPORT_JIT
@ -227,6 +232,7 @@ version. Like all other character and string literals that are compared against
the regular expression pattern, we must use STR_ macros instead of literal the regular expression pattern, we must use STR_ macros instead of literal
strings to make sure that UTF-8 support works on EBCDIC platforms. */ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Adlam0 STR_A STR_d STR_l STR_a STR_m "\0"
#define STRING_Ahom0 STR_A STR_h STR_o STR_m "\0" #define STRING_Ahom0 STR_A STR_h STR_o STR_m "\0"
#define STRING_Anatolian_Hieroglyphs0 STR_A STR_n STR_a STR_t STR_o STR_l STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0" #define STRING_Anatolian_Hieroglyphs0 STR_A STR_n STR_a STR_t STR_o STR_l STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Any0 STR_A STR_n STR_y "\0" #define STRING_Any0 STR_A STR_n STR_y "\0"
@ -238,6 +244,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0" #define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
#define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0" #define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
#define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0" #define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
#define STRING_Bhaiksuki0 STR_B STR_h STR_a STR_i STR_k STR_s STR_u STR_k STR_i "\0"
#define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0" #define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
#define STRING_Brahmi0 STR_B STR_r STR_a STR_h STR_m STR_i "\0" #define STRING_Brahmi0 STR_B STR_r STR_a STR_h STR_m STR_i "\0"
#define STRING_Braille0 STR_B STR_r STR_a STR_i STR_l STR_l STR_e "\0" #define STRING_Braille0 STR_B STR_r STR_a STR_i STR_l STR_l STR_e "\0"
@ -313,6 +320,8 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0" #define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0" #define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0" #define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
#define STRING_Marchen0 STR_M STR_a STR_r STR_c STR_h STR_e STR_n "\0"
#define STRING_Masaram_Gondi0 STR_M STR_a STR_s STR_a STR_r STR_a STR_m STR_UNDERSCORE STR_G STR_o STR_n STR_d STR_i "\0"
#define STRING_Mc0 STR_M STR_c "\0" #define STRING_Mc0 STR_M STR_c "\0"
#define STRING_Me0 STR_M STR_e "\0" #define STRING_Me0 STR_M STR_e "\0"
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0" #define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
@ -330,9 +339,11 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0" #define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
#define STRING_Nd0 STR_N STR_d "\0" #define STRING_Nd0 STR_N STR_d "\0"
#define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0" #define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
#define STRING_Newa0 STR_N STR_e STR_w STR_a "\0"
#define STRING_Nko0 STR_N STR_k STR_o "\0" #define STRING_Nko0 STR_N STR_k STR_o "\0"
#define STRING_Nl0 STR_N STR_l "\0" #define STRING_Nl0 STR_N STR_l "\0"
#define STRING_No0 STR_N STR_o "\0" #define STRING_No0 STR_N STR_o "\0"
#define STRING_Nushu0 STR_N STR_u STR_s STR_h STR_u "\0"
#define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0" #define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
#define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0" #define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
#define STRING_Old_Hungarian0 STR_O STR_l STR_d STR_UNDERSCORE STR_H STR_u STR_n STR_g STR_a STR_r STR_i STR_a STR_n "\0" #define STRING_Old_Hungarian0 STR_O STR_l STR_d STR_UNDERSCORE STR_H STR_u STR_n STR_g STR_a STR_r STR_i STR_a STR_n "\0"
@ -343,6 +354,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0" #define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0" #define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0" #define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
#define STRING_Osage0 STR_O STR_s STR_a STR_g STR_e "\0"
#define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0" #define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
#define STRING_P0 STR_P "\0" #define STRING_P0 STR_P "\0"
#define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0" #define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
@ -373,6 +385,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Sm0 STR_S STR_m "\0" #define STRING_Sm0 STR_S STR_m "\0"
#define STRING_So0 STR_S STR_o "\0" #define STRING_So0 STR_S STR_o "\0"
#define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0" #define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0"
#define STRING_Soyombo0 STR_S STR_o STR_y STR_o STR_m STR_b STR_o "\0"
#define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0" #define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0"
#define STRING_Syloti_Nagri0 STR_S STR_y STR_l STR_o STR_t STR_i STR_UNDERSCORE STR_N STR_a STR_g STR_r STR_i "\0" #define STRING_Syloti_Nagri0 STR_S STR_y STR_l STR_o STR_t STR_i STR_UNDERSCORE STR_N STR_a STR_g STR_r STR_i "\0"
#define STRING_Syriac0 STR_S STR_y STR_r STR_i STR_a STR_c "\0" #define STRING_Syriac0 STR_S STR_y STR_r STR_i STR_a STR_c "\0"
@ -383,6 +396,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Tai_Viet0 STR_T STR_a STR_i STR_UNDERSCORE STR_V STR_i STR_e STR_t "\0" #define STRING_Tai_Viet0 STR_T STR_a STR_i STR_UNDERSCORE STR_V STR_i STR_e STR_t "\0"
#define STRING_Takri0 STR_T STR_a STR_k STR_r STR_i "\0" #define STRING_Takri0 STR_T STR_a STR_k STR_r STR_i "\0"
#define STRING_Tamil0 STR_T STR_a STR_m STR_i STR_l "\0" #define STRING_Tamil0 STR_T STR_a STR_m STR_i STR_l "\0"
#define STRING_Tangut0 STR_T STR_a STR_n STR_g STR_u STR_t "\0"
#define STRING_Telugu0 STR_T STR_e STR_l STR_u STR_g STR_u "\0" #define STRING_Telugu0 STR_T STR_e STR_l STR_u STR_g STR_u "\0"
#define STRING_Thaana0 STR_T STR_h STR_a STR_a STR_n STR_a "\0" #define STRING_Thaana0 STR_T STR_h STR_a STR_a STR_n STR_a "\0"
#define STRING_Thai0 STR_T STR_h STR_a STR_i "\0" #define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
@ -399,11 +413,13 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Xwd0 STR_X STR_w STR_d "\0" #define STRING_Xwd0 STR_X STR_w STR_d "\0"
#define STRING_Yi0 STR_Y STR_i "\0" #define STRING_Yi0 STR_Y STR_i "\0"
#define STRING_Z0 STR_Z "\0" #define STRING_Z0 STR_Z "\0"
#define STRING_Zanabazar_Square0 STR_Z STR_a STR_n STR_a STR_b STR_a STR_z STR_a STR_r STR_UNDERSCORE STR_S STR_q STR_u STR_a STR_r STR_e "\0"
#define STRING_Zl0 STR_Z STR_l "\0" #define STRING_Zl0 STR_Z STR_l "\0"
#define STRING_Zp0 STR_Z STR_p "\0" #define STRING_Zp0 STR_Z STR_p "\0"
#define STRING_Zs0 STR_Z STR_s "\0" #define STRING_Zs0 STR_Z STR_s "\0"
const char PRIV(utt_names)[] = const char PRIV(utt_names)[] =
STRING_Adlam0
STRING_Ahom0 STRING_Ahom0
STRING_Anatolian_Hieroglyphs0 STRING_Anatolian_Hieroglyphs0
STRING_Any0 STRING_Any0
@ -415,6 +431,7 @@ const char PRIV(utt_names)[] =
STRING_Bassa_Vah0 STRING_Bassa_Vah0
STRING_Batak0 STRING_Batak0
STRING_Bengali0 STRING_Bengali0
STRING_Bhaiksuki0
STRING_Bopomofo0 STRING_Bopomofo0
STRING_Brahmi0 STRING_Brahmi0
STRING_Braille0 STRING_Braille0
@ -490,6 +507,8 @@ const char PRIV(utt_names)[] =
STRING_Malayalam0 STRING_Malayalam0
STRING_Mandaic0 STRING_Mandaic0
STRING_Manichaean0 STRING_Manichaean0
STRING_Marchen0
STRING_Masaram_Gondi0
STRING_Mc0 STRING_Mc0
STRING_Me0 STRING_Me0
STRING_Meetei_Mayek0 STRING_Meetei_Mayek0
@ -507,9 +526,11 @@ const char PRIV(utt_names)[] =
STRING_Nabataean0 STRING_Nabataean0
STRING_Nd0 STRING_Nd0
STRING_New_Tai_Lue0 STRING_New_Tai_Lue0
STRING_Newa0
STRING_Nko0 STRING_Nko0
STRING_Nl0 STRING_Nl0
STRING_No0 STRING_No0
STRING_Nushu0
STRING_Ogham0 STRING_Ogham0
STRING_Ol_Chiki0 STRING_Ol_Chiki0
STRING_Old_Hungarian0 STRING_Old_Hungarian0
@ -520,6 +541,7 @@ const char PRIV(utt_names)[] =
STRING_Old_South_Arabian0 STRING_Old_South_Arabian0
STRING_Old_Turkic0 STRING_Old_Turkic0
STRING_Oriya0 STRING_Oriya0
STRING_Osage0
STRING_Osmanya0 STRING_Osmanya0
STRING_P0 STRING_P0
STRING_Pahawh_Hmong0 STRING_Pahawh_Hmong0
@ -550,6 +572,7 @@ const char PRIV(utt_names)[] =
STRING_Sm0 STRING_Sm0
STRING_So0 STRING_So0
STRING_Sora_Sompeng0 STRING_Sora_Sompeng0
STRING_Soyombo0
STRING_Sundanese0 STRING_Sundanese0
STRING_Syloti_Nagri0 STRING_Syloti_Nagri0
STRING_Syriac0 STRING_Syriac0
@ -560,6 +583,7 @@ const char PRIV(utt_names)[] =
STRING_Tai_Viet0 STRING_Tai_Viet0
STRING_Takri0 STRING_Takri0
STRING_Tamil0 STRING_Tamil0
STRING_Tangut0
STRING_Telugu0 STRING_Telugu0
STRING_Thaana0 STRING_Thaana0
STRING_Thai0 STRING_Thai0
@ -576,186 +600,197 @@ const char PRIV(utt_names)[] =
STRING_Xwd0 STRING_Xwd0
STRING_Yi0 STRING_Yi0
STRING_Z0 STRING_Z0
STRING_Zanabazar_Square0
STRING_Zl0 STRING_Zl0
STRING_Zp0 STRING_Zp0
STRING_Zs0; STRING_Zs0;
const ucp_type_table PRIV(utt)[] = { const ucp_type_table PRIV(utt)[] = {
{ 0, PT_SC, ucp_Ahom }, { 0, PT_SC, ucp_Adlam },
{ 5, PT_SC, ucp_Anatolian_Hieroglyphs }, { 6, PT_SC, ucp_Ahom },
{ 27, PT_ANY, 0 }, { 11, PT_SC, ucp_Anatolian_Hieroglyphs },
{ 31, PT_SC, ucp_Arabic }, { 33, PT_ANY, 0 },
{ 38, PT_SC, ucp_Armenian }, { 37, PT_SC, ucp_Arabic },
{ 47, PT_SC, ucp_Avestan }, { 44, PT_SC, ucp_Armenian },
{ 55, PT_SC, ucp_Balinese }, { 53, PT_SC, ucp_Avestan },
{ 64, PT_SC, ucp_Bamum }, { 61, PT_SC, ucp_Balinese },
{ 70, PT_SC, ucp_Bassa_Vah }, { 70, PT_SC, ucp_Bamum },
{ 80, PT_SC, ucp_Batak }, { 76, PT_SC, ucp_Bassa_Vah },
{ 86, PT_SC, ucp_Bengali }, { 86, PT_SC, ucp_Batak },
{ 94, PT_SC, ucp_Bopomofo }, { 92, PT_SC, ucp_Bengali },
{ 103, PT_SC, ucp_Brahmi }, { 100, PT_SC, ucp_Bhaiksuki },
{ 110, PT_SC, ucp_Braille }, { 110, PT_SC, ucp_Bopomofo },
{ 118, PT_SC, ucp_Buginese }, { 119, PT_SC, ucp_Brahmi },
{ 127, PT_SC, ucp_Buhid }, { 126, PT_SC, ucp_Braille },
{ 133, PT_GC, ucp_C }, { 134, PT_SC, ucp_Buginese },
{ 135, PT_SC, ucp_Canadian_Aboriginal }, { 143, PT_SC, ucp_Buhid },
{ 155, PT_SC, ucp_Carian }, { 149, PT_GC, ucp_C },
{ 162, PT_SC, ucp_Caucasian_Albanian }, { 151, PT_SC, ucp_Canadian_Aboriginal },
{ 181, PT_PC, ucp_Cc }, { 171, PT_SC, ucp_Carian },
{ 184, PT_PC, ucp_Cf }, { 178, PT_SC, ucp_Caucasian_Albanian },
{ 187, PT_SC, ucp_Chakma }, { 197, PT_PC, ucp_Cc },
{ 194, PT_SC, ucp_Cham }, { 200, PT_PC, ucp_Cf },
{ 199, PT_SC, ucp_Cherokee }, { 203, PT_SC, ucp_Chakma },
{ 208, PT_PC, ucp_Cn }, { 210, PT_SC, ucp_Cham },
{ 211, PT_PC, ucp_Co }, { 215, PT_SC, ucp_Cherokee },
{ 214, PT_SC, ucp_Common }, { 224, PT_PC, ucp_Cn },
{ 221, PT_SC, ucp_Coptic }, { 227, PT_PC, ucp_Co },
{ 228, PT_PC, ucp_Cs }, { 230, PT_SC, ucp_Common },
{ 231, PT_SC, ucp_Cuneiform }, { 237, PT_SC, ucp_Coptic },
{ 241, PT_SC, ucp_Cypriot }, { 244, PT_PC, ucp_Cs },
{ 249, PT_SC, ucp_Cyrillic }, { 247, PT_SC, ucp_Cuneiform },
{ 258, PT_SC, ucp_Deseret }, { 257, PT_SC, ucp_Cypriot },
{ 266, PT_SC, ucp_Devanagari }, { 265, PT_SC, ucp_Cyrillic },
{ 277, PT_SC, ucp_Duployan }, { 274, PT_SC, ucp_Deseret },
{ 286, PT_SC, ucp_Egyptian_Hieroglyphs }, { 282, PT_SC, ucp_Devanagari },
{ 307, PT_SC, ucp_Elbasan }, { 293, PT_SC, ucp_Duployan },
{ 315, PT_SC, ucp_Ethiopic }, { 302, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 324, PT_SC, ucp_Georgian }, { 323, PT_SC, ucp_Elbasan },
{ 333, PT_SC, ucp_Glagolitic }, { 331, PT_SC, ucp_Ethiopic },
{ 344, PT_SC, ucp_Gothic }, { 340, PT_SC, ucp_Georgian },
{ 351, PT_SC, ucp_Grantha }, { 349, PT_SC, ucp_Glagolitic },
{ 359, PT_SC, ucp_Greek }, { 360, PT_SC, ucp_Gothic },
{ 365, PT_SC, ucp_Gujarati }, { 367, PT_SC, ucp_Grantha },
{ 374, PT_SC, ucp_Gurmukhi }, { 375, PT_SC, ucp_Greek },
{ 383, PT_SC, ucp_Han }, { 381, PT_SC, ucp_Gujarati },
{ 387, PT_SC, ucp_Hangul }, { 390, PT_SC, ucp_Gurmukhi },
{ 394, PT_SC, ucp_Hanunoo }, { 399, PT_SC, ucp_Han },
{ 402, PT_SC, ucp_Hatran }, { 403, PT_SC, ucp_Hangul },
{ 409, PT_SC, ucp_Hebrew }, { 410, PT_SC, ucp_Hanunoo },
{ 416, PT_SC, ucp_Hiragana }, { 418, PT_SC, ucp_Hatran },
{ 425, PT_SC, ucp_Imperial_Aramaic }, { 425, PT_SC, ucp_Hebrew },
{ 442, PT_SC, ucp_Inherited }, { 432, PT_SC, ucp_Hiragana },
{ 452, PT_SC, ucp_Inscriptional_Pahlavi }, { 441, PT_SC, ucp_Imperial_Aramaic },
{ 474, PT_SC, ucp_Inscriptional_Parthian }, { 458, PT_SC, ucp_Inherited },
{ 497, PT_SC, ucp_Javanese }, { 468, PT_SC, ucp_Inscriptional_Pahlavi },
{ 506, PT_SC, ucp_Kaithi }, { 490, PT_SC, ucp_Inscriptional_Parthian },
{ 513, PT_SC, ucp_Kannada }, { 513, PT_SC, ucp_Javanese },
{ 521, PT_SC, ucp_Katakana }, { 522, PT_SC, ucp_Kaithi },
{ 530, PT_SC, ucp_Kayah_Li }, { 529, PT_SC, ucp_Kannada },
{ 539, PT_SC, ucp_Kharoshthi }, { 537, PT_SC, ucp_Katakana },
{ 550, PT_SC, ucp_Khmer }, { 546, PT_SC, ucp_Kayah_Li },
{ 556, PT_SC, ucp_Khojki }, { 555, PT_SC, ucp_Kharoshthi },
{ 563, PT_SC, ucp_Khudawadi }, { 566, PT_SC, ucp_Khmer },
{ 573, PT_GC, ucp_L }, { 572, PT_SC, ucp_Khojki },
{ 575, PT_LAMP, 0 }, { 579, PT_SC, ucp_Khudawadi },
{ 578, PT_SC, ucp_Lao }, { 589, PT_GC, ucp_L },
{ 582, PT_SC, ucp_Latin }, { 591, PT_LAMP, 0 },
{ 588, PT_SC, ucp_Lepcha }, { 594, PT_SC, ucp_Lao },
{ 595, PT_SC, ucp_Limbu }, { 598, PT_SC, ucp_Latin },
{ 601, PT_SC, ucp_Linear_A }, { 604, PT_SC, ucp_Lepcha },
{ 610, PT_SC, ucp_Linear_B }, { 611, PT_SC, ucp_Limbu },
{ 619, PT_SC, ucp_Lisu }, { 617, PT_SC, ucp_Linear_A },
{ 624, PT_PC, ucp_Ll }, { 626, PT_SC, ucp_Linear_B },
{ 627, PT_PC, ucp_Lm }, { 635, PT_SC, ucp_Lisu },
{ 630, PT_PC, ucp_Lo }, { 640, PT_PC, ucp_Ll },
{ 633, PT_PC, ucp_Lt }, { 643, PT_PC, ucp_Lm },
{ 636, PT_PC, ucp_Lu }, { 646, PT_PC, ucp_Lo },
{ 639, PT_SC, ucp_Lycian }, { 649, PT_PC, ucp_Lt },
{ 646, PT_SC, ucp_Lydian }, { 652, PT_PC, ucp_Lu },
{ 653, PT_GC, ucp_M }, { 655, PT_SC, ucp_Lycian },
{ 655, PT_SC, ucp_Mahajani }, { 662, PT_SC, ucp_Lydian },
{ 664, PT_SC, ucp_Malayalam }, { 669, PT_GC, ucp_M },
{ 674, PT_SC, ucp_Mandaic }, { 671, PT_SC, ucp_Mahajani },
{ 682, PT_SC, ucp_Manichaean }, { 680, PT_SC, ucp_Malayalam },
{ 693, PT_PC, ucp_Mc }, { 690, PT_SC, ucp_Mandaic },
{ 696, PT_PC, ucp_Me }, { 698, PT_SC, ucp_Manichaean },
{ 699, PT_SC, ucp_Meetei_Mayek }, { 709, PT_SC, ucp_Marchen },
{ 712, PT_SC, ucp_Mende_Kikakui }, { 717, PT_SC, ucp_Masaram_Gondi },
{ 726, PT_SC, ucp_Meroitic_Cursive }, { 731, PT_PC, ucp_Mc },
{ 743, PT_SC, ucp_Meroitic_Hieroglyphs }, { 734, PT_PC, ucp_Me },
{ 764, PT_SC, ucp_Miao }, { 737, PT_SC, ucp_Meetei_Mayek },
{ 769, PT_PC, ucp_Mn }, { 750, PT_SC, ucp_Mende_Kikakui },
{ 772, PT_SC, ucp_Modi }, { 764, PT_SC, ucp_Meroitic_Cursive },
{ 777, PT_SC, ucp_Mongolian }, { 781, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 787, PT_SC, ucp_Mro }, { 802, PT_SC, ucp_Miao },
{ 791, PT_SC, ucp_Multani }, { 807, PT_PC, ucp_Mn },
{ 799, PT_SC, ucp_Myanmar }, { 810, PT_SC, ucp_Modi },
{ 807, PT_GC, ucp_N }, { 815, PT_SC, ucp_Mongolian },
{ 809, PT_SC, ucp_Nabataean }, { 825, PT_SC, ucp_Mro },
{ 819, PT_PC, ucp_Nd }, { 829, PT_SC, ucp_Multani },
{ 822, PT_SC, ucp_New_Tai_Lue }, { 837, PT_SC, ucp_Myanmar },
{ 834, PT_SC, ucp_Nko }, { 845, PT_GC, ucp_N },
{ 838, PT_PC, ucp_Nl }, { 847, PT_SC, ucp_Nabataean },
{ 841, PT_PC, ucp_No }, { 857, PT_PC, ucp_Nd },
{ 844, PT_SC, ucp_Ogham }, { 860, PT_SC, ucp_New_Tai_Lue },
{ 850, PT_SC, ucp_Ol_Chiki }, { 872, PT_SC, ucp_Newa },
{ 859, PT_SC, ucp_Old_Hungarian }, { 877, PT_SC, ucp_Nko },
{ 873, PT_SC, ucp_Old_Italic }, { 881, PT_PC, ucp_Nl },
{ 884, PT_SC, ucp_Old_North_Arabian }, { 884, PT_PC, ucp_No },
{ 902, PT_SC, ucp_Old_Permic }, { 887, PT_SC, ucp_Nushu },
{ 913, PT_SC, ucp_Old_Persian }, { 893, PT_SC, ucp_Ogham },
{ 925, PT_SC, ucp_Old_South_Arabian }, { 899, PT_SC, ucp_Ol_Chiki },
{ 943, PT_SC, ucp_Old_Turkic }, { 908, PT_SC, ucp_Old_Hungarian },
{ 954, PT_SC, ucp_Oriya }, { 922, PT_SC, ucp_Old_Italic },
{ 960, PT_SC, ucp_Osmanya }, { 933, PT_SC, ucp_Old_North_Arabian },
{ 968, PT_GC, ucp_P }, { 951, PT_SC, ucp_Old_Permic },
{ 970, PT_SC, ucp_Pahawh_Hmong }, { 962, PT_SC, ucp_Old_Persian },
{ 983, PT_SC, ucp_Palmyrene }, { 974, PT_SC, ucp_Old_South_Arabian },
{ 993, PT_SC, ucp_Pau_Cin_Hau }, { 992, PT_SC, ucp_Old_Turkic },
{ 1005, PT_PC, ucp_Pc }, { 1003, PT_SC, ucp_Oriya },
{ 1008, PT_PC, ucp_Pd }, { 1009, PT_SC, ucp_Osage },
{ 1011, PT_PC, ucp_Pe }, { 1015, PT_SC, ucp_Osmanya },
{ 1014, PT_PC, ucp_Pf }, { 1023, PT_GC, ucp_P },
{ 1017, PT_SC, ucp_Phags_Pa }, { 1025, PT_SC, ucp_Pahawh_Hmong },
{ 1026, PT_SC, ucp_Phoenician }, { 1038, PT_SC, ucp_Palmyrene },
{ 1037, PT_PC, ucp_Pi }, { 1048, PT_SC, ucp_Pau_Cin_Hau },
{ 1040, PT_PC, ucp_Po }, { 1060, PT_PC, ucp_Pc },
{ 1043, PT_PC, ucp_Ps }, { 1063, PT_PC, ucp_Pd },
{ 1046, PT_SC, ucp_Psalter_Pahlavi }, { 1066, PT_PC, ucp_Pe },
{ 1062, PT_SC, ucp_Rejang }, { 1069, PT_PC, ucp_Pf },
{ 1069, PT_SC, ucp_Runic }, { 1072, PT_SC, ucp_Phags_Pa },
{ 1075, PT_GC, ucp_S }, { 1081, PT_SC, ucp_Phoenician },
{ 1077, PT_SC, ucp_Samaritan }, { 1092, PT_PC, ucp_Pi },
{ 1087, PT_SC, ucp_Saurashtra }, { 1095, PT_PC, ucp_Po },
{ 1098, PT_PC, ucp_Sc }, { 1098, PT_PC, ucp_Ps },
{ 1101, PT_SC, ucp_Sharada }, { 1101, PT_SC, ucp_Psalter_Pahlavi },
{ 1109, PT_SC, ucp_Shavian }, { 1117, PT_SC, ucp_Rejang },
{ 1117, PT_SC, ucp_Siddham }, { 1124, PT_SC, ucp_Runic },
{ 1125, PT_SC, ucp_SignWriting }, { 1130, PT_GC, ucp_S },
{ 1137, PT_SC, ucp_Sinhala }, { 1132, PT_SC, ucp_Samaritan },
{ 1145, PT_PC, ucp_Sk }, { 1142, PT_SC, ucp_Saurashtra },
{ 1148, PT_PC, ucp_Sm }, { 1153, PT_PC, ucp_Sc },
{ 1151, PT_PC, ucp_So }, { 1156, PT_SC, ucp_Sharada },
{ 1154, PT_SC, ucp_Sora_Sompeng }, { 1164, PT_SC, ucp_Shavian },
{ 1167, PT_SC, ucp_Sundanese }, { 1172, PT_SC, ucp_Siddham },
{ 1177, PT_SC, ucp_Syloti_Nagri }, { 1180, PT_SC, ucp_SignWriting },
{ 1190, PT_SC, ucp_Syriac }, { 1192, PT_SC, ucp_Sinhala },
{ 1197, PT_SC, ucp_Tagalog }, { 1200, PT_PC, ucp_Sk },
{ 1205, PT_SC, ucp_Tagbanwa }, { 1203, PT_PC, ucp_Sm },
{ 1214, PT_SC, ucp_Tai_Le }, { 1206, PT_PC, ucp_So },
{ 1221, PT_SC, ucp_Tai_Tham }, { 1209, PT_SC, ucp_Sora_Sompeng },
{ 1230, PT_SC, ucp_Tai_Viet }, { 1222, PT_SC, ucp_Soyombo },
{ 1239, PT_SC, ucp_Takri }, { 1230, PT_SC, ucp_Sundanese },
{ 1245, PT_SC, ucp_Tamil }, { 1240, PT_SC, ucp_Syloti_Nagri },
{ 1251, PT_SC, ucp_Telugu }, { 1253, PT_SC, ucp_Syriac },
{ 1258, PT_SC, ucp_Thaana }, { 1260, PT_SC, ucp_Tagalog },
{ 1265, PT_SC, ucp_Thai }, { 1268, PT_SC, ucp_Tagbanwa },
{ 1270, PT_SC, ucp_Tibetan }, { 1277, PT_SC, ucp_Tai_Le },
{ 1278, PT_SC, ucp_Tifinagh }, { 1284, PT_SC, ucp_Tai_Tham },
{ 1287, PT_SC, ucp_Tirhuta }, { 1293, PT_SC, ucp_Tai_Viet },
{ 1295, PT_SC, ucp_Ugaritic }, { 1302, PT_SC, ucp_Takri },
{ 1304, PT_SC, ucp_Vai }, { 1308, PT_SC, ucp_Tamil },
{ 1308, PT_SC, ucp_Warang_Citi }, { 1314, PT_SC, ucp_Tangut },
{ 1320, PT_ALNUM, 0 }, { 1321, PT_SC, ucp_Telugu },
{ 1324, PT_PXSPACE, 0 }, { 1328, PT_SC, ucp_Thaana },
{ 1328, PT_SPACE, 0 }, { 1335, PT_SC, ucp_Thai },
{ 1332, PT_UCNC, 0 }, { 1340, PT_SC, ucp_Tibetan },
{ 1336, PT_WORD, 0 }, { 1348, PT_SC, ucp_Tifinagh },
{ 1340, PT_SC, ucp_Yi }, { 1357, PT_SC, ucp_Tirhuta },
{ 1343, PT_GC, ucp_Z }, { 1365, PT_SC, ucp_Ugaritic },
{ 1345, PT_PC, ucp_Zl }, { 1374, PT_SC, ucp_Vai },
{ 1348, PT_PC, ucp_Zp }, { 1378, PT_SC, ucp_Warang_Citi },
{ 1351, PT_PC, ucp_Zs } { 1390, PT_ALNUM, 0 },
{ 1394, PT_PXSPACE, 0 },
{ 1398, PT_SPACE, 0 },
{ 1402, PT_UCNC, 0 },
{ 1406, PT_WORD, 0 },
{ 1410, PT_SC, ucp_Yi },
{ 1413, PT_GC, ucp_Z },
{ 1415, PT_SC, ucp_Zanabazar_Square },
{ 1432, PT_PC, ucp_Zl },
{ 1435, PT_PC, ucp_Zp },
{ 1438, PT_PC, ucp_Zs }
}; };
const size_t PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table); const size_t PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);

File diff suppressed because it is too large Load Diff

View File

@ -100,9 +100,7 @@ enum {
ucp_Zs /* Space separator */ ucp_Zs /* Space separator */
}; };
/* These are grapheme break properties. Note that the code for processing them /* These are grapheme break properties. */
assumes that the values are less than 16. If more values are added that take
the number to 16 or more, the code will have to be rewritten. */
enum { enum {
ucp_gbCR, /* 0 */ ucp_gbCR, /* 0 */
@ -117,7 +115,12 @@ enum {
ucp_gbLV, /* 9 Hangul syllable type LV */ ucp_gbLV, /* 9 Hangul syllable type LV */
ucp_gbLVT, /* 10 Hangul syllable type LVT */ ucp_gbLVT, /* 10 Hangul syllable type LVT */
ucp_gbRegionalIndicator, /* 11 */ ucp_gbRegionalIndicator, /* 11 */
ucp_gbOther /* 12 */ ucp_gbOther, /* 12 */
ucp_gbE_Base, /* 13 */
ucp_gbE_Modifier, /* 14 */
ucp_gbE_Base_GAZ, /* 15 */
ucp_gbZWJ, /* 16 */
ucp_gbGlue_After_Zwj /* 17 */
}; };
/* These are the script identifications. */ /* These are the script identifications. */
@ -184,13 +187,13 @@ enum {
ucp_Tifinagh, ucp_Tifinagh,
ucp_Ugaritic, ucp_Ugaritic,
ucp_Yi, ucp_Yi,
/* New for Unicode 5.0: */ /* New for Unicode 5.0 */
ucp_Balinese, ucp_Balinese,
ucp_Cuneiform, ucp_Cuneiform,
ucp_Nko, ucp_Nko,
ucp_Phags_Pa, ucp_Phags_Pa,
ucp_Phoenician, ucp_Phoenician,
/* New for Unicode 5.1: */ /* New for Unicode 5.1 */
ucp_Carian, ucp_Carian,
ucp_Cham, ucp_Cham,
ucp_Kayah_Li, ucp_Kayah_Li,
@ -202,7 +205,7 @@ enum {
ucp_Saurashtra, ucp_Saurashtra,
ucp_Sundanese, ucp_Sundanese,
ucp_Vai, ucp_Vai,
/* New for Unicode 5.2: */ /* New for Unicode 5.2 */
ucp_Avestan, ucp_Avestan,
ucp_Bamum, ucp_Bamum,
ucp_Egyptian_Hieroglyphs, ucp_Egyptian_Hieroglyphs,
@ -218,11 +221,11 @@ enum {
ucp_Samaritan, ucp_Samaritan,
ucp_Tai_Tham, ucp_Tai_Tham,
ucp_Tai_Viet, ucp_Tai_Viet,
/* New for Unicode 6.0.0: */ /* New for Unicode 6.0.0 */
ucp_Batak, ucp_Batak,
ucp_Brahmi, ucp_Brahmi,
ucp_Mandaic, ucp_Mandaic,
/* New for Unicode 6.1.0: */ /* New for Unicode 6.1.0 */
ucp_Chakma, ucp_Chakma,
ucp_Meroitic_Cursive, ucp_Meroitic_Cursive,
ucp_Meroitic_Hieroglyphs, ucp_Meroitic_Hieroglyphs,
@ -230,7 +233,7 @@ enum {
ucp_Sharada, ucp_Sharada,
ucp_Sora_Sompeng, ucp_Sora_Sompeng,
ucp_Takri, ucp_Takri,
/* New for Unicode 7.0.0: */ /* New for Unicode 7.0.0 */
ucp_Bassa_Vah, ucp_Bassa_Vah,
ucp_Caucasian_Albanian, ucp_Caucasian_Albanian,
ucp_Duployan, ucp_Duployan,
@ -254,13 +257,24 @@ enum {
ucp_Siddham, ucp_Siddham,
ucp_Tirhuta, ucp_Tirhuta,
ucp_Warang_Citi, ucp_Warang_Citi,
/* New for Unicode 8.0.0: */ /* New for Unicode 8.0.0 */
ucp_Ahom, ucp_Ahom,
ucp_Anatolian_Hieroglyphs, ucp_Anatolian_Hieroglyphs,
ucp_Hatran, ucp_Hatran,
ucp_Multani, ucp_Multani,
ucp_Old_Hungarian, ucp_Old_Hungarian,
ucp_SignWriting ucp_SignWriting,
/* New for Unicode 10.0.0 (no update since 8.0.0) */
ucp_Adlam,
ucp_Bhaiksuki,
ucp_Marchen,
ucp_Newa,
ucp_Osage,
ucp_Tangut,
ucp_Masaram_Gondi,
ucp_Nushu,
ucp_Soyombo,
ucp_Zanabazar_Square
}; };
#endif /* PCRE2_UCP_H_IDEMPOTENT_GUARD */ #endif /* PCRE2_UCP_H_IDEMPOTENT_GUARD */

View File

@ -473,6 +473,12 @@ so many of them that they are split into two fields. */
#define CTL_UTF8_INPUT 0x40000000u #define CTL_UTF8_INPUT 0x40000000u
#define CTL_ZERO_TERMINATE 0x80000000u #define CTL_ZERO_TERMINATE 0x80000000u
/* Combinations */
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL)
/* Second control word */ /* Second control word */
#define CTL2_SUBSTITUTE_EXTENDED 0x00000001u #define CTL2_SUBSTITUTE_EXTENDED 0x00000001u
@ -480,15 +486,10 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u #define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u #define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u #define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
#define CTL_NL_SET 0x40000000u /* Informational */ #define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL_BSR_SET 0x80000000u /* Informational */ #define CTL2_BSR_SET 0x80000000u /* Informational */
/* Combinations */
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL)
/* These are the matching controls that may be set either on a pattern or on a /* These are the matching controls that may be set either on a pattern or on a
data line. They are copied from the pattern controls as initial settings for data line. They are copied from the pattern controls as initial settings for
@ -601,6 +602,7 @@ static modstruct modlist[] = {
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) }, { "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) }, { "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) }, { "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) }, { "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) }, { "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
{ "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) }, { "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) },
@ -723,7 +725,7 @@ static modstruct modlist[] = {
CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \ CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH) CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL_BSR_SET|CTL_NL_SET) #define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET)
/* Controls that apply only at compile time with 'push'. */ /* Controls that apply only at compile time with 'push'. */
@ -3688,8 +3690,8 @@ for (;;)
#else #else
*((uint16_t *)field) = PCRE2_BSR_UNICODE; *((uint16_t *)field) = PCRE2_BSR_UNICODE;
#endif #endif
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_BSR_SET; if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_BSR_SET;
else dctl->control2 &= ~CTL_BSR_SET; else dctl->control2 &= ~CTL2_BSR_SET;
} }
else else
{ {
@ -3698,8 +3700,8 @@ for (;;)
else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0) else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0)
*((uint16_t *)field) = PCRE2_BSR_UNICODE; *((uint16_t *)field) = PCRE2_BSR_UNICODE;
else goto INVALID_VALUE; else goto INVALID_VALUE;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_BSR_SET; if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_BSR_SET;
else dctl->control2 |= CTL_BSR_SET; else dctl->control2 |= CTL2_BSR_SET;
} }
pp = ep; pp = ep;
break; break;
@ -3792,14 +3794,14 @@ for (;;)
if (i == 0) if (i == 0)
{ {
*((uint16_t *)field) = NEWLINE_DEFAULT; *((uint16_t *)field) = NEWLINE_DEFAULT;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_NL_SET; if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_NL_SET;
else dctl->control2 &= ~CTL_NL_SET; else dctl->control2 &= ~CTL2_NL_SET;
} }
else else
{ {
*((uint16_t *)field) = i; *((uint16_t *)field) = i;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_NL_SET; if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_NL_SET;
else dctl->control2 |= CTL_NL_SET; else dctl->control2 |= CTL2_NL_SET;
} }
pp = ep; pp = ep;
break; break;
@ -3971,7 +3973,7 @@ Returns: nothing
static void static void
show_controls(uint32_t controls, uint32_t controls2, const char *before) show_controls(uint32_t controls, uint32_t controls2, const char *before)
{ {
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s", fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before, before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "", ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "", ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -3979,10 +3981,11 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "", ((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "",
((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "", ((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "",
((controls & CTL_BINCODE) != 0)? " bincode" : "", ((controls & CTL_BINCODE) != 0)? " bincode" : "",
((controls2 & CTL_BSR_SET) != 0)? " bsr" : "", ((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "", ((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "", ((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "", ((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
((controls & CTL_DFA) != 0)? " dfa" : "", ((controls & CTL_DFA) != 0)? " dfa" : "",
((controls & CTL_EXPAND) != 0)? " expand" : "", ((controls & CTL_EXPAND) != 0)? " expand" : "",
((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "", ((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
@ -3996,7 +3999,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_JITVERIFY) != 0)? " jitverify" : "", ((controls & CTL_JITVERIFY) != 0)? " jitverify" : "",
((controls & CTL_MARK) != 0)? " mark" : "", ((controls & CTL_MARK) != 0)? " mark" : "",
((controls & CTL_MEMORY) != 0)? " memory" : "", ((controls & CTL_MEMORY) != 0)? " memory" : "",
((controls2 & CTL_NL_SET) != 0)? " newline" : "", ((controls2 & CTL2_NL_SET) != 0)? " newline" : "",
((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "", ((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "",
((controls & CTL_POSIX) != 0)? " posix" : "", ((controls & CTL_POSIX) != 0)? " posix" : "",
((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "", ((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "",
@ -4435,7 +4438,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
if (jchanged) fprintf(outfile, "Duplicate name status changes\n"); if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
if ((pat_patctl.control2 & CTL_BSR_SET) != 0 || if ((pat_patctl.control2 & CTL2_BSR_SET) != 0 ||
(FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0) (FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0)
fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)? fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)?
"any Unicode newline" : "CR, LF, or CRLF"); "any Unicode newline" : "CR, LF, or CRLF");
@ -5268,7 +5271,7 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB; if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB;
if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP; if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP;
if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE; if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE;
if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC; if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC;
if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE; if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE;
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL; if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY; if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
@ -5276,8 +5279,8 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0) if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
{ {
preg.re_endp = (char *)pbuffer8 + patlen; preg.re_endp = (char *)pbuffer8 + patlen;
cflags |= REG_PEND; cflags |= REG_PEND;
} }
rc = regcomp(&preg, (char *)pbuffer8, cflags); rc = regcomp(&preg, (char *)pbuffer8, cflags);
@ -5530,7 +5533,7 @@ if (test_mode == PCRE32_MODE && pbuffer32 != NULL)
appropriate default newline setting, local_newline_default will be non-zero. We appropriate default newline setting, local_newline_default will be non-zero. We
use this if there is no explicit newline modifier. */ use this if there is no explicit newline modifier. */
if ((pat_patctl.control2 & CTL_NL_SET) == 0 && local_newline_default != 0) if ((pat_patctl.control2 & CTL2_NL_SET) == 0 && local_newline_default != 0)
{ {
SETFLD(pat_context, newline_convention, local_newline_default); SETFLD(pat_context, newline_convention, local_newline_default);
} }
@ -5540,11 +5543,11 @@ NULL context. */
use_pat_context = ((pat_patctl.control & CTL_NULLCONTEXT) != 0)? use_pat_context = ((pat_patctl.control & CTL_NULLCONTEXT) != 0)?
NULL : PTR(pat_context); NULL : PTR(pat_context);
/* If PCRE2_LITERAL is set, set use_forbid_utf zero because PCRE2_NEVER_UTF /* If PCRE2_LITERAL is set, set use_forbid_utf zero because PCRE2_NEVER_UTF
and PCRE2_NEVER_UCP are invalid with it. */ and PCRE2_NEVER_UCP are invalid with it. */
if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0; if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0;
/* Compile many times when timing. */ /* Compile many times when timing. */
@ -5556,7 +5559,7 @@ if (timeit > 0)
{ {
clock_t start_time = clock(); clock_t start_time = clock();
PCRE2_COMPILE(compiled_code, pbuffer, patlen, PCRE2_COMPILE(compiled_code, pbuffer, patlen,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset, pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
use_pat_context); use_pat_context);
time_taken += clock() - start_time; time_taken += clock() - start_time;
if (TEST(compiled_code, !=, NULL)) if (TEST(compiled_code, !=, NULL))
@ -5665,7 +5668,7 @@ if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
/* If an explicit newline modifier was given, set the information flag in the /* If an explicit newline modifier was given, set the information flag in the
pattern so that it is preserved over push/pop. */ pattern so that it is preserved over push/pop. */
if ((pat_patctl.control2 & CTL_NL_SET) != 0) if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
{ {
SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET); SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET);
} }
@ -5822,11 +5825,11 @@ return capcount;
*************************************************/ *************************************************/
/* Called from a PCRE2 library as a result of the (?C) item. We print out where /* Called from a PCRE2 library as a result of the (?C) item. We print out where
we are in the match. Yield zero unless more callouts than the fail count, or we are in the match (unless suppressed). Yield zero unless more callouts than
the callout data is not zero. The only differences in the callout block for the fail count, or the callout data is not zero. The only differences in the
different code unit widths are that the pointers to the subject, the most callout block for different code unit widths are that the pointers to the
recent MARK, and a callout argument string point to strings of the appropriate subject, the most recent MARK, and a callout argument string point to strings
width. Casts can be used to deal with this. of the appropriate width. Casts can be used to deal with this.
Argument: a pointer to a callout block Argument: a pointer to a callout block
Return: Return:
@ -5839,6 +5842,7 @@ uint32_t i, pre_start, post_start, subject_length;
PCRE2_SIZE current_position; PCRE2_SIZE current_position;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0; BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0; BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
/* This FILE is used for echoing the subject. This is done only once in simple /* This FILE is used for echoing the subject. This is done only once in simple
cases. */ cases. */
@ -5887,75 +5891,82 @@ if (callout_capture)
} }
} }
/* Re-print the subject in canonical form (with escapes for non-printing /* Unless suppressed, re-print the subject in canonical form (with escapes for
characters), the first time, or if giving full details. On subsequent calls in non-printing characters), the first time, or if giving full details. On
the same match, we use PCHARS() just to find the printed lengths of the subsequent calls in the same match, we use PCHARS() just to find the printed
substrings. */ lengths of the substrings. */
if (f != NULL) fprintf(f, "--->"); if (callout_where)
/* The subject before the match start. */
PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
/* If a lookbehind is involved, the current position may be earlier than the
match start. If so, use the match start instead. */
current_position = (cb->current_position >= cb->start_match)?
cb->current_position : cb->start_match;
/* The subject between the match start and the current position. */
PCHARS(post_start, cb->subject, cb->start_match,
current_position - cb->start_match, utf, f);
/* Print from the current position to the end. */
PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
utf, f);
/* Calculate the total subject printed length (no print). */
PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
if (f != NULL) fprintf(f, "\n");
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
callout whose number has not already been shown with captured strings, show the
number here. A callout with a string argument has been displayed above. */
if (cb->callout_number == 255)
{ {
fprintf(outfile, "%+3d ", (int)cb->pattern_position);
if (cb->pattern_position > 99) fprintf(outfile, "\n ");
}
else
{
if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " ");
else fprintf(outfile, "%3d ", cb->callout_number);
}
/* Now show position indicators */ if (f != NULL) fprintf(f, "--->");
for (i = 0; i < pre_start; i++) fprintf(outfile, " "); /* The subject before the match start. */
fprintf(outfile, "^");
if (post_start > 0) PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
{
for (i = 0; i < post_start - 1; i++) fprintf(outfile, " "); /* If a lookbehind is involved, the current position may be earlier than the
match start. If so, use the match start instead. */
current_position = (cb->current_position >= cb->start_match)?
cb->current_position : cb->start_match;
/* The subject between the match start and the current position. */
PCHARS(post_start, cb->subject, cb->start_match,
current_position - cb->start_match, utf, f);
/* Print from the current position to the end. */
PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
utf, f);
/* Calculate the total subject printed length (no print). */
PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
if (f != NULL) fprintf(f, "\n");
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
callout whose number has not already been shown with captured strings, show the
number here. A callout with a string argument has been displayed above. */
if (cb->callout_number == 255)
{
fprintf(outfile, "%+3d ", (int)cb->pattern_position);
if (cb->pattern_position > 99) fprintf(outfile, "\n ");
}
else
{
if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " ");
else fprintf(outfile, "%3d ", cb->callout_number);
}
/* Now show position indicators */
for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
fprintf(outfile, "^"); fprintf(outfile, "^");
if (post_start > 0)
{
for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
fprintf(outfile, "^");
}
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
fprintf(outfile, " ");
if (cb->next_item_length != 0)
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
pbuffer8 + cb->pattern_position);
fprintf(outfile, "\n");
} }
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
fprintf(outfile, " ");
if (cb->next_item_length != 0)
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
pbuffer8 + cb->pattern_position);
fprintf(outfile, "\n");
first_callout = FALSE; first_callout = FALSE;
/* Show any mark info */
if (cb->mark != last_callout_mark) if (cb->mark != last_callout_mark)
{ {
if (cb->mark == NULL) if (cb->mark == NULL)
@ -5969,6 +5980,8 @@ if (cb->mark != last_callout_mark)
last_callout_mark = cb->mark; last_callout_mark = cb->mark;
} }
/* Show callout data */
if (callout_data_ptr != NULL) if (callout_data_ptr != NULL)
{ {
int callout_data = *((int32_t *)callout_data_ptr); int callout_data = *((int32_t *)callout_data_ptr);
@ -5979,6 +5992,8 @@ if (callout_data_ptr != NULL)
} }
} }
/* Keep count and give the appropriate return code */
callout_count++; callout_count++;
if (cb->callout_number == dat_datctl.cerror[0] && if (cb->callout_number == dat_datctl.cerror[0] &&

30
testdata/testinput5 vendored
View File

@ -6,14 +6,16 @@
#newline_default lf any anycrlf #newline_default lf any anycrlf
# PCRE2 and Perl disagree about the characteristics of certain Unicode # PCRE2 and Perl disagree about the characteristics of certain Unicode
# characters. For example, 061C is considered by Perl to be Arabic, though # characters. For example, 061C was considered by Perl to be Arabic, though
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are # it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
# graphic and printable according to Perl, though they are actually "isolate" # However, it *is* in that file for Unicode 10, but when I came to re-check,
# control characters. That is why the following tests are here rather than in # Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
# test 4.
# 2066-2069 are graphic and printable according to Perl, though they are
# actually "isolate" control characters. That is why the following tests are
# here rather than in test 4.
/^[\p{Arabic}]/utf /^[\p{Arabic}]/utf
\= Expect no match
\x{061c} \x{061c}
/^[[:graph:]]+$/utf,ucp /^[[:graph:]]+$/utf,ucp
@ -2022,5 +2024,21 @@
/Aሴ+B/literal,utf,no_utf_check /Aሴ+B/literal,utf,no_utf_check
Aሴ+B Aሴ+B
# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
# doesn't recognize all these scripts. In time these three tests can be moved
# to test 4.
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
/^\x{1E900}\x{104B0}/i,utf
\x{1E900}\x{104B0}
\x{1E922}\x{104D8}
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
# End of testinput5 # End of testinput5

95
testdata/testoutput5 vendored
View File

@ -6,16 +6,18 @@
#newline_default lf any anycrlf #newline_default lf any anycrlf
# PCRE2 and Perl disagree about the characteristics of certain Unicode # PCRE2 and Perl disagree about the characteristics of certain Unicode
# characters. For example, 061C is considered by Perl to be Arabic, though # characters. For example, 061C was considered by Perl to be Arabic, though
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are # it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
# graphic and printable according to Perl, though they are actually "isolate" # However, it *is* in that file for Unicode 10, but when I came to re-check,
# control characters. That is why the following tests are here rather than in # Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
# test 4.
# 2066-2069 are graphic and printable according to Perl, though they are
# actually "isolate" control characters. That is why the following tests are
# here rather than in test 4.
/^[\p{Arabic}]/utf /^[\p{Arabic}]/utf
\= Expect no match
\x{061c} \x{061c}
No match 0: \x{61c}
/^[[:graph:]]+$/utf,ucp /^[[:graph:]]+$/utf,ucp
\= Expect no match \= Expect no match
@ -4585,5 +4587,84 @@ No match
/Aሴ+B/literal,utf,no_utf_check /Aሴ+B/literal,utf,no_utf_check
Aሴ+B Aሴ+B
0: A\x{1234}+B 0: A\x{1234}+B
# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
# doesn't recognize all these scripts. In time these three tests can be moved
# to test 4.
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
1: \x{1e900}\x{1e924}\x{1e953}
2: \x{11c00}\x{11c2d}\x{11c3e}
3: \x{11c70}\x{11c77}\x{11cab}
4: \x{11400}\x{1142f}\x{11455}
5: \x{104b0}\x{104d8}\x{104fb}
6: \x{16fe0}\x{18800}\x{18af2}
7: \x{11d00}\x{11d3a}\x{11d59}
8: \x{16fe1}\x{1b170}\x{1b2fb}
9: \x{11a50}\x{11a58}\x{11aa2}
10: \x{11a00}\x{11a07}\x{11a47}
/^\x{1E900}\x{104B0}/i,utf
\x{1E900}\x{104B0}
0: \x{1e900}\x{104b0}
\x{1E922}\x{104D8}
0: \x{1e922}\x{104d8}
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
Callout 0: last capture = 1
1: \x{1e900}
Callout 0: last capture = 1
1: \x{1e924}
Callout 0: last capture = 1
1: \x{1e953}
Callout 0: last capture = 1
1: \x{11c00}
Callout 0: last capture = 1
1: \x{11c2d}\x{11c3e}
Callout 0: last capture = 1
1: \x{11c70}
Callout 0: last capture = 1
1: \x{11c77}\x{11cab}
Callout 0: last capture = 1
1: \x{11400}
Callout 0: last capture = 1
1: \x{1142f}
Callout 0: last capture = 1
1: \x{11455}
Callout 0: last capture = 1
1: \x{104b0}
Callout 0: last capture = 1
1: \x{104d8}
Callout 0: last capture = 1
1: \x{104fb}
Callout 0: last capture = 1
1: \x{16fe0}
Callout 0: last capture = 1
1: \x{18800}
Callout 0: last capture = 1
1: \x{18af2}
Callout 0: last capture = 1
1: \x{11d00}\x{11d3a}
Callout 0: last capture = 1
1: \x{11d59}
Callout 0: last capture = 1
1: \x{16fe1}
Callout 0: last capture = 1
1: \x{1b170}
Callout 0: last capture = 1
1: \x{1b2fb}
Callout 0: last capture = 1
1: \x{11a50}\x{11a58}
Callout 0: last capture = 1
1: \x{11aa2}
Callout 0: last capture = 1
1: \x{11a00}\x{11a07}\x{11a47}
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
1: \x{11a00}\x{11a07}\x{11a47}
# End of testinput5 # End of testinput5