Update to Unicode 10.0.0 and add callout_no_where to pcre2test to aid testing.

This commit is contained in:
Philip.Hazel 2017-07-02 16:32:01 +00:00
parent b7d5cee61f
commit 41bb787fb3
22 changed files with 6797 additions and 3360 deletions

View File

@ -209,6 +209,9 @@ much faster.
because this can give a fast "no match" without searching for a "required code
unit". Previously only non-anchored patterns did this.
47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
48. Add the callout_no_where modifier to pcre2test.
Version 10.23 14-February-2017

View File

@ -171,7 +171,10 @@ library. They are also documented in the pcre2build man page.
give large performance improvements on certain platforms, add --enable-jit to
the "configure" command. This support is available only for certain hardware
architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error.
will be a compile time error. If you are running under SELinux you may also
want to add --enable-jit-sealloc, which enables the use of an execmem
allocator in JIT that is compatible with SELinux. This has no effect if JIT
is not enabled.
. If you do not want to make use of the default support for UTF-8 Unicode
character strings in the 8-bit library, UTF-16 Unicode character strings in
@ -874,4 +877,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 11 April 2017
Last updated: 17 June 2017

View File

@ -170,8 +170,13 @@ Just-in-time (JIT) compiler support is included in the build by specifying
--enable-jit
</pre>
This support is available only for certain hardware architectures. If this
option is set for an unsupported architecture, a building error occurs.
See the
option is set for an unsupported architecture, a building error occurs. If you
are running under SELinux you may also want to add
<pre>
--enable-jit-sealloc
</pre>
which enables the use of an execmem allocator in JIT that is compatible with
SELinux. This has no effect if JIT is not enabled. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for a discussion of JIT usage. When JIT support is enabled,
pcre2grep automatically makes use of it, unless you add
@ -516,7 +521,7 @@ contains a single function called LLVMFuzzerTestOneInput() whose arguments are
a pointer to a string and the length of the string. When called, this function
tries to compile the string as a pattern, and if that succeeds, to match it.
This is done both with no options and with some random options bits that are
generated from the string.
generated from the string.
</P>
<P>
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
@ -529,13 +534,13 @@ file are the test string.
</P>
<br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br>
<P>
In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the <b>pcre2_match()</b> function. The default was to use the
In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the <b>pcre2_match()</b> function. The default was to use the
system stack, but if
<pre>
--disable-stack-for-recursion
</pre>
was set, memory on the heap was used. From release 10.30 onwards this has
was set, memory on the heap was used. From release 10.30 onwards this has
changed (the stack is no longer used) and this option now does nothing except
give a warning.
</P>
@ -554,7 +559,7 @@ Cambridge, England.
</P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P>
Last updated: 30 May 2017
Last updated: 17 June 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -755,6 +755,7 @@ Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is:
</P>
<P>
Adlam,
Ahom,
Anatolian_Hieroglyphs,
Arabic,
@ -765,6 +766,7 @@ Bamum,
Bassa_Vah,
Batak,
Bengali,
Bhaiksuki,
Bopomofo,
Brahmi,
Braille,
@ -826,6 +828,8 @@ Mahajani,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@ -838,7 +842,9 @@ Multani,
Myanmar,
Nabataean,
New_Tai_Lue,
Newa,
Nko,
Nushu,
Ogham,
Ol_Chiki,
Old_Hungarian,
@ -849,6 +855,7 @@ Old_Persian,
Old_South_Arabian,
Old_Turkic,
Oriya,
Osage,
Osmanya,
Pahawh_Hmong,
Palmyrene,
@ -866,6 +873,7 @@ Siddham,
SignWriting,
Sinhala,
Sora_Sompeng,
Soyombo,
Sundanese,
Syloti_Nagri,
Syriac,
@ -876,6 +884,7 @@ Tai_Tham,
Tai_Viet,
Takri,
Tamil,
Tangut,
Telugu,
Thaana,
Thai,
@ -885,7 +894,8 @@ Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
Yi,
Zanabazar_Square.
</P>
<P>
Each character has exactly one Unicode general category property, specified by
@ -3445,7 +3455,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 30 May 2017
Last updated: 02 July 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -568,7 +568,7 @@ Setting compilation options
</b><br>
<P>
The following modifiers set options for <b>pcre2_compile()</b>. Most of them set
bits in the options argument of that function, but those whose names start with
bits in the options argument of that function, but those whose names start with
PCRE2_EXTRA are additional options that are set in the compile context. For the
main options, there are some single-letter abbreviations that are the same as
Perl options. There is special handling for /x: if a second x is present,
@ -579,25 +579,25 @@ way <b>pcre2_compile()</b> behaves. See
for a description of the effects of these options.
<pre>
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL
dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD
match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP
@ -631,7 +631,7 @@ heavily used in the test files.
/B bincode show binary code without lengths
callout_info show callout information
debug same as info,fullbincode
framesize show matching frame size
framesize show matching frame size
fullbincode show binary code with lengths
/I info show info about compiled pattern
hex unquoted characters are hexadecimal
@ -649,7 +649,7 @@ heavily used in the test files.
push push compiled pattern onto the stack
pushcopy push a copy onto the stack
stackguard=&#60;number&#62; test the stackguard feature
subject_literal treat all subject lines as literal
subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
@ -720,7 +720,7 @@ not necessarily the last character. These lines are omitted if no starting or
ending code units are recorded.
</P>
<P>
The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
number of capturing parentheses in the pattern.
</P>
@ -972,8 +972,8 @@ below. All other modifiers are either ignored, with a warning message, or cause
an error.
</P>
<P>
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
REG_PEND extension is used to pass it by length.
</P>
<br><b>
@ -1013,7 +1013,7 @@ are mutually exclusive.
Setting certain match controls
</b><br>
<P>
The following modifiers are really subject modifiers, and are described under
The following modifiers are really subject modifiers, and are described under
"Subject Modifiers" below. However, they may be included in a pattern's
modifier list, in which case they are applied to every subject line that is
processed with that pattern. They may not appear in <b>#pattern</b> commands.
@ -1040,9 +1040,9 @@ defaults, set them in a <b>#subject</b> command.
Specifying literal subject lines
</b><br>
<P>
If the <b>subject_literal</b> modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
If the <b>subject_literal</b> modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a <b>#subject</b> command are recognized.
</P>
<br><b>
@ -1054,7 +1054,8 @@ pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
line to contain a new pattern (or a command) instead of a subject line. This
facility is used when saving compiled patterns to a file, as described in the
section entitled "Saving and restoring compiled patterns"
<a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a>
<a href="#saverestore">below.</a>
If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled
pattern is stacked, leaving the original as current, ready to match the
following input lines. This provides a way of testing the
<b>pcre2_code_copy()</b> function.
@ -1103,18 +1104,18 @@ causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
<b>regexec()</b>. The other modifiers are ignored, with a warning message.
</P>
<P>
There is one additional modifier that can be used with the POSIX wrapper. It is
There is one additional modifier that can be used with the POSIX wrapper. It is
ignored (with a warning) if used for non-POSIX matching.
<pre>
posix_startend=&#60;n&#62;[:&#60;m&#62;]
posix_startend=&#60;n&#62;[:&#60;m&#62;]
</pre>
This causes the subject string to be passed to <b>regexec()</b> using the
REG_STARTEND option, which uses offsets to specify which part of the string is
searched. If only one number is given, the end offset is passed as the end of
the subject string. For more detail of REG_STARTEND, see the
<a href="pcre2posix.html"><b>pcre2posix</b></a>
documentation. If the subject string contains binary zeros (coded as escapes
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
documentation. If the subject string contains binary zeros (coded as escapes
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
its input), you must use <b>posix_startend</b> to specify its length.
</P>
<br><b>
@ -1135,6 +1136,7 @@ pattern.
callout_data=&#60;n&#62; set a value to pass via callouts
callout_error=&#60;n&#62;[:&#60;m&#62;] control callout error
callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
copy=&#60;number or name&#62; copy captured substring
depth_limit=&#60;n&#62; set a depth limit
@ -1230,29 +1232,10 @@ Testing callouts
</b><br>
<P>
A callout function is supplied when <b>pcre2test</b> calls the library matching
functions, unless <b>callout_none</b> is specified. If <b>callout_capture</b> is
set, the current captured groups are output when a callout occurs. The default
return from the callout function is zero, which allows matching to continue.
</P>
<P>
The <b>callout_fail</b> modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to backtrack)
when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;) are given, 1
is returned when callout &#60;n&#62; is reached and there have been at least &#60;m&#62;
callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence.
</P>
<P>
Note that callouts with string arguments are always given the number zero. See
"Callouts" below for a description of the output when a callout it taken.
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
functions, unless <b>callout_none</b> is specified. Its behaviour can be
controlled by various modifiers listed above whose names begin with
<b>callout_</b>. Details are given in the section entitled "Callouts"
<a href="#callouts">below.</a>
</P>
<br><b>
Finding all matches in a string
@ -1384,7 +1367,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Setting
zero reverts to the default of 32K. Providing a stack that is larger than the
default is necessary only for very complicated patterns. If <b>jitstack</b> is
set non-zero on a subject line it overrides any value that was set on the
set non-zero on a subject line it overrides any value that was set on the
pattern.
</P>
<br><b>
@ -1414,7 +1397,7 @@ The <i>match_limit</i> number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with
increasing length of subject string.
increasing length of subject string.
</P>
<P>
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
@ -1660,7 +1643,7 @@ restart the match with additional subject data by means of the
For further information about partial matching, see the
<a href="pcre2partial.html"><b>pcre2partial</b></a>
documentation.
</P>
<a name="callouts"></a></P>
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
<P>
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
@ -1669,8 +1652,33 @@ This works with both matching functions.
</P>
<P>
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
default, but you can use a <b>callout_fail</b> modifier in a subject line (as
described above) to change this and other parameters of the callout.
default, but you can use a <b>callout_fail</b> modifier in a subject line to
change this and other parameters of the callout.
</P>
<P>
If <b>callout_capture</b> is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
<b>callout_no_where</b> modifier is set.
</P>
<P>
The default return from the callout function is zero, which allows matching to
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence. Note that callouts with string arguments
are always given the number zero. See
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P>
<P>
Inserting callouts can be helpful when using <b>pcre2test</b> to check
@ -1858,7 +1866,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 16 June 2017
Last updated: 02 July 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -3543,9 +3543,14 @@ JUST-IN-TIME COMPILER SUPPORT
This support is available only for certain hardware architectures. If
this option is set for an unsupported architecture, a building error
occurs. See the pcre2jit documentation for a discussion of JIT usage.
When JIT support is enabled, pcre2grep automatically makes use of it,
unless you add
occurs. If you are running under SELinux you may also want to add
--enable-jit-sealloc
which enables the use of an execmem allocator in JIT that is compatible
with SELinux. This has no effect if JIT is not enabled. See the
pcre2jit documentation for a discussion of JIT usage. When JIT support
is enabled, pcre2grep automatically makes use of it, unless you add
--disable-pcre2grep-jit
@ -3554,14 +3559,14 @@ JUST-IN-TIME COMPILER SUPPORT
NEWLINE RECOGNITION
By default, PCRE2 interprets the linefeed (LF) character as indicating
the end of a line. This is the normal newline character on Unix-like
systems. You can compile PCRE2 to use carriage return (CR) instead, by
By default, PCRE2 interprets the linefeed (LF) character as indicating
the end of a line. This is the normal newline character on Unix-like
systems. You can compile PCRE2 to use carriage return (CR) instead, by
adding
--enable-newline-is-cr
to the configure command. There is also an --enable-newline-is-lf
to the configure command. There is also an --enable-newline-is-lf
option, which explicitly specifies linefeed as the newline character.
Alternatively, you can specify that line endings are to be indicated by
@ -3574,104 +3579,104 @@ NEWLINE RECOGNITION
--enable-newline-is-anycrlf
which causes PCRE2 to recognize any of the three sequences CR, LF, or
which causes PCRE2 to recognize any of the three sequences CR, LF, or
CRLF as indicating a line ending. Finally, a fifth option, specified by
--enable-newline-is-any
causes PCRE2 to recognize any Unicode newline sequence. The Unicode
causes PCRE2 to recognize any Unicode newline sequence. The Unicode
newline sequences are the three just mentioned, plus the single charac-
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
U+2029).
Whatever default line ending convention is selected when PCRE2 is built
can be overridden by applications that use the library. At build time
can be overridden by applications that use the library. At build time
it is conventional to use the standard for your operating system.
WHAT \R MATCHES
By default, the sequence \R in a pattern matches any Unicode newline
sequence, independently of what has been selected as the line ending
By default, the sequence \R in a pattern matches any Unicode newline
sequence, independently of what has been selected as the line ending
sequence. If you specify
--enable-bsr-anycrlf
the default is changed so that \R matches only CR, LF, or CRLF. What-
ever is selected when PCRE2 is built can be overridden by applications
the default is changed so that \R matches only CR, LF, or CRLF. What-
ever is selected when PCRE2 is built can be overridden by applications
that use the library.
HANDLING VERY LARGE PATTERNS
Within a compiled pattern, offset values are used to point from one
part to another (for example, from an opening parenthesis to an alter-
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
two-byte values are used for these offsets, leading to a maximum size
for a compiled pattern of around 64K code units. This is sufficient to
Within a compiled pattern, offset values are used to point from one
part to another (for example, from an opening parenthesis to an alter-
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
two-byte values are used for these offsets, leading to a maximum size
for a compiled pattern of around 64K code units. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do
want to process truly enormous patterns, so it is possible to compile
PCRE2 to use three-byte or four-byte offsets by adding a setting such
want to process truly enormous patterns, so it is possible to compile
PCRE2 to use three-byte or four-byte offsets by adding a setting such
as
--with-link-size=3
to the configure command. The value given must be 2, 3, or 4. For the
16-bit library, a value of 3 is rounded up to 4. In these libraries,
using longer offsets slows down the operation of PCRE2 because it has
to load additional data when handling them. For the 32-bit library the
value is always 4 and cannot be overridden; the value of --with-link-
to the configure command. The value given must be 2, 3, or 4. For the
16-bit library, a value of 3 is rounded up to 4. In these libraries,
using longer offsets slows down the operation of PCRE2 because it has
to load additional data when handling them. For the 32-bit library the
value is always 4 and cannot be overridden; the value of --with-link-
size is ignored.
LIMITING PCRE2 RESOURCE USAGE
The pcre2_match() function increments a counter each time it goes round
its main loop. Putting a limit on this counter controls the amount of
computing resource used by a single call to pcre2_match(). The limit
its main loop. Putting a limit on this counter controls the amount of
computing resource used by a single call to pcre2_match(). The limit
can be changed at run time, as described in the pcre2api documentation.
The default is 10 million, but this can be changed by adding a setting
The default is 10 million, but this can be changed by adding a setting
such as
--with-match-limit=500000
to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the
to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the
counting is done differently).
The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking
The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking
points there are (that is, the deeper the search tree), the more memory
is needed. If the initial vector is not large enough, heap memory is
is needed. If the initial vector is not large enough, heap memory is
used, up to a certain limit, which is specified in kilobytes. The limit
can be changed at run time, as described in the pcre2api documentation.
The default limit (in effect unlimited) is 20 million. You can change
The default limit (in effect unlimited) is 20 million. You can change
this by a setting such as
--with-heap-limit=500
which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply
which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply
to pcre2_dfa_match().
You can also explicitly limit the depth of nested backtracking in the
You can also explicitly limit the depth of nested backtracking in the
pcre2_match() interpreter. This limit defaults to the value that is set
for --with-match-limit. You can set a lower default limit by adding,
for --with-match-limit. You can set a lower default limit by adding,
for example,
--with-match-limit_depth=10000
to the configure command. This value can be overridden at run time.
This depth limit indirectly limits the amount of heap memory that is
used, but because the size of each backtracking "frame" depends on the
number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function
recursion was used for backtracking. However, as well as applying to
to the configure command. This value can be overridden at run time.
This depth limit indirectly limits the amount of heap memory that is
used, but because the size of each backtracking "frame" depends on the
number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function
recursion was used for backtracking. However, as well as applying to
pcre2_match(), this limit also controls the depth of recursive function
calls in pcre2_dfa_match(). These are used for lookaround assertions,
calls in pcre2_dfa_match(). These are used for lookaround assertions,
atomic groups, and recursion within patterns. The limit does not apply
to JIT matching.
@ -3680,45 +3685,45 @@ CREATING CHARACTER TABLES AT BUILD TIME
PCRE2 uses fixed tables for processing characters whose code points are
less than 256. By default, PCRE2 is built with a set of tables that are
distributed in the file src/pcre2_chartables.c.dist. These tables are
distributed in the file src/pcre2_chartables.c.dist. These tables are
for ASCII codes only. If you add
--enable-rebuild-chartables
to the configure command, the distributed tables are no longer used.
Instead, a program called dftables is compiled and run. This outputs
to the configure command, the distributed tables are no longer used.
Instead, a program called dftables is compiled and run. This outputs
the source for new set of tables, created in the default locale of your
C run-time system. This method of replacing the tables does not work if
you are cross compiling, because dftables is run on the local host. If
you need to create alternative tables when cross compiling, you will
you are cross compiling, because dftables is run on the local host. If
you need to create alternative tables when cross compiling, you will
have to do so "by hand".
USING EBCDIC CODE
PCRE2 assumes by default that it will run in an environment where the
character code is ASCII or Unicode, which is a superset of ASCII. This
PCRE2 assumes by default that it will run in an environment where the
character code is ASCII or Unicode, which is a superset of ASCII. This
is the case for most computer operating systems. PCRE2 can, however, be
compiled to run in an 8-bit EBCDIC environment by adding
--enable-ebcdic --disable-unicode
to the configure command. This setting implies --enable-rebuild-charta-
bles. You should only use it if you know that you are in an EBCDIC
bles. You should only use it if you know that you are in an EBCDIC
environment (for example, an IBM mainframe operating system).
It is not possible to support both EBCDIC and UTF-8 codes in the same
version of the library. Consequently, --enable-unicode and --enable-
It is not possible to support both EBCDIC and UTF-8 codes in the same
version of the library. Consequently, --enable-unicode and --enable-
ebcdic are mutually exclusive.
The EBCDIC character that corresponds to an ASCII LF is assumed to have
the value 0x15 by default. However, in some EBCDIC environments, 0x25
the value 0x15 by default. However, in some EBCDIC environments, 0x25
is used. In such an environment you should use
--enable-ebcdic-nl25
as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
acter (which, in Unicode, is 0x85).
@ -3731,34 +3736,34 @@ PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS
By default, on non-Windows systems, pcre2grep supports the use of call-
outs with string arguments within the patterns it is matching, in order
to run external scripts. For details, see the pcre2grep documentation.
This support can be disabled by adding --disable-pcre2grep-callout to
to run external scripts. For details, see the pcre2grep documentation.
This support can be disabled by adding --disable-pcre2grep-callout to
the configure command.
PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT
By default, pcre2grep reads all files as plain text. You can build it
so that it recognizes files whose names end in .gz or .bz2, and reads
By default, pcre2grep reads all files as plain text. You can build it
so that it recognizes files whose names end in .gz or .bz2, and reads
them with libz or libbz2, respectively, by adding one or both of
--enable-pcre2grep-libz
--enable-pcre2grep-libbz2
to the configure command. These options naturally require that the rel-
evant libraries are installed on your system. Configuration will fail
evant libraries are installed on your system. Configuration will fail
if they are not.
PCRE2GREP BUFFER SIZE
pcre2grep uses an internal buffer to hold a "window" on the file it is
pcre2grep uses an internal buffer to hold a "window" on the file it is
scanning, in order to be able to output "before" and "after" lines when
it finds a match. The starting size of the buffer is controlled by a
parameter whose default value is 20K. The buffer itself is three times
this size, but because of the way it is used for holding "before"
lines, the longest line that is guaranteed to be processable is the
parameter size. If a longer line is encountered, pcre2grep automati-
it finds a match. The starting size of the buffer is controlled by a
parameter whose default value is 20K. The buffer itself is three times
this size, but because of the way it is used for holding "before"
lines, the longest line that is guaranteed to be processable is the
parameter size. If a longer line is encountered, pcre2grep automati-
cally expands the buffer, up to a specified maximum size, whose default
is 1M or the starting size, whichever is the larger. You can change the
default parameter values by adding, for example,
@ -3766,8 +3771,8 @@ PCRE2GREP BUFFER SIZE
--with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152
to the configure command. The caller of pcre2grep can override these
values by using --buffer-size and --max-buffer-size on the command
to the configure command. The caller of pcre2grep can override these
values by using --buffer-size and --max-buffer-size on the command
line.
@ -3778,26 +3783,26 @@ PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
--enable-pcre2test-libreadline
--enable-pcre2test-libedit
to the configure command, pcre2test is linked with the libreadline
to the configure command, pcre2test is linked with the libreadline
orlibedit library, respectively, and when its input is from a terminal,
it reads it using the readline() function. This provides line-editing
and history facilities. Note that libreadline is GPL-licensed, so if
you distribute a binary of pcre2test linked in this way, there may be
it reads it using the readline() function. This provides line-editing
and history facilities. Note that libreadline is GPL-licensed, so if
you distribute a binary of pcre2test linked in this way, there may be
licensing issues. These can be avoided by linking instead with libedit,
which has a BSD licence.
Setting --enable-pcre2test-libreadline causes the -lreadline option to
be added to the pcre2test build. In many operating environments with a
sytem-installed readline library this is sufficient. However, in some
Setting --enable-pcre2test-libreadline causes the -lreadline option to
be added to the pcre2test build. In many operating environments with a
sytem-installed readline library this is sufficient. However, in some
environments (e.g. if an unmodified distribution version of readline is
in use), some extra configuration may be necessary. The INSTALL file
in use), some extra configuration may be necessary. The INSTALL file
for libreadline says this:
"Readline uses the termcap functions, but does not link with
the termcap or curses library itself, allowing applications
which link with readline the to choose an appropriate library."
If your environment has not been set up so that an appropriate library
If your environment has not been set up so that an appropriate library
is automatically included, you may need to add something like
LIBS="-ncurses"
@ -3811,7 +3816,7 @@ INCLUDING DEBUGGING CODE
--enable-debug
to the configure command, additional debugging code is included in the
to the configure command, additional debugging code is included in the
build. This feature is intended for use by the PCRE2 maintainers.
@ -3821,15 +3826,15 @@ DEBUGGING WITH VALGRIND SUPPORT
--enable-valgrind
to the configure command, PCRE2 will use valgrind annotations to mark
certain memory regions as unaddressable. This allows it to detect
invalid memory accesses, and is mostly useful for debugging PCRE2
to the configure command, PCRE2 will use valgrind annotations to mark
certain memory regions as unaddressable. This allows it to detect
invalid memory accesses, and is mostly useful for debugging PCRE2
itself.
CODE COVERAGE REPORTING
If your C compiler is gcc, you can build a version of PCRE2 that can
If your C compiler is gcc, you can build a version of PCRE2 that can
generate a code coverage report for its test suite. To enable this, you
must install lcov version 1.6 or above. Then specify
@ -3838,20 +3843,20 @@ CODE COVERAGE REPORTING
to the configure command and build PCRE2 in the usual way.
Note that using ccache (a caching C compiler) is incompatible with code
coverage reporting. If you have configured ccache to run automatically
coverage reporting. If you have configured ccache to run automatically
on your system, you must set the environment variable
CCACHE_DISABLE=1
before running make to build PCRE2, so that ccache is not used.
When --enable-coverage is used, the following addition targets are
When --enable-coverage is used, the following addition targets are
added to the Makefile:
make coverage
This creates a fresh coverage report for the PCRE2 test suite. It is
equivalent to running "make coverage-reset", "make coverage-baseline",
This creates a fresh coverage report for the PCRE2 test suite. It is
equivalent to running "make coverage-reset", "make coverage-baseline",
"make check", and then "make coverage-report".
make coverage-reset
@ -3868,56 +3873,56 @@ CODE COVERAGE REPORTING
make coverage-clean-report
This removes the generated coverage report without cleaning the cover-
This removes the generated coverage report without cleaning the cover-
age data itself.
make coverage-clean-data
This removes the captured coverage data without removing the coverage
This removes the captured coverage data without removing the coverage
files created at compile time (*.gcno).
make coverage-clean
This cleans all coverage data including the generated coverage report.
For more information about code coverage, see the gcov and lcov docu-
This cleans all coverage data including the generated coverage report.
For more information about code coverage, see the gcov and lcov docu-
mentation.
SUPPORT FOR FUZZERS
There is a special option for use by people who want to run fuzzing
There is a special option for use by people who want to run fuzzing
tests on PCRE2:
--enable-fuzz-support
At present this applies only to the 8-bit library. If set, it causes an
extra library called libpcre2-fuzzsupport.a to be built, but not
installed. This contains a single function called LLVMFuzzerTestOneIn-
put() whose arguments are a pointer to a string and the length of the
string. When called, this function tries to compile the string as a
pattern, and if that succeeds, to match it. This is done both with no
options and with some random options bits that are generated from the
extra library called libpcre2-fuzzsupport.a to be built, but not
installed. This contains a single function called LLVMFuzzerTestOneIn-
put() whose arguments are a pointer to a string and the length of the
string. When called, this function tries to compile the string as a
pattern, and if that succeeds, to match it. This is done both with no
options and with some random options bits that are generated from the
string.
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
zcheck to be created. This is normally run under valgrind or used when
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
zcheck to be created. This is normally run under valgrind or used when
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
function and outputs information about it is doing. The input strings
are specified by arguments: if an argument starts with "=" the rest of
it is a literal input string. Otherwise, it is assumed to be a file
function and outputs information about it is doing. The input strings
are specified by arguments: if an argument starts with "=" the rest of
it is a literal input string. Otherwise, it is assumed to be a file
name, and the contents of the file are the test string.
OBSOLETE OPTION
In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the pcre2_match() function. The default was to use the
In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the pcre2_match() function. The default was to use the
system stack, but if
--disable-stack-for-recursion
was set, memory on the heap was used. From release 10.30 onwards this
has changed (the stack is no longer used) and this option now does
was set, memory on the heap was used. From release 10.30 onwards this
has changed (the stack is no longer used) and this option now does
nothing except give a warning.
@ -3935,7 +3940,7 @@ AUTHOR
REVISION
Last updated: 30 May 2017
Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -6309,26 +6314,28 @@ BACKSLASH
Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is:
Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Balinese,
Bamum, Bassa_Vah, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese,
Buhid, Canadian_Aboriginal, Carian, Caucasian_Albanian, Chakma, Cham,
Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan, Ethiopic, Geor-
gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gurmukhi, Han,
Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited,
Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan-
nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha-
jani, Malayalam, Mandaic, Manichaean, Meetei_Mayek, Mende_Kikakui,
Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro,
Multani, Myanmar, Nabataean, New_Tai_Lue, Nko, Ogham, Ol_Chiki,
Old_Hungarian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
Old_South_Arabian, Old_Turkic, Oriya, Osmanya, Pahawh_Hmong, Palmyrene,
Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang, Runic,
Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, Sinhala,
Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa,
Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, Thaana, Thai,
Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi.
Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali-
nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba-
nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot,
Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan,
Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gur-
mukhi, Han, Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Ara-
maic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,
Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Kho-
jki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu,
Lycian, Lydian, Mahajani, Malayalam, Mandaic, Manichaean, Marchen,
Masaram_Gondi, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar,
Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pahawh_Hmong,
Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang,
Runic, Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting,
Sinhala, Sora_Sompeng, Soyombo, Sundanese, Syloti_Nagri, Syriac, Taga-
log, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Tangut, Tel-
ugu, Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai,
Warang_Citi, Yi, Zanabazar_Square.
Each character has exactly one Unicode general category property, spec-
ified by a two-letter abbreviation. For compatibility with Perl, nega-
@ -8737,7 +8744,7 @@ AUTHOR
REVISION
Last updated: 30 May 2017
Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
.TH PCRE2PATTERN 3 "02 July 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -754,6 +754,7 @@ example:
Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is:
.P
Adlam,
Ahom,
Anatolian_Hieroglyphs,
Arabic,
@ -764,6 +765,7 @@ Bamum,
Bassa_Vah,
Batak,
Bengali,
Bhaiksuki,
Bopomofo,
Brahmi,
Braille,
@ -825,6 +827,8 @@ Mahajani,
Malayalam,
Mandaic,
Manichaean,
Marchen,
Masaram_Gondi,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
@ -837,7 +841,9 @@ Multani,
Myanmar,
Nabataean,
New_Tai_Lue,
Newa,
Nko,
Nushu,
Ogham,
Ol_Chiki,
Old_Hungarian,
@ -848,6 +854,7 @@ Old_Persian,
Old_South_Arabian,
Old_Turkic,
Oriya,
Osage,
Osmanya,
Pahawh_Hmong,
Palmyrene,
@ -865,6 +872,7 @@ Siddham,
SignWriting,
Sinhala,
Sora_Sompeng,
Soyombo,
Sundanese,
Syloti_Nagri,
Syriac,
@ -875,6 +883,7 @@ Tai_Tham,
Tai_Viet,
Takri,
Tamil,
Tangut,
Telugu,
Thaana,
Thai,
@ -884,7 +893,8 @@ Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
Yi,
Zanabazar_Square.
.P
Each character has exactly one Unicode general category property, specified by
a two-letter abbreviation. For compatibility with Perl, negation can be
@ -3475,6 +3485,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 30 May 2017
Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "16 June 2017" "PCRE 10.30"
.TH PCRE2TEST 1 "02 July 2017" "PCRE 10.30"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -527,7 +527,7 @@ by a previous \fB#pattern\fP command.
.rs
.sp
The following modifiers set options for \fBpcre2_compile()\fP. Most of them set
bits in the options argument of that function, but those whose names start with
bits in the options argument of that function, but those whose names start with
PCRE2_EXTRA are additional options that are set in the compile context. For the
main options, there are some single-letter abbreviations that are the same as
Perl options. There is special handling for /x: if a second x is present,
@ -540,25 +540,25 @@ way \fBpcre2_compile()\fP behaves. See
for a description of the effects of these options.
.sp
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL
dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD
match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP
@ -593,7 +593,7 @@ heavily used in the test files.
/B bincode show binary code without lengths
callout_info show callout information
debug same as info,fullbincode
framesize show matching frame size
framesize show matching frame size
fullbincode show binary code with lengths
/I info show info about compiled pattern
hex unquoted characters are hexadecimal
@ -611,7 +611,7 @@ heavily used in the test files.
push push compiled pattern onto the stack
pushcopy push a copy onto the stack
stackguard=<number> test the stackguard feature
subject_literal treat all subject lines as literal
subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
@ -677,7 +677,7 @@ unit" is the last literal code unit that must be present in any match. This is
not necessarily the last character. These lines are omitted if no starting or
ending code units are recorded.
.P
The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
used by \fBpcre2_match()\fP for handling backtracking. The size depends on the
number of capturing parentheses in the pattern.
.P
@ -934,8 +934,8 @@ The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described
below. All other modifiers are either ignored, with a warning message, or cause
an error.
.P
The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
REG_PEND extension is used to pass it by length.
.
.
@ -977,7 +977,7 @@ are mutually exclusive.
.SS "Setting certain match controls"
.rs
.sp
The following modifiers are really subject modifiers, and are described under
The following modifiers are really subject modifiers, and are described under
"Subject Modifiers" below. However, they may be included in a pattern's
modifier list, in which case they are applied to every subject line that is
processed with that pattern. They may not appear in \fB#pattern\fP commands.
@ -1004,9 +1004,9 @@ defaults, set them in a \fB#subject\fP command.
.SS "Specifying literal subject lines"
.rs
.sp
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a \fB#subject\fP command are recognized.
.
.
@ -1020,7 +1020,9 @@ facility is used when saving compiled patterns to a file, as described in the
section entitled "Saving and restoring compiled patterns"
.\" HTML <a href="#saverestore">
.\" </a>
below. If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
below.
.\"
If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
pattern is stacked, leaving the original as current, ready to match the
following input lines. This provides a way of testing the
\fBpcre2_code_copy()\fP function.
@ -1073,10 +1075,10 @@ that have any effect are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP,
causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
\fBregexec()\fP. The other modifiers are ignored, with a warning message.
.P
There is one additional modifier that can be used with the POSIX wrapper. It is
There is one additional modifier that can be used with the POSIX wrapper. It is
ignored (with a warning) if used for non-POSIX matching.
.sp
posix_startend=<n>[:<m>]
posix_startend=<n>[:<m>]
.sp
This causes the subject string to be passed to \fBregexec()\fP using the
REG_STARTEND option, which uses offsets to specify which part of the string is
@ -1085,8 +1087,8 @@ the subject string. For more detail of REG_STARTEND, see the
.\" HREF
\fBpcre2posix\fP
.\"
documentation. If the subject string contains binary zeros (coded as escapes
such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in
documentation. If the subject string contains binary zeros (coded as escapes
such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in
its input), you must use \fBposix_startend\fP to specify its length.
.
.
@ -1107,6 +1109,7 @@ pattern.
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit
@ -1200,26 +1203,13 @@ does no capturing); it is ignored, with a warning message, if present.
.rs
.sp
A callout function is supplied when \fBpcre2test\fP calls the library matching
functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is
set, the current captured groups are output when a callout occurs. The default
return from the callout function is zero, which allows matching to continue.
.P
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to backtrack)
when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1
is returned when callout <n> is reached and there have been at least <m>
callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence.
.P
Note that callouts with string arguments are always given the number zero. See
"Callouts" below for a description of the output when a callout it taken.
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
functions, unless \fBcallout_none\fP is specified. Its behaviour can be
controlled by various modifiers listed above whose names begin with
\fBcallout_\fP. Details are given in the section entitled "Callouts"
.\" HTML <a href="#callouts">
.\" </a>
below.
.\"
.
.
.SS "Finding all matches in a string"
@ -1344,7 +1334,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Setting
zero reverts to the default of 32K. Providing a stack that is larger than the
default is necessary only for very complicated patterns. If \fBjitstack\fP is
set non-zero on a subject line it overrides any value that was set on the
set non-zero on a subject line it overrides any value that was set on the
pattern.
.
.
@ -1372,7 +1362,7 @@ The \fImatch_limit\fP number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with
increasing length of subject string.
increasing length of subject string.
.P
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
much nested backtracking happens (that is, how deeply the pattern's tree is
@ -1625,6 +1615,7 @@ For further information about partial matching, see the
documentation.
.
.
.\" HTML <a name="callouts"></a>
.SH CALLOUTS
.rs
.sp
@ -1633,8 +1624,30 @@ function is called during matching unless \fBcallout_none\fP is specified.
This works with both matching functions.
.P
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
described above) to change this and other parameters of the callout.
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
change this and other parameters of the callout.
.P
If \fBcallout_capture\fP is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
\fBcallout_no_where\fP modifier is set.
.P
The default return from the callout function is zero, which allows matching to
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
are given, 1 is returned when callout <n> is reached and there have been at
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
are always given the number zero. See
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
.P
Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see
@ -1837,6 +1850,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 16 June 2017
Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -943,7 +943,7 @@ PATTERN MODIFIERS
next line to contain a new pattern (or a command) instead of a subject
line. This facility is used when saving compiled patterns to a file, as
described in the section entitled "Saving and restoring compiled pat-
terns" below. If pushcopy is used instead of push, a copy of the com-
terns" below. If pushcopy is used instead of push, a copy of the com-
piled pattern is stacked, leaving the original as current, ready to
match the following input lines. This provides a way of testing the
pcre2_code_copy() function. The push and pushcopy modifiers are
@ -1016,6 +1016,7 @@ SUBJECT MODIFIERS
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit
@ -1107,29 +1108,9 @@ SUBJECT MODIFIERS
Testing callouts
A callout function is supplied when pcre2test calls the library match-
ing functions, unless callout_none is specified. If callout_capture is
set, the current captured groups are output when a callout occurs. The
default return from the callout function is zero, which allows matching
to continue.
The callout_fail modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to back-
track) when a callout of that number is reached. If two numbers
(<n>:<m>) are given, 1 is returned when callout <n> is reached and
there have been at least <m> callouts. The callout_error modifier is
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
entire matching process to be aborted. If both these modifiers are set
for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See "Callouts" below for a description of the output when a call-
out it taken.
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
ing functions, unless callout_none is specified. Its behaviour can be
controlled by various modifiers listed above whose names begin with
callout_. Details are given in the section entitled "Callouts" below.
Finding all matches in a string
@ -1511,8 +1492,32 @@ CALLOUTS
works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line (as
described above) to change this and other parameters of the callout.
default, but you can use a callout_fail modifier in a subject line to
change this and other parameters of the callout.
If callout_capture is set, the current captured groups are output when
a callout occurs. By default, the callout function then generates out-
put that indicates where the current match start and matching points
are in the subject, and what the next pattern item is. This output is
suppressed if the callout_no_where modifier is set.
The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see
@ -1687,5 +1692,5 @@ AUTHOR
REVISION
Last updated: 16 June 2017
Last updated: 02 July 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -23,6 +23,7 @@
# Script updated to Python 3 by running it through the 2to3 converter.
# Added script names for Unicode 7.0.0, 20-June-2014.
# Added script names for Unicode 8.0.0, 19-June-2015.
# Added script names for Unicode 10.0.0, 02-July-2017.
script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@ -51,7 +52,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
# New for Unicode 8.0.0
'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
'SignWriting'
'SignWriting',
# New for Unicode 10.0.0
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
'Nushu', 'Soyombo', 'Zanabazar_Square'
]
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',

View File

@ -122,6 +122,7 @@
# 20-June-2014: Updated for Unicode 7.0.0
# 12-August-2014: Updated to put Unicode version into the file
# 19-June-2015: Updated for Unicode 8.0.0
# 02-July-2017: Updated for Unicode 10.0.0
##############################################################################
@ -335,7 +336,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
# New for Unicode 8.0.0
'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
'SignWriting'
'SignWriting',
# New for Unicode 10.0.0
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
'Nushu', 'Soyombo', 'Zanabazar_Square'
]
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
@ -343,7 +347,8 @@ category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ]
break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend',
'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other' ]
'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other',
'E_Base', 'E_Modifier', 'E_Base_GAZ', 'ZWJ', 'Glue_After_Zwj' ]
test_record_size()
unicode_version = ""

View File

@ -1,10 +1,11 @@
# CaseFolding-8.0.0.txt
# Date: 2015-01-13, 18:16:36 GMT [MD]
# CaseFolding-10.0.0.txt
# Date: 2017-04-14, 05:40:18 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see http://www.unicode.org/reports/tr44/
#
# Case Folding Properties
#
@ -23,7 +24,7 @@
#
# NOTE: case folding does not preserve normalization formats!
#
# For information on case folding, including how to have case folding
# For information on case folding, including how to have case folding
# preserve normalization formats, see Section 3.13 Default Case Algorithms in
# The Unicode Standard.
#
@ -593,6 +594,15 @@
13FB; C; 13F3; # CHEROKEE SMALL LETTER YU
13FC; C; 13F4; # CHEROKEE SMALL LETTER YV
13FD; C; 13F5; # CHEROKEE SMALL LETTER MV
1C80; C; 0432; # CYRILLIC SMALL LETTER ROUNDED VE
1C81; C; 0434; # CYRILLIC SMALL LETTER LONG-LEGGED DE
1C82; C; 043E; # CYRILLIC SMALL LETTER NARROW O
1C83; C; 0441; # CYRILLIC SMALL LETTER WIDE ES
1C84; C; 0442; # CYRILLIC SMALL LETTER TALL TE
1C85; C; 0442; # CYRILLIC SMALL LETTER THREE-LEGGED TE
1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN
1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT
1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK
1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW
1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE
1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW
@ -1163,6 +1173,7 @@ A7AA; C; 0266; # LATIN CAPITAL LETTER H WITH HOOK
A7AB; C; 025C; # LATIN CAPITAL LETTER REVERSED OPEN E
A7AC; C; 0261; # LATIN CAPITAL LETTER SCRIPT G
A7AD; C; 026C; # LATIN CAPITAL LETTER L WITH BELT
A7AE; C; 026A; # LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0; C; 029E; # LATIN CAPITAL LETTER TURNED K
A7B1; C; 0287; # LATIN CAPITAL LETTER TURNED T
A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL
@ -1327,6 +1338,42 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
10425; C; 1044D; # DESERET CAPITAL LETTER ENG
10426; C; 1044E; # DESERET CAPITAL LETTER OI
10427; C; 1044F; # DESERET CAPITAL LETTER EW
104B0; C; 104D8; # OSAGE CAPITAL LETTER A
104B1; C; 104D9; # OSAGE CAPITAL LETTER AI
104B2; C; 104DA; # OSAGE CAPITAL LETTER AIN
104B3; C; 104DB; # OSAGE CAPITAL LETTER AH
104B4; C; 104DC; # OSAGE CAPITAL LETTER BRA
104B5; C; 104DD; # OSAGE CAPITAL LETTER CHA
104B6; C; 104DE; # OSAGE CAPITAL LETTER EHCHA
104B7; C; 104DF; # OSAGE CAPITAL LETTER E
104B8; C; 104E0; # OSAGE CAPITAL LETTER EIN
104B9; C; 104E1; # OSAGE CAPITAL LETTER HA
104BA; C; 104E2; # OSAGE CAPITAL LETTER HYA
104BB; C; 104E3; # OSAGE CAPITAL LETTER I
104BC; C; 104E4; # OSAGE CAPITAL LETTER KA
104BD; C; 104E5; # OSAGE CAPITAL LETTER EHKA
104BE; C; 104E6; # OSAGE CAPITAL LETTER KYA
104BF; C; 104E7; # OSAGE CAPITAL LETTER LA
104C0; C; 104E8; # OSAGE CAPITAL LETTER MA
104C1; C; 104E9; # OSAGE CAPITAL LETTER NA
104C2; C; 104EA; # OSAGE CAPITAL LETTER O
104C3; C; 104EB; # OSAGE CAPITAL LETTER OIN
104C4; C; 104EC; # OSAGE CAPITAL LETTER PA
104C5; C; 104ED; # OSAGE CAPITAL LETTER EHPA
104C6; C; 104EE; # OSAGE CAPITAL LETTER SA
104C7; C; 104EF; # OSAGE CAPITAL LETTER SHA
104C8; C; 104F0; # OSAGE CAPITAL LETTER TA
104C9; C; 104F1; # OSAGE CAPITAL LETTER EHTA
104CA; C; 104F2; # OSAGE CAPITAL LETTER TSA
104CB; C; 104F3; # OSAGE CAPITAL LETTER EHTSA
104CC; C; 104F4; # OSAGE CAPITAL LETTER TSHA
104CD; C; 104F5; # OSAGE CAPITAL LETTER DHA
104CE; C; 104F6; # OSAGE CAPITAL LETTER U
104CF; C; 104F7; # OSAGE CAPITAL LETTER WA
104D0; C; 104F8; # OSAGE CAPITAL LETTER KHA
104D1; C; 104F9; # OSAGE CAPITAL LETTER GHA
104D2; C; 104FA; # OSAGE CAPITAL LETTER ZA
104D3; C; 104FB; # OSAGE CAPITAL LETTER ZHA
10C80; C; 10CC0; # OLD HUNGARIAN CAPITAL LETTER A
10C81; C; 10CC1; # OLD HUNGARIAN CAPITAL LETTER AA
10C82; C; 10CC2; # OLD HUNGARIAN CAPITAL LETTER EB
@ -1410,5 +1457,39 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU
118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII
118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO
1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF
1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI
1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM
1E903; C; 1E925; # ADLAM CAPITAL LETTER MIIM
1E904; C; 1E926; # ADLAM CAPITAL LETTER BA
1E905; C; 1E927; # ADLAM CAPITAL LETTER SINNYIIYHE
1E906; C; 1E928; # ADLAM CAPITAL LETTER PE
1E907; C; 1E929; # ADLAM CAPITAL LETTER BHE
1E908; C; 1E92A; # ADLAM CAPITAL LETTER RA
1E909; C; 1E92B; # ADLAM CAPITAL LETTER E
1E90A; C; 1E92C; # ADLAM CAPITAL LETTER FA
1E90B; C; 1E92D; # ADLAM CAPITAL LETTER I
1E90C; C; 1E92E; # ADLAM CAPITAL LETTER O
1E90D; C; 1E92F; # ADLAM CAPITAL LETTER DHA
1E90E; C; 1E930; # ADLAM CAPITAL LETTER YHE
1E90F; C; 1E931; # ADLAM CAPITAL LETTER WAW
1E910; C; 1E932; # ADLAM CAPITAL LETTER NUN
1E911; C; 1E933; # ADLAM CAPITAL LETTER KAF
1E912; C; 1E934; # ADLAM CAPITAL LETTER YA
1E913; C; 1E935; # ADLAM CAPITAL LETTER U
1E914; C; 1E936; # ADLAM CAPITAL LETTER JIIM
1E915; C; 1E937; # ADLAM CAPITAL LETTER CHI
1E916; C; 1E938; # ADLAM CAPITAL LETTER HA
1E917; C; 1E939; # ADLAM CAPITAL LETTER QAAF
1E918; C; 1E93A; # ADLAM CAPITAL LETTER GA
1E919; C; 1E93B; # ADLAM CAPITAL LETTER NYA
1E91A; C; 1E93C; # ADLAM CAPITAL LETTER TU
1E91B; C; 1E93D; # ADLAM CAPITAL LETTER NHA
1E91C; C; 1E93E; # ADLAM CAPITAL LETTER VA
1E91D; C; 1E93F; # ADLAM CAPITAL LETTER KHA
1E91E; C; 1E940; # ADLAM CAPITAL LETTER GBE
1E91F; C; 1E941; # ADLAM CAPITAL LETTER ZAL
1E920; C; 1E942; # ADLAM CAPITAL LETTER KPO
1E921; C; 1E943; # ADLAM CAPITAL LETTER SHA
#
# EOF

View File

@ -1,10 +1,11 @@
# DerivedGeneralCategory-8.0.0.txt
# Date: 2015-02-13, 13:47:11 GMT [MD]
# DerivedGeneralCategory-10.0.0.txt
# Date: 2017-03-08, 08:41:49 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see http://www.unicode.org/reports/tr44/
# ================================================
@ -36,8 +37,10 @@
082E..082F ; Cn # [2] <reserved-082E>..<reserved-082F>
083F ; Cn # <reserved-083F>
085C..085D ; Cn # [2] <reserved-085C>..<reserved-085D>
085F..089F ; Cn # [65] <reserved-085F>..<reserved-089F>
08B5..08E2 ; Cn # [46] <reserved-08B5>..<reserved-08E2>
085F ; Cn # <reserved-085F>
086B..089F ; Cn # [53] <reserved-086B>..<reserved-089F>
08B5 ; Cn # <reserved-08B5>
08BE..08D3 ; Cn # [22] <reserved-08BE>..<reserved-08D3>
0984 ; Cn # <reserved-0984>
098D..098E ; Cn # [2] <reserved-098D>..<reserved-098E>
0991..0992 ; Cn # [2] <reserved-0991>..<reserved-0992>
@ -51,7 +54,7 @@
09D8..09DB ; Cn # [4] <reserved-09D8>..<reserved-09DB>
09DE ; Cn # <reserved-09DE>
09E4..09E5 ; Cn # [2] <reserved-09E4>..<reserved-09E5>
09FC..0A00 ; Cn # [5] <reserved-09FC>..<reserved-0A00>
09FE..0A00 ; Cn # [3] <reserved-09FE>..<reserved-0A00>
0A04 ; Cn # <reserved-0A04>
0A0B..0A0E ; Cn # [4] <reserved-0A0B>..<reserved-0A0E>
0A11..0A12 ; Cn # [2] <reserved-0A11>..<reserved-0A12>
@ -81,7 +84,7 @@
0AD1..0ADF ; Cn # [15] <reserved-0AD1>..<reserved-0ADF>
0AE4..0AE5 ; Cn # [2] <reserved-0AE4>..<reserved-0AE5>
0AF2..0AF8 ; Cn # [7] <reserved-0AF2>..<reserved-0AF8>
0AFA..0B00 ; Cn # [7] <reserved-0AFA>..<reserved-0B00>
0B00 ; Cn # <reserved-0B00>
0B04 ; Cn # <reserved-0B04>
0B0D..0B0E ; Cn # [2] <reserved-0B0D>..<reserved-0B0E>
0B11..0B12 ; Cn # [2] <reserved-0B11>..<reserved-0B12>
@ -124,7 +127,6 @@
0C5B..0C5F ; Cn # [5] <reserved-0C5B>..<reserved-0C5F>
0C64..0C65 ; Cn # [2] <reserved-0C64>..<reserved-0C65>
0C70..0C77 ; Cn # [8] <reserved-0C70>..<reserved-0C77>
0C80 ; Cn # <reserved-0C80>
0C84 ; Cn # <reserved-0C84>
0C8D ; Cn # <reserved-0C8D>
0C91 ; Cn # <reserved-0C91>
@ -138,17 +140,14 @@
0CDF ; Cn # <reserved-0CDF>
0CE4..0CE5 ; Cn # [2] <reserved-0CE4>..<reserved-0CE5>
0CF0 ; Cn # <reserved-0CF0>
0CF3..0D00 ; Cn # [14] <reserved-0CF3>..<reserved-0D00>
0CF3..0CFF ; Cn # [13] <reserved-0CF3>..<reserved-0CFF>
0D04 ; Cn # <reserved-0D04>
0D0D ; Cn # <reserved-0D0D>
0D11 ; Cn # <reserved-0D11>
0D3B..0D3C ; Cn # [2] <reserved-0D3B>..<reserved-0D3C>
0D45 ; Cn # <reserved-0D45>
0D49 ; Cn # <reserved-0D49>
0D4F..0D56 ; Cn # [8] <reserved-0D4F>..<reserved-0D56>
0D58..0D5E ; Cn # [7] <reserved-0D58>..<reserved-0D5E>
0D50..0D53 ; Cn # [4] <reserved-0D50>..<reserved-0D53>
0D64..0D65 ; Cn # [2] <reserved-0D64>..<reserved-0D65>
0D76..0D78 ; Cn # [3] <reserved-0D76>..<reserved-0D78>
0D80..0D81 ; Cn # [2] <reserved-0D80>..<reserved-0D81>
0D84 ; Cn # <reserved-0D84>
0D97..0D99 ; Cn # [3] <reserved-0D97>..<reserved-0D99>
@ -249,11 +248,10 @@
1BF4..1BFB ; Cn # [8] <reserved-1BF4>..<reserved-1BFB>
1C38..1C3A ; Cn # [3] <reserved-1C38>..<reserved-1C3A>
1C4A..1C4C ; Cn # [3] <reserved-1C4A>..<reserved-1C4C>
1C80..1CBF ; Cn # [64] <reserved-1C80>..<reserved-1CBF>
1C89..1CBF ; Cn # [55] <reserved-1C89>..<reserved-1CBF>
1CC8..1CCF ; Cn # [8] <reserved-1CC8>..<reserved-1CCF>
1CF7 ; Cn # <reserved-1CF7>
1CFA..1CFF ; Cn # [6] <reserved-1CFA>..<reserved-1CFF>
1DF6..1DFB ; Cn # [6] <reserved-1DF6>..<reserved-1DFB>
1DFA ; Cn # <reserved-1DFA>
1F16..1F17 ; Cn # [2] <reserved-1F16>..<reserved-1F17>
1F1E..1F1F ; Cn # [2] <reserved-1F1E>..<reserved-1F1F>
1F46..1F47 ; Cn # [2] <reserved-1F46>..<reserved-1F47>
@ -274,17 +272,16 @@
2072..2073 ; Cn # [2] <reserved-2072>..<reserved-2073>
208F ; Cn # <reserved-208F>
209D..209F ; Cn # [3] <reserved-209D>..<reserved-209F>
20BF..20CF ; Cn # [17] <reserved-20BF>..<reserved-20CF>
20C0..20CF ; Cn # [16] <reserved-20C0>..<reserved-20CF>
20F1..20FF ; Cn # [15] <reserved-20F1>..<reserved-20FF>
218C..218F ; Cn # [4] <reserved-218C>..<reserved-218F>
23FB..23FF ; Cn # [5] <reserved-23FB>..<reserved-23FF>
2427..243F ; Cn # [25] <reserved-2427>..<reserved-243F>
244B..245F ; Cn # [21] <reserved-244B>..<reserved-245F>
2B74..2B75 ; Cn # [2] <reserved-2B74>..<reserved-2B75>
2B96..2B97 ; Cn # [2] <reserved-2B96>..<reserved-2B97>
2BBA..2BBC ; Cn # [3] <reserved-2BBA>..<reserved-2BBC>
2BC9 ; Cn # <reserved-2BC9>
2BD2..2BEB ; Cn # [26] <reserved-2BD2>..<reserved-2BEB>
2BD3..2BEB ; Cn # [25] <reserved-2BD3>..<reserved-2BEB>
2BF0..2BFF ; Cn # [16] <reserved-2BF0>..<reserved-2BFF>
2C2F ; Cn # <reserved-2C2F>
2C5F ; Cn # <reserved-2C5F>
@ -303,7 +300,7 @@
2DCF ; Cn # <reserved-2DCF>
2DD7 ; Cn # <reserved-2DD7>
2DDF ; Cn # <reserved-2DDF>
2E43..2E7F ; Cn # [61] <reserved-2E43>..<reserved-2E7F>
2E4A..2E7F ; Cn # [54] <reserved-2E4A>..<reserved-2E7F>
2E9A ; Cn # <reserved-2E9A>
2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF>
2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF>
@ -311,24 +308,24 @@
3040 ; Cn # <reserved-3040>
3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098>
3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104>
312E..3130 ; Cn # [3] <reserved-312E>..<reserved-3130>
312F..3130 ; Cn # [2] <reserved-312F>..<reserved-3130>
318F ; Cn # <reserved-318F>
31BB..31BF ; Cn # [5] <reserved-31BB>..<reserved-31BF>
31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF>
321F ; Cn # <reserved-321F>
32FF ; Cn # <reserved-32FF>
4DB6..4DBF ; Cn # [10] <reserved-4DB6>..<reserved-4DBF>
9FD6..9FFF ; Cn # [42] <reserved-9FD6>..<reserved-9FFF>
9FEB..9FFF ; Cn # [21] <reserved-9FEB>..<reserved-9FFF>
A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F>
A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF>
A62C..A63F ; Cn # [20] <reserved-A62C>..<reserved-A63F>
A6F8..A6FF ; Cn # [8] <reserved-A6F8>..<reserved-A6FF>
A7AE..A7AF ; Cn # [2] <reserved-A7AE>..<reserved-A7AF>
A7AF ; Cn # <reserved-A7AF>
A7B8..A7F6 ; Cn # [63] <reserved-A7B8>..<reserved-A7F6>
A82C..A82F ; Cn # [4] <reserved-A82C>..<reserved-A82F>
A83A..A83F ; Cn # [6] <reserved-A83A>..<reserved-A83F>
A878..A87F ; Cn # [8] <reserved-A878>..<reserved-A87F>
A8C5..A8CD ; Cn # [9] <reserved-A8C5>..<reserved-A8CD>
A8C6..A8CD ; Cn # [8] <reserved-A8C6>..<reserved-A8CD>
A8DA..A8DF ; Cn # [6] <reserved-A8DA>..<reserved-A8DF>
A8FE..A8FF ; Cn # [2] <reserved-A8FE>..<reserved-A8FF>
A954..A95E ; Cn # [11] <reserved-A954>..<reserved-A95E>
@ -390,21 +387,23 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
100FB..100FF ; Cn # [5] <reserved-100FB>..<reserved-100FF>
10103..10106 ; Cn # [4] <reserved-10103>..<reserved-10106>
10134..10136 ; Cn # [3] <reserved-10134>..<reserved-10136>
1018D..1018F ; Cn # [3] <reserved-1018D>..<reserved-1018F>
1018F ; Cn # <reserved-1018F>
1019C..1019F ; Cn # [4] <reserved-1019C>..<reserved-1019F>
101A1..101CF ; Cn # [47] <reserved-101A1>..<reserved-101CF>
101FE..1027F ; Cn # [130] <reserved-101FE>..<reserved-1027F>
1029D..1029F ; Cn # [3] <reserved-1029D>..<reserved-1029F>
102D1..102DF ; Cn # [15] <reserved-102D1>..<reserved-102DF>
102FC..102FF ; Cn # [4] <reserved-102FC>..<reserved-102FF>
10324..1032F ; Cn # [12] <reserved-10324>..<reserved-1032F>
10324..1032C ; Cn # [9] <reserved-10324>..<reserved-1032C>
1034B..1034F ; Cn # [5] <reserved-1034B>..<reserved-1034F>
1037B..1037F ; Cn # [5] <reserved-1037B>..<reserved-1037F>
1039E ; Cn # <reserved-1039E>
103C4..103C7 ; Cn # [4] <reserved-103C4>..<reserved-103C7>
103D6..103FF ; Cn # [42] <reserved-103D6>..<reserved-103FF>
1049E..1049F ; Cn # [2] <reserved-1049E>..<reserved-1049F>
104AA..104FF ; Cn # [86] <reserved-104AA>..<reserved-104FF>
104AA..104AF ; Cn # [6] <reserved-104AA>..<reserved-104AF>
104D4..104D7 ; Cn # [4] <reserved-104D4>..<reserved-104D7>
104FC..104FF ; Cn # [4] <reserved-104FC>..<reserved-104FF>
10528..1052F ; Cn # [8] <reserved-10528>..<reserved-1052F>
10564..1056E ; Cn # [11] <reserved-10564>..<reserved-1056E>
10570..105FF ; Cn # [144] <reserved-10570>..<reserved-105FF>
@ -460,7 +459,7 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
111E0 ; Cn # <reserved-111E0>
111F5..111FF ; Cn # [11] <reserved-111F5>..<reserved-111FF>
11212 ; Cn # <reserved-11212>
1123E..1127F ; Cn # [66] <reserved-1123E>..<reserved-1127F>
1123F..1127F ; Cn # [65] <reserved-1123F>..<reserved-1127F>
11287 ; Cn # <reserved-11287>
11289 ; Cn # <reserved-11289>
1128E ; Cn # <reserved-1128E>
@ -482,21 +481,43 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
11358..1135C ; Cn # [5] <reserved-11358>..<reserved-1135C>
11364..11365 ; Cn # [2] <reserved-11364>..<reserved-11365>
1136D..1136F ; Cn # [3] <reserved-1136D>..<reserved-1136F>
11375..1147F ; Cn # [267] <reserved-11375>..<reserved-1147F>
11375..113FF ; Cn # [139] <reserved-11375>..<reserved-113FF>
1145A ; Cn # <reserved-1145A>
1145C ; Cn # <reserved-1145C>
1145E..1147F ; Cn # [34] <reserved-1145E>..<reserved-1147F>
114C8..114CF ; Cn # [8] <reserved-114C8>..<reserved-114CF>
114DA..1157F ; Cn # [166] <reserved-114DA>..<reserved-1157F>
115B6..115B7 ; Cn # [2] <reserved-115B6>..<reserved-115B7>
115DE..115FF ; Cn # [34] <reserved-115DE>..<reserved-115FF>
11645..1164F ; Cn # [11] <reserved-11645>..<reserved-1164F>
1165A..1167F ; Cn # [38] <reserved-1165A>..<reserved-1167F>
1165A..1165F ; Cn # [6] <reserved-1165A>..<reserved-1165F>
1166D..1167F ; Cn # [19] <reserved-1166D>..<reserved-1167F>
116B8..116BF ; Cn # [8] <reserved-116B8>..<reserved-116BF>
116CA..116FF ; Cn # [54] <reserved-116CA>..<reserved-116FF>
1171A..1171C ; Cn # [3] <reserved-1171A>..<reserved-1171C>
1172C..1172F ; Cn # [4] <reserved-1172C>..<reserved-1172F>
11740..1189F ; Cn # [352] <reserved-11740>..<reserved-1189F>
118F3..118FE ; Cn # [12] <reserved-118F3>..<reserved-118FE>
11900..11ABF ; Cn # [448] <reserved-11900>..<reserved-11ABF>
11AF9..11FFF ; Cn # [1287] <reserved-11AF9>..<reserved-11FFF>
11900..119FF ; Cn # [256] <reserved-11900>..<reserved-119FF>
11A48..11A4F ; Cn # [8] <reserved-11A48>..<reserved-11A4F>
11A84..11A85 ; Cn # [2] <reserved-11A84>..<reserved-11A85>
11A9D ; Cn # <reserved-11A9D>
11AA3..11ABF ; Cn # [29] <reserved-11AA3>..<reserved-11ABF>
11AF9..11BFF ; Cn # [263] <reserved-11AF9>..<reserved-11BFF>
11C09 ; Cn # <reserved-11C09>
11C37 ; Cn # <reserved-11C37>
11C46..11C4F ; Cn # [10] <reserved-11C46>..<reserved-11C4F>
11C6D..11C6F ; Cn # [3] <reserved-11C6D>..<reserved-11C6F>
11C90..11C91 ; Cn # [2] <reserved-11C90>..<reserved-11C91>
11CA8 ; Cn # <reserved-11CA8>
11CB7..11CFF ; Cn # [73] <reserved-11CB7>..<reserved-11CFF>
11D07 ; Cn # <reserved-11D07>
11D0A ; Cn # <reserved-11D0A>
11D37..11D39 ; Cn # [3] <reserved-11D37>..<reserved-11D39>
11D3B ; Cn # <reserved-11D3B>
11D3E ; Cn # <reserved-11D3E>
11D48..11D4F ; Cn # [8] <reserved-11D48>..<reserved-11D4F>
11D5A..11FFF ; Cn # [678] <reserved-11D5A>..<reserved-11FFF>
1239A..123FF ; Cn # [102] <reserved-1239A>..<reserved-123FF>
1246F ; Cn # <reserved-1246F>
12475..1247F ; Cn # [11] <reserved-12475>..<reserved-1247F>
@ -516,8 +537,12 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
16B90..16EFF ; Cn # [880] <reserved-16B90>..<reserved-16EFF>
16F45..16F4F ; Cn # [11] <reserved-16F45>..<reserved-16F4F>
16F7F..16F8E ; Cn # [16] <reserved-16F7F>..<reserved-16F8E>
16FA0..1AFFF ; Cn # [16480] <reserved-16FA0>..<reserved-1AFFF>
1B002..1BBFF ; Cn # [3070] <reserved-1B002>..<reserved-1BBFF>
16FA0..16FDF ; Cn # [64] <reserved-16FA0>..<reserved-16FDF>
16FE2..16FFF ; Cn # [30] <reserved-16FE2>..<reserved-16FFF>
187ED..187FF ; Cn # [19] <reserved-187ED>..<reserved-187FF>
18AF3..1AFFF ; Cn # [9485] <reserved-18AF3>..<reserved-1AFFF>
1B11F..1B16F ; Cn # [81] <reserved-1B11F>..<reserved-1B16F>
1B2FC..1BBFF ; Cn # [2308] <reserved-1B2FC>..<reserved-1BBFF>
1BC6B..1BC6F ; Cn # [5] <reserved-1BC6B>..<reserved-1BC6F>
1BC7D..1BC7F ; Cn # [3] <reserved-1BC7D>..<reserved-1BC7F>
1BC89..1BC8F ; Cn # [7] <reserved-1BC89>..<reserved-1BC8F>
@ -551,9 +576,17 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1D7CC..1D7CD ; Cn # [2] <reserved-1D7CC>..<reserved-1D7CD>
1DA8C..1DA9A ; Cn # [15] <reserved-1DA8C>..<reserved-1DA9A>
1DAA0 ; Cn # <reserved-1DAA0>
1DAB0..1E7FF ; Cn # [3408] <reserved-1DAB0>..<reserved-1E7FF>
1DAB0..1DFFF ; Cn # [1360] <reserved-1DAB0>..<reserved-1DFFF>
1E007 ; Cn # <reserved-1E007>
1E019..1E01A ; Cn # [2] <reserved-1E019>..<reserved-1E01A>
1E022 ; Cn # <reserved-1E022>
1E025 ; Cn # <reserved-1E025>
1E02B..1E7FF ; Cn # [2005] <reserved-1E02B>..<reserved-1E7FF>
1E8C5..1E8C6 ; Cn # [2] <reserved-1E8C5>..<reserved-1E8C6>
1E8D7..1EDFF ; Cn # [1321] <reserved-1E8D7>..<reserved-1EDFF>
1E8D7..1E8FF ; Cn # [41] <reserved-1E8D7>..<reserved-1E8FF>
1E94B..1E94F ; Cn # [5] <reserved-1E94B>..<reserved-1E94F>
1E95A..1E95D ; Cn # [4] <reserved-1E95A>..<reserved-1E95D>
1E960..1EDFF ; Cn # [1184] <reserved-1E960>..<reserved-1EDFF>
1EE04 ; Cn # <reserved-1EE04>
1EE20 ; Cn # <reserved-1EE20>
1EE23 ; Cn # <reserved-1EE23>
@ -597,30 +630,34 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
1F10D..1F10F ; Cn # [3] <reserved-1F10D>..<reserved-1F10F>
1F12F ; Cn # <reserved-1F12F>
1F16C..1F16F ; Cn # [4] <reserved-1F16C>..<reserved-1F16F>
1F19B..1F1E5 ; Cn # [75] <reserved-1F19B>..<reserved-1F1E5>
1F1AD..1F1E5 ; Cn # [57] <reserved-1F1AD>..<reserved-1F1E5>
1F203..1F20F ; Cn # [13] <reserved-1F203>..<reserved-1F20F>
1F23B..1F23F ; Cn # [5] <reserved-1F23B>..<reserved-1F23F>
1F23C..1F23F ; Cn # [4] <reserved-1F23C>..<reserved-1F23F>
1F249..1F24F ; Cn # [7] <reserved-1F249>..<reserved-1F24F>
1F252..1F2FF ; Cn # [174] <reserved-1F252>..<reserved-1F2FF>
1F57A ; Cn # <reserved-1F57A>
1F5A4 ; Cn # <reserved-1F5A4>
1F6D1..1F6DF ; Cn # [15] <reserved-1F6D1>..<reserved-1F6DF>
1F252..1F25F ; Cn # [14] <reserved-1F252>..<reserved-1F25F>
1F266..1F2FF ; Cn # [154] <reserved-1F266>..<reserved-1F2FF>
1F6D5..1F6DF ; Cn # [11] <reserved-1F6D5>..<reserved-1F6DF>
1F6ED..1F6EF ; Cn # [3] <reserved-1F6ED>..<reserved-1F6EF>
1F6F4..1F6FF ; Cn # [12] <reserved-1F6F4>..<reserved-1F6FF>
1F6F9..1F6FF ; Cn # [7] <reserved-1F6F9>..<reserved-1F6FF>
1F774..1F77F ; Cn # [12] <reserved-1F774>..<reserved-1F77F>
1F7D5..1F7FF ; Cn # [43] <reserved-1F7D5>..<reserved-1F7FF>
1F80C..1F80F ; Cn # [4] <reserved-1F80C>..<reserved-1F80F>
1F848..1F84F ; Cn # [8] <reserved-1F848>..<reserved-1F84F>
1F85A..1F85F ; Cn # [6] <reserved-1F85A>..<reserved-1F85F>
1F888..1F88F ; Cn # [8] <reserved-1F888>..<reserved-1F88F>
1F8AE..1F90F ; Cn # [98] <reserved-1F8AE>..<reserved-1F90F>
1F919..1F97F ; Cn # [103] <reserved-1F919>..<reserved-1F97F>
1F985..1F9BF ; Cn # [59] <reserved-1F985>..<reserved-1F9BF>
1F9C1..1FFFF ; Cn # [1599] <reserved-1F9C1>..<noncharacter-1FFFF>
1F8AE..1F8FF ; Cn # [82] <reserved-1F8AE>..<reserved-1F8FF>
1F90C..1F90F ; Cn # [4] <reserved-1F90C>..<reserved-1F90F>
1F93F ; Cn # <reserved-1F93F>
1F94D..1F94F ; Cn # [3] <reserved-1F94D>..<reserved-1F94F>
1F96C..1F97F ; Cn # [20] <reserved-1F96C>..<reserved-1F97F>
1F998..1F9BF ; Cn # [40] <reserved-1F998>..<reserved-1F9BF>
1F9C1..1F9CF ; Cn # [15] <reserved-1F9C1>..<reserved-1F9CF>
1F9E7..1FFFF ; Cn # [1561] <reserved-1F9E7>..<noncharacter-1FFFF>
2A6D7..2A6FF ; Cn # [41] <reserved-2A6D7>..<reserved-2A6FF>
2B735..2B73F ; Cn # [11] <reserved-2B735>..<reserved-2B73F>
2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F>
2CEA2..2F7FF ; Cn # [10590] <reserved-2CEA2>..<reserved-2F7FF>
2CEA2..2CEAF ; Cn # [14] <reserved-2CEA2>..<reserved-2CEAF>
2EBE1..2F7FF ; Cn # [3103] <reserved-2EBE1>..<reserved-2F7FF>
2FA1E..E0000 ; Cn # [722403] <reserved-2FA1E>..<reserved-E0000>
E0002..E001F ; Cn # [30] <reserved-E0002>..<reserved-E001F>
E0080..E00FF ; Cn # [128] <reserved-E0080>..<reserved-E00FF>
@ -628,7 +665,7 @@ E01F0..EFFFF ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF>
FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>
# Total code points: 853859
# Total code points: 837841
# ================================================
@ -1221,11 +1258,12 @@ A7A2 ; Lu # LATIN CAPITAL LETTER K WITH OBLIQUE STROKE
A7A4 ; Lu # LATIN CAPITAL LETTER N WITH OBLIQUE STROKE
A7A6 ; Lu # LATIN CAPITAL LETTER R WITH OBLIQUE STROKE
A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE
A7AA..A7AD ; Lu # [4] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER L WITH BELT
A7AA..A7AE ; Lu # [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA
A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA
FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW
104B0..104D3 ; Lu # [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US
118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO
1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z
@ -1259,8 +1297,9 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
1D756..1D76E ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA
1D790..1D7A8 ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA
1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA
1E900..1E921 ; Lu # [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA
# Total code points: 1631
# Total code points: 1702
# ================================================
@ -1537,6 +1576,7 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER
0561..0587 ; Ll # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
1C80..1C88 ; Ll # [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL
1D6B..1D77 ; Ll # [13] LATIN SMALL LETTER UE..LATIN SMALL LETTER TURNED G
1D79..1D9A ; Ll # [34] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK
@ -1866,6 +1906,7 @@ FB00..FB06 ; Ll # [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST
FB13..FB17 ; Ll # [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
10428..1044F ; Ll # [40] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER EW
104D8..104FB ; Ll # [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US
118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO
1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z
@ -1896,8 +1937,9 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL
1D7AA..1D7C2 ; Ll # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA
1D7C4..1D7C9 ; Ll # [6] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL
1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA
1E922..1E943 ; Ll # [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA
# Total code points: 1984
# Total code points: 2063
# ================================================
@ -1976,8 +2018,9 @@ FF70 ; Lm # HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
16B40..16B43 ; Lm # [4] PAHAWH HMONG SIGN VOS SEEV..PAHAWH HMONG SIGN IB YAM
16F93..16F9F ; Lm # [13] MIAO LETTER TONE-2..MIAO LETTER REFORMED TONE-8
16FE0..16FE1 ; Lm # [2] TANGUT ITERATION MARK..NUSHU ITERATION MARK
# Total code points: 248
# Total code points: 250
# ================================================
@ -2005,7 +2048,9 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
07CA..07EA ; Lo # [33] NKO LETTER A..NKO LETTER JONA RA
0800..0815 ; Lo # [22] SAMARITAN LETTER ALAF..SAMARITAN LETTER TAAF
0840..0858 ; Lo # [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN
0860..086A ; Lo # [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
08A0..08B4 ; Lo # [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08B6..08BD ; Lo # [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
0904..0939 ; Lo # [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA
093D ; Lo # DEVANAGARI SIGN AVAGRAHA
0950 ; Lo # DEVANAGARI OM
@ -2022,6 +2067,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
09DC..09DD ; Lo # [2] BENGALI LETTER RRA..BENGALI LETTER RHA
09DF..09E1 ; Lo # [3] BENGALI LETTER YYA..BENGALI LETTER VOCALIC LL
09F0..09F1 ; Lo # [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL
09FC ; Lo # BENGALI LETTER VEDIC ANUSVARA
0A05..0A0A ; Lo # [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
0A0F..0A10 ; Lo # [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI
0A13..0A28 ; Lo # [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA
@ -2070,6 +2116,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
0C3D ; Lo # TELUGU SIGN AVAGRAHA
0C58..0C5A ; Lo # [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
0C60..0C61 ; Lo # [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
0C80 ; Lo # KANNADA SIGN SPACING CANDRABINDU
0C85..0C8C ; Lo # [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
0C8E..0C90 ; Lo # [3] KANNADA LETTER E..KANNADA LETTER AI
0C92..0CA8 ; Lo # [23] KANNADA LETTER O..KANNADA LETTER NA
@ -2084,6 +2131,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
0D12..0D3A ; Lo # [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
0D3D ; Lo # MALAYALAM SIGN AVAGRAHA
0D4E ; Lo # MALAYALAM LETTER DOT REPH
0D54..0D56 ; Lo # [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
0D5F..0D61 ; Lo # [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
0D7A..0D7F ; Lo # [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
0D85..0D96 ; Lo # [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
@ -2156,7 +2204,8 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
17DC ; Lo # KHMER SIGN AVAKRAHASANYA
1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1844..1877 ; Lo # [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
1880..18A8 ; Lo # [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA
1880..1884 ; Lo # [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
1887..18A8 ; Lo # [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA
18B0..18F5 ; Lo # [70] CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CARRIER DENTAL S
1900..191E ; Lo # [31] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER TRA
@ -2194,12 +2243,12 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
309F ; Lo # HIRAGANA DIGRAPH YORI
30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
30FF ; Lo # KATAKANA DIGRAPH KOTO
3105..312D ; Lo # [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH
3105..312E ; Lo # [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
31A0..31BA ; Lo # [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
3400..4DB5 ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4E00..9FD5 ; Lo # [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
4E00..9FEA ; Lo # [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E
A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE
@ -2283,7 +2332,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
10280..1029C ; Lo # [29] LYCIAN LETTER A..LYCIAN LETTER X
102A0..102D0 ; Lo # [49] CARIAN LETTER A..CARIAN LETTER UUU3
10300..1031F ; Lo # [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
10330..10340 ; Lo # [17] GOTHIC LETTER AHSA..GOTHIC LETTER PAIRTHRA
1032D..10340 ; Lo # [20] OLD ITALIC LETTER YE..GOTHIC LETTER PAIRTHRA
10342..10349 ; Lo # [8] GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL
10350..10375 ; Lo # [38] OLD PERMIC LETTER AN..OLD PERMIC LETTER IA
10380..1039D ; Lo # [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU
@ -2349,6 +2398,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
1133D ; Lo # GRANTHA SIGN AVAGRAHA
11350 ; Lo # GRANTHA OM
1135D..11361 ; Lo # [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL
11400..11434 ; Lo # [53] NEWA LETTER A..NEWA LETTER HA
11447..1144A ; Lo # [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
11480..114AF ; Lo # [48] TIRHUTA ANJI..TIRHUTA LETTER HA
114C4..114C5 ; Lo # [2] TIRHUTA SIGN AVAGRAHA..TIRHUTA GVANG
114C7 ; Lo # TIRHUTA OM
@ -2359,7 +2410,21 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA
11700..11719 ; Lo # [26] AHOM LETTER KA..AHOM LETTER JHA
118FF ; Lo # WARANG CITI OM
11A00 ; Lo # ZANABAZAR SQUARE LETTER A
11A0B..11A32 ; Lo # [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
11A3A ; Lo # ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A50 ; Lo # SOYOMBO LETTER A
11A5C..11A83 ; Lo # [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
11A86..11A89 ; Lo # [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
11C00..11C08 ; Lo # [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
11C0A..11C2E ; Lo # [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
11C40 ; Lo # BHAIKSUKI SIGN AVAGRAHA
11C72..11C8F ; Lo # [30] MARCHEN LETTER KA..MARCHEN LETTER A
11D00..11D06 ; Lo # [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
11D08..11D09 ; Lo # [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
11D0B..11D30 ; Lo # [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
11D46 ; Lo # MASARAM GONDI REPHA
12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U
12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
@ -2372,7 +2437,10 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ
16F00..16F44 ; Lo # [69] MIAO LETTER PA..MIAO LETTER HHA
16F50 ; Lo # MIAO LETTER NASALIZATION
1B000..1B001 ; Lo # [2] KATAKANA LETTER ARCHAIC E..HIRAGANA LETTER ARCHAIC YE
17000..187EC ; Lo # [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
18800..18AF2 ; Lo # [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
1B000..1B11E ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2
1B170..1B2FB ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
1BC00..1BC6A ; Lo # [107] DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M
1BC70..1BC7C ; Lo # [13] DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLOYAN AFFIX ATTACHED TANGENT HOOK
1BC80..1BC88 ; Lo # [9] DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HIGH VERTICAL
@ -2415,9 +2483,10 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
2A700..2B734 ; Lo # [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
2B740..2B81D ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
# Total code points: 105697
# Total code points: 121047
# ================================================
@ -2446,6 +2515,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
08D4..08E1 ; Mn # [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
093A ; Mn # DEVANAGARI VOWEL SIGN OE
093C ; Mn # DEVANAGARI SIGN NUKTA
@ -2472,6 +2542,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0AC7..0AC8 ; Mn # [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
0ACD ; Mn # GUJARATI SIGN VIRAMA
0AE2..0AE3 ; Mn # [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
0AFA..0AFF ; Mn # [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
0B01 ; Mn # ORIYA SIGN CANDRABINDU
0B3C ; Mn # ORIYA SIGN NUKTA
0B3F ; Mn # ORIYA VOWEL SIGN I
@ -2494,7 +2565,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
0CC6 ; Mn # KANNADA VOWEL SIGN E
0CCC..0CCD ; Mn # [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CE2..0CE3 ; Mn # [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0D01 ; Mn # MALAYALAM SIGN CANDRABINDU
0D00..0D01 ; Mn # [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D3B..0D3C ; Mn # [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
0D41..0D44 ; Mn # [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
0D4D ; Mn # MALAYALAM SIGN VIRAMA
0D62..0D63 ; Mn # [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
@ -2540,6 +2612,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
17C9..17D3 ; Mn # [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
17DD ; Mn # KHMER SIGN ATTHACAN
180B..180D ; Mn # [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
1885..1886 ; Mn # [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
18A9 ; Mn # MONGOLIAN LETTER ALI GALI DAGALGA
1920..1922 ; Mn # [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
1927..1928 ; Mn # [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
@ -2577,8 +2650,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
1CED ; Mn # VEDIC SIGN TIRYAK
1CF4 ; Mn # VEDIC TONE CANDRA ABOVE
1CF8..1CF9 ; Mn # [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
1DC0..1DF5 ; Mn # [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
1DFC..1DFF ; Mn # [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
1DC0..1DF9 ; Mn # [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFB..1DFF ; Mn # [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
20D0..20DC ; Mn # [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20E1 ; Mn # COMBINING LEFT RIGHT ARROW ABOVE
20E5..20F0 ; Mn # [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE
@ -2595,7 +2668,7 @@ A802 ; Mn # SYLOTI NAGRI SIGN DVISVARA
A806 ; Mn # SYLOTI NAGRI SIGN HASANTA
A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA
A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A8C4 ; Mn # SAURASHTRA SIGN VIRAMA
A8C4..A8C5 ; Mn # [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
@ -2647,6 +2720,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
11234 ; Mn # KHOJKI SIGN ANUSVARA
11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
1123E ; Mn # KHOJKI SIGN SUKUN
112DF ; Mn # KHUDAWADI SIGN ANUSVARA
112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
@ -2654,6 +2728,9 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
11340 ; Mn # GRANTHA VOWEL SIGN II
11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
11438..1143F ; Mn # [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11442..11444 ; Mn # [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11446 ; Mn # NEWA SIGN NUKTA
114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E
114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA
@ -2672,6 +2749,27 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11A01..11A06 ; Mn # [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A09..11A0A ; Mn # [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A33..11A38 ; Mn # [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A3B..11A3E ; Mn # [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A47 ; Mn # ZANABAZAR SQUARE SUBJOINER
11A51..11A56 ; Mn # [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
11A59..11A5B ; Mn # [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
11A8A..11A96 ; Mn # [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
11A98..11A99 ; Mn # [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11C30..11C36 ; Mn # [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
11C38..11C3D ; Mn # [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
11C3F ; Mn # BHAIKSUKI SIGN VIRAMA
11C92..11CA7 ; Mn # [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
11CAA..11CB0 ; Mn # [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
11CB2..11CB3 ; Mn # [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
11CB5..11CB6 ; Mn # [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
11D31..11D36 ; Mn # [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
11D3A ; Mn # MASARAM GONDI VOWEL SIGN E
11D3C..11D3D ; Mn # [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Mn # [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D47 ; Mn # MASARAM GONDI RA-KARA
16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -2687,10 +2785,16 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
1DA84 ; Mn # SIGNWRITING LOCATION HEAD NECK
1DA9B..1DA9F ; Mn # [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
1DAA1..1DAAF ; Mn # [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
1E000..1E006 ; Mn # [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018 ; Mn # [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021 ; Mn # [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
1E023..1E024 ; Mn # [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
1E026..1E02A ; Mn # [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
1E8D0..1E8D6 ; Mn # [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
1E944..1E94A ; Mn # [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 1567
# Total code points: 1763
# ================================================
@ -2795,6 +2899,7 @@ A670..A672 ; Me # [3] COMBINING CYRILLIC TEN MILLIONS SIGN..COMBINING CYRIL
1C34..1C35 ; Mc # [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
1CE1 ; Mc # VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
1CF2..1CF3 ; Mc # [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
1CF7 ; Mc # VEDIC SIGN ATIKRAMA
302E..302F ; Mc # [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK
A823..A824 ; Mc # [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
A827 ; Mc # SYLOTI NAGRI VOWEL SIGN OO
@ -2837,6 +2942,9 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
1134B..1134D ; Mc # [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
11357 ; Mc # GRANTHA AU LENGTH MARK
11362..11363 ; Mc # [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
11435..11437 ; Mc # [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
11440..11441 ; Mc # [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
11445 ; Mc # NEWA SIGN VISARGA
114B0..114B2 ; Mc # [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II
114B9 ; Mc # TIRHUTA VOWEL SIGN E
114BB..114BE ; Mc # [4] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN AU
@ -2852,11 +2960,20 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
116B6 ; Mc # TAKRI SIGN VIRAMA
11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11726 ; Mc # AHOM VOWEL SIGN E
11A07..11A08 ; Mc # [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A39 ; Mc # ZANABAZAR SQUARE SIGN VISARGA
11A57..11A58 ; Mc # [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A97 ; Mc # SOYOMBO SIGN VISARGA
11C2F ; Mc # BHAIKSUKI VOWEL SIGN AA
11C3E ; Mc # BHAIKSUKI SIGN VISARGA
11CA9 ; Mc # MARCHEN SUBJOINED LETTER YA
11CB1 ; Mc # MARCHEN VOWEL SIGN I
11CB4 ; Mc # MARCHEN VOWEL SIGN O
16F51..16F7E ; Mc # [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5
# Total code points: 383
# Total code points: 401
# ================================================
@ -2905,16 +3022,20 @@ FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE
111D0..111D9 ; Nd # [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
112F0..112F9 ; Nd # [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE
11450..11459 ; Nd # [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
114D0..114D9 ; Nd # [10] TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE
11650..11659 ; Nd # [10] MODI DIGIT ZERO..MODI DIGIT NINE
116C0..116C9 ; Nd # [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE
11730..11739 ; Nd # [10] AHOM DIGIT ZERO..AHOM DIGIT NINE
118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
11C50..11C59 ; Nd # [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
11D50..11D59 ; Nd # [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE
16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE
1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
1E950..1E959 ; Nd # [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
# Total code points: 550
# Total code points: 590
# ================================================
@ -2946,7 +3067,8 @@ A6E6..A6EF ; Nl # [10] BAMUM LETTER MO..BAMUM LETTER KOGHOM
0B72..0B77 ; No # [6] ORIYA FRACTION ONE QUARTER..ORIYA FRACTION THREE SIXTEENTHS
0BF0..0BF2 ; No # [3] TAMIL NUMBER TEN..TAMIL NUMBER ONE THOUSAND
0C78..0C7E ; No # [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR
0D70..0D75 ; No # [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
0D58..0D5E ; No # [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
0D70..0D78 ; No # [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
0F2A..0F33 ; No # [10] TIBETAN DIGIT HALF ONE..TIBETAN DIGIT HALF ZERO
1369..137C ; No # [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND
17F0..17F9 ; No # [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON
@ -2993,12 +3115,13 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND
1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
11C5A..11C6C ; No # [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS
1D360..1D371 ; No # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
# Total code points: 647
# Total code points: 676
# ================================================
@ -3048,6 +3171,7 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
061C ; Cf # ARABIC LETTER MARK
06DD ; Cf # ARABIC END OF AYAH
070F ; Cf # SYRIAC ABBREVIATION MARK
08E2 ; Cf # ARABIC DISPUTED END OF AYAH
180E ; Cf # MONGOLIAN VOWEL SEPARATOR
200B..200F ; Cf # [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK
202A..202E ; Cf # [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
@ -3061,7 +3185,7 @@ FFF9..FFFB ; Cf # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION
E0001 ; Cf # LANGUAGE TAG
E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG
# Total code points: 150
# Total code points: 151
# ================================================
@ -3315,6 +3439,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
085E ; Po # MANDAIC PUNCTUATION
0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
0970 ; Po # DEVANAGARI ABBREVIATION SIGN
09FD ; Po # BENGALI ABBREVIATION SIGN
0AF0 ; Po # GUJARATI ABBREVIATION SIGN
0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA
0E4F ; Po # THAI CHARACTER FONGMAN
@ -3366,6 +3491,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN
2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM
2E41 ; Po # REVERSED COMMA
2E43..2E49 ; Po # [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK
303D ; Po # PART ALTERNATION MARK
30FB ; Po # KATAKANA MIDDLE DOT
@ -3429,10 +3555,19 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2
11238..1123D ; Po # [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
112A9 ; Po # MULTANI SECTION MARK
1144B..1144F ; Po # [5] NEWA DANDA..NEWA ABBREVIATION SIGN
1145B ; Po # NEWA PLACEHOLDER MARK
1145D ; Po # NEWA INSERTION SIGN
114C6 ; Po # TIRHUTA ABBREVIATION SIGN
115C1..115D7 ; Po # [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN
11660..1166C ; Po # [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
11A3F..11A46 ; Po # [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
11A9A..11A9C ; Po # [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9E..11AA2 ; Po # [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
11C41..11C45 ; Po # [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
11C70..11C71 ; Po # [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA
16AF5 ; Po # BASSA VAH FULL STOP
@ -3440,8 +3575,9 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
16B44 ; Po # PAHAWH HMONG SIGN XAUS
1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
1E95E..1E95F ; Po # [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
# Total code points: 513
# Total code points: 566
# ================================================
@ -3528,7 +3664,7 @@ FFE9..FFEC ; Sm # [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW
0BF9 ; Sc # TAMIL RUPEE SIGN
0E3F ; Sc # THAI CURRENCY SYMBOL BAHT
17DB ; Sc # KHMER CURRENCY SYMBOL RIEL
20A0..20BE ; Sc # [31] EURO-CURRENCY SIGN..LARI SIGN
20A0..20BF ; Sc # [32] EURO-CURRENCY SIGN..BITCOIN SIGN
A838 ; Sc # NORTH INDIC RUPEE MARK
FDFC ; Sc # RIAL SIGN
FE69 ; Sc # SMALL DOLLAR SIGN
@ -3536,7 +3672,7 @@ FF04 ; Sc # FULLWIDTH DOLLAR SIGN
FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN
FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN
# Total code points: 53
# Total code points: 54
# ================================================
@ -3594,6 +3730,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
0BF3..0BF8 ; So # [6] TAMIL DAY SIGN..TAMIL AS ABOVE SIGN
0BFA ; So # TAMIL NUMBER SIGN
0C7F ; So # TELUGU SIGN TUUMU
0D4F ; So # MALAYALAM SIGN PARA
0D79 ; So # MALAYALAM DATE MARK
0F01..0F03 ; So # [3] TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
0F13 ; So # TIBETAN MARK CARET -DZUD RTAGS ME LONG CAN
@ -3642,8 +3779,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
232B..237B ; So # [81] ERASE TO THE LEFT..NOT CHECK MARK
237D..239A ; So # [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL
23B4..23DB ; So # [40] TOP SQUARE BRACKET..FUSE
23E2..23FA ; So # [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD
2400..2426 ; So # [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
23E2..2426 ; So # [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
2440..244A ; So # [11] OCR HOOK..OCR DOUBLE BACKSLASH
249C..24E9 ; So # [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
2500..25B6 ; So # [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE
@ -3659,7 +3795,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BB9 ; So # [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8 ; So # [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD1 ; So # [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
2BCA..2BD2 ; So # [9] TOP HALF BLACK CIRCLE..GROUP MARK
2BEC..2BEF ; So # [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA
2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP
@ -3694,7 +3830,7 @@ FFED..FFEE ; So # [2] HALFWIDTH BLACK SQUARE..HALFWIDTH WHITE CIRCLE
FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
10137..1013F ; So # [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT
10179..10189 ; So # [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
1018C ; So # GREEK SINUSOID SIGN
1018C..1018E ; So # [3] GREEK SINUSOID SIGN..NOMISMA SIGN
10190..1019B ; So # [12] ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN
101A0 ; So # GREEK SYMBOL TAU RHO
101D0..101FC ; So # [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND
@ -3727,17 +3863,16 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
1F110..1F12E ; So # [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
1F130..1F16B ; So # [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F170..1F19A ; So # [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS
1F170..1F1AC ; So # [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA
1F210..1F23A ; So # [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
1F210..1F23B ; So # [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
1F240..1F248 ; So # [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
1F250..1F251 ; So # [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
1F260..1F265 ; So # [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
1F300..1F3FA ; So # [251] CYCLONE..AMPHORA
1F400..1F579 ; So # [378] RAT..JOYSTICK
1F57B..1F5A3 ; So # [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
1F5A5..1F6D0 ; So # [300] DESKTOP COMPUTER..PLACE OF WORSHIP
1F400..1F6D4 ; So # [725] RAT..PAGODA
1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F3 ; So # [4] SATELLITE..PASSENGER SHIP
1F6F0..1F6F8 ; So # [9] SATELLITE..FLYING SAUCER
1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D4 ; So # [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
@ -3745,11 +3880,15 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
1F860..1F887 ; So # [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F910..1F918 ; So # [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
1F980..1F984 ; So # [5] CRAB..UNICORN FACE
1F900..1F90B ; So # [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F910..1F93E ; So # [47] ZIPPER-MOUTH FACE..HANDBALL
1F940..1F94C ; So # [13] WILTED FLOWER..CURLING STONE
1F950..1F96B ; So # [28] CROISSANT..CANNED FOOD
1F980..1F997 ; So # [24] CRAB..CRICKET
1F9C0 ; So # CHEESE WEDGE
1F9D0..1F9E6 ; So # [23] FACE WITH MONOCLE..SOCKS
# Total code points: 5677
# Total code points: 5855
# ================================================

View File

@ -1,10 +1,11 @@
# GraphemeBreakProperty-8.0.0.txt
# Date: 2015-02-13, 13:47:14 GMT [MD]
# GraphemeBreakProperty-10.0.0.txt
# Date: 2017-03-12, 07:03:41 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see http://www.unicode.org/reports/tr44/
# ================================================
@ -17,6 +18,21 @@
# ================================================
0600..0605 ; Prepend # Cf [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
06DD ; Prepend # Cf ARABIC END OF AYAH
070F ; Prepend # Cf SYRIAC ABBREVIATION MARK
08E2 ; Prepend # Cf ARABIC DISPUTED END OF AYAH
0D4E ; Prepend # Lo MALAYALAM LETTER DOT REPH
110BD ; Prepend # Cf KAITHI NUMBER SIGN
111C2..111C3 ; Prepend # Lo [2] SHARADA SIGN JIHVAMULIYA..SHARADA SIGN UPADHMANIYA
11A3A ; Prepend # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A86..11A89 ; Prepend # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11D46 ; Prepend # Lo MASARAM GONDI REPHA
# Total code points: 19
# ================================================
000D ; CR # Cc <control-000D>
# Total code points: 1
@ -34,10 +50,7 @@
000E..001F ; Control # Cc [18] <control-000E>..<control-001F>
007F..009F ; Control # Cc [33] <control-007F>..<control-009F>
00AD ; Control # Cf SOFT HYPHEN
0600..0605 ; Control # Cf [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
061C ; Control # Cf ARABIC LETTER MARK
06DD ; Control # Cf ARABIC END OF AYAH
070F ; Control # Cf SYRIAC ABBREVIATION MARK
180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR
200B ; Control # Cf ZERO WIDTH SPACE
200E..200F ; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
@ -51,17 +64,15 @@ D800..DFFF ; Control # Cs [2048] <surrogate-D800>..<surrogate-DFFF>
FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE
FFF0..FFF8 ; Control # Cn [9] <reserved-FFF0>..<reserved-FFF8>
FFF9..FFFB ; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR
110BD ; Control # Cf KAITHI NUMBER SIGN
1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP
1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
E0000 ; Control # Cn <reserved-E0000>
E0001 ; Control # Cf LANGUAGE TAG
E0002..E001F ; Control # Cn [30] <reserved-E0002>..<reserved-E001F>
E0020..E007F ; Control # Cf [96] TAG SPACE..CANCEL TAG
E0080..E00FF ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF>
E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
# Total code points: 6030
# Total code points: 5925
# ================================================
@ -89,6 +100,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0825..0827 ; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
0829..082D ; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
0859..085B ; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
08D4..08E1 ; Extend # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08E3..0902 ; Extend # Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
093A ; Extend # Mn DEVANAGARI VOWEL SIGN OE
093C ; Extend # Mn DEVANAGARI SIGN NUKTA
@ -117,6 +129,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0AC7..0AC8 ; Extend # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
0ACD ; Extend # Mn GUJARATI SIGN VIRAMA
0AE2..0AE3 ; Extend # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
0AFA..0AFF ; Extend # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
0B01 ; Extend # Mn ORIYA SIGN CANDRABINDU
0B3C ; Extend # Mn ORIYA SIGN NUKTA
0B3E ; Extend # Mc ORIYA VOWEL SIGN AA
@ -145,7 +158,8 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
0CCC..0CCD ; Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CD5..0CD6 ; Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
0CE2..0CE3 ; Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0D01 ; Extend # Mn MALAYALAM SIGN CANDRABINDU
0D00..0D01 ; Extend # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D3B..0D3C ; Extend # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
0D3E ; Extend # Mc MALAYALAM VOWEL SIGN AA
0D41..0D44 ; Extend # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
0D4D ; Extend # Mn MALAYALAM SIGN VIRAMA
@ -195,6 +209,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
17C9..17D3 ; Extend # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
17DD ; Extend # Mn KHMER SIGN ATTHACAN
180B..180D ; Extend # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
1885..1886 ; Extend # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
18A9 ; Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA
1920..1922 ; Extend # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
1927..1928 ; Extend # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
@ -233,9 +248,9 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
1CED ; Extend # Mn VEDIC SIGN TIRYAK
1CF4 ; Extend # Mn VEDIC TONE CANDRA ABOVE
1CF8..1CF9 ; Extend # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
1DC0..1DF5 ; Extend # Mn [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
1DFC..1DFF ; Extend # Mn [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
200C..200D ; Extend # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
1DC0..1DF9 ; Extend # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFB..1DFF ; Extend # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
200C ; Extend # Cf ZERO WIDTH NON-JOINER
20D0..20DC ; Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20DD..20E0 ; Extend # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
20E1 ; Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE
@ -256,7 +271,7 @@ A802 ; Extend # Mn SYLOTI NAGRI SIGN DVISVARA
A806 ; Extend # Mn SYLOTI NAGRI SIGN HASANTA
A80B ; Extend # Mn SYLOTI NAGRI SIGN ANUSVARA
A825..A826 ; Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
A8C4 ; Extend # Mn SAURASHTRA SIGN VIRAMA
A8C4..A8C5 ; Extend # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8E0..A8F1 ; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
A926..A92D ; Extend # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
A947..A951 ; Extend # Mn [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
@ -309,6 +324,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1122F..11231 ; Extend # Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
11234 ; Extend # Mn KHOJKI SIGN ANUSVARA
11236..11237 ; Extend # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
1123E ; Extend # Mn KHOJKI SIGN SUKUN
112DF ; Extend # Mn KHUDAWADI SIGN ANUSVARA
112E3..112EA ; Extend # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
11300..11301 ; Extend # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
@ -318,6 +334,9 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
11357 ; Extend # Mc GRANTHA AU LENGTH MARK
11366..1136C ; Extend # Mn [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
11370..11374 ; Extend # Mn [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
11438..1143F ; Extend # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11442..11444 ; Extend # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11446 ; Extend # Mn NEWA SIGN NUKTA
114B0 ; Extend # Mc TIRHUTA VOWEL SIGN AA
114B3..114B8 ; Extend # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
114BA ; Extend # Mn TIRHUTA VOWEL SIGN SHORT E
@ -339,6 +358,27 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1171D..1171F ; Extend # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11722..11725 ; Extend # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11727..1172B ; Extend # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11A01..11A06 ; Extend # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A09..11A0A ; Extend # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A33..11A38 ; Extend # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A3B..11A3E ; Extend # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A47 ; Extend # Mn ZANABAZAR SQUARE SUBJOINER
11A51..11A56 ; Extend # Mn [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
11A59..11A5B ; Extend # Mn [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
11A8A..11A96 ; Extend # Mn [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
11A98..11A99 ; Extend # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11C30..11C36 ; Extend # Mn [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
11C38..11C3D ; Extend # Mn [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
11C3F ; Extend # Mn BHAIKSUKI SIGN VIRAMA
11C92..11CA7 ; Extend # Mn [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
11CAA..11CB0 ; Extend # Mn [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
11CB2..11CB3 ; Extend # Mn [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
11CB5..11CB6 ; Extend # Mn [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
11D31..11D36 ; Extend # Mn [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
11D3A ; Extend # Mn MASARAM GONDI VOWEL SIGN E
11D3C..11D3D ; Extend # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Extend # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D47 ; Extend # Mn MASARAM GONDI RA-KARA
16AF0..16AF4 ; Extend # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
16B30..16B36 ; Extend # Mn [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
16F8F..16F92 ; Extend # Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -356,10 +396,17 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
1DA84 ; Extend # Mn SIGNWRITING LOCATION HEAD NECK
1DA9B..1DA9F ; Extend # Mn [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
1DAA1..1DAAF ; Extend # Mn [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
1E000..1E006 ; Extend # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018 ; Extend # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021 ; Extend # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
1E023..1E024 ; Extend # Mn [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
1E026..1E02A ; Extend # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
1E8D0..1E8D6 ; Extend # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
1E944..1E94A ; Extend # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
E0020..E007F ; Extend # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 1610
# Total code points: 1901
# ================================================
@ -444,6 +491,7 @@ E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
1C34..1C35 ; SpacingMark # Mc [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
1CE1 ; SpacingMark # Mc VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
1CF2..1CF3 ; SpacingMark # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
1CF7 ; SpacingMark # Mc VEDIC SIGN ATIKRAMA
A823..A824 ; SpacingMark # Mc [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
A827 ; SpacingMark # Mc SYLOTI NAGRI VOWEL SIGN OO
A880..A881 ; SpacingMark # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
@ -482,6 +530,9 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
11347..11348 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI
1134B..1134D ; SpacingMark # Mc [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
11362..11363 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
11435..11437 ; SpacingMark # Mc [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
11440..11441 ; SpacingMark # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
11445 ; SpacingMark # Mc NEWA SIGN VISARGA
114B1..114B2 ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN I..TIRHUTA VOWEL SIGN II
114B9 ; SpacingMark # Mc TIRHUTA VOWEL SIGN E
114BB..114BC ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN O
@ -498,11 +549,20 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
116B6 ; SpacingMark # Mc TAKRI SIGN VIRAMA
11720..11721 ; SpacingMark # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11726 ; SpacingMark # Mc AHOM VOWEL SIGN E
11A07..11A08 ; SpacingMark # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A39 ; SpacingMark # Mc ZANABAZAR SQUARE SIGN VISARGA
11A57..11A58 ; SpacingMark # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A97 ; SpacingMark # Mc SOYOMBO SIGN VISARGA
11C2F ; SpacingMark # Mc BHAIKSUKI VOWEL SIGN AA
11C3E ; SpacingMark # Mc BHAIKSUKI SIGN VISARGA
11CA9 ; SpacingMark # Mc MARCHEN SUBJOINED LETTER YA
11CB1 ; SpacingMark # Mc MARCHEN VOWEL SIGN I
11CB4 ; SpacingMark # Mc MARCHEN VOWEL SIGN O
16F51..16F7E ; SpacingMark # Mc [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
1D166 ; SpacingMark # Mc MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
1D16D ; SpacingMark # Mc MUSICAL SYMBOL COMBINING AUGMENTATION DOT
# Total code points: 330
# Total code points: 348
# ================================================
@ -1333,4 +1393,83 @@ D789..D7A3 ; LVT # Lo [27] HANGUL SYLLABLE HIG..HANGUL SYLLABLE HIH
# Total code points: 10773
# ================================================
261D ; E_Base # So WHITE UP POINTING INDEX
26F9 ; E_Base # So PERSON WITH BALL
270A..270D ; E_Base # So [4] RAISED FIST..WRITING HAND
1F385 ; E_Base # So FATHER CHRISTMAS
1F3C2..1F3C4 ; E_Base # So [3] SNOWBOARDER..SURFER
1F3C7 ; E_Base # So HORSE RACING
1F3CA..1F3CC ; E_Base # So [3] SWIMMER..GOLFER
1F442..1F443 ; E_Base # So [2] EAR..NOSE
1F446..1F450 ; E_Base # So [11] WHITE UP POINTING BACKHAND INDEX..OPEN HANDS SIGN
1F46E ; E_Base # So POLICE OFFICER
1F470..1F478 ; E_Base # So [9] BRIDE WITH VEIL..PRINCESS
1F47C ; E_Base # So BABY ANGEL
1F481..1F483 ; E_Base # So [3] INFORMATION DESK PERSON..DANCER
1F485..1F487 ; E_Base # So [3] NAIL POLISH..HAIRCUT
1F4AA ; E_Base # So FLEXED BICEPS
1F574..1F575 ; E_Base # So [2] MAN IN BUSINESS SUIT LEVITATING..SLEUTH OR SPY
1F57A ; E_Base # So MAN DANCING
1F590 ; E_Base # So RAISED HAND WITH FINGERS SPLAYED
1F595..1F596 ; E_Base # So [2] REVERSED HAND WITH MIDDLE FINGER EXTENDED..RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS
1F645..1F647 ; E_Base # So [3] FACE WITH NO GOOD GESTURE..PERSON BOWING DEEPLY
1F64B..1F64F ; E_Base # So [5] HAPPY PERSON RAISING ONE HAND..PERSON WITH FOLDED HANDS
1F6A3 ; E_Base # So ROWBOAT
1F6B4..1F6B6 ; E_Base # So [3] BICYCLIST..PEDESTRIAN
1F6C0 ; E_Base # So BATH
1F6CC ; E_Base # So SLEEPING ACCOMMODATION
1F918..1F91C ; E_Base # So [5] SIGN OF THE HORNS..RIGHT-FACING FIST
1F91E..1F91F ; E_Base # So [2] HAND WITH INDEX AND MIDDLE FINGERS CROSSED..I LOVE YOU HAND SIGN
1F926 ; E_Base # So FACE PALM
1F930..1F939 ; E_Base # So [10] PREGNANT WOMAN..JUGGLING
1F93D..1F93E ; E_Base # So [2] WATER POLO..HANDBALL
1F9D1..1F9DD ; E_Base # So [13] ADULT..ELF
# Total code points: 98
# ================================================
1F3FB..1F3FF ; E_Modifier # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
# Total code points: 5
# ================================================
200D ; ZWJ # Cf ZERO WIDTH JOINER
# Total code points: 1
# ================================================
2640 ; Glue_After_Zwj # So FEMALE SIGN
2642 ; Glue_After_Zwj # So MALE SIGN
2695..2696 ; Glue_After_Zwj # So [2] STAFF OF AESCULAPIUS..SCALES
2708 ; Glue_After_Zwj # So AIRPLANE
2764 ; Glue_After_Zwj # So HEAVY BLACK HEART
1F308 ; Glue_After_Zwj # So RAINBOW
1F33E ; Glue_After_Zwj # So EAR OF RICE
1F373 ; Glue_After_Zwj # So COOKING
1F393 ; Glue_After_Zwj # So GRADUATION CAP
1F3A4 ; Glue_After_Zwj # So MICROPHONE
1F3A8 ; Glue_After_Zwj # So ARTIST PALETTE
1F3EB ; Glue_After_Zwj # So SCHOOL
1F3ED ; Glue_After_Zwj # So FACTORY
1F48B ; Glue_After_Zwj # So KISS MARK
1F4BB..1F4BC ; Glue_After_Zwj # So [2] PERSONAL COMPUTER..BRIEFCASE
1F527 ; Glue_After_Zwj # So WRENCH
1F52C ; Glue_After_Zwj # So MICROSCOPE
1F5E8 ; Glue_After_Zwj # So LEFT SPEECH BUBBLE
1F680 ; Glue_After_Zwj # So ROCKET
1F692 ; Glue_After_Zwj # So FIRE ENGINE
# Total code points: 22
# ================================================
1F466..1F469 ; E_Base_GAZ # So [4] BOY..WOMAN
# Total code points: 4
# EOF

View File

@ -1,10 +1,11 @@
# Scripts-8.0.0.txt
# Date: 2015-03-11, 22:29:42 GMT [MD]
# Scripts-10.0.0.txt
# Date: 2017-03-11, 06:40:37 GMT
# © 2017 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
# Copyright (c) 1991-2015 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
# For documentation, see http://www.unicode.org/reports/tr44/
# For more information, see:
# UAX #24, Unicode Script Property: http://www.unicode.org/reports/tr24/
# Especially the sections:
@ -92,10 +93,10 @@
0605 ; Common # Cf ARABIC NUMBER MARK ABOVE
060C ; Common # Po ARABIC COMMA
061B ; Common # Po ARABIC SEMICOLON
061C ; Common # Cf ARABIC LETTER MARK
061F ; Common # Po ARABIC QUESTION MARK
0640 ; Common # Lm ARABIC TATWEEL
06DD ; Common # Cf ARABIC END OF AYAH
08E2 ; Common # Cf ARABIC DISPUTED END OF AYAH
0964..0965 ; Common # Po [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
0E3F ; Common # Sc THAI CURRENCY SYMBOL BAHT
0FD5..0FD8 ; Common # So [4] RIGHT-FACING SVASTI SIGN..LEFT-FACING SVASTI SIGN WITH DOTS
@ -110,6 +111,7 @@
1CEE..1CF1 ; Common # Lo [4] VEDIC SIGN HEXIFORM LONG ANUSVARA..VEDIC SIGN ANUSVARA UBHAYATO MUKHA
1CF2..1CF3 ; Common # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
1CF5..1CF6 ; Common # Lo [2] VEDIC SIGN JIHVAMULIYA..VEDIC SIGN UPADHMANIYA
1CF7 ; Common # Mc VEDIC SIGN ATIKRAMA
2000..200A ; Common # Zs [11] EN QUAD..HAIR SPACE
200B ; Common # Cf ZERO WIDTH SPACE
200E..200F ; Common # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
@ -153,7 +155,7 @@
208A..208C ; Common # Sm [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN
208D ; Common # Ps SUBSCRIPT LEFT PARENTHESIS
208E ; Common # Pe SUBSCRIPT RIGHT PARENTHESIS
20A0..20BE ; Common # Sc [31] EURO-CURRENCY SIGN..LARI SIGN
20A0..20BF ; Common # Sc [32] EURO-CURRENCY SIGN..BITCOIN SIGN
2100..2101 ; Common # So [2] ACCOUNT OF..ADDRESSED TO THE SUBJECT
2102 ; Common # L& DOUBLE-STRUCK CAPITAL C
2103..2106 ; Common # So [4] DEGREE CELSIUS..CADA UNA
@ -223,8 +225,7 @@
239B..23B3 ; Common # Sm [25] LEFT PARENTHESIS UPPER HOOK..SUMMATION BOTTOM
23B4..23DB ; Common # So [40] TOP SQUARE BRACKET..FUSE
23DC..23E1 ; Common # Sm [6] TOP PARENTHESIS..BOTTOM TORTOISE SHELL BRACKET
23E2..23FA ; Common # So [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD
2400..2426 ; Common # So [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
23E2..2426 ; Common # So [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
2440..244A ; Common # So [11] OCR HOOK..OCR DOUBLE BACKSLASH
2460..249B ; Common # No [60] CIRCLED DIGIT ONE..NUMBER TWENTY FULL STOP
249C..24E9 ; Common # So [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
@ -309,7 +310,7 @@
2B76..2B95 ; Common # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
2B98..2BB9 ; Common # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8 ; Common # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD1 ; Common # So [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
2BCA..2BD2 ; Common # So [9] TOP HALF BLACK CIRCLE..GROUP MARK
2BEC..2BEF ; Common # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2E00..2E01 ; Common # Po [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER
2E02 ; Common # Pi LEFT SUBSTITUTION BRACKET
@ -348,6 +349,7 @@
2E40 ; Common # Pd DOUBLE HYPHEN
2E41 ; Common # Po REVERSED COMMA
2E42 ; Common # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK
2E43..2E49 ; Common # Po [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
2FF0..2FFB ; Common # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
3000 ; Common # Zs IDEOGRAPHIC SPACE
3001..3003 ; Common # Po [3] IDEOGRAPHIC COMMA..DITTO MARK
@ -572,19 +574,18 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F100..1F10C ; Common # No [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
1F110..1F12E ; Common # So [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
1F130..1F16B ; Common # So [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
1F170..1F19A ; Common # So [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS
1F170..1F1AC ; Common # So [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
1F1E6..1F1FF ; Common # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
1F201..1F202 ; Common # So [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA
1F210..1F23A ; Common # So [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
1F210..1F23B ; Common # So [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
1F240..1F248 ; Common # So [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
1F250..1F251 ; Common # So [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
1F260..1F265 ; Common # So [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
1F300..1F3FA ; Common # So [251] CYCLONE..AMPHORA
1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
1F400..1F579 ; Common # So [378] RAT..JOYSTICK
1F57B..1F5A3 ; Common # So [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
1F5A5..1F6D0 ; Common # So [300] DESKTOP COMPUTER..PLACE OF WORSHIP
1F400..1F6D4 ; Common # So [725] RAT..PAGODA
1F6E0..1F6EC ; Common # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F3 ; Common # So [4] SATELLITE..PASSENGER SHIP
1F6F0..1F6F8 ; Common # So [9] SATELLITE..FLYING SAUCER
1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
1F780..1F7D4 ; Common # So [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
@ -592,13 +593,17 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
1F850..1F859 ; Common # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
1F860..1F887 ; Common # So [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
1F890..1F8AD ; Common # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F910..1F918 ; Common # So [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
1F980..1F984 ; Common # So [5] CRAB..UNICORN FACE
1F900..1F90B ; Common # So [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
1F910..1F93E ; Common # So [47] ZIPPER-MOUTH FACE..HANDBALL
1F940..1F94C ; Common # So [13] WILTED FLOWER..CURLING STONE
1F950..1F96B ; Common # So [28] CROISSANT..CANNED FOOD
1F980..1F997 ; Common # So [24] CRAB..CRICKET
1F9C0 ; Common # So CHEESE WEDGE
1F9D0..1F9E6 ; Common # So [23] FACE WITH MONOCLE..SOCKS
E0001 ; Common # Cf LANGUAGE TAG
E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG
# Total code points: 7179
# Total code points: 7363
# ================================================
@ -641,7 +646,7 @@ A770 ; Latin # Lm MODIFIER LETTER US
A771..A787 ; Latin # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T
A78B..A78E ; Latin # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
A78F ; Latin # Lo LATIN LETTER SINOLOGICAL DOT
A790..A7AD ; Latin # L& [30] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER L WITH BELT
A790..A7AE ; Latin # L& [31] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER SMALL CAPITAL I
A7B0..A7B7 ; Latin # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA
A7F7 ; Latin # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I
A7F8..A7F9 ; Latin # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
@ -654,7 +659,7 @@ FB00..FB06 ; Latin # L& [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE S
FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
# Total code points: 1349
# Total code points: 1350
# ================================================
@ -708,13 +713,13 @@ AB65 ; Greek # L& GREEK LETTER SMALL CAPITAL OMEGA
10175..10178 ; Greek # No [4] GREEK ONE HALF SIGN..GREEK THREE QUARTERS SIGN
10179..10189 ; Greek # So [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
1018A..1018B ; Greek # No [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN
1018C ; Greek # So GREEK SINUSOID SIGN
1018C..1018E ; Greek # So [3] GREEK SINUSOID SIGN..NOMISMA SIGN
101A0 ; Greek # So GREEK SYMBOL TAU RHO
1D200..1D241 ; Greek # So [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54
1D242..1D244 ; Greek # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
1D245 ; Greek # So GREEK MUSICAL LEIMMA
# Total code points: 516
# Total code points: 518
# ================================================
@ -724,6 +729,7 @@ AB65 ; Greek # L& GREEK LETTER SMALL CAPITAL OMEGA
0487 ; Cyrillic # Mn COMBINING CYRILLIC POKRYTIE
0488..0489 ; Cyrillic # Me [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN
048A..052F ; Cyrillic # L& [166] CYRILLIC CAPITAL LETTER SHORT I WITH TAIL..CYRILLIC SMALL LETTER EL WITH DESCENDER
1C80..1C88 ; Cyrillic # L& [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
1D2B ; Cyrillic # L& CYRILLIC LETTER SMALL CAPITAL EL
1D78 ; Cyrillic # Lm MODIFIER LETTER CYRILLIC EN
2DE0..2DFF ; Cyrillic # Mn [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS
@ -740,7 +746,7 @@ A69C..A69D ; Cyrillic # Lm [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER
A69E..A69F ; Cyrillic # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF
# Total code points: 434
# Total code points: 443
# ================================================
@ -791,6 +797,7 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
060D ; Arabic # Po ARABIC DATE SEPARATOR
060E..060F ; Arabic # So [2] ARABIC POETIC VERSE SIGN..ARABIC SIGN MISRA
0610..061A ; Arabic # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA
061C ; Arabic # Cf ARABIC LETTER MARK
061E ; Arabic # Po ARABIC TRIPLE DOT PUNCTUATION MARK
0620..063F ; Arabic # Lo [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE
0641..064A ; Arabic # Lo [10] ARABIC LETTER FEH..ARABIC LETTER YEH
@ -815,6 +822,8 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
06FF ; Arabic # Lo ARABIC LETTER HEH WITH INVERTED V
0750..077F ; Arabic # Lo [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE
08A0..08B4 ; Arabic # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08B6..08BD ; Arabic # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
08D4..08E1 ; Arabic # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
08E3..08FF ; Arabic # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
FB50..FBB1 ; Arabic # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
FBB2..FBC1 ; Arabic # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW
@ -862,7 +871,7 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN
1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL
# Total code points: 1257
# Total code points: 1280
# ================================================
@ -873,8 +882,9 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
0712..072F ; Syriac # Lo [30] SYRIAC LETTER BETH..SYRIAC LETTER PERSIAN DHALATH
0730..074A ; Syriac # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
074D..074F ; Syriac # Lo [3] SYRIAC LETTER SOGDIAN ZHAIN..SYRIAC LETTER SOGDIAN FE
0860..086A ; Syriac # Lo [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
# Total code points: 77
# Total code points: 88
# ================================================
@ -944,8 +954,10 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
09F4..09F9 ; Bengali # No [6] BENGALI CURRENCY NUMERATOR ONE..BENGALI CURRENCY DENOMINATOR SIXTEEN
09FA ; Bengali # So BENGALI ISSHAR
09FB ; Bengali # Sc BENGALI GANDA MARK
09FC ; Bengali # Lo BENGALI LETTER VEDIC ANUSVARA
09FD ; Bengali # Po BENGALI ABBREVIATION SIGN
# Total code points: 93
# Total code points: 95
# ================================================
@ -998,8 +1010,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0AF0 ; Gujarati # Po GUJARATI ABBREVIATION SIGN
0AF1 ; Gujarati # Sc GUJARATI RUPEE SIGN
0AF9 ; Gujarati # Lo GUJARATI LETTER ZHA
0AFA..0AFF ; Gujarati # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
# Total code points: 85
# Total code points: 91
# ================================================
@ -1086,6 +1099,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
# ================================================
0C80 ; Kannada # Lo KANNADA SIGN SPACING CANDRABINDU
0C81 ; Kannada # Mn KANNADA SIGN CANDRABINDU
0C82..0C83 ; Kannada # Mc [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
0C85..0C8C ; Kannada # Lo [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
@ -1109,15 +1123,16 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
# Total code points: 87
# Total code points: 88
# ================================================
0D01 ; Malayalam # Mn MALAYALAM SIGN CANDRABINDU
0D00..0D01 ; Malayalam # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
0D02..0D03 ; Malayalam # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
0D05..0D0C ; Malayalam # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
0D0E..0D10 ; Malayalam # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
0D12..0D3A ; Malayalam # Lo [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
0D3B..0D3C ; Malayalam # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
0D3D ; Malayalam # Lo MALAYALAM SIGN AVAGRAHA
0D3E..0D40 ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN II
0D41..0D44 ; Malayalam # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
@ -1125,15 +1140,18 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
0D4A..0D4C ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN O..MALAYALAM VOWEL SIGN AU
0D4D ; Malayalam # Mn MALAYALAM SIGN VIRAMA
0D4E ; Malayalam # Lo MALAYALAM LETTER DOT REPH
0D4F ; Malayalam # So MALAYALAM SIGN PARA
0D54..0D56 ; Malayalam # Lo [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
0D57 ; Malayalam # Mc MALAYALAM AU LENGTH MARK
0D58..0D5E ; Malayalam # No [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
0D5F..0D61 ; Malayalam # Lo [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
0D62..0D63 ; Malayalam # Mn [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
0D66..0D6F ; Malayalam # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
0D70..0D75 ; Malayalam # No [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
0D70..0D78 ; Malayalam # No [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
0D79 ; Malayalam # So MALAYALAM DATE MARK
0D7A..0D7F ; Malayalam # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
# Total code points: 100
# Total code points: 117
# ================================================
@ -1436,21 +1454,24 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
1820..1842 ; Mongolian # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
1843 ; Mongolian # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN
1844..1877 ; Mongolian # Lo [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
1880..18A8 ; Mongolian # Lo [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA
1880..1884 ; Mongolian # Lo [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
1885..1886 ; Mongolian # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
1887..18A8 ; Mongolian # Lo [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
18A9 ; Mongolian # Mn MONGOLIAN LETTER ALI GALI DAGALGA
18AA ; Mongolian # Lo MONGOLIAN LETTER MANCHU ALI GALI LHA
11660..1166C ; Mongolian # Po [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
# Total code points: 153
# Total code points: 166
# ================================================
3041..3096 ; Hiragana # Lo [86] HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMALL KE
309D..309E ; Hiragana # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
309F ; Hiragana # Lo HIRAGANA DIGRAPH YORI
1B001 ; Hiragana # Lo HIRAGANA LETTER ARCHAIC YE
1B001..1B11E ; Hiragana # Lo [286] HIRAGANA LETTER ARCHAIC YE..HENTAIGANA LETTER N-MU-MO-2
1F200 ; Hiragana # So SQUARE HIRAGANA HOKA
# Total code points: 91
# Total code points: 376
# ================================================
@ -1469,10 +1490,10 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
# ================================================
02EA..02EB ; Bopomofo # Sk [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK
3105..312D ; Bopomofo # Lo [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH
3105..312E ; Bopomofo # Lo [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
31A0..31BA ; Bopomofo # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
# Total code points: 70
# Total code points: 71
# ================================================
@ -1485,16 +1506,17 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
3038..303A ; Han # Nl [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY
303B ; Han # Lm VERTICAL IDEOGRAPHIC ITERATION MARK
3400..4DB5 ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4E00..9FD5 ; Han # Lo [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
4E00..9FEA ; Han # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
F900..FA6D ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D
FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
20000..2A6D6 ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
2A700..2B734 ; Han # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
2B740..2B81D ; Han # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Han # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
# Total code points: 81734
# Total code points: 89228
# ================================================
@ -1509,8 +1531,9 @@ A490..A4C6 ; Yi # So [55] YI RADICAL QOT..YI RADICAL KE
10300..1031F ; Old_Italic # Lo [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
10320..10323 ; Old_Italic # No [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY
1032D..1032F ; Old_Italic # Lo [3] OLD ITALIC LETTER YE..OLD ITALIC LETTER SOUTHERN TSE
# Total code points: 36
# Total code points: 39
# ================================================
@ -1542,8 +1565,8 @@ A490..A4C6 ; Yi # So [55] YI RADICAL QOT..YI RADICAL KE
1CED ; Inherited # Mn VEDIC SIGN TIRYAK
1CF4 ; Inherited # Mn VEDIC TONE CANDRA ABOVE
1CF8..1CF9 ; Inherited # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
1DC0..1DF5 ; Inherited # Mn [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
1DFC..1DFF ; Inherited # Mn [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
1DC0..1DF9 ; Inherited # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
1DFB..1DFF ; Inherited # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
200C..200D ; Inherited # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
20D0..20DC ; Inherited # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20DD..20E0 ; Inherited # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
@ -1562,7 +1585,7 @@ FE20..FE2D ; Inherited # Mn [14] COMBINING LIGATURE LEFT HALF..COMBINING CON
1D1AA..1D1AD ; Inherited # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# Total code points: 563
# Total code points: 568
# ================================================
@ -1705,8 +1728,13 @@ E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-2
2C00..2C2E ; Glagolitic # L& [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
2C30..2C5E ; Glagolitic # L& [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
1E000..1E006 ; Glagolitic # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
1E008..1E018 ; Glagolitic # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
1E01B..1E021 ; Glagolitic # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
1E023..1E024 ; Glagolitic # Mn [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
1E026..1E02A ; Glagolitic # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
# Total code points: 94
# Total code points: 132
# ================================================
@ -1872,11 +1900,11 @@ A62A..A62B ; Vai # Lo [2] VAI SYLLABLE NDOLE MA..VAI SYLLABLE NDOLE DO
A880..A881 ; Saurashtra # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
A882..A8B3 ; Saurashtra # Lo [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA
A8B4..A8C3 ; Saurashtra # Mc [16] SAURASHTRA CONSONANT SIGN HAARU..SAURASHTRA VOWEL SIGN AU
A8C4 ; Saurashtra # Mn SAURASHTRA SIGN VIRAMA
A8C4..A8C5 ; Saurashtra # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
A8CE..A8CF ; Saurashtra # Po [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA
A8D0..A8D9 ; Saurashtra # Nd [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE
# Total code points: 81
# Total code points: 82
# ================================================
@ -2314,8 +2342,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
11235 ; Khojki # Mc KHOJKI SIGN VIRAMA
11236..11237 ; Khojki # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
11238..1123D ; Khojki # Po [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
1123E ; Khojki # Mn KHOJKI SIGN SUKUN
# Total code points: 61
# Total code points: 62
# ================================================
@ -2536,4 +2565,129 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
# Total code points: 672
# ================================================
1E900..1E943 ; Adlam # L& [68] ADLAM CAPITAL LETTER ALIF..ADLAM SMALL LETTER SHA
1E944..1E94A ; Adlam # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
1E950..1E959 ; Adlam # Nd [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
1E95E..1E95F ; Adlam # Po [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
# Total code points: 87
# ================================================
11C00..11C08 ; Bhaiksuki # Lo [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
11C0A..11C2E ; Bhaiksuki # Lo [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
11C2F ; Bhaiksuki # Mc BHAIKSUKI VOWEL SIGN AA
11C30..11C36 ; Bhaiksuki # Mn [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
11C38..11C3D ; Bhaiksuki # Mn [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
11C3E ; Bhaiksuki # Mc BHAIKSUKI SIGN VISARGA
11C3F ; Bhaiksuki # Mn BHAIKSUKI SIGN VIRAMA
11C40 ; Bhaiksuki # Lo BHAIKSUKI SIGN AVAGRAHA
11C41..11C45 ; Bhaiksuki # Po [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
11C50..11C59 ; Bhaiksuki # Nd [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
11C5A..11C6C ; Bhaiksuki # No [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
# Total code points: 97
# ================================================
11C70..11C71 ; Marchen # Po [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
11C72..11C8F ; Marchen # Lo [30] MARCHEN LETTER KA..MARCHEN LETTER A
11C92..11CA7 ; Marchen # Mn [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
11CA9 ; Marchen # Mc MARCHEN SUBJOINED LETTER YA
11CAA..11CB0 ; Marchen # Mn [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
11CB1 ; Marchen # Mc MARCHEN VOWEL SIGN I
11CB2..11CB3 ; Marchen # Mn [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
11CB4 ; Marchen # Mc MARCHEN VOWEL SIGN O
11CB5..11CB6 ; Marchen # Mn [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
# Total code points: 68
# ================================================
11400..11434 ; Newa # Lo [53] NEWA LETTER A..NEWA LETTER HA
11435..11437 ; Newa # Mc [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
11438..1143F ; Newa # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
11440..11441 ; Newa # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
11442..11444 ; Newa # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
11445 ; Newa # Mc NEWA SIGN VISARGA
11446 ; Newa # Mn NEWA SIGN NUKTA
11447..1144A ; Newa # Lo [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
1144B..1144F ; Newa # Po [5] NEWA DANDA..NEWA ABBREVIATION SIGN
11450..11459 ; Newa # Nd [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
1145B ; Newa # Po NEWA PLACEHOLDER MARK
1145D ; Newa # Po NEWA INSERTION SIGN
# Total code points: 92
# ================================================
104B0..104D3 ; Osage # L& [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
104D8..104FB ; Osage # L& [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
# Total code points: 72
# ================================================
16FE0 ; Tangut # Lm TANGUT ITERATION MARK
17000..187EC ; Tangut # Lo [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
18800..18AF2 ; Tangut # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
# Total code points: 6881
# ================================================
11D00..11D06 ; Masaram_Gondi # Lo [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
11D08..11D09 ; Masaram_Gondi # Lo [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
11D0B..11D30 ; Masaram_Gondi # Lo [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
11D31..11D36 ; Masaram_Gondi # Mn [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
11D3A ; Masaram_Gondi # Mn MASARAM GONDI VOWEL SIGN E
11D3C..11D3D ; Masaram_Gondi # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
11D3F..11D45 ; Masaram_Gondi # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
11D46 ; Masaram_Gondi # Lo MASARAM GONDI REPHA
11D47 ; Masaram_Gondi # Mn MASARAM GONDI RA-KARA
11D50..11D59 ; Masaram_Gondi # Nd [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
# Total code points: 75
# ================================================
16FE1 ; Nushu # Lm NUSHU ITERATION MARK
1B170..1B2FB ; Nushu # Lo [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
# Total code points: 397
# ================================================
11A50 ; Soyombo # Lo SOYOMBO LETTER A
11A51..11A56 ; Soyombo # Mn [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
11A57..11A58 ; Soyombo # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
11A59..11A5B ; Soyombo # Mn [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
11A5C..11A83 ; Soyombo # Lo [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
11A86..11A89 ; Soyombo # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
11A8A..11A96 ; Soyombo # Mn [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
11A97 ; Soyombo # Mc SOYOMBO SIGN VISARGA
11A98..11A99 ; Soyombo # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
11A9A..11A9C ; Soyombo # Po [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
11A9E..11AA2 ; Soyombo # Po [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
# Total code points: 80
# ================================================
11A00 ; Zanabazar_Square # Lo ZANABAZAR SQUARE LETTER A
11A01..11A06 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
11A07..11A08 ; Zanabazar_Square # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
11A09..11A0A ; Zanabazar_Square # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
11A0B..11A32 ; Zanabazar_Square # Lo [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
11A33..11A38 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
11A39 ; Zanabazar_Square # Mc ZANABAZAR SQUARE SIGN VISARGA
11A3A ; Zanabazar_Square # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
11A3B..11A3E ; Zanabazar_Square # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
11A3F..11A46 ; Zanabazar_Square # Po [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
11A47 ; Zanabazar_Square # Mn ZANABAZAR SQUARE SUBJOINER
# Total code points: 72
# EOF

File diff suppressed because it is too large Load Diff

View File

@ -192,7 +192,12 @@ const uint32_t PRIV(ucp_gbtable)[] = {
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT), /* 10 LVT */
(1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Other */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 12 Other */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 13 E_Base */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 14 E_Modifier */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 15 E_Base_GAZ */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 16 ZWJ */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Glue_After_Zwj */
};
#ifdef SUPPORT_JIT
@ -227,6 +232,7 @@ version. Like all other character and string literals that are compared against
the regular expression pattern, we must use STR_ macros instead of literal
strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Adlam0 STR_A STR_d STR_l STR_a STR_m "\0"
#define STRING_Ahom0 STR_A STR_h STR_o STR_m "\0"
#define STRING_Anatolian_Hieroglyphs0 STR_A STR_n STR_a STR_t STR_o STR_l STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Any0 STR_A STR_n STR_y "\0"
@ -238,6 +244,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
#define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
#define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
#define STRING_Bhaiksuki0 STR_B STR_h STR_a STR_i STR_k STR_s STR_u STR_k STR_i "\0"
#define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
#define STRING_Brahmi0 STR_B STR_r STR_a STR_h STR_m STR_i "\0"
#define STRING_Braille0 STR_B STR_r STR_a STR_i STR_l STR_l STR_e "\0"
@ -313,6 +320,8 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
#define STRING_Marchen0 STR_M STR_a STR_r STR_c STR_h STR_e STR_n "\0"
#define STRING_Masaram_Gondi0 STR_M STR_a STR_s STR_a STR_r STR_a STR_m STR_UNDERSCORE STR_G STR_o STR_n STR_d STR_i "\0"
#define STRING_Mc0 STR_M STR_c "\0"
#define STRING_Me0 STR_M STR_e "\0"
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
@ -330,9 +339,11 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
#define STRING_Nd0 STR_N STR_d "\0"
#define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
#define STRING_Newa0 STR_N STR_e STR_w STR_a "\0"
#define STRING_Nko0 STR_N STR_k STR_o "\0"
#define STRING_Nl0 STR_N STR_l "\0"
#define STRING_No0 STR_N STR_o "\0"
#define STRING_Nushu0 STR_N STR_u STR_s STR_h STR_u "\0"
#define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
#define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
#define STRING_Old_Hungarian0 STR_O STR_l STR_d STR_UNDERSCORE STR_H STR_u STR_n STR_g STR_a STR_r STR_i STR_a STR_n "\0"
@ -343,6 +354,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
#define STRING_Osage0 STR_O STR_s STR_a STR_g STR_e "\0"
#define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
#define STRING_P0 STR_P "\0"
#define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
@ -373,6 +385,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Sm0 STR_S STR_m "\0"
#define STRING_So0 STR_S STR_o "\0"
#define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0"
#define STRING_Soyombo0 STR_S STR_o STR_y STR_o STR_m STR_b STR_o "\0"
#define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0"
#define STRING_Syloti_Nagri0 STR_S STR_y STR_l STR_o STR_t STR_i STR_UNDERSCORE STR_N STR_a STR_g STR_r STR_i "\0"
#define STRING_Syriac0 STR_S STR_y STR_r STR_i STR_a STR_c "\0"
@ -383,6 +396,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Tai_Viet0 STR_T STR_a STR_i STR_UNDERSCORE STR_V STR_i STR_e STR_t "\0"
#define STRING_Takri0 STR_T STR_a STR_k STR_r STR_i "\0"
#define STRING_Tamil0 STR_T STR_a STR_m STR_i STR_l "\0"
#define STRING_Tangut0 STR_T STR_a STR_n STR_g STR_u STR_t "\0"
#define STRING_Telugu0 STR_T STR_e STR_l STR_u STR_g STR_u "\0"
#define STRING_Thaana0 STR_T STR_h STR_a STR_a STR_n STR_a "\0"
#define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
@ -399,11 +413,13 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Xwd0 STR_X STR_w STR_d "\0"
#define STRING_Yi0 STR_Y STR_i "\0"
#define STRING_Z0 STR_Z "\0"
#define STRING_Zanabazar_Square0 STR_Z STR_a STR_n STR_a STR_b STR_a STR_z STR_a STR_r STR_UNDERSCORE STR_S STR_q STR_u STR_a STR_r STR_e "\0"
#define STRING_Zl0 STR_Z STR_l "\0"
#define STRING_Zp0 STR_Z STR_p "\0"
#define STRING_Zs0 STR_Z STR_s "\0"
const char PRIV(utt_names)[] =
STRING_Adlam0
STRING_Ahom0
STRING_Anatolian_Hieroglyphs0
STRING_Any0
@ -415,6 +431,7 @@ const char PRIV(utt_names)[] =
STRING_Bassa_Vah0
STRING_Batak0
STRING_Bengali0
STRING_Bhaiksuki0
STRING_Bopomofo0
STRING_Brahmi0
STRING_Braille0
@ -490,6 +507,8 @@ const char PRIV(utt_names)[] =
STRING_Malayalam0
STRING_Mandaic0
STRING_Manichaean0
STRING_Marchen0
STRING_Masaram_Gondi0
STRING_Mc0
STRING_Me0
STRING_Meetei_Mayek0
@ -507,9 +526,11 @@ const char PRIV(utt_names)[] =
STRING_Nabataean0
STRING_Nd0
STRING_New_Tai_Lue0
STRING_Newa0
STRING_Nko0
STRING_Nl0
STRING_No0
STRING_Nushu0
STRING_Ogham0
STRING_Ol_Chiki0
STRING_Old_Hungarian0
@ -520,6 +541,7 @@ const char PRIV(utt_names)[] =
STRING_Old_South_Arabian0
STRING_Old_Turkic0
STRING_Oriya0
STRING_Osage0
STRING_Osmanya0
STRING_P0
STRING_Pahawh_Hmong0
@ -550,6 +572,7 @@ const char PRIV(utt_names)[] =
STRING_Sm0
STRING_So0
STRING_Sora_Sompeng0
STRING_Soyombo0
STRING_Sundanese0
STRING_Syloti_Nagri0
STRING_Syriac0
@ -560,6 +583,7 @@ const char PRIV(utt_names)[] =
STRING_Tai_Viet0
STRING_Takri0
STRING_Tamil0
STRING_Tangut0
STRING_Telugu0
STRING_Thaana0
STRING_Thai0
@ -576,186 +600,197 @@ const char PRIV(utt_names)[] =
STRING_Xwd0
STRING_Yi0
STRING_Z0
STRING_Zanabazar_Square0
STRING_Zl0
STRING_Zp0
STRING_Zs0;
const ucp_type_table PRIV(utt)[] = {
{ 0, PT_SC, ucp_Ahom },
{ 5, PT_SC, ucp_Anatolian_Hieroglyphs },
{ 27, PT_ANY, 0 },
{ 31, PT_SC, ucp_Arabic },
{ 38, PT_SC, ucp_Armenian },
{ 47, PT_SC, ucp_Avestan },
{ 55, PT_SC, ucp_Balinese },
{ 64, PT_SC, ucp_Bamum },
{ 70, PT_SC, ucp_Bassa_Vah },
{ 80, PT_SC, ucp_Batak },
{ 86, PT_SC, ucp_Bengali },
{ 94, PT_SC, ucp_Bopomofo },
{ 103, PT_SC, ucp_Brahmi },
{ 110, PT_SC, ucp_Braille },
{ 118, PT_SC, ucp_Buginese },
{ 127, PT_SC, ucp_Buhid },
{ 133, PT_GC, ucp_C },
{ 135, PT_SC, ucp_Canadian_Aboriginal },
{ 155, PT_SC, ucp_Carian },
{ 162, PT_SC, ucp_Caucasian_Albanian },
{ 181, PT_PC, ucp_Cc },
{ 184, PT_PC, ucp_Cf },
{ 187, PT_SC, ucp_Chakma },
{ 194, PT_SC, ucp_Cham },
{ 199, PT_SC, ucp_Cherokee },
{ 208, PT_PC, ucp_Cn },
{ 211, PT_PC, ucp_Co },
{ 214, PT_SC, ucp_Common },
{ 221, PT_SC, ucp_Coptic },
{ 228, PT_PC, ucp_Cs },
{ 231, PT_SC, ucp_Cuneiform },
{ 241, PT_SC, ucp_Cypriot },
{ 249, PT_SC, ucp_Cyrillic },
{ 258, PT_SC, ucp_Deseret },
{ 266, PT_SC, ucp_Devanagari },
{ 277, PT_SC, ucp_Duployan },
{ 286, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 307, PT_SC, ucp_Elbasan },
{ 315, PT_SC, ucp_Ethiopic },
{ 324, PT_SC, ucp_Georgian },
{ 333, PT_SC, ucp_Glagolitic },
{ 344, PT_SC, ucp_Gothic },
{ 351, PT_SC, ucp_Grantha },
{ 359, PT_SC, ucp_Greek },
{ 365, PT_SC, ucp_Gujarati },
{ 374, PT_SC, ucp_Gurmukhi },
{ 383, PT_SC, ucp_Han },
{ 387, PT_SC, ucp_Hangul },
{ 394, PT_SC, ucp_Hanunoo },
{ 402, PT_SC, ucp_Hatran },
{ 409, PT_SC, ucp_Hebrew },
{ 416, PT_SC, ucp_Hiragana },
{ 425, PT_SC, ucp_Imperial_Aramaic },
{ 442, PT_SC, ucp_Inherited },
{ 452, PT_SC, ucp_Inscriptional_Pahlavi },
{ 474, PT_SC, ucp_Inscriptional_Parthian },
{ 497, PT_SC, ucp_Javanese },
{ 506, PT_SC, ucp_Kaithi },
{ 513, PT_SC, ucp_Kannada },
{ 521, PT_SC, ucp_Katakana },
{ 530, PT_SC, ucp_Kayah_Li },
{ 539, PT_SC, ucp_Kharoshthi },
{ 550, PT_SC, ucp_Khmer },
{ 556, PT_SC, ucp_Khojki },
{ 563, PT_SC, ucp_Khudawadi },
{ 573, PT_GC, ucp_L },
{ 575, PT_LAMP, 0 },
{ 578, PT_SC, ucp_Lao },
{ 582, PT_SC, ucp_Latin },
{ 588, PT_SC, ucp_Lepcha },
{ 595, PT_SC, ucp_Limbu },
{ 601, PT_SC, ucp_Linear_A },
{ 610, PT_SC, ucp_Linear_B },
{ 619, PT_SC, ucp_Lisu },
{ 624, PT_PC, ucp_Ll },
{ 627, PT_PC, ucp_Lm },
{ 630, PT_PC, ucp_Lo },
{ 633, PT_PC, ucp_Lt },
{ 636, PT_PC, ucp_Lu },
{ 639, PT_SC, ucp_Lycian },
{ 646, PT_SC, ucp_Lydian },
{ 653, PT_GC, ucp_M },
{ 655, PT_SC, ucp_Mahajani },
{ 664, PT_SC, ucp_Malayalam },
{ 674, PT_SC, ucp_Mandaic },
{ 682, PT_SC, ucp_Manichaean },
{ 693, PT_PC, ucp_Mc },
{ 696, PT_PC, ucp_Me },
{ 699, PT_SC, ucp_Meetei_Mayek },
{ 712, PT_SC, ucp_Mende_Kikakui },
{ 726, PT_SC, ucp_Meroitic_Cursive },
{ 743, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 764, PT_SC, ucp_Miao },
{ 769, PT_PC, ucp_Mn },
{ 772, PT_SC, ucp_Modi },
{ 777, PT_SC, ucp_Mongolian },
{ 787, PT_SC, ucp_Mro },
{ 791, PT_SC, ucp_Multani },
{ 799, PT_SC, ucp_Myanmar },
{ 807, PT_GC, ucp_N },
{ 809, PT_SC, ucp_Nabataean },
{ 819, PT_PC, ucp_Nd },
{ 822, PT_SC, ucp_New_Tai_Lue },
{ 834, PT_SC, ucp_Nko },
{ 838, PT_PC, ucp_Nl },
{ 841, PT_PC, ucp_No },
{ 844, PT_SC, ucp_Ogham },
{ 850, PT_SC, ucp_Ol_Chiki },
{ 859, PT_SC, ucp_Old_Hungarian },
{ 873, PT_SC, ucp_Old_Italic },
{ 884, PT_SC, ucp_Old_North_Arabian },
{ 902, PT_SC, ucp_Old_Permic },
{ 913, PT_SC, ucp_Old_Persian },
{ 925, PT_SC, ucp_Old_South_Arabian },
{ 943, PT_SC, ucp_Old_Turkic },
{ 954, PT_SC, ucp_Oriya },
{ 960, PT_SC, ucp_Osmanya },
{ 968, PT_GC, ucp_P },
{ 970, PT_SC, ucp_Pahawh_Hmong },
{ 983, PT_SC, ucp_Palmyrene },
{ 993, PT_SC, ucp_Pau_Cin_Hau },
{ 1005, PT_PC, ucp_Pc },
{ 1008, PT_PC, ucp_Pd },
{ 1011, PT_PC, ucp_Pe },
{ 1014, PT_PC, ucp_Pf },
{ 1017, PT_SC, ucp_Phags_Pa },
{ 1026, PT_SC, ucp_Phoenician },
{ 1037, PT_PC, ucp_Pi },
{ 1040, PT_PC, ucp_Po },
{ 1043, PT_PC, ucp_Ps },
{ 1046, PT_SC, ucp_Psalter_Pahlavi },
{ 1062, PT_SC, ucp_Rejang },
{ 1069, PT_SC, ucp_Runic },
{ 1075, PT_GC, ucp_S },
{ 1077, PT_SC, ucp_Samaritan },
{ 1087, PT_SC, ucp_Saurashtra },
{ 1098, PT_PC, ucp_Sc },
{ 1101, PT_SC, ucp_Sharada },
{ 1109, PT_SC, ucp_Shavian },
{ 1117, PT_SC, ucp_Siddham },
{ 1125, PT_SC, ucp_SignWriting },
{ 1137, PT_SC, ucp_Sinhala },
{ 1145, PT_PC, ucp_Sk },
{ 1148, PT_PC, ucp_Sm },
{ 1151, PT_PC, ucp_So },
{ 1154, PT_SC, ucp_Sora_Sompeng },
{ 1167, PT_SC, ucp_Sundanese },
{ 1177, PT_SC, ucp_Syloti_Nagri },
{ 1190, PT_SC, ucp_Syriac },
{ 1197, PT_SC, ucp_Tagalog },
{ 1205, PT_SC, ucp_Tagbanwa },
{ 1214, PT_SC, ucp_Tai_Le },
{ 1221, PT_SC, ucp_Tai_Tham },
{ 1230, PT_SC, ucp_Tai_Viet },
{ 1239, PT_SC, ucp_Takri },
{ 1245, PT_SC, ucp_Tamil },
{ 1251, PT_SC, ucp_Telugu },
{ 1258, PT_SC, ucp_Thaana },
{ 1265, PT_SC, ucp_Thai },
{ 1270, PT_SC, ucp_Tibetan },
{ 1278, PT_SC, ucp_Tifinagh },
{ 1287, PT_SC, ucp_Tirhuta },
{ 1295, PT_SC, ucp_Ugaritic },
{ 1304, PT_SC, ucp_Vai },
{ 1308, PT_SC, ucp_Warang_Citi },
{ 1320, PT_ALNUM, 0 },
{ 1324, PT_PXSPACE, 0 },
{ 1328, PT_SPACE, 0 },
{ 1332, PT_UCNC, 0 },
{ 1336, PT_WORD, 0 },
{ 1340, PT_SC, ucp_Yi },
{ 1343, PT_GC, ucp_Z },
{ 1345, PT_PC, ucp_Zl },
{ 1348, PT_PC, ucp_Zp },
{ 1351, PT_PC, ucp_Zs }
{ 0, PT_SC, ucp_Adlam },
{ 6, PT_SC, ucp_Ahom },
{ 11, PT_SC, ucp_Anatolian_Hieroglyphs },
{ 33, PT_ANY, 0 },
{ 37, PT_SC, ucp_Arabic },
{ 44, PT_SC, ucp_Armenian },
{ 53, PT_SC, ucp_Avestan },
{ 61, PT_SC, ucp_Balinese },
{ 70, PT_SC, ucp_Bamum },
{ 76, PT_SC, ucp_Bassa_Vah },
{ 86, PT_SC, ucp_Batak },
{ 92, PT_SC, ucp_Bengali },
{ 100, PT_SC, ucp_Bhaiksuki },
{ 110, PT_SC, ucp_Bopomofo },
{ 119, PT_SC, ucp_Brahmi },
{ 126, PT_SC, ucp_Braille },
{ 134, PT_SC, ucp_Buginese },
{ 143, PT_SC, ucp_Buhid },
{ 149, PT_GC, ucp_C },
{ 151, PT_SC, ucp_Canadian_Aboriginal },
{ 171, PT_SC, ucp_Carian },
{ 178, PT_SC, ucp_Caucasian_Albanian },
{ 197, PT_PC, ucp_Cc },
{ 200, PT_PC, ucp_Cf },
{ 203, PT_SC, ucp_Chakma },
{ 210, PT_SC, ucp_Cham },
{ 215, PT_SC, ucp_Cherokee },
{ 224, PT_PC, ucp_Cn },
{ 227, PT_PC, ucp_Co },
{ 230, PT_SC, ucp_Common },
{ 237, PT_SC, ucp_Coptic },
{ 244, PT_PC, ucp_Cs },
{ 247, PT_SC, ucp_Cuneiform },
{ 257, PT_SC, ucp_Cypriot },
{ 265, PT_SC, ucp_Cyrillic },
{ 274, PT_SC, ucp_Deseret },
{ 282, PT_SC, ucp_Devanagari },
{ 293, PT_SC, ucp_Duployan },
{ 302, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 323, PT_SC, ucp_Elbasan },
{ 331, PT_SC, ucp_Ethiopic },
{ 340, PT_SC, ucp_Georgian },
{ 349, PT_SC, ucp_Glagolitic },
{ 360, PT_SC, ucp_Gothic },
{ 367, PT_SC, ucp_Grantha },
{ 375, PT_SC, ucp_Greek },
{ 381, PT_SC, ucp_Gujarati },
{ 390, PT_SC, ucp_Gurmukhi },
{ 399, PT_SC, ucp_Han },
{ 403, PT_SC, ucp_Hangul },
{ 410, PT_SC, ucp_Hanunoo },
{ 418, PT_SC, ucp_Hatran },
{ 425, PT_SC, ucp_Hebrew },
{ 432, PT_SC, ucp_Hiragana },
{ 441, PT_SC, ucp_Imperial_Aramaic },
{ 458, PT_SC, ucp_Inherited },
{ 468, PT_SC, ucp_Inscriptional_Pahlavi },
{ 490, PT_SC, ucp_Inscriptional_Parthian },
{ 513, PT_SC, ucp_Javanese },
{ 522, PT_SC, ucp_Kaithi },
{ 529, PT_SC, ucp_Kannada },
{ 537, PT_SC, ucp_Katakana },
{ 546, PT_SC, ucp_Kayah_Li },
{ 555, PT_SC, ucp_Kharoshthi },
{ 566, PT_SC, ucp_Khmer },
{ 572, PT_SC, ucp_Khojki },
{ 579, PT_SC, ucp_Khudawadi },
{ 589, PT_GC, ucp_L },
{ 591, PT_LAMP, 0 },
{ 594, PT_SC, ucp_Lao },
{ 598, PT_SC, ucp_Latin },
{ 604, PT_SC, ucp_Lepcha },
{ 611, PT_SC, ucp_Limbu },
{ 617, PT_SC, ucp_Linear_A },
{ 626, PT_SC, ucp_Linear_B },
{ 635, PT_SC, ucp_Lisu },
{ 640, PT_PC, ucp_Ll },
{ 643, PT_PC, ucp_Lm },
{ 646, PT_PC, ucp_Lo },
{ 649, PT_PC, ucp_Lt },
{ 652, PT_PC, ucp_Lu },
{ 655, PT_SC, ucp_Lycian },
{ 662, PT_SC, ucp_Lydian },
{ 669, PT_GC, ucp_M },
{ 671, PT_SC, ucp_Mahajani },
{ 680, PT_SC, ucp_Malayalam },
{ 690, PT_SC, ucp_Mandaic },
{ 698, PT_SC, ucp_Manichaean },
{ 709, PT_SC, ucp_Marchen },
{ 717, PT_SC, ucp_Masaram_Gondi },
{ 731, PT_PC, ucp_Mc },
{ 734, PT_PC, ucp_Me },
{ 737, PT_SC, ucp_Meetei_Mayek },
{ 750, PT_SC, ucp_Mende_Kikakui },
{ 764, PT_SC, ucp_Meroitic_Cursive },
{ 781, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 802, PT_SC, ucp_Miao },
{ 807, PT_PC, ucp_Mn },
{ 810, PT_SC, ucp_Modi },
{ 815, PT_SC, ucp_Mongolian },
{ 825, PT_SC, ucp_Mro },
{ 829, PT_SC, ucp_Multani },
{ 837, PT_SC, ucp_Myanmar },
{ 845, PT_GC, ucp_N },
{ 847, PT_SC, ucp_Nabataean },
{ 857, PT_PC, ucp_Nd },
{ 860, PT_SC, ucp_New_Tai_Lue },
{ 872, PT_SC, ucp_Newa },
{ 877, PT_SC, ucp_Nko },
{ 881, PT_PC, ucp_Nl },
{ 884, PT_PC, ucp_No },
{ 887, PT_SC, ucp_Nushu },
{ 893, PT_SC, ucp_Ogham },
{ 899, PT_SC, ucp_Ol_Chiki },
{ 908, PT_SC, ucp_Old_Hungarian },
{ 922, PT_SC, ucp_Old_Italic },
{ 933, PT_SC, ucp_Old_North_Arabian },
{ 951, PT_SC, ucp_Old_Permic },
{ 962, PT_SC, ucp_Old_Persian },
{ 974, PT_SC, ucp_Old_South_Arabian },
{ 992, PT_SC, ucp_Old_Turkic },
{ 1003, PT_SC, ucp_Oriya },
{ 1009, PT_SC, ucp_Osage },
{ 1015, PT_SC, ucp_Osmanya },
{ 1023, PT_GC, ucp_P },
{ 1025, PT_SC, ucp_Pahawh_Hmong },
{ 1038, PT_SC, ucp_Palmyrene },
{ 1048, PT_SC, ucp_Pau_Cin_Hau },
{ 1060, PT_PC, ucp_Pc },
{ 1063, PT_PC, ucp_Pd },
{ 1066, PT_PC, ucp_Pe },
{ 1069, PT_PC, ucp_Pf },
{ 1072, PT_SC, ucp_Phags_Pa },
{ 1081, PT_SC, ucp_Phoenician },
{ 1092, PT_PC, ucp_Pi },
{ 1095, PT_PC, ucp_Po },
{ 1098, PT_PC, ucp_Ps },
{ 1101, PT_SC, ucp_Psalter_Pahlavi },
{ 1117, PT_SC, ucp_Rejang },
{ 1124, PT_SC, ucp_Runic },
{ 1130, PT_GC, ucp_S },
{ 1132, PT_SC, ucp_Samaritan },
{ 1142, PT_SC, ucp_Saurashtra },
{ 1153, PT_PC, ucp_Sc },
{ 1156, PT_SC, ucp_Sharada },
{ 1164, PT_SC, ucp_Shavian },
{ 1172, PT_SC, ucp_Siddham },
{ 1180, PT_SC, ucp_SignWriting },
{ 1192, PT_SC, ucp_Sinhala },
{ 1200, PT_PC, ucp_Sk },
{ 1203, PT_PC, ucp_Sm },
{ 1206, PT_PC, ucp_So },
{ 1209, PT_SC, ucp_Sora_Sompeng },
{ 1222, PT_SC, ucp_Soyombo },
{ 1230, PT_SC, ucp_Sundanese },
{ 1240, PT_SC, ucp_Syloti_Nagri },
{ 1253, PT_SC, ucp_Syriac },
{ 1260, PT_SC, ucp_Tagalog },
{ 1268, PT_SC, ucp_Tagbanwa },
{ 1277, PT_SC, ucp_Tai_Le },
{ 1284, PT_SC, ucp_Tai_Tham },
{ 1293, PT_SC, ucp_Tai_Viet },
{ 1302, PT_SC, ucp_Takri },
{ 1308, PT_SC, ucp_Tamil },
{ 1314, PT_SC, ucp_Tangut },
{ 1321, PT_SC, ucp_Telugu },
{ 1328, PT_SC, ucp_Thaana },
{ 1335, PT_SC, ucp_Thai },
{ 1340, PT_SC, ucp_Tibetan },
{ 1348, PT_SC, ucp_Tifinagh },
{ 1357, PT_SC, ucp_Tirhuta },
{ 1365, PT_SC, ucp_Ugaritic },
{ 1374, PT_SC, ucp_Vai },
{ 1378, PT_SC, ucp_Warang_Citi },
{ 1390, PT_ALNUM, 0 },
{ 1394, PT_PXSPACE, 0 },
{ 1398, PT_SPACE, 0 },
{ 1402, PT_UCNC, 0 },
{ 1406, PT_WORD, 0 },
{ 1410, PT_SC, ucp_Yi },
{ 1413, PT_GC, ucp_Z },
{ 1415, PT_SC, ucp_Zanabazar_Square },
{ 1432, PT_PC, ucp_Zl },
{ 1435, PT_PC, ucp_Zp },
{ 1438, PT_PC, ucp_Zs }
};
const size_t PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);

File diff suppressed because it is too large Load Diff

View File

@ -100,9 +100,7 @@ enum {
ucp_Zs /* Space separator */
};
/* These are grapheme break properties. Note that the code for processing them
assumes that the values are less than 16. If more values are added that take
the number to 16 or more, the code will have to be rewritten. */
/* These are grapheme break properties. */
enum {
ucp_gbCR, /* 0 */
@ -117,7 +115,12 @@ enum {
ucp_gbLV, /* 9 Hangul syllable type LV */
ucp_gbLVT, /* 10 Hangul syllable type LVT */
ucp_gbRegionalIndicator, /* 11 */
ucp_gbOther /* 12 */
ucp_gbOther, /* 12 */
ucp_gbE_Base, /* 13 */
ucp_gbE_Modifier, /* 14 */
ucp_gbE_Base_GAZ, /* 15 */
ucp_gbZWJ, /* 16 */
ucp_gbGlue_After_Zwj /* 17 */
};
/* These are the script identifications. */
@ -184,13 +187,13 @@ enum {
ucp_Tifinagh,
ucp_Ugaritic,
ucp_Yi,
/* New for Unicode 5.0: */
/* New for Unicode 5.0 */
ucp_Balinese,
ucp_Cuneiform,
ucp_Nko,
ucp_Phags_Pa,
ucp_Phoenician,
/* New for Unicode 5.1: */
/* New for Unicode 5.1 */
ucp_Carian,
ucp_Cham,
ucp_Kayah_Li,
@ -202,7 +205,7 @@ enum {
ucp_Saurashtra,
ucp_Sundanese,
ucp_Vai,
/* New for Unicode 5.2: */
/* New for Unicode 5.2 */
ucp_Avestan,
ucp_Bamum,
ucp_Egyptian_Hieroglyphs,
@ -218,11 +221,11 @@ enum {
ucp_Samaritan,
ucp_Tai_Tham,
ucp_Tai_Viet,
/* New for Unicode 6.0.0: */
/* New for Unicode 6.0.0 */
ucp_Batak,
ucp_Brahmi,
ucp_Mandaic,
/* New for Unicode 6.1.0: */
/* New for Unicode 6.1.0 */
ucp_Chakma,
ucp_Meroitic_Cursive,
ucp_Meroitic_Hieroglyphs,
@ -230,7 +233,7 @@ enum {
ucp_Sharada,
ucp_Sora_Sompeng,
ucp_Takri,
/* New for Unicode 7.0.0: */
/* New for Unicode 7.0.0 */
ucp_Bassa_Vah,
ucp_Caucasian_Albanian,
ucp_Duployan,
@ -254,13 +257,24 @@ enum {
ucp_Siddham,
ucp_Tirhuta,
ucp_Warang_Citi,
/* New for Unicode 8.0.0: */
/* New for Unicode 8.0.0 */
ucp_Ahom,
ucp_Anatolian_Hieroglyphs,
ucp_Hatran,
ucp_Multani,
ucp_Old_Hungarian,
ucp_SignWriting
ucp_SignWriting,
/* New for Unicode 10.0.0 (no update since 8.0.0) */
ucp_Adlam,
ucp_Bhaiksuki,
ucp_Marchen,
ucp_Newa,
ucp_Osage,
ucp_Tangut,
ucp_Masaram_Gondi,
ucp_Nushu,
ucp_Soyombo,
ucp_Zanabazar_Square
};
#endif /* PCRE2_UCP_H_IDEMPOTENT_GUARD */

View File

@ -473,6 +473,12 @@ so many of them that they are split into two fields. */
#define CTL_UTF8_INPUT 0x40000000u
#define CTL_ZERO_TERMINATE 0x80000000u
/* Combinations */
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL)
/* Second control word */
#define CTL2_SUBSTITUTE_EXTENDED 0x00000001u
@ -480,15 +486,10 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
#define CTL_NL_SET 0x40000000u /* Informational */
#define CTL_BSR_SET 0x80000000u /* Informational */
/* Combinations */
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL)
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
/* These are the matching controls that may be set either on a pattern or on a
data line. They are copied from the pattern controls as initial settings for
@ -601,6 +602,7 @@ static modstruct modlist[] = {
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
{ "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) },
@ -723,7 +725,7 @@ static modstruct modlist[] = {
CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL_BSR_SET|CTL_NL_SET)
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET)
/* Controls that apply only at compile time with 'push'. */
@ -3688,8 +3690,8 @@ for (;;)
#else
*((uint16_t *)field) = PCRE2_BSR_UNICODE;
#endif
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_BSR_SET;
else dctl->control2 &= ~CTL_BSR_SET;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_BSR_SET;
else dctl->control2 &= ~CTL2_BSR_SET;
}
else
{
@ -3698,8 +3700,8 @@ for (;;)
else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0)
*((uint16_t *)field) = PCRE2_BSR_UNICODE;
else goto INVALID_VALUE;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_BSR_SET;
else dctl->control2 |= CTL_BSR_SET;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_BSR_SET;
else dctl->control2 |= CTL2_BSR_SET;
}
pp = ep;
break;
@ -3792,14 +3794,14 @@ for (;;)
if (i == 0)
{
*((uint16_t *)field) = NEWLINE_DEFAULT;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_NL_SET;
else dctl->control2 &= ~CTL_NL_SET;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_NL_SET;
else dctl->control2 &= ~CTL2_NL_SET;
}
else
{
*((uint16_t *)field) = i;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_NL_SET;
else dctl->control2 |= CTL_NL_SET;
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_NL_SET;
else dctl->control2 |= CTL2_NL_SET;
}
pp = ep;
break;
@ -3971,7 +3973,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -3979,10 +3981,11 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "",
((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "",
((controls & CTL_BINCODE) != 0)? " bincode" : "",
((controls2 & CTL_BSR_SET) != 0)? " bsr" : "",
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
((controls & CTL_DFA) != 0)? " dfa" : "",
((controls & CTL_EXPAND) != 0)? " expand" : "",
((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
@ -3996,7 +3999,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_JITVERIFY) != 0)? " jitverify" : "",
((controls & CTL_MARK) != 0)? " mark" : "",
((controls & CTL_MEMORY) != 0)? " memory" : "",
((controls2 & CTL_NL_SET) != 0)? " newline" : "",
((controls2 & CTL2_NL_SET) != 0)? " newline" : "",
((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "",
((controls & CTL_POSIX) != 0)? " posix" : "",
((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "",
@ -4435,7 +4438,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
if ((pat_patctl.control2 & CTL_BSR_SET) != 0 ||
if ((pat_patctl.control2 & CTL2_BSR_SET) != 0 ||
(FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0)
fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)?
"any Unicode newline" : "CR, LF, or CRLF");
@ -5268,7 +5271,7 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB;
if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP;
if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE;
if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC;
if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC;
if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE;
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
@ -5276,8 +5279,8 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
{
preg.re_endp = (char *)pbuffer8 + patlen;
cflags |= REG_PEND;
}
cflags |= REG_PEND;
}
rc = regcomp(&preg, (char *)pbuffer8, cflags);
@ -5530,7 +5533,7 @@ if (test_mode == PCRE32_MODE && pbuffer32 != NULL)
appropriate default newline setting, local_newline_default will be non-zero. We
use this if there is no explicit newline modifier. */
if ((pat_patctl.control2 & CTL_NL_SET) == 0 && local_newline_default != 0)
if ((pat_patctl.control2 & CTL2_NL_SET) == 0 && local_newline_default != 0)
{
SETFLD(pat_context, newline_convention, local_newline_default);
}
@ -5540,11 +5543,11 @@ NULL context. */
use_pat_context = ((pat_patctl.control & CTL_NULLCONTEXT) != 0)?
NULL : PTR(pat_context);
/* If PCRE2_LITERAL is set, set use_forbid_utf zero because PCRE2_NEVER_UTF
and PCRE2_NEVER_UCP are invalid with it. */
if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0;
if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0;
/* Compile many times when timing. */
@ -5556,7 +5559,7 @@ if (timeit > 0)
{
clock_t start_time = clock();
PCRE2_COMPILE(compiled_code, pbuffer, patlen,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
use_pat_context);
time_taken += clock() - start_time;
if (TEST(compiled_code, !=, NULL))
@ -5665,7 +5668,7 @@ if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
/* If an explicit newline modifier was given, set the information flag in the
pattern so that it is preserved over push/pop. */
if ((pat_patctl.control2 & CTL_NL_SET) != 0)
if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
{
SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET);
}
@ -5822,11 +5825,11 @@ return capcount;
*************************************************/
/* Called from a PCRE2 library as a result of the (?C) item. We print out where
we are in the match. Yield zero unless more callouts than the fail count, or
the callout data is not zero. The only differences in the callout block for
different code unit widths are that the pointers to the subject, the most
recent MARK, and a callout argument string point to strings of the appropriate
width. Casts can be used to deal with this.
we are in the match (unless suppressed). Yield zero unless more callouts than
the fail count, or the callout data is not zero. The only differences in the
callout block for different code unit widths are that the pointers to the
subject, the most recent MARK, and a callout argument string point to strings
of the appropriate width. Casts can be used to deal with this.
Argument: a pointer to a callout block
Return:
@ -5839,6 +5842,7 @@ uint32_t i, pre_start, post_start, subject_length;
PCRE2_SIZE current_position;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
/* This FILE is used for echoing the subject. This is done only once in simple
cases. */
@ -5887,75 +5891,82 @@ if (callout_capture)
}
}
/* Re-print the subject in canonical form (with escapes for non-printing
characters), the first time, or if giving full details. On subsequent calls in
the same match, we use PCHARS() just to find the printed lengths of the
substrings. */
/* Unless suppressed, re-print the subject in canonical form (with escapes for
non-printing characters), the first time, or if giving full details. On
subsequent calls in the same match, we use PCHARS() just to find the printed
lengths of the substrings. */
if (f != NULL) fprintf(f, "--->");
/* The subject before the match start. */
PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
/* If a lookbehind is involved, the current position may be earlier than the
match start. If so, use the match start instead. */
current_position = (cb->current_position >= cb->start_match)?
cb->current_position : cb->start_match;
/* The subject between the match start and the current position. */
PCHARS(post_start, cb->subject, cb->start_match,
current_position - cb->start_match, utf, f);
/* Print from the current position to the end. */
PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
utf, f);
/* Calculate the total subject printed length (no print). */
PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
if (f != NULL) fprintf(f, "\n");
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
callout whose number has not already been shown with captured strings, show the
number here. A callout with a string argument has been displayed above. */
if (cb->callout_number == 255)
if (callout_where)
{
fprintf(outfile, "%+3d ", (int)cb->pattern_position);
if (cb->pattern_position > 99) fprintf(outfile, "\n ");
}
else
{
if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " ");
else fprintf(outfile, "%3d ", cb->callout_number);
}
/* Now show position indicators */
if (f != NULL) fprintf(f, "--->");
for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
fprintf(outfile, "^");
/* The subject before the match start. */
if (post_start > 0)
{
for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
/* If a lookbehind is involved, the current position may be earlier than the
match start. If so, use the match start instead. */
current_position = (cb->current_position >= cb->start_match)?
cb->current_position : cb->start_match;
/* The subject between the match start and the current position. */
PCHARS(post_start, cb->subject, cb->start_match,
current_position - cb->start_match, utf, f);
/* Print from the current position to the end. */
PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
utf, f);
/* Calculate the total subject printed length (no print). */
PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
if (f != NULL) fprintf(f, "\n");
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
callout whose number has not already been shown with captured strings, show the
number here. A callout with a string argument has been displayed above. */
if (cb->callout_number == 255)
{
fprintf(outfile, "%+3d ", (int)cb->pattern_position);
if (cb->pattern_position > 99) fprintf(outfile, "\n ");
}
else
{
if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " ");
else fprintf(outfile, "%3d ", cb->callout_number);
}
/* Now show position indicators */
for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
fprintf(outfile, "^");
if (post_start > 0)
{
for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
fprintf(outfile, "^");
}
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
fprintf(outfile, " ");
if (cb->next_item_length != 0)
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
pbuffer8 + cb->pattern_position);
fprintf(outfile, "\n");
}
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
fprintf(outfile, " ");
if (cb->next_item_length != 0)
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
pbuffer8 + cb->pattern_position);
fprintf(outfile, "\n");
first_callout = FALSE;
/* Show any mark info */
if (cb->mark != last_callout_mark)
{
if (cb->mark == NULL)
@ -5969,6 +5980,8 @@ if (cb->mark != last_callout_mark)
last_callout_mark = cb->mark;
}
/* Show callout data */
if (callout_data_ptr != NULL)
{
int callout_data = *((int32_t *)callout_data_ptr);
@ -5979,6 +5992,8 @@ if (callout_data_ptr != NULL)
}
}
/* Keep count and give the appropriate return code */
callout_count++;
if (cb->callout_number == dat_datctl.cerror[0] &&

30
testdata/testinput5 vendored
View File

@ -6,14 +6,16 @@
#newline_default lf any anycrlf
# PCRE2 and Perl disagree about the characteristics of certain Unicode
# characters. For example, 061C is considered by Perl to be Arabic, though
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
# graphic and printable according to Perl, though they are actually "isolate"
# control characters. That is why the following tests are here rather than in
# test 4.
# characters. For example, 061C was considered by Perl to be Arabic, though
# it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
# However, it *is* in that file for Unicode 10, but when I came to re-check,
# Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
# 2066-2069 are graphic and printable according to Perl, though they are
# actually "isolate" control characters. That is why the following tests are
# here rather than in test 4.
/^[\p{Arabic}]/utf
\= Expect no match
\x{061c}
/^[[:graph:]]+$/utf,ucp
@ -2022,5 +2024,21 @@
/Aሴ+B/literal,utf,no_utf_check
Aሴ+B
# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
# doesn't recognize all these scripts. In time these three tests can be moved
# to test 4.
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
/^\x{1E900}\x{104B0}/i,utf
\x{1E900}\x{104B0}
\x{1E922}\x{104D8}
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
# End of testinput5

95
testdata/testoutput5 vendored
View File

@ -6,16 +6,18 @@
#newline_default lf any anycrlf
# PCRE2 and Perl disagree about the characteristics of certain Unicode
# characters. For example, 061C is considered by Perl to be Arabic, though
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
# graphic and printable according to Perl, though they are actually "isolate"
# control characters. That is why the following tests are here rather than in
# test 4.
# characters. For example, 061C was considered by Perl to be Arabic, though
# it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
# However, it *is* in that file for Unicode 10, but when I came to re-check,
# Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
# 2066-2069 are graphic and printable according to Perl, though they are
# actually "isolate" control characters. That is why the following tests are
# here rather than in test 4.
/^[\p{Arabic}]/utf
\= Expect no match
\x{061c}
No match
0: \x{61c}
/^[[:graph:]]+$/utf,ucp
\= Expect no match
@ -4585,5 +4587,84 @@ No match
/Aሴ+B/literal,utf,no_utf_check
Aሴ+B
0: A\x{1234}+B
# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
# doesn't recognize all these scripts. In time these three tests can be moved
# to test 4.
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
1: \x{1e900}\x{1e924}\x{1e953}
2: \x{11c00}\x{11c2d}\x{11c3e}
3: \x{11c70}\x{11c77}\x{11cab}
4: \x{11400}\x{1142f}\x{11455}
5: \x{104b0}\x{104d8}\x{104fb}
6: \x{16fe0}\x{18800}\x{18af2}
7: \x{11d00}\x{11d3a}\x{11d59}
8: \x{16fe1}\x{1b170}\x{1b2fb}
9: \x{11a50}\x{11a58}\x{11aa2}
10: \x{11a00}\x{11a07}\x{11a47}
/^\x{1E900}\x{104B0}/i,utf
\x{1E900}\x{104B0}
0: \x{1e900}\x{104b0}
\x{1E922}\x{104D8}
0: \x{1e922}\x{104d8}
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
Callout 0: last capture = 1
1: \x{1e900}
Callout 0: last capture = 1
1: \x{1e924}
Callout 0: last capture = 1
1: \x{1e953}
Callout 0: last capture = 1
1: \x{11c00}
Callout 0: last capture = 1
1: \x{11c2d}\x{11c3e}
Callout 0: last capture = 1
1: \x{11c70}
Callout 0: last capture = 1
1: \x{11c77}\x{11cab}
Callout 0: last capture = 1
1: \x{11400}
Callout 0: last capture = 1
1: \x{1142f}
Callout 0: last capture = 1
1: \x{11455}
Callout 0: last capture = 1
1: \x{104b0}
Callout 0: last capture = 1
1: \x{104d8}
Callout 0: last capture = 1
1: \x{104fb}
Callout 0: last capture = 1
1: \x{16fe0}
Callout 0: last capture = 1
1: \x{18800}
Callout 0: last capture = 1
1: \x{18af2}
Callout 0: last capture = 1
1: \x{11d00}\x{11d3a}
Callout 0: last capture = 1
1: \x{11d59}
Callout 0: last capture = 1
1: \x{16fe1}
Callout 0: last capture = 1
1: \x{1b170}
Callout 0: last capture = 1
1: \x{1b2fb}
Callout 0: last capture = 1
1: \x{11a50}\x{11a58}
Callout 0: last capture = 1
1: \x{11aa2}
Callout 0: last capture = 1
1: \x{11a00}\x{11a07}\x{11a47}
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
1: \x{11a00}\x{11a07}\x{11a47}
# End of testinput5