Update to Unicode 10.0.0 and add callout_no_where to pcre2test to aid testing.
This commit is contained in:
parent
b7d5cee61f
commit
41bb787fb3
|
@ -209,6 +209,9 @@ much faster.
|
|||
because this can give a fast "no match" without searching for a "required code
|
||||
unit". Previously only non-anchored patterns did this.
|
||||
|
||||
47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
|
||||
|
||||
48. Add the callout_no_where modifier to pcre2test.
|
||||
|
||||
|
||||
Version 10.23 14-February-2017
|
||||
|
|
|
@ -171,7 +171,10 @@ library. They are also documented in the pcre2build man page.
|
|||
give large performance improvements on certain platforms, add --enable-jit to
|
||||
the "configure" command. This support is available only for certain hardware
|
||||
architectures. If you try to enable it on an unsupported architecture, there
|
||||
will be a compile time error.
|
||||
will be a compile time error. If you are running under SELinux you may also
|
||||
want to add --enable-jit-sealloc, which enables the use of an execmem
|
||||
allocator in JIT that is compatible with SELinux. This has no effect if JIT
|
||||
is not enabled.
|
||||
|
||||
. If you do not want to make use of the default support for UTF-8 Unicode
|
||||
character strings in the 8-bit library, UTF-16 Unicode character strings in
|
||||
|
@ -874,4 +877,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 11 April 2017
|
||||
Last updated: 17 June 2017
|
||||
|
|
|
@ -170,8 +170,13 @@ Just-in-time (JIT) compiler support is included in the build by specifying
|
|||
--enable-jit
|
||||
</pre>
|
||||
This support is available only for certain hardware architectures. If this
|
||||
option is set for an unsupported architecture, a building error occurs.
|
||||
See the
|
||||
option is set for an unsupported architecture, a building error occurs. If you
|
||||
are running under SELinux you may also want to add
|
||||
<pre>
|
||||
--enable-jit-sealloc
|
||||
</pre>
|
||||
which enables the use of an execmem allocator in JIT that is compatible with
|
||||
SELinux. This has no effect if JIT is not enabled. See the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for a discussion of JIT usage. When JIT support is enabled,
|
||||
pcre2grep automatically makes use of it, unless you add
|
||||
|
@ -516,7 +521,7 @@ contains a single function called LLVMFuzzerTestOneInput() whose arguments are
|
|||
a pointer to a string and the length of the string. When called, this function
|
||||
tries to compile the string as a pattern, and if that succeeds, to match it.
|
||||
This is done both with no options and with some random options bits that are
|
||||
generated from the string.
|
||||
generated from the string.
|
||||
</P>
|
||||
<P>
|
||||
Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
|
||||
|
@ -529,13 +534,13 @@ file are the test string.
|
|||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br>
|
||||
<P>
|
||||
In versions of PCRE2 prior to 10.30, there were two ways of handling
|
||||
backtracking in the <b>pcre2_match()</b> function. The default was to use the
|
||||
In versions of PCRE2 prior to 10.30, there were two ways of handling
|
||||
backtracking in the <b>pcre2_match()</b> function. The default was to use the
|
||||
system stack, but if
|
||||
<pre>
|
||||
--disable-stack-for-recursion
|
||||
</pre>
|
||||
was set, memory on the heap was used. From release 10.30 onwards this has
|
||||
was set, memory on the heap was used. From release 10.30 onwards this has
|
||||
changed (the stack is no longer used) and this option now does nothing except
|
||||
give a warning.
|
||||
</P>
|
||||
|
@ -554,7 +559,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 May 2017
|
||||
Last updated: 17 June 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -755,6 +755,7 @@ Those that are not part of an identified script are lumped together as
|
|||
"Common". The current list of scripts is:
|
||||
</P>
|
||||
<P>
|
||||
Adlam,
|
||||
Ahom,
|
||||
Anatolian_Hieroglyphs,
|
||||
Arabic,
|
||||
|
@ -765,6 +766,7 @@ Bamum,
|
|||
Bassa_Vah,
|
||||
Batak,
|
||||
Bengali,
|
||||
Bhaiksuki,
|
||||
Bopomofo,
|
||||
Brahmi,
|
||||
Braille,
|
||||
|
@ -826,6 +828,8 @@ Mahajani,
|
|||
Malayalam,
|
||||
Mandaic,
|
||||
Manichaean,
|
||||
Marchen,
|
||||
Masaram_Gondi,
|
||||
Meetei_Mayek,
|
||||
Mende_Kikakui,
|
||||
Meroitic_Cursive,
|
||||
|
@ -838,7 +842,9 @@ Multani,
|
|||
Myanmar,
|
||||
Nabataean,
|
||||
New_Tai_Lue,
|
||||
Newa,
|
||||
Nko,
|
||||
Nushu,
|
||||
Ogham,
|
||||
Ol_Chiki,
|
||||
Old_Hungarian,
|
||||
|
@ -849,6 +855,7 @@ Old_Persian,
|
|||
Old_South_Arabian,
|
||||
Old_Turkic,
|
||||
Oriya,
|
||||
Osage,
|
||||
Osmanya,
|
||||
Pahawh_Hmong,
|
||||
Palmyrene,
|
||||
|
@ -866,6 +873,7 @@ Siddham,
|
|||
SignWriting,
|
||||
Sinhala,
|
||||
Sora_Sompeng,
|
||||
Soyombo,
|
||||
Sundanese,
|
||||
Syloti_Nagri,
|
||||
Syriac,
|
||||
|
@ -876,6 +884,7 @@ Tai_Tham,
|
|||
Tai_Viet,
|
||||
Takri,
|
||||
Tamil,
|
||||
Tangut,
|
||||
Telugu,
|
||||
Thaana,
|
||||
Thai,
|
||||
|
@ -885,7 +894,8 @@ Tirhuta,
|
|||
Ugaritic,
|
||||
Vai,
|
||||
Warang_Citi,
|
||||
Yi.
|
||||
Yi,
|
||||
Zanabazar_Square.
|
||||
</P>
|
||||
<P>
|
||||
Each character has exactly one Unicode general category property, specified by
|
||||
|
@ -3445,7 +3455,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 May 2017
|
||||
Last updated: 02 July 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -568,7 +568,7 @@ Setting compilation options
|
|||
</b><br>
|
||||
<P>
|
||||
The following modifiers set options for <b>pcre2_compile()</b>. Most of them set
|
||||
bits in the options argument of that function, but those whose names start with
|
||||
bits in the options argument of that function, but those whose names start with
|
||||
PCRE2_EXTRA are additional options that are set in the compile context. For the
|
||||
main options, there are some single-letter abbreviations that are the same as
|
||||
Perl options. There is special handling for /x: if a second x is present,
|
||||
|
@ -579,25 +579,25 @@ way <b>pcre2_compile()</b> behaves. See
|
|||
for a description of the effects of these options.
|
||||
<pre>
|
||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
alt_bsux set PCRE2_ALT_BSUX
|
||||
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||
alt_verbnames set PCRE2_ALT_VERBNAMES
|
||||
anchored set PCRE2_ANCHORED
|
||||
auto_callout set PCRE2_AUTO_CALLOUT
|
||||
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||
/i caseless set PCRE2_CASELESS
|
||||
dollar_endonly set PCRE2_DOLLAR_ENDONLY
|
||||
/s dotall set PCRE2_DOTALL
|
||||
dupnames set PCRE2_DUPNAMES
|
||||
endanchored set PCRE2_ENDANCHORED
|
||||
/x extended set PCRE2_EXTENDED
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
firstline set PCRE2_FIRSTLINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
||||
match_word set PCRE2_EXTRA_MATCH_WORD
|
||||
match_word set PCRE2_EXTRA_MATCH_WORD
|
||||
/m multiline set PCRE2_MULTILINE
|
||||
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
||||
never_ucp set PCRE2_NEVER_UCP
|
||||
|
@ -631,7 +631,7 @@ heavily used in the test files.
|
|||
/B bincode show binary code without lengths
|
||||
callout_info show callout information
|
||||
debug same as info,fullbincode
|
||||
framesize show matching frame size
|
||||
framesize show matching frame size
|
||||
fullbincode show binary code with lengths
|
||||
/I info show info about compiled pattern
|
||||
hex unquoted characters are hexadecimal
|
||||
|
@ -649,7 +649,7 @@ heavily used in the test files.
|
|||
push push compiled pattern onto the stack
|
||||
pushcopy push a copy onto the stack
|
||||
stackguard=<number> test the stackguard feature
|
||||
subject_literal treat all subject lines as literal
|
||||
subject_literal treat all subject lines as literal
|
||||
tables=[0|1|2] select internal tables
|
||||
use_length do not zero-terminate the pattern
|
||||
utf8_input treat input as UTF-8
|
||||
|
@ -720,7 +720,7 @@ not necessarily the last character. These lines are omitted if no starting or
|
|||
ending code units are recorded.
|
||||
</P>
|
||||
<P>
|
||||
The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
|
||||
The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
|
||||
used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
|
||||
number of capturing parentheses in the pattern.
|
||||
</P>
|
||||
|
@ -972,8 +972,8 @@ below. All other modifiers are either ignored, with a warning message, or cause
|
|||
an error.
|
||||
</P>
|
||||
<P>
|
||||
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
|
||||
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
|
||||
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
|
||||
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
|
||||
REG_PEND extension is used to pass it by length.
|
||||
</P>
|
||||
<br><b>
|
||||
|
@ -1013,7 +1013,7 @@ are mutually exclusive.
|
|||
Setting certain match controls
|
||||
</b><br>
|
||||
<P>
|
||||
The following modifiers are really subject modifiers, and are described under
|
||||
The following modifiers are really subject modifiers, and are described under
|
||||
"Subject Modifiers" below. However, they may be included in a pattern's
|
||||
modifier list, in which case they are applied to every subject line that is
|
||||
processed with that pattern. They may not appear in <b>#pattern</b> commands.
|
||||
|
@ -1040,9 +1040,9 @@ defaults, set them in a <b>#subject</b> command.
|
|||
Specifying literal subject lines
|
||||
</b><br>
|
||||
<P>
|
||||
If the <b>subject_literal</b> modifier is present on a pattern, all the subject
|
||||
lines that it matches are taken as literal strings, with no interpretation of
|
||||
backslashes. It is not possible to set subject modifiers on such lines, but any
|
||||
If the <b>subject_literal</b> modifier is present on a pattern, all the subject
|
||||
lines that it matches are taken as literal strings, with no interpretation of
|
||||
backslashes. It is not possible to set subject modifiers on such lines, but any
|
||||
that are set as defaults by a <b>#subject</b> command are recognized.
|
||||
</P>
|
||||
<br><b>
|
||||
|
@ -1054,7 +1054,8 @@ pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
|
|||
line to contain a new pattern (or a command) instead of a subject line. This
|
||||
facility is used when saving compiled patterns to a file, as described in the
|
||||
section entitled "Saving and restoring compiled patterns"
|
||||
<a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a>
|
||||
<a href="#saverestore">below.</a>
|
||||
If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled
|
||||
pattern is stacked, leaving the original as current, ready to match the
|
||||
following input lines. This provides a way of testing the
|
||||
<b>pcre2_code_copy()</b> function.
|
||||
|
@ -1103,18 +1104,18 @@ causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
|
|||
<b>regexec()</b>. The other modifiers are ignored, with a warning message.
|
||||
</P>
|
||||
<P>
|
||||
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||
ignored (with a warning) if used for non-POSIX matching.
|
||||
<pre>
|
||||
posix_startend=<n>[:<m>]
|
||||
posix_startend=<n>[:<m>]
|
||||
</pre>
|
||||
This causes the subject string to be passed to <b>regexec()</b> using the
|
||||
REG_STARTEND option, which uses offsets to specify which part of the string is
|
||||
searched. If only one number is given, the end offset is passed as the end of
|
||||
the subject string. For more detail of REG_STARTEND, see the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
documentation. If the subject string contains binary zeros (coded as escapes
|
||||
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
|
||||
documentation. If the subject string contains binary zeros (coded as escapes
|
||||
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
|
||||
its input), you must use <b>posix_startend</b> to specify its length.
|
||||
</P>
|
||||
<br><b>
|
||||
|
@ -1135,6 +1136,7 @@ pattern.
|
|||
callout_data=<n> set a value to pass via callouts
|
||||
callout_error=<n>[:<m>] control callout error
|
||||
callout_fail=<n>[:<m>] control callout failure
|
||||
callout_no_where do not show position of a callout
|
||||
callout_none do not supply a callout function
|
||||
copy=<number or name> copy captured substring
|
||||
depth_limit=<n> set a depth limit
|
||||
|
@ -1230,29 +1232,10 @@ Testing callouts
|
|||
</b><br>
|
||||
<P>
|
||||
A callout function is supplied when <b>pcre2test</b> calls the library matching
|
||||
functions, unless <b>callout_none</b> is specified. If <b>callout_capture</b> is
|
||||
set, the current captured groups are output when a callout occurs. The default
|
||||
return from the callout function is zero, which allows matching to continue.
|
||||
</P>
|
||||
<P>
|
||||
The <b>callout_fail</b> modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 (causing matching to backtrack)
|
||||
when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1
|
||||
is returned when callout <n> is reached and there have been at least <m>
|
||||
callouts. The <b>callout_error</b> modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
<b>callout_error</b> takes precedence.
|
||||
</P>
|
||||
<P>
|
||||
Note that callouts with string arguments are always given the number zero. See
|
||||
"Callouts" below for a description of the output when a callout it taken.
|
||||
</P>
|
||||
<P>
|
||||
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from <b>pcre2test</b>'s callout function.
|
||||
functions, unless <b>callout_none</b> is specified. Its behaviour can be
|
||||
controlled by various modifiers listed above whose names begin with
|
||||
<b>callout_</b>. Details are given in the section entitled "Callouts"
|
||||
<a href="#callouts">below.</a>
|
||||
</P>
|
||||
<br><b>
|
||||
Finding all matches in a string
|
||||
|
@ -1384,7 +1367,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
|
|||
optimization is not being used. The value is a number of kilobytes. Setting
|
||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
||||
default is necessary only for very complicated patterns. If <b>jitstack</b> is
|
||||
set non-zero on a subject line it overrides any value that was set on the
|
||||
set non-zero on a subject line it overrides any value that was set on the
|
||||
pattern.
|
||||
</P>
|
||||
<br><b>
|
||||
|
@ -1414,7 +1397,7 @@ The <i>match_limit</i> number is a measure of the amount of backtracking
|
|||
that takes place, and learning the minimum value can be instructive. For most
|
||||
simple matches, the number is quite small, but for patterns with very large
|
||||
numbers of matching possibilities, it can become large very quickly with
|
||||
increasing length of subject string.
|
||||
increasing length of subject string.
|
||||
</P>
|
||||
<P>
|
||||
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
|
||||
|
@ -1660,7 +1643,7 @@ restart the match with additional subject data by means of the
|
|||
For further information about partial matching, see the
|
||||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<a name="callouts"></a></P>
|
||||
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
|
||||
|
@ -1669,8 +1652,33 @@ This works with both matching functions.
|
|||
</P>
|
||||
<P>
|
||||
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
|
||||
default, but you can use a <b>callout_fail</b> modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
default, but you can use a <b>callout_fail</b> modifier in a subject line to
|
||||
change this and other parameters of the callout.
|
||||
</P>
|
||||
<P>
|
||||
If <b>callout_capture</b> is set, the current captured groups are output when a
|
||||
callout occurs. By default, the callout function then generates output that
|
||||
indicates where the current match start and matching points are in the subject,
|
||||
and what the next pattern item is. This output is suppressed if the
|
||||
<b>callout_no_where</b> modifier is set.
|
||||
</P>
|
||||
<P>
|
||||
The default return from the callout function is zero, which allows matching to
|
||||
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
|
||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||
are given, 1 is returned when callout <n> is reached and there have been at
|
||||
least <m> callouts. The <b>callout_error</b> modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
<b>callout_error</b> takes precedence. Note that callouts with string arguments
|
||||
are always given the number zero. See
|
||||
</P>
|
||||
<P>
|
||||
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from <b>pcre2test</b>'s callout function.
|
||||
</P>
|
||||
<P>
|
||||
Inserting callouts can be helpful when using <b>pcre2test</b> to check
|
||||
|
@ -1858,7 +1866,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 16 June 2017
|
||||
Last updated: 02 July 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
275
doc/pcre2.txt
275
doc/pcre2.txt
|
@ -3543,9 +3543,14 @@ JUST-IN-TIME COMPILER SUPPORT
|
|||
|
||||
This support is available only for certain hardware architectures. If
|
||||
this option is set for an unsupported architecture, a building error
|
||||
occurs. See the pcre2jit documentation for a discussion of JIT usage.
|
||||
When JIT support is enabled, pcre2grep automatically makes use of it,
|
||||
unless you add
|
||||
occurs. If you are running under SELinux you may also want to add
|
||||
|
||||
--enable-jit-sealloc
|
||||
|
||||
which enables the use of an execmem allocator in JIT that is compatible
|
||||
with SELinux. This has no effect if JIT is not enabled. See the
|
||||
pcre2jit documentation for a discussion of JIT usage. When JIT support
|
||||
is enabled, pcre2grep automatically makes use of it, unless you add
|
||||
|
||||
--disable-pcre2grep-jit
|
||||
|
||||
|
@ -3554,14 +3559,14 @@ JUST-IN-TIME COMPILER SUPPORT
|
|||
|
||||
NEWLINE RECOGNITION
|
||||
|
||||
By default, PCRE2 interprets the linefeed (LF) character as indicating
|
||||
the end of a line. This is the normal newline character on Unix-like
|
||||
systems. You can compile PCRE2 to use carriage return (CR) instead, by
|
||||
By default, PCRE2 interprets the linefeed (LF) character as indicating
|
||||
the end of a line. This is the normal newline character on Unix-like
|
||||
systems. You can compile PCRE2 to use carriage return (CR) instead, by
|
||||
adding
|
||||
|
||||
--enable-newline-is-cr
|
||||
|
||||
to the configure command. There is also an --enable-newline-is-lf
|
||||
to the configure command. There is also an --enable-newline-is-lf
|
||||
option, which explicitly specifies linefeed as the newline character.
|
||||
|
||||
Alternatively, you can specify that line endings are to be indicated by
|
||||
|
@ -3574,104 +3579,104 @@ NEWLINE RECOGNITION
|
|||
|
||||
--enable-newline-is-anycrlf
|
||||
|
||||
which causes PCRE2 to recognize any of the three sequences CR, LF, or
|
||||
which causes PCRE2 to recognize any of the three sequences CR, LF, or
|
||||
CRLF as indicating a line ending. Finally, a fifth option, specified by
|
||||
|
||||
--enable-newline-is-any
|
||||
|
||||
causes PCRE2 to recognize any Unicode newline sequence. The Unicode
|
||||
causes PCRE2 to recognize any Unicode newline sequence. The Unicode
|
||||
newline sequences are the three just mentioned, plus the single charac-
|
||||
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
|
||||
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
|
||||
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
|
||||
U+2029).
|
||||
|
||||
Whatever default line ending convention is selected when PCRE2 is built
|
||||
can be overridden by applications that use the library. At build time
|
||||
can be overridden by applications that use the library. At build time
|
||||
it is conventional to use the standard for your operating system.
|
||||
|
||||
|
||||
WHAT \R MATCHES
|
||||
|
||||
By default, the sequence \R in a pattern matches any Unicode newline
|
||||
sequence, independently of what has been selected as the line ending
|
||||
By default, the sequence \R in a pattern matches any Unicode newline
|
||||
sequence, independently of what has been selected as the line ending
|
||||
sequence. If you specify
|
||||
|
||||
--enable-bsr-anycrlf
|
||||
|
||||
the default is changed so that \R matches only CR, LF, or CRLF. What-
|
||||
ever is selected when PCRE2 is built can be overridden by applications
|
||||
the default is changed so that \R matches only CR, LF, or CRLF. What-
|
||||
ever is selected when PCRE2 is built can be overridden by applications
|
||||
that use the library.
|
||||
|
||||
|
||||
HANDLING VERY LARGE PATTERNS
|
||||
|
||||
Within a compiled pattern, offset values are used to point from one
|
||||
part to another (for example, from an opening parenthesis to an alter-
|
||||
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
|
||||
two-byte values are used for these offsets, leading to a maximum size
|
||||
for a compiled pattern of around 64K code units. This is sufficient to
|
||||
Within a compiled pattern, offset values are used to point from one
|
||||
part to another (for example, from an opening parenthesis to an alter-
|
||||
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
|
||||
two-byte values are used for these offsets, leading to a maximum size
|
||||
for a compiled pattern of around 64K code units. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do
|
||||
want to process truly enormous patterns, so it is possible to compile
|
||||
PCRE2 to use three-byte or four-byte offsets by adding a setting such
|
||||
want to process truly enormous patterns, so it is possible to compile
|
||||
PCRE2 to use three-byte or four-byte offsets by adding a setting such
|
||||
as
|
||||
|
||||
--with-link-size=3
|
||||
|
||||
to the configure command. The value given must be 2, 3, or 4. For the
|
||||
16-bit library, a value of 3 is rounded up to 4. In these libraries,
|
||||
using longer offsets slows down the operation of PCRE2 because it has
|
||||
to load additional data when handling them. For the 32-bit library the
|
||||
value is always 4 and cannot be overridden; the value of --with-link-
|
||||
to the configure command. The value given must be 2, 3, or 4. For the
|
||||
16-bit library, a value of 3 is rounded up to 4. In these libraries,
|
||||
using longer offsets slows down the operation of PCRE2 because it has
|
||||
to load additional data when handling them. For the 32-bit library the
|
||||
value is always 4 and cannot be overridden; the value of --with-link-
|
||||
size is ignored.
|
||||
|
||||
|
||||
LIMITING PCRE2 RESOURCE USAGE
|
||||
|
||||
The pcre2_match() function increments a counter each time it goes round
|
||||
its main loop. Putting a limit on this counter controls the amount of
|
||||
computing resource used by a single call to pcre2_match(). The limit
|
||||
its main loop. Putting a limit on this counter controls the amount of
|
||||
computing resource used by a single call to pcre2_match(). The limit
|
||||
can be changed at run time, as described in the pcre2api documentation.
|
||||
The default is 10 million, but this can be changed by adding a setting
|
||||
The default is 10 million, but this can be changed by adding a setting
|
||||
such as
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
to the configure command. This setting also applies to the
|
||||
pcre2_dfa_match() matching function, and to JIT matching (though the
|
||||
to the configure command. This setting also applies to the
|
||||
pcre2_dfa_match() matching function, and to JIT matching (though the
|
||||
counting is done differently).
|
||||
|
||||
The pcre2_match() function starts out using a 20K vector on the system
|
||||
stack to record backtracking points. The more nested backtracking
|
||||
The pcre2_match() function starts out using a 20K vector on the system
|
||||
stack to record backtracking points. The more nested backtracking
|
||||
points there are (that is, the deeper the search tree), the more memory
|
||||
is needed. If the initial vector is not large enough, heap memory is
|
||||
is needed. If the initial vector is not large enough, heap memory is
|
||||
used, up to a certain limit, which is specified in kilobytes. The limit
|
||||
can be changed at run time, as described in the pcre2api documentation.
|
||||
The default limit (in effect unlimited) is 20 million. You can change
|
||||
The default limit (in effect unlimited) is 20 million. You can change
|
||||
this by a setting such as
|
||||
|
||||
--with-heap-limit=500
|
||||
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
||||
only to interpretive matching in pcre2_match(). It does not apply when
|
||||
JIT (which has its own memory arrangements) is used, nor does it apply
|
||||
which limits the amount of heap to 500 kilobytes. This limit applies
|
||||
only to interpretive matching in pcre2_match(). It does not apply when
|
||||
JIT (which has its own memory arrangements) is used, nor does it apply
|
||||
to pcre2_dfa_match().
|
||||
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
You can also explicitly limit the depth of nested backtracking in the
|
||||
pcre2_match() interpreter. This limit defaults to the value that is set
|
||||
for --with-match-limit. You can set a lower default limit by adding,
|
||||
for --with-match-limit. You can set a lower default limit by adding,
|
||||
for example,
|
||||
|
||||
--with-match-limit_depth=10000
|
||||
|
||||
to the configure command. This value can be overridden at run time.
|
||||
This depth limit indirectly limits the amount of heap memory that is
|
||||
used, but because the size of each backtracking "frame" depends on the
|
||||
number of capturing parentheses in a pattern, the amount of heap that
|
||||
is used before the limit is reached varies from pattern to pattern.
|
||||
This limit was more useful in versions before 10.30, where function
|
||||
recursion was used for backtracking. However, as well as applying to
|
||||
to the configure command. This value can be overridden at run time.
|
||||
This depth limit indirectly limits the amount of heap memory that is
|
||||
used, but because the size of each backtracking "frame" depends on the
|
||||
number of capturing parentheses in a pattern, the amount of heap that
|
||||
is used before the limit is reached varies from pattern to pattern.
|
||||
This limit was more useful in versions before 10.30, where function
|
||||
recursion was used for backtracking. However, as well as applying to
|
||||
pcre2_match(), this limit also controls the depth of recursive function
|
||||
calls in pcre2_dfa_match(). These are used for lookaround assertions,
|
||||
calls in pcre2_dfa_match(). These are used for lookaround assertions,
|
||||
atomic groups, and recursion within patterns. The limit does not apply
|
||||
to JIT matching.
|
||||
|
||||
|
@ -3680,45 +3685,45 @@ CREATING CHARACTER TABLES AT BUILD TIME
|
|||
|
||||
PCRE2 uses fixed tables for processing characters whose code points are
|
||||
less than 256. By default, PCRE2 is built with a set of tables that are
|
||||
distributed in the file src/pcre2_chartables.c.dist. These tables are
|
||||
distributed in the file src/pcre2_chartables.c.dist. These tables are
|
||||
for ASCII codes only. If you add
|
||||
|
||||
--enable-rebuild-chartables
|
||||
|
||||
to the configure command, the distributed tables are no longer used.
|
||||
Instead, a program called dftables is compiled and run. This outputs
|
||||
to the configure command, the distributed tables are no longer used.
|
||||
Instead, a program called dftables is compiled and run. This outputs
|
||||
the source for new set of tables, created in the default locale of your
|
||||
C run-time system. This method of replacing the tables does not work if
|
||||
you are cross compiling, because dftables is run on the local host. If
|
||||
you need to create alternative tables when cross compiling, you will
|
||||
you are cross compiling, because dftables is run on the local host. If
|
||||
you need to create alternative tables when cross compiling, you will
|
||||
have to do so "by hand".
|
||||
|
||||
|
||||
USING EBCDIC CODE
|
||||
|
||||
PCRE2 assumes by default that it will run in an environment where the
|
||||
character code is ASCII or Unicode, which is a superset of ASCII. This
|
||||
PCRE2 assumes by default that it will run in an environment where the
|
||||
character code is ASCII or Unicode, which is a superset of ASCII. This
|
||||
is the case for most computer operating systems. PCRE2 can, however, be
|
||||
compiled to run in an 8-bit EBCDIC environment by adding
|
||||
|
||||
--enable-ebcdic --disable-unicode
|
||||
|
||||
to the configure command. This setting implies --enable-rebuild-charta-
|
||||
bles. You should only use it if you know that you are in an EBCDIC
|
||||
bles. You should only use it if you know that you are in an EBCDIC
|
||||
environment (for example, an IBM mainframe operating system).
|
||||
|
||||
It is not possible to support both EBCDIC and UTF-8 codes in the same
|
||||
version of the library. Consequently, --enable-unicode and --enable-
|
||||
It is not possible to support both EBCDIC and UTF-8 codes in the same
|
||||
version of the library. Consequently, --enable-unicode and --enable-
|
||||
ebcdic are mutually exclusive.
|
||||
|
||||
The EBCDIC character that corresponds to an ASCII LF is assumed to have
|
||||
the value 0x15 by default. However, in some EBCDIC environments, 0x25
|
||||
the value 0x15 by default. However, in some EBCDIC environments, 0x25
|
||||
is used. In such an environment you should use
|
||||
|
||||
--enable-ebcdic-nl25
|
||||
|
||||
as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
|
||||
has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
|
||||
has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
|
||||
0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
|
||||
acter (which, in Unicode, is 0x85).
|
||||
|
||||
|
@ -3731,34 +3736,34 @@ PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS
|
|||
|
||||
By default, on non-Windows systems, pcre2grep supports the use of call-
|
||||
outs with string arguments within the patterns it is matching, in order
|
||||
to run external scripts. For details, see the pcre2grep documentation.
|
||||
This support can be disabled by adding --disable-pcre2grep-callout to
|
||||
to run external scripts. For details, see the pcre2grep documentation.
|
||||
This support can be disabled by adding --disable-pcre2grep-callout to
|
||||
the configure command.
|
||||
|
||||
|
||||
PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT
|
||||
|
||||
By default, pcre2grep reads all files as plain text. You can build it
|
||||
so that it recognizes files whose names end in .gz or .bz2, and reads
|
||||
By default, pcre2grep reads all files as plain text. You can build it
|
||||
so that it recognizes files whose names end in .gz or .bz2, and reads
|
||||
them with libz or libbz2, respectively, by adding one or both of
|
||||
|
||||
--enable-pcre2grep-libz
|
||||
--enable-pcre2grep-libbz2
|
||||
|
||||
to the configure command. These options naturally require that the rel-
|
||||
evant libraries are installed on your system. Configuration will fail
|
||||
evant libraries are installed on your system. Configuration will fail
|
||||
if they are not.
|
||||
|
||||
|
||||
PCRE2GREP BUFFER SIZE
|
||||
|
||||
pcre2grep uses an internal buffer to hold a "window" on the file it is
|
||||
pcre2grep uses an internal buffer to hold a "window" on the file it is
|
||||
scanning, in order to be able to output "before" and "after" lines when
|
||||
it finds a match. The starting size of the buffer is controlled by a
|
||||
parameter whose default value is 20K. The buffer itself is three times
|
||||
this size, but because of the way it is used for holding "before"
|
||||
lines, the longest line that is guaranteed to be processable is the
|
||||
parameter size. If a longer line is encountered, pcre2grep automati-
|
||||
it finds a match. The starting size of the buffer is controlled by a
|
||||
parameter whose default value is 20K. The buffer itself is three times
|
||||
this size, but because of the way it is used for holding "before"
|
||||
lines, the longest line that is guaranteed to be processable is the
|
||||
parameter size. If a longer line is encountered, pcre2grep automati-
|
||||
cally expands the buffer, up to a specified maximum size, whose default
|
||||
is 1M or the starting size, whichever is the larger. You can change the
|
||||
default parameter values by adding, for example,
|
||||
|
@ -3766,8 +3771,8 @@ PCRE2GREP BUFFER SIZE
|
|||
--with-pcre2grep-bufsize=51200
|
||||
--with-pcre2grep-max-bufsize=2097152
|
||||
|
||||
to the configure command. The caller of pcre2grep can override these
|
||||
values by using --buffer-size and --max-buffer-size on the command
|
||||
to the configure command. The caller of pcre2grep can override these
|
||||
values by using --buffer-size and --max-buffer-size on the command
|
||||
line.
|
||||
|
||||
|
||||
|
@ -3778,26 +3783,26 @@ PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
|
|||
--enable-pcre2test-libreadline
|
||||
--enable-pcre2test-libedit
|
||||
|
||||
to the configure command, pcre2test is linked with the libreadline
|
||||
to the configure command, pcre2test is linked with the libreadline
|
||||
orlibedit library, respectively, and when its input is from a terminal,
|
||||
it reads it using the readline() function. This provides line-editing
|
||||
and history facilities. Note that libreadline is GPL-licensed, so if
|
||||
you distribute a binary of pcre2test linked in this way, there may be
|
||||
it reads it using the readline() function. This provides line-editing
|
||||
and history facilities. Note that libreadline is GPL-licensed, so if
|
||||
you distribute a binary of pcre2test linked in this way, there may be
|
||||
licensing issues. These can be avoided by linking instead with libedit,
|
||||
which has a BSD licence.
|
||||
|
||||
Setting --enable-pcre2test-libreadline causes the -lreadline option to
|
||||
be added to the pcre2test build. In many operating environments with a
|
||||
sytem-installed readline library this is sufficient. However, in some
|
||||
Setting --enable-pcre2test-libreadline causes the -lreadline option to
|
||||
be added to the pcre2test build. In many operating environments with a
|
||||
sytem-installed readline library this is sufficient. However, in some
|
||||
environments (e.g. if an unmodified distribution version of readline is
|
||||
in use), some extra configuration may be necessary. The INSTALL file
|
||||
in use), some extra configuration may be necessary. The INSTALL file
|
||||
for libreadline says this:
|
||||
|
||||
"Readline uses the termcap functions, but does not link with
|
||||
the termcap or curses library itself, allowing applications
|
||||
which link with readline the to choose an appropriate library."
|
||||
|
||||
If your environment has not been set up so that an appropriate library
|
||||
If your environment has not been set up so that an appropriate library
|
||||
is automatically included, you may need to add something like
|
||||
|
||||
LIBS="-ncurses"
|
||||
|
@ -3811,7 +3816,7 @@ INCLUDING DEBUGGING CODE
|
|||
|
||||
--enable-debug
|
||||
|
||||
to the configure command, additional debugging code is included in the
|
||||
to the configure command, additional debugging code is included in the
|
||||
build. This feature is intended for use by the PCRE2 maintainers.
|
||||
|
||||
|
||||
|
@ -3821,15 +3826,15 @@ DEBUGGING WITH VALGRIND SUPPORT
|
|||
|
||||
--enable-valgrind
|
||||
|
||||
to the configure command, PCRE2 will use valgrind annotations to mark
|
||||
certain memory regions as unaddressable. This allows it to detect
|
||||
invalid memory accesses, and is mostly useful for debugging PCRE2
|
||||
to the configure command, PCRE2 will use valgrind annotations to mark
|
||||
certain memory regions as unaddressable. This allows it to detect
|
||||
invalid memory accesses, and is mostly useful for debugging PCRE2
|
||||
itself.
|
||||
|
||||
|
||||
CODE COVERAGE REPORTING
|
||||
|
||||
If your C compiler is gcc, you can build a version of PCRE2 that can
|
||||
If your C compiler is gcc, you can build a version of PCRE2 that can
|
||||
generate a code coverage report for its test suite. To enable this, you
|
||||
must install lcov version 1.6 or above. Then specify
|
||||
|
||||
|
@ -3838,20 +3843,20 @@ CODE COVERAGE REPORTING
|
|||
to the configure command and build PCRE2 in the usual way.
|
||||
|
||||
Note that using ccache (a caching C compiler) is incompatible with code
|
||||
coverage reporting. If you have configured ccache to run automatically
|
||||
coverage reporting. If you have configured ccache to run automatically
|
||||
on your system, you must set the environment variable
|
||||
|
||||
CCACHE_DISABLE=1
|
||||
|
||||
before running make to build PCRE2, so that ccache is not used.
|
||||
|
||||
When --enable-coverage is used, the following addition targets are
|
||||
When --enable-coverage is used, the following addition targets are
|
||||
added to the Makefile:
|
||||
|
||||
make coverage
|
||||
|
||||
This creates a fresh coverage report for the PCRE2 test suite. It is
|
||||
equivalent to running "make coverage-reset", "make coverage-baseline",
|
||||
This creates a fresh coverage report for the PCRE2 test suite. It is
|
||||
equivalent to running "make coverage-reset", "make coverage-baseline",
|
||||
"make check", and then "make coverage-report".
|
||||
|
||||
make coverage-reset
|
||||
|
@ -3868,56 +3873,56 @@ CODE COVERAGE REPORTING
|
|||
|
||||
make coverage-clean-report
|
||||
|
||||
This removes the generated coverage report without cleaning the cover-
|
||||
This removes the generated coverage report without cleaning the cover-
|
||||
age data itself.
|
||||
|
||||
make coverage-clean-data
|
||||
|
||||
This removes the captured coverage data without removing the coverage
|
||||
This removes the captured coverage data without removing the coverage
|
||||
files created at compile time (*.gcno).
|
||||
|
||||
make coverage-clean
|
||||
|
||||
This cleans all coverage data including the generated coverage report.
|
||||
For more information about code coverage, see the gcov and lcov docu-
|
||||
This cleans all coverage data including the generated coverage report.
|
||||
For more information about code coverage, see the gcov and lcov docu-
|
||||
mentation.
|
||||
|
||||
|
||||
SUPPORT FOR FUZZERS
|
||||
|
||||
There is a special option for use by people who want to run fuzzing
|
||||
There is a special option for use by people who want to run fuzzing
|
||||
tests on PCRE2:
|
||||
|
||||
--enable-fuzz-support
|
||||
|
||||
At present this applies only to the 8-bit library. If set, it causes an
|
||||
extra library called libpcre2-fuzzsupport.a to be built, but not
|
||||
installed. This contains a single function called LLVMFuzzerTestOneIn-
|
||||
put() whose arguments are a pointer to a string and the length of the
|
||||
string. When called, this function tries to compile the string as a
|
||||
pattern, and if that succeeds, to match it. This is done both with no
|
||||
options and with some random options bits that are generated from the
|
||||
extra library called libpcre2-fuzzsupport.a to be built, but not
|
||||
installed. This contains a single function called LLVMFuzzerTestOneIn-
|
||||
put() whose arguments are a pointer to a string and the length of the
|
||||
string. When called, this function tries to compile the string as a
|
||||
pattern, and if that succeeds, to match it. This is done both with no
|
||||
options and with some random options bits that are generated from the
|
||||
string.
|
||||
|
||||
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
|
||||
zcheck to be created. This is normally run under valgrind or used when
|
||||
Setting --enable-fuzz-support also causes a binary called pcre2fuz-
|
||||
zcheck to be created. This is normally run under valgrind or used when
|
||||
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
|
||||
function and outputs information about it is doing. The input strings
|
||||
are specified by arguments: if an argument starts with "=" the rest of
|
||||
it is a literal input string. Otherwise, it is assumed to be a file
|
||||
function and outputs information about it is doing. The input strings
|
||||
are specified by arguments: if an argument starts with "=" the rest of
|
||||
it is a literal input string. Otherwise, it is assumed to be a file
|
||||
name, and the contents of the file are the test string.
|
||||
|
||||
|
||||
OBSOLETE OPTION
|
||||
|
||||
In versions of PCRE2 prior to 10.30, there were two ways of handling
|
||||
backtracking in the pcre2_match() function. The default was to use the
|
||||
In versions of PCRE2 prior to 10.30, there were two ways of handling
|
||||
backtracking in the pcre2_match() function. The default was to use the
|
||||
system stack, but if
|
||||
|
||||
--disable-stack-for-recursion
|
||||
|
||||
was set, memory on the heap was used. From release 10.30 onwards this
|
||||
has changed (the stack is no longer used) and this option now does
|
||||
was set, memory on the heap was used. From release 10.30 onwards this
|
||||
has changed (the stack is no longer used) and this option now does
|
||||
nothing except give a warning.
|
||||
|
||||
|
||||
|
@ -3935,7 +3940,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 30 May 2017
|
||||
Last updated: 17 June 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -6309,26 +6314,28 @@ BACKSLASH
|
|||
Those that are not part of an identified script are lumped together as
|
||||
"Common". The current list of scripts is:
|
||||
|
||||
Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Balinese,
|
||||
Bamum, Bassa_Vah, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese,
|
||||
Buhid, Canadian_Aboriginal, Carian, Caucasian_Albanian, Chakma, Cham,
|
||||
Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
|
||||
Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan, Ethiopic, Geor-
|
||||
gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gurmukhi, Han,
|
||||
Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited,
|
||||
Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kan-
|
||||
nada, Katakana, Kayah_Li, Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
|
||||
Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha-
|
||||
jani, Malayalam, Mandaic, Manichaean, Meetei_Mayek, Mende_Kikakui,
|
||||
Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro,
|
||||
Multani, Myanmar, Nabataean, New_Tai_Lue, Nko, Ogham, Ol_Chiki,
|
||||
Old_Hungarian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
|
||||
Old_South_Arabian, Old_Turkic, Oriya, Osmanya, Pahawh_Hmong, Palmyrene,
|
||||
Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang, Runic,
|
||||
Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, Sinhala,
|
||||
Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa,
|
||||
Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, Thaana, Thai,
|
||||
Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi.
|
||||
Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali-
|
||||
nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
|
||||
Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba-
|
||||
nian, Chakma, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot,
|
||||
Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan,
|
||||
Ethiopic, Georgian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gur-
|
||||
mukhi, Han, Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Ara-
|
||||
maic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,
|
||||
Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Kho-
|
||||
jki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu,
|
||||
Lycian, Lydian, Mahajani, Malayalam, Mandaic, Manichaean, Marchen,
|
||||
Masaram_Gondi, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
|
||||
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar,
|
||||
Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
|
||||
ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
|
||||
Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pahawh_Hmong,
|
||||
Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi, Rejang,
|
||||
Runic, Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting,
|
||||
Sinhala, Sora_Sompeng, Soyombo, Sundanese, Syloti_Nagri, Syriac, Taga-
|
||||
log, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Tangut, Tel-
|
||||
ugu, Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai,
|
||||
Warang_Citi, Yi, Zanabazar_Square.
|
||||
|
||||
Each character has exactly one Unicode general category property, spec-
|
||||
ified by a two-letter abbreviation. For compatibility with Perl, nega-
|
||||
|
@ -8737,7 +8744,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 30 May 2017
|
||||
Last updated: 02 July 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
|
||||
.TH PCRE2PATTERN 3 "02 July 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -754,6 +754,7 @@ example:
|
|||
Those that are not part of an identified script are lumped together as
|
||||
"Common". The current list of scripts is:
|
||||
.P
|
||||
Adlam,
|
||||
Ahom,
|
||||
Anatolian_Hieroglyphs,
|
||||
Arabic,
|
||||
|
@ -764,6 +765,7 @@ Bamum,
|
|||
Bassa_Vah,
|
||||
Batak,
|
||||
Bengali,
|
||||
Bhaiksuki,
|
||||
Bopomofo,
|
||||
Brahmi,
|
||||
Braille,
|
||||
|
@ -825,6 +827,8 @@ Mahajani,
|
|||
Malayalam,
|
||||
Mandaic,
|
||||
Manichaean,
|
||||
Marchen,
|
||||
Masaram_Gondi,
|
||||
Meetei_Mayek,
|
||||
Mende_Kikakui,
|
||||
Meroitic_Cursive,
|
||||
|
@ -837,7 +841,9 @@ Multani,
|
|||
Myanmar,
|
||||
Nabataean,
|
||||
New_Tai_Lue,
|
||||
Newa,
|
||||
Nko,
|
||||
Nushu,
|
||||
Ogham,
|
||||
Ol_Chiki,
|
||||
Old_Hungarian,
|
||||
|
@ -848,6 +854,7 @@ Old_Persian,
|
|||
Old_South_Arabian,
|
||||
Old_Turkic,
|
||||
Oriya,
|
||||
Osage,
|
||||
Osmanya,
|
||||
Pahawh_Hmong,
|
||||
Palmyrene,
|
||||
|
@ -865,6 +872,7 @@ Siddham,
|
|||
SignWriting,
|
||||
Sinhala,
|
||||
Sora_Sompeng,
|
||||
Soyombo,
|
||||
Sundanese,
|
||||
Syloti_Nagri,
|
||||
Syriac,
|
||||
|
@ -875,6 +883,7 @@ Tai_Tham,
|
|||
Tai_Viet,
|
||||
Takri,
|
||||
Tamil,
|
||||
Tangut,
|
||||
Telugu,
|
||||
Thaana,
|
||||
Thai,
|
||||
|
@ -884,7 +893,8 @@ Tirhuta,
|
|||
Ugaritic,
|
||||
Vai,
|
||||
Warang_Citi,
|
||||
Yi.
|
||||
Yi,
|
||||
Zanabazar_Square.
|
||||
.P
|
||||
Each character has exactly one Unicode general category property, specified by
|
||||
a two-letter abbreviation. For compatibility with Perl, negation can be
|
||||
|
@ -3475,6 +3485,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 30 May 2017
|
||||
Last updated: 02 July 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
107
doc/pcre2test.1
107
doc/pcre2test.1
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "16 June 2017" "PCRE 10.30"
|
||||
.TH PCRE2TEST 1 "02 July 2017" "PCRE 10.30"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -527,7 +527,7 @@ by a previous \fB#pattern\fP command.
|
|||
.rs
|
||||
.sp
|
||||
The following modifiers set options for \fBpcre2_compile()\fP. Most of them set
|
||||
bits in the options argument of that function, but those whose names start with
|
||||
bits in the options argument of that function, but those whose names start with
|
||||
PCRE2_EXTRA are additional options that are set in the compile context. For the
|
||||
main options, there are some single-letter abbreviations that are the same as
|
||||
Perl options. There is special handling for /x: if a second x is present,
|
||||
|
@ -540,25 +540,25 @@ way \fBpcre2_compile()\fP behaves. See
|
|||
for a description of the effects of these options.
|
||||
.sp
|
||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
alt_bsux set PCRE2_ALT_BSUX
|
||||
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||
alt_verbnames set PCRE2_ALT_VERBNAMES
|
||||
anchored set PCRE2_ANCHORED
|
||||
auto_callout set PCRE2_AUTO_CALLOUT
|
||||
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
|
||||
/i caseless set PCRE2_CASELESS
|
||||
dollar_endonly set PCRE2_DOLLAR_ENDONLY
|
||||
/s dotall set PCRE2_DOTALL
|
||||
dupnames set PCRE2_DUPNAMES
|
||||
endanchored set PCRE2_ENDANCHORED
|
||||
/x extended set PCRE2_EXTENDED
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||
firstline set PCRE2_FIRSTLINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
literal set PCRE2_LITERAL
|
||||
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
||||
match_word set PCRE2_EXTRA_MATCH_WORD
|
||||
match_word set PCRE2_EXTRA_MATCH_WORD
|
||||
/m multiline set PCRE2_MULTILINE
|
||||
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
||||
never_ucp set PCRE2_NEVER_UCP
|
||||
|
@ -593,7 +593,7 @@ heavily used in the test files.
|
|||
/B bincode show binary code without lengths
|
||||
callout_info show callout information
|
||||
debug same as info,fullbincode
|
||||
framesize show matching frame size
|
||||
framesize show matching frame size
|
||||
fullbincode show binary code with lengths
|
||||
/I info show info about compiled pattern
|
||||
hex unquoted characters are hexadecimal
|
||||
|
@ -611,7 +611,7 @@ heavily used in the test files.
|
|||
push push compiled pattern onto the stack
|
||||
pushcopy push a copy onto the stack
|
||||
stackguard=<number> test the stackguard feature
|
||||
subject_literal treat all subject lines as literal
|
||||
subject_literal treat all subject lines as literal
|
||||
tables=[0|1|2] select internal tables
|
||||
use_length do not zero-terminate the pattern
|
||||
utf8_input treat input as UTF-8
|
||||
|
@ -677,7 +677,7 @@ unit" is the last literal code unit that must be present in any match. This is
|
|||
not necessarily the last character. These lines are omitted if no starting or
|
||||
ending code units are recorded.
|
||||
.P
|
||||
The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
|
||||
The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
|
||||
used by \fBpcre2_match()\fP for handling backtracking. The size depends on the
|
||||
number of capturing parentheses in the pattern.
|
||||
.P
|
||||
|
@ -934,8 +934,8 @@ The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described
|
|||
below. All other modifiers are either ignored, with a warning message, or cause
|
||||
an error.
|
||||
.P
|
||||
The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
|
||||
default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
|
||||
The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
|
||||
default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
|
||||
REG_PEND extension is used to pass it by length.
|
||||
.
|
||||
.
|
||||
|
@ -977,7 +977,7 @@ are mutually exclusive.
|
|||
.SS "Setting certain match controls"
|
||||
.rs
|
||||
.sp
|
||||
The following modifiers are really subject modifiers, and are described under
|
||||
The following modifiers are really subject modifiers, and are described under
|
||||
"Subject Modifiers" below. However, they may be included in a pattern's
|
||||
modifier list, in which case they are applied to every subject line that is
|
||||
processed with that pattern. They may not appear in \fB#pattern\fP commands.
|
||||
|
@ -1004,9 +1004,9 @@ defaults, set them in a \fB#subject\fP command.
|
|||
.SS "Specifying literal subject lines"
|
||||
.rs
|
||||
.sp
|
||||
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
|
||||
lines that it matches are taken as literal strings, with no interpretation of
|
||||
backslashes. It is not possible to set subject modifiers on such lines, but any
|
||||
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
|
||||
lines that it matches are taken as literal strings, with no interpretation of
|
||||
backslashes. It is not possible to set subject modifiers on such lines, but any
|
||||
that are set as defaults by a \fB#subject\fP command are recognized.
|
||||
.
|
||||
.
|
||||
|
@ -1020,7 +1020,9 @@ facility is used when saving compiled patterns to a file, as described in the
|
|||
section entitled "Saving and restoring compiled patterns"
|
||||
.\" HTML <a href="#saverestore">
|
||||
.\" </a>
|
||||
below. If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
|
||||
below.
|
||||
.\"
|
||||
If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
|
||||
pattern is stacked, leaving the original as current, ready to match the
|
||||
following input lines. This provides a way of testing the
|
||||
\fBpcre2_code_copy()\fP function.
|
||||
|
@ -1073,10 +1075,10 @@ that have any effect are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP,
|
|||
causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
|
||||
\fBregexec()\fP. The other modifiers are ignored, with a warning message.
|
||||
.P
|
||||
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||
ignored (with a warning) if used for non-POSIX matching.
|
||||
.sp
|
||||
posix_startend=<n>[:<m>]
|
||||
posix_startend=<n>[:<m>]
|
||||
.sp
|
||||
This causes the subject string to be passed to \fBregexec()\fP using the
|
||||
REG_STARTEND option, which uses offsets to specify which part of the string is
|
||||
|
@ -1085,8 +1087,8 @@ the subject string. For more detail of REG_STARTEND, see the
|
|||
.\" HREF
|
||||
\fBpcre2posix\fP
|
||||
.\"
|
||||
documentation. If the subject string contains binary zeros (coded as escapes
|
||||
such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in
|
||||
documentation. If the subject string contains binary zeros (coded as escapes
|
||||
such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in
|
||||
its input), you must use \fBposix_startend\fP to specify its length.
|
||||
.
|
||||
.
|
||||
|
@ -1107,6 +1109,7 @@ pattern.
|
|||
callout_data=<n> set a value to pass via callouts
|
||||
callout_error=<n>[:<m>] control callout error
|
||||
callout_fail=<n>[:<m>] control callout failure
|
||||
callout_no_where do not show position of a callout
|
||||
callout_none do not supply a callout function
|
||||
copy=<number or name> copy captured substring
|
||||
depth_limit=<n> set a depth limit
|
||||
|
@ -1200,26 +1203,13 @@ does no capturing); it is ignored, with a warning message, if present.
|
|||
.rs
|
||||
.sp
|
||||
A callout function is supplied when \fBpcre2test\fP calls the library matching
|
||||
functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is
|
||||
set, the current captured groups are output when a callout occurs. The default
|
||||
return from the callout function is zero, which allows matching to continue.
|
||||
.P
|
||||
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 (causing matching to backtrack)
|
||||
when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1
|
||||
is returned when callout <n> is reached and there have been at least <m>
|
||||
callouts. The \fBcallout_error\fP modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
\fBcallout_error\fP takes precedence.
|
||||
.P
|
||||
Note that callouts with string arguments are always given the number zero. See
|
||||
"Callouts" below for a description of the output when a callout it taken.
|
||||
.P
|
||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from \fBpcre2test\fP's callout function.
|
||||
functions, unless \fBcallout_none\fP is specified. Its behaviour can be
|
||||
controlled by various modifiers listed above whose names begin with
|
||||
\fBcallout_\fP. Details are given in the section entitled "Callouts"
|
||||
.\" HTML <a href="#callouts">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
.
|
||||
.
|
||||
.SS "Finding all matches in a string"
|
||||
|
@ -1344,7 +1334,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
|
|||
optimization is not being used. The value is a number of kilobytes. Setting
|
||||
zero reverts to the default of 32K. Providing a stack that is larger than the
|
||||
default is necessary only for very complicated patterns. If \fBjitstack\fP is
|
||||
set non-zero on a subject line it overrides any value that was set on the
|
||||
set non-zero on a subject line it overrides any value that was set on the
|
||||
pattern.
|
||||
.
|
||||
.
|
||||
|
@ -1372,7 +1362,7 @@ The \fImatch_limit\fP number is a measure of the amount of backtracking
|
|||
that takes place, and learning the minimum value can be instructive. For most
|
||||
simple matches, the number is quite small, but for patterns with very large
|
||||
numbers of matching possibilities, it can become large very quickly with
|
||||
increasing length of subject string.
|
||||
increasing length of subject string.
|
||||
.P
|
||||
For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
|
||||
much nested backtracking happens (that is, how deeply the pattern's tree is
|
||||
|
@ -1625,6 +1615,7 @@ For further information about partial matching, see the
|
|||
documentation.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="callouts"></a>
|
||||
.SH CALLOUTS
|
||||
.rs
|
||||
.sp
|
||||
|
@ -1633,8 +1624,30 @@ function is called during matching unless \fBcallout_none\fP is specified.
|
|||
This works with both matching functions.
|
||||
.P
|
||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
|
||||
change this and other parameters of the callout.
|
||||
.P
|
||||
If \fBcallout_capture\fP is set, the current captured groups are output when a
|
||||
callout occurs. By default, the callout function then generates output that
|
||||
indicates where the current match start and matching points are in the subject,
|
||||
and what the next pattern item is. This output is suppressed if the
|
||||
\fBcallout_no_where\fP modifier is set.
|
||||
.P
|
||||
The default return from the callout function is zero, which allows matching to
|
||||
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
|
||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||
are given, 1 is returned when callout <n> is reached and there have been at
|
||||
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
|
||||
are always given the number zero. See
|
||||
.P
|
||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from \fBpcre2test\fP's callout function.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
|
@ -1837,6 +1850,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 16 June 2017
|
||||
Last updated: 02 July 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -943,7 +943,7 @@ PATTERN MODIFIERS
|
|||
next line to contain a new pattern (or a command) instead of a subject
|
||||
line. This facility is used when saving compiled patterns to a file, as
|
||||
described in the section entitled "Saving and restoring compiled pat-
|
||||
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||
piled pattern is stacked, leaving the original as current, ready to
|
||||
match the following input lines. This provides a way of testing the
|
||||
pcre2_code_copy() function. The push and pushcopy modifiers are
|
||||
|
@ -1016,6 +1016,7 @@ SUBJECT MODIFIERS
|
|||
callout_data=<n> set a value to pass via callouts
|
||||
callout_error=<n>[:<m>] control callout error
|
||||
callout_fail=<n>[:<m>] control callout failure
|
||||
callout_no_where do not show position of a callout
|
||||
callout_none do not supply a callout function
|
||||
copy=<number or name> copy captured substring
|
||||
depth_limit=<n> set a depth limit
|
||||
|
@ -1107,29 +1108,9 @@ SUBJECT MODIFIERS
|
|||
Testing callouts
|
||||
|
||||
A callout function is supplied when pcre2test calls the library match-
|
||||
ing functions, unless callout_none is specified. If callout_capture is
|
||||
set, the current captured groups are output when a callout occurs. The
|
||||
default return from the callout function is zero, which allows matching
|
||||
to continue.
|
||||
|
||||
The callout_fail modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 (causing matching to back-
|
||||
track) when a callout of that number is reached. If two numbers
|
||||
(<n>:<m>) are given, 1 is returned when callout <n> is reached and
|
||||
there have been at least <m> callouts. The callout_error modifier is
|
||||
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
|
||||
entire matching process to be aborted. If both these modifiers are set
|
||||
for the same callout number, callout_error takes precedence.
|
||||
|
||||
Note that callouts with string arguments are always given the number
|
||||
zero. See "Callouts" below for a description of the output when a call-
|
||||
out it taken.
|
||||
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. This is set as the "user data" that is passed to the matching
|
||||
function, and passed back when the callout function is invoked. Any
|
||||
value other than zero is used as a return from pcre2test's callout
|
||||
function.
|
||||
ing functions, unless callout_none is specified. Its behaviour can be
|
||||
controlled by various modifiers listed above whose names begin with
|
||||
callout_. Details are given in the section entitled "Callouts" below.
|
||||
|
||||
Finding all matches in a string
|
||||
|
||||
|
@ -1511,8 +1492,32 @@ CALLOUTS
|
|||
works with both matching functions.
|
||||
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
default, but you can use a callout_fail modifier in a subject line to
|
||||
change this and other parameters of the callout.
|
||||
|
||||
If callout_capture is set, the current captured groups are output when
|
||||
a callout occurs. By default, the callout function then generates out-
|
||||
put that indicates where the current match start and matching points
|
||||
are in the subject, and what the next pattern item is. This output is
|
||||
suppressed if the callout_no_where modifier is set.
|
||||
|
||||
The default return from the callout function is zero, which allows
|
||||
matching to continue. The callout_fail modifier can be given one or two
|
||||
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
||||
ing matching to backtrack) when a callout of that number is reached. If
|
||||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
||||
reached and there have been at least <m> callouts. The callout_error
|
||||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
||||
ing the entire matching process to be aborted. If both these modifiers
|
||||
are set for the same callout number, callout_error takes precedence.
|
||||
Note that callouts with string arguments are always given the number
|
||||
zero. See
|
||||
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. This is set as the "user data" that is passed to the matching
|
||||
function, and passed back when the callout function is invoked. Any
|
||||
value other than zero is used as a return from pcre2test's callout
|
||||
function.
|
||||
|
||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
|
@ -1687,5 +1692,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 16 June 2017
|
||||
Last updated: 02 July 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
|
|
|
@ -23,6 +23,7 @@
|
|||
# Script updated to Python 3 by running it through the 2to3 converter.
|
||||
# Added script names for Unicode 7.0.0, 20-June-2014.
|
||||
# Added script names for Unicode 8.0.0, 19-June-2015.
|
||||
# Added script names for Unicode 10.0.0, 02-July-2017.
|
||||
|
||||
script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
|
||||
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
|
||||
|
@ -51,7 +52,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
|
|||
'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
|
||||
# New for Unicode 8.0.0
|
||||
'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
|
||||
'SignWriting'
|
||||
'SignWriting',
|
||||
# New for Unicode 10.0.0
|
||||
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
|
||||
'Nushu', 'Soyombo', 'Zanabazar_Square'
|
||||
]
|
||||
|
||||
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
|
||||
|
|
|
@ -122,6 +122,7 @@
|
|||
# 20-June-2014: Updated for Unicode 7.0.0
|
||||
# 12-August-2014: Updated to put Unicode version into the file
|
||||
# 19-June-2015: Updated for Unicode 8.0.0
|
||||
# 02-July-2017: Updated for Unicode 10.0.0
|
||||
##############################################################################
|
||||
|
||||
|
||||
|
@ -335,7 +336,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
|
|||
'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
|
||||
# New for Unicode 8.0.0
|
||||
'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
|
||||
'SignWriting'
|
||||
'SignWriting',
|
||||
# New for Unicode 10.0.0
|
||||
'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
|
||||
'Nushu', 'Soyombo', 'Zanabazar_Square'
|
||||
]
|
||||
|
||||
category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
|
||||
|
@ -343,7 +347,8 @@ category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
|
|||
'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ]
|
||||
|
||||
break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend',
|
||||
'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other' ]
|
||||
'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other',
|
||||
'E_Base', 'E_Modifier', 'E_Base_GAZ', 'ZWJ', 'Glue_After_Zwj' ]
|
||||
|
||||
test_record_size()
|
||||
unicode_version = ""
|
||||
|
|
|
@ -1,10 +1,11 @@
|
|||
# CaseFolding-8.0.0.txt
|
||||
# Date: 2015-01-13, 18:16:36 GMT [MD]
|
||||
# CaseFolding-10.0.0.txt
|
||||
# Date: 2017-04-14, 05:40:18 GMT
|
||||
# © 2017 Unicode®, Inc.
|
||||
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
#
|
||||
# Unicode Character Database
|
||||
# Copyright (c) 1991-2015 Unicode, Inc.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
#
|
||||
# Case Folding Properties
|
||||
#
|
||||
|
@ -23,7 +24,7 @@
|
|||
#
|
||||
# NOTE: case folding does not preserve normalization formats!
|
||||
#
|
||||
# For information on case folding, including how to have case folding
|
||||
# For information on case folding, including how to have case folding
|
||||
# preserve normalization formats, see Section 3.13 Default Case Algorithms in
|
||||
# The Unicode Standard.
|
||||
#
|
||||
|
@ -593,6 +594,15 @@
|
|||
13FB; C; 13F3; # CHEROKEE SMALL LETTER YU
|
||||
13FC; C; 13F4; # CHEROKEE SMALL LETTER YV
|
||||
13FD; C; 13F5; # CHEROKEE SMALL LETTER MV
|
||||
1C80; C; 0432; # CYRILLIC SMALL LETTER ROUNDED VE
|
||||
1C81; C; 0434; # CYRILLIC SMALL LETTER LONG-LEGGED DE
|
||||
1C82; C; 043E; # CYRILLIC SMALL LETTER NARROW O
|
||||
1C83; C; 0441; # CYRILLIC SMALL LETTER WIDE ES
|
||||
1C84; C; 0442; # CYRILLIC SMALL LETTER TALL TE
|
||||
1C85; C; 0442; # CYRILLIC SMALL LETTER THREE-LEGGED TE
|
||||
1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN
|
||||
1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT
|
||||
1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK
|
||||
1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW
|
||||
1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE
|
||||
1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW
|
||||
|
@ -1163,6 +1173,7 @@ A7AA; C; 0266; # LATIN CAPITAL LETTER H WITH HOOK
|
|||
A7AB; C; 025C; # LATIN CAPITAL LETTER REVERSED OPEN E
|
||||
A7AC; C; 0261; # LATIN CAPITAL LETTER SCRIPT G
|
||||
A7AD; C; 026C; # LATIN CAPITAL LETTER L WITH BELT
|
||||
A7AE; C; 026A; # LATIN CAPITAL LETTER SMALL CAPITAL I
|
||||
A7B0; C; 029E; # LATIN CAPITAL LETTER TURNED K
|
||||
A7B1; C; 0287; # LATIN CAPITAL LETTER TURNED T
|
||||
A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL
|
||||
|
@ -1327,6 +1338,42 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
|
|||
10425; C; 1044D; # DESERET CAPITAL LETTER ENG
|
||||
10426; C; 1044E; # DESERET CAPITAL LETTER OI
|
||||
10427; C; 1044F; # DESERET CAPITAL LETTER EW
|
||||
104B0; C; 104D8; # OSAGE CAPITAL LETTER A
|
||||
104B1; C; 104D9; # OSAGE CAPITAL LETTER AI
|
||||
104B2; C; 104DA; # OSAGE CAPITAL LETTER AIN
|
||||
104B3; C; 104DB; # OSAGE CAPITAL LETTER AH
|
||||
104B4; C; 104DC; # OSAGE CAPITAL LETTER BRA
|
||||
104B5; C; 104DD; # OSAGE CAPITAL LETTER CHA
|
||||
104B6; C; 104DE; # OSAGE CAPITAL LETTER EHCHA
|
||||
104B7; C; 104DF; # OSAGE CAPITAL LETTER E
|
||||
104B8; C; 104E0; # OSAGE CAPITAL LETTER EIN
|
||||
104B9; C; 104E1; # OSAGE CAPITAL LETTER HA
|
||||
104BA; C; 104E2; # OSAGE CAPITAL LETTER HYA
|
||||
104BB; C; 104E3; # OSAGE CAPITAL LETTER I
|
||||
104BC; C; 104E4; # OSAGE CAPITAL LETTER KA
|
||||
104BD; C; 104E5; # OSAGE CAPITAL LETTER EHKA
|
||||
104BE; C; 104E6; # OSAGE CAPITAL LETTER KYA
|
||||
104BF; C; 104E7; # OSAGE CAPITAL LETTER LA
|
||||
104C0; C; 104E8; # OSAGE CAPITAL LETTER MA
|
||||
104C1; C; 104E9; # OSAGE CAPITAL LETTER NA
|
||||
104C2; C; 104EA; # OSAGE CAPITAL LETTER O
|
||||
104C3; C; 104EB; # OSAGE CAPITAL LETTER OIN
|
||||
104C4; C; 104EC; # OSAGE CAPITAL LETTER PA
|
||||
104C5; C; 104ED; # OSAGE CAPITAL LETTER EHPA
|
||||
104C6; C; 104EE; # OSAGE CAPITAL LETTER SA
|
||||
104C7; C; 104EF; # OSAGE CAPITAL LETTER SHA
|
||||
104C8; C; 104F0; # OSAGE CAPITAL LETTER TA
|
||||
104C9; C; 104F1; # OSAGE CAPITAL LETTER EHTA
|
||||
104CA; C; 104F2; # OSAGE CAPITAL LETTER TSA
|
||||
104CB; C; 104F3; # OSAGE CAPITAL LETTER EHTSA
|
||||
104CC; C; 104F4; # OSAGE CAPITAL LETTER TSHA
|
||||
104CD; C; 104F5; # OSAGE CAPITAL LETTER DHA
|
||||
104CE; C; 104F6; # OSAGE CAPITAL LETTER U
|
||||
104CF; C; 104F7; # OSAGE CAPITAL LETTER WA
|
||||
104D0; C; 104F8; # OSAGE CAPITAL LETTER KHA
|
||||
104D1; C; 104F9; # OSAGE CAPITAL LETTER GHA
|
||||
104D2; C; 104FA; # OSAGE CAPITAL LETTER ZA
|
||||
104D3; C; 104FB; # OSAGE CAPITAL LETTER ZHA
|
||||
10C80; C; 10CC0; # OLD HUNGARIAN CAPITAL LETTER A
|
||||
10C81; C; 10CC1; # OLD HUNGARIAN CAPITAL LETTER AA
|
||||
10C82; C; 10CC2; # OLD HUNGARIAN CAPITAL LETTER EB
|
||||
|
@ -1410,5 +1457,39 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
|
|||
118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU
|
||||
118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII
|
||||
118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO
|
||||
1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF
|
||||
1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI
|
||||
1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM
|
||||
1E903; C; 1E925; # ADLAM CAPITAL LETTER MIIM
|
||||
1E904; C; 1E926; # ADLAM CAPITAL LETTER BA
|
||||
1E905; C; 1E927; # ADLAM CAPITAL LETTER SINNYIIYHE
|
||||
1E906; C; 1E928; # ADLAM CAPITAL LETTER PE
|
||||
1E907; C; 1E929; # ADLAM CAPITAL LETTER BHE
|
||||
1E908; C; 1E92A; # ADLAM CAPITAL LETTER RA
|
||||
1E909; C; 1E92B; # ADLAM CAPITAL LETTER E
|
||||
1E90A; C; 1E92C; # ADLAM CAPITAL LETTER FA
|
||||
1E90B; C; 1E92D; # ADLAM CAPITAL LETTER I
|
||||
1E90C; C; 1E92E; # ADLAM CAPITAL LETTER O
|
||||
1E90D; C; 1E92F; # ADLAM CAPITAL LETTER DHA
|
||||
1E90E; C; 1E930; # ADLAM CAPITAL LETTER YHE
|
||||
1E90F; C; 1E931; # ADLAM CAPITAL LETTER WAW
|
||||
1E910; C; 1E932; # ADLAM CAPITAL LETTER NUN
|
||||
1E911; C; 1E933; # ADLAM CAPITAL LETTER KAF
|
||||
1E912; C; 1E934; # ADLAM CAPITAL LETTER YA
|
||||
1E913; C; 1E935; # ADLAM CAPITAL LETTER U
|
||||
1E914; C; 1E936; # ADLAM CAPITAL LETTER JIIM
|
||||
1E915; C; 1E937; # ADLAM CAPITAL LETTER CHI
|
||||
1E916; C; 1E938; # ADLAM CAPITAL LETTER HA
|
||||
1E917; C; 1E939; # ADLAM CAPITAL LETTER QAAF
|
||||
1E918; C; 1E93A; # ADLAM CAPITAL LETTER GA
|
||||
1E919; C; 1E93B; # ADLAM CAPITAL LETTER NYA
|
||||
1E91A; C; 1E93C; # ADLAM CAPITAL LETTER TU
|
||||
1E91B; C; 1E93D; # ADLAM CAPITAL LETTER NHA
|
||||
1E91C; C; 1E93E; # ADLAM CAPITAL LETTER VA
|
||||
1E91D; C; 1E93F; # ADLAM CAPITAL LETTER KHA
|
||||
1E91E; C; 1E940; # ADLAM CAPITAL LETTER GBE
|
||||
1E91F; C; 1E941; # ADLAM CAPITAL LETTER ZAL
|
||||
1E920; C; 1E942; # ADLAM CAPITAL LETTER KPO
|
||||
1E921; C; 1E943; # ADLAM CAPITAL LETTER SHA
|
||||
#
|
||||
# EOF
|
||||
|
|
|
@ -1,10 +1,11 @@
|
|||
# DerivedGeneralCategory-8.0.0.txt
|
||||
# Date: 2015-02-13, 13:47:11 GMT [MD]
|
||||
# DerivedGeneralCategory-10.0.0.txt
|
||||
# Date: 2017-03-08, 08:41:49 GMT
|
||||
# © 2017 Unicode®, Inc.
|
||||
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
#
|
||||
# Unicode Character Database
|
||||
# Copyright (c) 1991-2015 Unicode, Inc.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -36,8 +37,10 @@
|
|||
082E..082F ; Cn # [2] <reserved-082E>..<reserved-082F>
|
||||
083F ; Cn # <reserved-083F>
|
||||
085C..085D ; Cn # [2] <reserved-085C>..<reserved-085D>
|
||||
085F..089F ; Cn # [65] <reserved-085F>..<reserved-089F>
|
||||
08B5..08E2 ; Cn # [46] <reserved-08B5>..<reserved-08E2>
|
||||
085F ; Cn # <reserved-085F>
|
||||
086B..089F ; Cn # [53] <reserved-086B>..<reserved-089F>
|
||||
08B5 ; Cn # <reserved-08B5>
|
||||
08BE..08D3 ; Cn # [22] <reserved-08BE>..<reserved-08D3>
|
||||
0984 ; Cn # <reserved-0984>
|
||||
098D..098E ; Cn # [2] <reserved-098D>..<reserved-098E>
|
||||
0991..0992 ; Cn # [2] <reserved-0991>..<reserved-0992>
|
||||
|
@ -51,7 +54,7 @@
|
|||
09D8..09DB ; Cn # [4] <reserved-09D8>..<reserved-09DB>
|
||||
09DE ; Cn # <reserved-09DE>
|
||||
09E4..09E5 ; Cn # [2] <reserved-09E4>..<reserved-09E5>
|
||||
09FC..0A00 ; Cn # [5] <reserved-09FC>..<reserved-0A00>
|
||||
09FE..0A00 ; Cn # [3] <reserved-09FE>..<reserved-0A00>
|
||||
0A04 ; Cn # <reserved-0A04>
|
||||
0A0B..0A0E ; Cn # [4] <reserved-0A0B>..<reserved-0A0E>
|
||||
0A11..0A12 ; Cn # [2] <reserved-0A11>..<reserved-0A12>
|
||||
|
@ -81,7 +84,7 @@
|
|||
0AD1..0ADF ; Cn # [15] <reserved-0AD1>..<reserved-0ADF>
|
||||
0AE4..0AE5 ; Cn # [2] <reserved-0AE4>..<reserved-0AE5>
|
||||
0AF2..0AF8 ; Cn # [7] <reserved-0AF2>..<reserved-0AF8>
|
||||
0AFA..0B00 ; Cn # [7] <reserved-0AFA>..<reserved-0B00>
|
||||
0B00 ; Cn # <reserved-0B00>
|
||||
0B04 ; Cn # <reserved-0B04>
|
||||
0B0D..0B0E ; Cn # [2] <reserved-0B0D>..<reserved-0B0E>
|
||||
0B11..0B12 ; Cn # [2] <reserved-0B11>..<reserved-0B12>
|
||||
|
@ -124,7 +127,6 @@
|
|||
0C5B..0C5F ; Cn # [5] <reserved-0C5B>..<reserved-0C5F>
|
||||
0C64..0C65 ; Cn # [2] <reserved-0C64>..<reserved-0C65>
|
||||
0C70..0C77 ; Cn # [8] <reserved-0C70>..<reserved-0C77>
|
||||
0C80 ; Cn # <reserved-0C80>
|
||||
0C84 ; Cn # <reserved-0C84>
|
||||
0C8D ; Cn # <reserved-0C8D>
|
||||
0C91 ; Cn # <reserved-0C91>
|
||||
|
@ -138,17 +140,14 @@
|
|||
0CDF ; Cn # <reserved-0CDF>
|
||||
0CE4..0CE5 ; Cn # [2] <reserved-0CE4>..<reserved-0CE5>
|
||||
0CF0 ; Cn # <reserved-0CF0>
|
||||
0CF3..0D00 ; Cn # [14] <reserved-0CF3>..<reserved-0D00>
|
||||
0CF3..0CFF ; Cn # [13] <reserved-0CF3>..<reserved-0CFF>
|
||||
0D04 ; Cn # <reserved-0D04>
|
||||
0D0D ; Cn # <reserved-0D0D>
|
||||
0D11 ; Cn # <reserved-0D11>
|
||||
0D3B..0D3C ; Cn # [2] <reserved-0D3B>..<reserved-0D3C>
|
||||
0D45 ; Cn # <reserved-0D45>
|
||||
0D49 ; Cn # <reserved-0D49>
|
||||
0D4F..0D56 ; Cn # [8] <reserved-0D4F>..<reserved-0D56>
|
||||
0D58..0D5E ; Cn # [7] <reserved-0D58>..<reserved-0D5E>
|
||||
0D50..0D53 ; Cn # [4] <reserved-0D50>..<reserved-0D53>
|
||||
0D64..0D65 ; Cn # [2] <reserved-0D64>..<reserved-0D65>
|
||||
0D76..0D78 ; Cn # [3] <reserved-0D76>..<reserved-0D78>
|
||||
0D80..0D81 ; Cn # [2] <reserved-0D80>..<reserved-0D81>
|
||||
0D84 ; Cn # <reserved-0D84>
|
||||
0D97..0D99 ; Cn # [3] <reserved-0D97>..<reserved-0D99>
|
||||
|
@ -249,11 +248,10 @@
|
|||
1BF4..1BFB ; Cn # [8] <reserved-1BF4>..<reserved-1BFB>
|
||||
1C38..1C3A ; Cn # [3] <reserved-1C38>..<reserved-1C3A>
|
||||
1C4A..1C4C ; Cn # [3] <reserved-1C4A>..<reserved-1C4C>
|
||||
1C80..1CBF ; Cn # [64] <reserved-1C80>..<reserved-1CBF>
|
||||
1C89..1CBF ; Cn # [55] <reserved-1C89>..<reserved-1CBF>
|
||||
1CC8..1CCF ; Cn # [8] <reserved-1CC8>..<reserved-1CCF>
|
||||
1CF7 ; Cn # <reserved-1CF7>
|
||||
1CFA..1CFF ; Cn # [6] <reserved-1CFA>..<reserved-1CFF>
|
||||
1DF6..1DFB ; Cn # [6] <reserved-1DF6>..<reserved-1DFB>
|
||||
1DFA ; Cn # <reserved-1DFA>
|
||||
1F16..1F17 ; Cn # [2] <reserved-1F16>..<reserved-1F17>
|
||||
1F1E..1F1F ; Cn # [2] <reserved-1F1E>..<reserved-1F1F>
|
||||
1F46..1F47 ; Cn # [2] <reserved-1F46>..<reserved-1F47>
|
||||
|
@ -274,17 +272,16 @@
|
|||
2072..2073 ; Cn # [2] <reserved-2072>..<reserved-2073>
|
||||
208F ; Cn # <reserved-208F>
|
||||
209D..209F ; Cn # [3] <reserved-209D>..<reserved-209F>
|
||||
20BF..20CF ; Cn # [17] <reserved-20BF>..<reserved-20CF>
|
||||
20C0..20CF ; Cn # [16] <reserved-20C0>..<reserved-20CF>
|
||||
20F1..20FF ; Cn # [15] <reserved-20F1>..<reserved-20FF>
|
||||
218C..218F ; Cn # [4] <reserved-218C>..<reserved-218F>
|
||||
23FB..23FF ; Cn # [5] <reserved-23FB>..<reserved-23FF>
|
||||
2427..243F ; Cn # [25] <reserved-2427>..<reserved-243F>
|
||||
244B..245F ; Cn # [21] <reserved-244B>..<reserved-245F>
|
||||
2B74..2B75 ; Cn # [2] <reserved-2B74>..<reserved-2B75>
|
||||
2B96..2B97 ; Cn # [2] <reserved-2B96>..<reserved-2B97>
|
||||
2BBA..2BBC ; Cn # [3] <reserved-2BBA>..<reserved-2BBC>
|
||||
2BC9 ; Cn # <reserved-2BC9>
|
||||
2BD2..2BEB ; Cn # [26] <reserved-2BD2>..<reserved-2BEB>
|
||||
2BD3..2BEB ; Cn # [25] <reserved-2BD3>..<reserved-2BEB>
|
||||
2BF0..2BFF ; Cn # [16] <reserved-2BF0>..<reserved-2BFF>
|
||||
2C2F ; Cn # <reserved-2C2F>
|
||||
2C5F ; Cn # <reserved-2C5F>
|
||||
|
@ -303,7 +300,7 @@
|
|||
2DCF ; Cn # <reserved-2DCF>
|
||||
2DD7 ; Cn # <reserved-2DD7>
|
||||
2DDF ; Cn # <reserved-2DDF>
|
||||
2E43..2E7F ; Cn # [61] <reserved-2E43>..<reserved-2E7F>
|
||||
2E4A..2E7F ; Cn # [54] <reserved-2E4A>..<reserved-2E7F>
|
||||
2E9A ; Cn # <reserved-2E9A>
|
||||
2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF>
|
||||
2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF>
|
||||
|
@ -311,24 +308,24 @@
|
|||
3040 ; Cn # <reserved-3040>
|
||||
3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098>
|
||||
3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104>
|
||||
312E..3130 ; Cn # [3] <reserved-312E>..<reserved-3130>
|
||||
312F..3130 ; Cn # [2] <reserved-312F>..<reserved-3130>
|
||||
318F ; Cn # <reserved-318F>
|
||||
31BB..31BF ; Cn # [5] <reserved-31BB>..<reserved-31BF>
|
||||
31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF>
|
||||
321F ; Cn # <reserved-321F>
|
||||
32FF ; Cn # <reserved-32FF>
|
||||
4DB6..4DBF ; Cn # [10] <reserved-4DB6>..<reserved-4DBF>
|
||||
9FD6..9FFF ; Cn # [42] <reserved-9FD6>..<reserved-9FFF>
|
||||
9FEB..9FFF ; Cn # [21] <reserved-9FEB>..<reserved-9FFF>
|
||||
A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F>
|
||||
A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF>
|
||||
A62C..A63F ; Cn # [20] <reserved-A62C>..<reserved-A63F>
|
||||
A6F8..A6FF ; Cn # [8] <reserved-A6F8>..<reserved-A6FF>
|
||||
A7AE..A7AF ; Cn # [2] <reserved-A7AE>..<reserved-A7AF>
|
||||
A7AF ; Cn # <reserved-A7AF>
|
||||
A7B8..A7F6 ; Cn # [63] <reserved-A7B8>..<reserved-A7F6>
|
||||
A82C..A82F ; Cn # [4] <reserved-A82C>..<reserved-A82F>
|
||||
A83A..A83F ; Cn # [6] <reserved-A83A>..<reserved-A83F>
|
||||
A878..A87F ; Cn # [8] <reserved-A878>..<reserved-A87F>
|
||||
A8C5..A8CD ; Cn # [9] <reserved-A8C5>..<reserved-A8CD>
|
||||
A8C6..A8CD ; Cn # [8] <reserved-A8C6>..<reserved-A8CD>
|
||||
A8DA..A8DF ; Cn # [6] <reserved-A8DA>..<reserved-A8DF>
|
||||
A8FE..A8FF ; Cn # [2] <reserved-A8FE>..<reserved-A8FF>
|
||||
A954..A95E ; Cn # [11] <reserved-A954>..<reserved-A95E>
|
||||
|
@ -390,21 +387,23 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
|
|||
100FB..100FF ; Cn # [5] <reserved-100FB>..<reserved-100FF>
|
||||
10103..10106 ; Cn # [4] <reserved-10103>..<reserved-10106>
|
||||
10134..10136 ; Cn # [3] <reserved-10134>..<reserved-10136>
|
||||
1018D..1018F ; Cn # [3] <reserved-1018D>..<reserved-1018F>
|
||||
1018F ; Cn # <reserved-1018F>
|
||||
1019C..1019F ; Cn # [4] <reserved-1019C>..<reserved-1019F>
|
||||
101A1..101CF ; Cn # [47] <reserved-101A1>..<reserved-101CF>
|
||||
101FE..1027F ; Cn # [130] <reserved-101FE>..<reserved-1027F>
|
||||
1029D..1029F ; Cn # [3] <reserved-1029D>..<reserved-1029F>
|
||||
102D1..102DF ; Cn # [15] <reserved-102D1>..<reserved-102DF>
|
||||
102FC..102FF ; Cn # [4] <reserved-102FC>..<reserved-102FF>
|
||||
10324..1032F ; Cn # [12] <reserved-10324>..<reserved-1032F>
|
||||
10324..1032C ; Cn # [9] <reserved-10324>..<reserved-1032C>
|
||||
1034B..1034F ; Cn # [5] <reserved-1034B>..<reserved-1034F>
|
||||
1037B..1037F ; Cn # [5] <reserved-1037B>..<reserved-1037F>
|
||||
1039E ; Cn # <reserved-1039E>
|
||||
103C4..103C7 ; Cn # [4] <reserved-103C4>..<reserved-103C7>
|
||||
103D6..103FF ; Cn # [42] <reserved-103D6>..<reserved-103FF>
|
||||
1049E..1049F ; Cn # [2] <reserved-1049E>..<reserved-1049F>
|
||||
104AA..104FF ; Cn # [86] <reserved-104AA>..<reserved-104FF>
|
||||
104AA..104AF ; Cn # [6] <reserved-104AA>..<reserved-104AF>
|
||||
104D4..104D7 ; Cn # [4] <reserved-104D4>..<reserved-104D7>
|
||||
104FC..104FF ; Cn # [4] <reserved-104FC>..<reserved-104FF>
|
||||
10528..1052F ; Cn # [8] <reserved-10528>..<reserved-1052F>
|
||||
10564..1056E ; Cn # [11] <reserved-10564>..<reserved-1056E>
|
||||
10570..105FF ; Cn # [144] <reserved-10570>..<reserved-105FF>
|
||||
|
@ -460,7 +459,7 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
|
|||
111E0 ; Cn # <reserved-111E0>
|
||||
111F5..111FF ; Cn # [11] <reserved-111F5>..<reserved-111FF>
|
||||
11212 ; Cn # <reserved-11212>
|
||||
1123E..1127F ; Cn # [66] <reserved-1123E>..<reserved-1127F>
|
||||
1123F..1127F ; Cn # [65] <reserved-1123F>..<reserved-1127F>
|
||||
11287 ; Cn # <reserved-11287>
|
||||
11289 ; Cn # <reserved-11289>
|
||||
1128E ; Cn # <reserved-1128E>
|
||||
|
@ -482,21 +481,43 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
|
|||
11358..1135C ; Cn # [5] <reserved-11358>..<reserved-1135C>
|
||||
11364..11365 ; Cn # [2] <reserved-11364>..<reserved-11365>
|
||||
1136D..1136F ; Cn # [3] <reserved-1136D>..<reserved-1136F>
|
||||
11375..1147F ; Cn # [267] <reserved-11375>..<reserved-1147F>
|
||||
11375..113FF ; Cn # [139] <reserved-11375>..<reserved-113FF>
|
||||
1145A ; Cn # <reserved-1145A>
|
||||
1145C ; Cn # <reserved-1145C>
|
||||
1145E..1147F ; Cn # [34] <reserved-1145E>..<reserved-1147F>
|
||||
114C8..114CF ; Cn # [8] <reserved-114C8>..<reserved-114CF>
|
||||
114DA..1157F ; Cn # [166] <reserved-114DA>..<reserved-1157F>
|
||||
115B6..115B7 ; Cn # [2] <reserved-115B6>..<reserved-115B7>
|
||||
115DE..115FF ; Cn # [34] <reserved-115DE>..<reserved-115FF>
|
||||
11645..1164F ; Cn # [11] <reserved-11645>..<reserved-1164F>
|
||||
1165A..1167F ; Cn # [38] <reserved-1165A>..<reserved-1167F>
|
||||
1165A..1165F ; Cn # [6] <reserved-1165A>..<reserved-1165F>
|
||||
1166D..1167F ; Cn # [19] <reserved-1166D>..<reserved-1167F>
|
||||
116B8..116BF ; Cn # [8] <reserved-116B8>..<reserved-116BF>
|
||||
116CA..116FF ; Cn # [54] <reserved-116CA>..<reserved-116FF>
|
||||
1171A..1171C ; Cn # [3] <reserved-1171A>..<reserved-1171C>
|
||||
1172C..1172F ; Cn # [4] <reserved-1172C>..<reserved-1172F>
|
||||
11740..1189F ; Cn # [352] <reserved-11740>..<reserved-1189F>
|
||||
118F3..118FE ; Cn # [12] <reserved-118F3>..<reserved-118FE>
|
||||
11900..11ABF ; Cn # [448] <reserved-11900>..<reserved-11ABF>
|
||||
11AF9..11FFF ; Cn # [1287] <reserved-11AF9>..<reserved-11FFF>
|
||||
11900..119FF ; Cn # [256] <reserved-11900>..<reserved-119FF>
|
||||
11A48..11A4F ; Cn # [8] <reserved-11A48>..<reserved-11A4F>
|
||||
11A84..11A85 ; Cn # [2] <reserved-11A84>..<reserved-11A85>
|
||||
11A9D ; Cn # <reserved-11A9D>
|
||||
11AA3..11ABF ; Cn # [29] <reserved-11AA3>..<reserved-11ABF>
|
||||
11AF9..11BFF ; Cn # [263] <reserved-11AF9>..<reserved-11BFF>
|
||||
11C09 ; Cn # <reserved-11C09>
|
||||
11C37 ; Cn # <reserved-11C37>
|
||||
11C46..11C4F ; Cn # [10] <reserved-11C46>..<reserved-11C4F>
|
||||
11C6D..11C6F ; Cn # [3] <reserved-11C6D>..<reserved-11C6F>
|
||||
11C90..11C91 ; Cn # [2] <reserved-11C90>..<reserved-11C91>
|
||||
11CA8 ; Cn # <reserved-11CA8>
|
||||
11CB7..11CFF ; Cn # [73] <reserved-11CB7>..<reserved-11CFF>
|
||||
11D07 ; Cn # <reserved-11D07>
|
||||
11D0A ; Cn # <reserved-11D0A>
|
||||
11D37..11D39 ; Cn # [3] <reserved-11D37>..<reserved-11D39>
|
||||
11D3B ; Cn # <reserved-11D3B>
|
||||
11D3E ; Cn # <reserved-11D3E>
|
||||
11D48..11D4F ; Cn # [8] <reserved-11D48>..<reserved-11D4F>
|
||||
11D5A..11FFF ; Cn # [678] <reserved-11D5A>..<reserved-11FFF>
|
||||
1239A..123FF ; Cn # [102] <reserved-1239A>..<reserved-123FF>
|
||||
1246F ; Cn # <reserved-1246F>
|
||||
12475..1247F ; Cn # [11] <reserved-12475>..<reserved-1247F>
|
||||
|
@ -516,8 +537,12 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
|
|||
16B90..16EFF ; Cn # [880] <reserved-16B90>..<reserved-16EFF>
|
||||
16F45..16F4F ; Cn # [11] <reserved-16F45>..<reserved-16F4F>
|
||||
16F7F..16F8E ; Cn # [16] <reserved-16F7F>..<reserved-16F8E>
|
||||
16FA0..1AFFF ; Cn # [16480] <reserved-16FA0>..<reserved-1AFFF>
|
||||
1B002..1BBFF ; Cn # [3070] <reserved-1B002>..<reserved-1BBFF>
|
||||
16FA0..16FDF ; Cn # [64] <reserved-16FA0>..<reserved-16FDF>
|
||||
16FE2..16FFF ; Cn # [30] <reserved-16FE2>..<reserved-16FFF>
|
||||
187ED..187FF ; Cn # [19] <reserved-187ED>..<reserved-187FF>
|
||||
18AF3..1AFFF ; Cn # [9485] <reserved-18AF3>..<reserved-1AFFF>
|
||||
1B11F..1B16F ; Cn # [81] <reserved-1B11F>..<reserved-1B16F>
|
||||
1B2FC..1BBFF ; Cn # [2308] <reserved-1B2FC>..<reserved-1BBFF>
|
||||
1BC6B..1BC6F ; Cn # [5] <reserved-1BC6B>..<reserved-1BC6F>
|
||||
1BC7D..1BC7F ; Cn # [3] <reserved-1BC7D>..<reserved-1BC7F>
|
||||
1BC89..1BC8F ; Cn # [7] <reserved-1BC89>..<reserved-1BC8F>
|
||||
|
@ -551,9 +576,17 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
|
|||
1D7CC..1D7CD ; Cn # [2] <reserved-1D7CC>..<reserved-1D7CD>
|
||||
1DA8C..1DA9A ; Cn # [15] <reserved-1DA8C>..<reserved-1DA9A>
|
||||
1DAA0 ; Cn # <reserved-1DAA0>
|
||||
1DAB0..1E7FF ; Cn # [3408] <reserved-1DAB0>..<reserved-1E7FF>
|
||||
1DAB0..1DFFF ; Cn # [1360] <reserved-1DAB0>..<reserved-1DFFF>
|
||||
1E007 ; Cn # <reserved-1E007>
|
||||
1E019..1E01A ; Cn # [2] <reserved-1E019>..<reserved-1E01A>
|
||||
1E022 ; Cn # <reserved-1E022>
|
||||
1E025 ; Cn # <reserved-1E025>
|
||||
1E02B..1E7FF ; Cn # [2005] <reserved-1E02B>..<reserved-1E7FF>
|
||||
1E8C5..1E8C6 ; Cn # [2] <reserved-1E8C5>..<reserved-1E8C6>
|
||||
1E8D7..1EDFF ; Cn # [1321] <reserved-1E8D7>..<reserved-1EDFF>
|
||||
1E8D7..1E8FF ; Cn # [41] <reserved-1E8D7>..<reserved-1E8FF>
|
||||
1E94B..1E94F ; Cn # [5] <reserved-1E94B>..<reserved-1E94F>
|
||||
1E95A..1E95D ; Cn # [4] <reserved-1E95A>..<reserved-1E95D>
|
||||
1E960..1EDFF ; Cn # [1184] <reserved-1E960>..<reserved-1EDFF>
|
||||
1EE04 ; Cn # <reserved-1EE04>
|
||||
1EE20 ; Cn # <reserved-1EE20>
|
||||
1EE23 ; Cn # <reserved-1EE23>
|
||||
|
@ -597,30 +630,34 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
|
|||
1F10D..1F10F ; Cn # [3] <reserved-1F10D>..<reserved-1F10F>
|
||||
1F12F ; Cn # <reserved-1F12F>
|
||||
1F16C..1F16F ; Cn # [4] <reserved-1F16C>..<reserved-1F16F>
|
||||
1F19B..1F1E5 ; Cn # [75] <reserved-1F19B>..<reserved-1F1E5>
|
||||
1F1AD..1F1E5 ; Cn # [57] <reserved-1F1AD>..<reserved-1F1E5>
|
||||
1F203..1F20F ; Cn # [13] <reserved-1F203>..<reserved-1F20F>
|
||||
1F23B..1F23F ; Cn # [5] <reserved-1F23B>..<reserved-1F23F>
|
||||
1F23C..1F23F ; Cn # [4] <reserved-1F23C>..<reserved-1F23F>
|
||||
1F249..1F24F ; Cn # [7] <reserved-1F249>..<reserved-1F24F>
|
||||
1F252..1F2FF ; Cn # [174] <reserved-1F252>..<reserved-1F2FF>
|
||||
1F57A ; Cn # <reserved-1F57A>
|
||||
1F5A4 ; Cn # <reserved-1F5A4>
|
||||
1F6D1..1F6DF ; Cn # [15] <reserved-1F6D1>..<reserved-1F6DF>
|
||||
1F252..1F25F ; Cn # [14] <reserved-1F252>..<reserved-1F25F>
|
||||
1F266..1F2FF ; Cn # [154] <reserved-1F266>..<reserved-1F2FF>
|
||||
1F6D5..1F6DF ; Cn # [11] <reserved-1F6D5>..<reserved-1F6DF>
|
||||
1F6ED..1F6EF ; Cn # [3] <reserved-1F6ED>..<reserved-1F6EF>
|
||||
1F6F4..1F6FF ; Cn # [12] <reserved-1F6F4>..<reserved-1F6FF>
|
||||
1F6F9..1F6FF ; Cn # [7] <reserved-1F6F9>..<reserved-1F6FF>
|
||||
1F774..1F77F ; Cn # [12] <reserved-1F774>..<reserved-1F77F>
|
||||
1F7D5..1F7FF ; Cn # [43] <reserved-1F7D5>..<reserved-1F7FF>
|
||||
1F80C..1F80F ; Cn # [4] <reserved-1F80C>..<reserved-1F80F>
|
||||
1F848..1F84F ; Cn # [8] <reserved-1F848>..<reserved-1F84F>
|
||||
1F85A..1F85F ; Cn # [6] <reserved-1F85A>..<reserved-1F85F>
|
||||
1F888..1F88F ; Cn # [8] <reserved-1F888>..<reserved-1F88F>
|
||||
1F8AE..1F90F ; Cn # [98] <reserved-1F8AE>..<reserved-1F90F>
|
||||
1F919..1F97F ; Cn # [103] <reserved-1F919>..<reserved-1F97F>
|
||||
1F985..1F9BF ; Cn # [59] <reserved-1F985>..<reserved-1F9BF>
|
||||
1F9C1..1FFFF ; Cn # [1599] <reserved-1F9C1>..<noncharacter-1FFFF>
|
||||
1F8AE..1F8FF ; Cn # [82] <reserved-1F8AE>..<reserved-1F8FF>
|
||||
1F90C..1F90F ; Cn # [4] <reserved-1F90C>..<reserved-1F90F>
|
||||
1F93F ; Cn # <reserved-1F93F>
|
||||
1F94D..1F94F ; Cn # [3] <reserved-1F94D>..<reserved-1F94F>
|
||||
1F96C..1F97F ; Cn # [20] <reserved-1F96C>..<reserved-1F97F>
|
||||
1F998..1F9BF ; Cn # [40] <reserved-1F998>..<reserved-1F9BF>
|
||||
1F9C1..1F9CF ; Cn # [15] <reserved-1F9C1>..<reserved-1F9CF>
|
||||
1F9E7..1FFFF ; Cn # [1561] <reserved-1F9E7>..<noncharacter-1FFFF>
|
||||
2A6D7..2A6FF ; Cn # [41] <reserved-2A6D7>..<reserved-2A6FF>
|
||||
2B735..2B73F ; Cn # [11] <reserved-2B735>..<reserved-2B73F>
|
||||
2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F>
|
||||
2CEA2..2F7FF ; Cn # [10590] <reserved-2CEA2>..<reserved-2F7FF>
|
||||
2CEA2..2CEAF ; Cn # [14] <reserved-2CEA2>..<reserved-2CEAF>
|
||||
2EBE1..2F7FF ; Cn # [3103] <reserved-2EBE1>..<reserved-2F7FF>
|
||||
2FA1E..E0000 ; Cn # [722403] <reserved-2FA1E>..<reserved-E0000>
|
||||
E0002..E001F ; Cn # [30] <reserved-E0002>..<reserved-E001F>
|
||||
E0080..E00FF ; Cn # [128] <reserved-E0080>..<reserved-E00FF>
|
||||
|
@ -628,7 +665,7 @@ E01F0..EFFFF ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF>
|
|||
FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
|
||||
10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>
|
||||
|
||||
# Total code points: 853859
|
||||
# Total code points: 837841
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1221,11 +1258,12 @@ A7A2 ; Lu # LATIN CAPITAL LETTER K WITH OBLIQUE STROKE
|
|||
A7A4 ; Lu # LATIN CAPITAL LETTER N WITH OBLIQUE STROKE
|
||||
A7A6 ; Lu # LATIN CAPITAL LETTER R WITH OBLIQUE STROKE
|
||||
A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE
|
||||
A7AA..A7AD ; Lu # [4] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER L WITH BELT
|
||||
A7AA..A7AE ; Lu # [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I
|
||||
A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA
|
||||
A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA
|
||||
FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
|
||||
10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW
|
||||
104B0..104D3 ; Lu # [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
|
||||
10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US
|
||||
118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO
|
||||
1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z
|
||||
|
@ -1259,8 +1297,9 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
|
|||
1D756..1D76E ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA
|
||||
1D790..1D7A8 ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA
|
||||
1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA
|
||||
1E900..1E921 ; Lu # [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA
|
||||
|
||||
# Total code points: 1631
|
||||
# Total code points: 1702
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1537,6 +1576,7 @@ FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
|
|||
052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER
|
||||
0561..0587 ; Ll # [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
|
||||
13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
|
||||
1C80..1C88 ; Ll # [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
|
||||
1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL
|
||||
1D6B..1D77 ; Ll # [13] LATIN SMALL LETTER UE..LATIN SMALL LETTER TURNED G
|
||||
1D79..1D9A ; Ll # [34] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK
|
||||
|
@ -1866,6 +1906,7 @@ FB00..FB06 ; Ll # [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST
|
|||
FB13..FB17 ; Ll # [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
|
||||
FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
|
||||
10428..1044F ; Ll # [40] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER EW
|
||||
104D8..104FB ; Ll # [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
|
||||
10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US
|
||||
118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO
|
||||
1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z
|
||||
|
@ -1896,8 +1937,9 @@ FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL
|
|||
1D7AA..1D7C2 ; Ll # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA
|
||||
1D7C4..1D7C9 ; Ll # [6] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL
|
||||
1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA
|
||||
1E922..1E943 ; Ll # [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA
|
||||
|
||||
# Total code points: 1984
|
||||
# Total code points: 2063
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1976,8 +2018,9 @@ FF70 ; Lm # HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
|
|||
FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
|
||||
16B40..16B43 ; Lm # [4] PAHAWH HMONG SIGN VOS SEEV..PAHAWH HMONG SIGN IB YAM
|
||||
16F93..16F9F ; Lm # [13] MIAO LETTER TONE-2..MIAO LETTER REFORMED TONE-8
|
||||
16FE0..16FE1 ; Lm # [2] TANGUT ITERATION MARK..NUSHU ITERATION MARK
|
||||
|
||||
# Total code points: 248
|
||||
# Total code points: 250
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2005,7 +2048,9 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
|
|||
07CA..07EA ; Lo # [33] NKO LETTER A..NKO LETTER JONA RA
|
||||
0800..0815 ; Lo # [22] SAMARITAN LETTER ALAF..SAMARITAN LETTER TAAF
|
||||
0840..0858 ; Lo # [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN
|
||||
0860..086A ; Lo # [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
|
||||
08A0..08B4 ; Lo # [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
|
||||
08B6..08BD ; Lo # [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
|
||||
0904..0939 ; Lo # [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA
|
||||
093D ; Lo # DEVANAGARI SIGN AVAGRAHA
|
||||
0950 ; Lo # DEVANAGARI OM
|
||||
|
@ -2022,6 +2067,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
|
|||
09DC..09DD ; Lo # [2] BENGALI LETTER RRA..BENGALI LETTER RHA
|
||||
09DF..09E1 ; Lo # [3] BENGALI LETTER YYA..BENGALI LETTER VOCALIC LL
|
||||
09F0..09F1 ; Lo # [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL
|
||||
09FC ; Lo # BENGALI LETTER VEDIC ANUSVARA
|
||||
0A05..0A0A ; Lo # [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
|
||||
0A0F..0A10 ; Lo # [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI
|
||||
0A13..0A28 ; Lo # [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA
|
||||
|
@ -2070,6 +2116,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
|
|||
0C3D ; Lo # TELUGU SIGN AVAGRAHA
|
||||
0C58..0C5A ; Lo # [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
|
||||
0C60..0C61 ; Lo # [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
|
||||
0C80 ; Lo # KANNADA SIGN SPACING CANDRABINDU
|
||||
0C85..0C8C ; Lo # [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
|
||||
0C8E..0C90 ; Lo # [3] KANNADA LETTER E..KANNADA LETTER AI
|
||||
0C92..0CA8 ; Lo # [23] KANNADA LETTER O..KANNADA LETTER NA
|
||||
|
@ -2084,6 +2131,7 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
|
|||
0D12..0D3A ; Lo # [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
|
||||
0D3D ; Lo # MALAYALAM SIGN AVAGRAHA
|
||||
0D4E ; Lo # MALAYALAM LETTER DOT REPH
|
||||
0D54..0D56 ; Lo # [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
|
||||
0D5F..0D61 ; Lo # [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
|
||||
0D7A..0D7F ; Lo # [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
|
||||
0D85..0D96 ; Lo # [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
|
||||
|
@ -2156,7 +2204,8 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
|
|||
17DC ; Lo # KHMER SIGN AVAKRAHASANYA
|
||||
1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
|
||||
1844..1877 ; Lo # [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
|
||||
1880..18A8 ; Lo # [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA
|
||||
1880..1884 ; Lo # [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
|
||||
1887..18A8 ; Lo # [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
|
||||
18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA
|
||||
18B0..18F5 ; Lo # [70] CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CARRIER DENTAL S
|
||||
1900..191E ; Lo # [31] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER TRA
|
||||
|
@ -2194,12 +2243,12 @@ FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
|
|||
309F ; Lo # HIRAGANA DIGRAPH YORI
|
||||
30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
|
||||
30FF ; Lo # KATAKANA DIGRAPH KOTO
|
||||
3105..312D ; Lo # [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH
|
||||
3105..312E ; Lo # [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
|
||||
3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
|
||||
31A0..31BA ; Lo # [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
|
||||
31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
|
||||
3400..4DB5 ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
|
||||
4E00..9FD5 ; Lo # [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
|
||||
4E00..9FEA ; Lo # [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
|
||||
A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E
|
||||
A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
|
||||
A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE
|
||||
|
@ -2283,7 +2332,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
10280..1029C ; Lo # [29] LYCIAN LETTER A..LYCIAN LETTER X
|
||||
102A0..102D0 ; Lo # [49] CARIAN LETTER A..CARIAN LETTER UUU3
|
||||
10300..1031F ; Lo # [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
|
||||
10330..10340 ; Lo # [17] GOTHIC LETTER AHSA..GOTHIC LETTER PAIRTHRA
|
||||
1032D..10340 ; Lo # [20] OLD ITALIC LETTER YE..GOTHIC LETTER PAIRTHRA
|
||||
10342..10349 ; Lo # [8] GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL
|
||||
10350..10375 ; Lo # [38] OLD PERMIC LETTER AN..OLD PERMIC LETTER IA
|
||||
10380..1039D ; Lo # [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU
|
||||
|
@ -2349,6 +2398,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
1133D ; Lo # GRANTHA SIGN AVAGRAHA
|
||||
11350 ; Lo # GRANTHA OM
|
||||
1135D..11361 ; Lo # [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL
|
||||
11400..11434 ; Lo # [53] NEWA LETTER A..NEWA LETTER HA
|
||||
11447..1144A ; Lo # [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
|
||||
11480..114AF ; Lo # [48] TIRHUTA ANJI..TIRHUTA LETTER HA
|
||||
114C4..114C5 ; Lo # [2] TIRHUTA SIGN AVAGRAHA..TIRHUTA GVANG
|
||||
114C7 ; Lo # TIRHUTA OM
|
||||
|
@ -2359,7 +2410,21 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA
|
||||
11700..11719 ; Lo # [26] AHOM LETTER KA..AHOM LETTER JHA
|
||||
118FF ; Lo # WARANG CITI OM
|
||||
11A00 ; Lo # ZANABAZAR SQUARE LETTER A
|
||||
11A0B..11A32 ; Lo # [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
|
||||
11A3A ; Lo # ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
|
||||
11A50 ; Lo # SOYOMBO LETTER A
|
||||
11A5C..11A83 ; Lo # [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
|
||||
11A86..11A89 ; Lo # [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
|
||||
11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
|
||||
11C00..11C08 ; Lo # [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
|
||||
11C0A..11C2E ; Lo # [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
|
||||
11C40 ; Lo # BHAIKSUKI SIGN AVAGRAHA
|
||||
11C72..11C8F ; Lo # [30] MARCHEN LETTER KA..MARCHEN LETTER A
|
||||
11D00..11D06 ; Lo # [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
|
||||
11D08..11D09 ; Lo # [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
|
||||
11D0B..11D30 ; Lo # [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
|
||||
11D46 ; Lo # MASARAM GONDI REPHA
|
||||
12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U
|
||||
12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
|
||||
13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
|
||||
|
@ -2372,7 +2437,10 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ
|
||||
16F00..16F44 ; Lo # [69] MIAO LETTER PA..MIAO LETTER HHA
|
||||
16F50 ; Lo # MIAO LETTER NASALIZATION
|
||||
1B000..1B001 ; Lo # [2] KATAKANA LETTER ARCHAIC E..HIRAGANA LETTER ARCHAIC YE
|
||||
17000..187EC ; Lo # [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
|
||||
18800..18AF2 ; Lo # [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
|
||||
1B000..1B11E ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2
|
||||
1B170..1B2FB ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
|
||||
1BC00..1BC6A ; Lo # [107] DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M
|
||||
1BC70..1BC7C ; Lo # [13] DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLOYAN AFFIX ATTACHED TANGENT HOOK
|
||||
1BC80..1BC88 ; Lo # [9] DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HIGH VERTICAL
|
||||
|
@ -2415,9 +2483,10 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
2A700..2B734 ; Lo # [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
|
||||
2B740..2B81D ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
|
||||
2B820..2CEA1 ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
|
||||
2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
|
||||
2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
|
||||
|
||||
# Total code points: 105697
|
||||
# Total code points: 121047
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2446,6 +2515,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
|
||||
0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
|
||||
0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
|
||||
08D4..08E1 ; Mn # [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
|
||||
08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
|
||||
093A ; Mn # DEVANAGARI VOWEL SIGN OE
|
||||
093C ; Mn # DEVANAGARI SIGN NUKTA
|
||||
|
@ -2472,6 +2542,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
0AC7..0AC8 ; Mn # [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
|
||||
0ACD ; Mn # GUJARATI SIGN VIRAMA
|
||||
0AE2..0AE3 ; Mn # [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
|
||||
0AFA..0AFF ; Mn # [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
|
||||
0B01 ; Mn # ORIYA SIGN CANDRABINDU
|
||||
0B3C ; Mn # ORIYA SIGN NUKTA
|
||||
0B3F ; Mn # ORIYA VOWEL SIGN I
|
||||
|
@ -2494,7 +2565,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
0CC6 ; Mn # KANNADA VOWEL SIGN E
|
||||
0CCC..0CCD ; Mn # [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
|
||||
0CE2..0CE3 ; Mn # [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
|
||||
0D01 ; Mn # MALAYALAM SIGN CANDRABINDU
|
||||
0D00..0D01 ; Mn # [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
|
||||
0D3B..0D3C ; Mn # [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
|
||||
0D41..0D44 ; Mn # [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
|
||||
0D4D ; Mn # MALAYALAM SIGN VIRAMA
|
||||
0D62..0D63 ; Mn # [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
|
||||
|
@ -2540,6 +2612,7 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
17C9..17D3 ; Mn # [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
|
||||
17DD ; Mn # KHMER SIGN ATTHACAN
|
||||
180B..180D ; Mn # [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
|
||||
1885..1886 ; Mn # [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
|
||||
18A9 ; Mn # MONGOLIAN LETTER ALI GALI DAGALGA
|
||||
1920..1922 ; Mn # [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
|
||||
1927..1928 ; Mn # [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
|
||||
|
@ -2577,8 +2650,8 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
|
|||
1CED ; Mn # VEDIC SIGN TIRYAK
|
||||
1CF4 ; Mn # VEDIC TONE CANDRA ABOVE
|
||||
1CF8..1CF9 ; Mn # [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
|
||||
1DC0..1DF5 ; Mn # [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
|
||||
1DFC..1DFF ; Mn # [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
|
||||
1DC0..1DF9 ; Mn # [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
|
||||
1DFB..1DFF ; Mn # [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
|
||||
20D0..20DC ; Mn # [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
|
||||
20E1 ; Mn # COMBINING LEFT RIGHT ARROW ABOVE
|
||||
20E5..20F0 ; Mn # [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE
|
||||
|
@ -2595,7 +2668,7 @@ A802 ; Mn # SYLOTI NAGRI SIGN DVISVARA
|
|||
A806 ; Mn # SYLOTI NAGRI SIGN HASANTA
|
||||
A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA
|
||||
A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
|
||||
A8C4 ; Mn # SAURASHTRA SIGN VIRAMA
|
||||
A8C4..A8C5 ; Mn # [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
|
||||
A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
|
||||
A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
|
||||
A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
|
||||
|
@ -2647,6 +2720,7 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
|
|||
1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
|
||||
11234 ; Mn # KHOJKI SIGN ANUSVARA
|
||||
11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
|
||||
1123E ; Mn # KHOJKI SIGN SUKUN
|
||||
112DF ; Mn # KHUDAWADI SIGN ANUSVARA
|
||||
112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
|
||||
11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
|
||||
|
@ -2654,6 +2728,9 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
|
|||
11340 ; Mn # GRANTHA VOWEL SIGN II
|
||||
11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
|
||||
11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
|
||||
11438..1143F ; Mn # [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
|
||||
11442..11444 ; Mn # [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
|
||||
11446 ; Mn # NEWA SIGN NUKTA
|
||||
114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
|
||||
114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E
|
||||
114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA
|
||||
|
@ -2672,6 +2749,27 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
|
|||
1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
|
||||
11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
|
||||
11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
|
||||
11A01..11A06 ; Mn # [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
|
||||
11A09..11A0A ; Mn # [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
|
||||
11A33..11A38 ; Mn # [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
|
||||
11A3B..11A3E ; Mn # [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
|
||||
11A47 ; Mn # ZANABAZAR SQUARE SUBJOINER
|
||||
11A51..11A56 ; Mn # [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
|
||||
11A59..11A5B ; Mn # [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
|
||||
11A8A..11A96 ; Mn # [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
|
||||
11A98..11A99 ; Mn # [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
|
||||
11C30..11C36 ; Mn # [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
|
||||
11C38..11C3D ; Mn # [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
|
||||
11C3F ; Mn # BHAIKSUKI SIGN VIRAMA
|
||||
11C92..11CA7 ; Mn # [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
|
||||
11CAA..11CB0 ; Mn # [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
|
||||
11CB2..11CB3 ; Mn # [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
|
||||
11CB5..11CB6 ; Mn # [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
|
||||
11D31..11D36 ; Mn # [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
|
||||
11D3A ; Mn # MASARAM GONDI VOWEL SIGN E
|
||||
11D3C..11D3D ; Mn # [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
|
||||
11D3F..11D45 ; Mn # [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
|
||||
11D47 ; Mn # MASARAM GONDI RA-KARA
|
||||
16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
|
||||
16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
|
||||
16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW
|
||||
|
@ -2687,10 +2785,16 @@ FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
|
|||
1DA84 ; Mn # SIGNWRITING LOCATION HEAD NECK
|
||||
1DA9B..1DA9F ; Mn # [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
|
||||
1DAA1..1DAAF ; Mn # [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
|
||||
1E000..1E006 ; Mn # [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
|
||||
1E008..1E018 ; Mn # [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
|
||||
1E01B..1E021 ; Mn # [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
|
||||
1E023..1E024 ; Mn # [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
|
||||
1E026..1E02A ; Mn # [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
|
||||
1E8D0..1E8D6 ; Mn # [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
|
||||
1E944..1E94A ; Mn # [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
|
||||
E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
|
||||
|
||||
# Total code points: 1567
|
||||
# Total code points: 1763
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2795,6 +2899,7 @@ A670..A672 ; Me # [3] COMBINING CYRILLIC TEN MILLIONS SIGN..COMBINING CYRIL
|
|||
1C34..1C35 ; Mc # [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
|
||||
1CE1 ; Mc # VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
|
||||
1CF2..1CF3 ; Mc # [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
|
||||
1CF7 ; Mc # VEDIC SIGN ATIKRAMA
|
||||
302E..302F ; Mc # [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK
|
||||
A823..A824 ; Mc # [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
|
||||
A827 ; Mc # SYLOTI NAGRI VOWEL SIGN OO
|
||||
|
@ -2837,6 +2942,9 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
|
|||
1134B..1134D ; Mc # [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
|
||||
11357 ; Mc # GRANTHA AU LENGTH MARK
|
||||
11362..11363 ; Mc # [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
|
||||
11435..11437 ; Mc # [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
|
||||
11440..11441 ; Mc # [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
|
||||
11445 ; Mc # NEWA SIGN VISARGA
|
||||
114B0..114B2 ; Mc # [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II
|
||||
114B9 ; Mc # TIRHUTA VOWEL SIGN E
|
||||
114BB..114BE ; Mc # [4] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN AU
|
||||
|
@ -2852,11 +2960,20 @@ ABEC ; Mc # MEETEI MAYEK LUM IYEK
|
|||
116B6 ; Mc # TAKRI SIGN VIRAMA
|
||||
11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
|
||||
11726 ; Mc # AHOM VOWEL SIGN E
|
||||
11A07..11A08 ; Mc # [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
|
||||
11A39 ; Mc # ZANABAZAR SQUARE SIGN VISARGA
|
||||
11A57..11A58 ; Mc # [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
|
||||
11A97 ; Mc # SOYOMBO SIGN VISARGA
|
||||
11C2F ; Mc # BHAIKSUKI VOWEL SIGN AA
|
||||
11C3E ; Mc # BHAIKSUKI SIGN VISARGA
|
||||
11CA9 ; Mc # MARCHEN SUBJOINED LETTER YA
|
||||
11CB1 ; Mc # MARCHEN VOWEL SIGN I
|
||||
11CB4 ; Mc # MARCHEN VOWEL SIGN O
|
||||
16F51..16F7E ; Mc # [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
|
||||
1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
|
||||
1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5
|
||||
|
||||
# Total code points: 383
|
||||
# Total code points: 401
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2905,16 +3022,20 @@ FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
|
|||
11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE
|
||||
111D0..111D9 ; Nd # [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
|
||||
112F0..112F9 ; Nd # [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE
|
||||
11450..11459 ; Nd # [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
|
||||
114D0..114D9 ; Nd # [10] TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE
|
||||
11650..11659 ; Nd # [10] MODI DIGIT ZERO..MODI DIGIT NINE
|
||||
116C0..116C9 ; Nd # [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE
|
||||
11730..11739 ; Nd # [10] AHOM DIGIT ZERO..AHOM DIGIT NINE
|
||||
118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
|
||||
11C50..11C59 ; Nd # [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
|
||||
11D50..11D59 ; Nd # [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
|
||||
16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE
|
||||
16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE
|
||||
1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
|
||||
1E950..1E959 ; Nd # [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
|
||||
|
||||
# Total code points: 550
|
||||
# Total code points: 590
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2946,7 +3067,8 @@ A6E6..A6EF ; Nl # [10] BAMUM LETTER MO..BAMUM LETTER KOGHOM
|
|||
0B72..0B77 ; No # [6] ORIYA FRACTION ONE QUARTER..ORIYA FRACTION THREE SIXTEENTHS
|
||||
0BF0..0BF2 ; No # [3] TAMIL NUMBER TEN..TAMIL NUMBER ONE THOUSAND
|
||||
0C78..0C7E ; No # [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR
|
||||
0D70..0D75 ; No # [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
|
||||
0D58..0D5E ; No # [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
|
||||
0D70..0D78 ; No # [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
|
||||
0F2A..0F33 ; No # [10] TIBETAN DIGIT HALF ONE..TIBETAN DIGIT HALF ZERO
|
||||
1369..137C ; No # [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND
|
||||
17F0..17F9 ; No # [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON
|
||||
|
@ -2993,12 +3115,13 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
|
|||
111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND
|
||||
1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
|
||||
118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
|
||||
11C5A..11C6C ; No # [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
|
||||
16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS
|
||||
1D360..1D371 ; No # [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
|
||||
1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
|
||||
1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
|
||||
|
||||
# Total code points: 647
|
||||
# Total code points: 676
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -3048,6 +3171,7 @@ A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
|
|||
061C ; Cf # ARABIC LETTER MARK
|
||||
06DD ; Cf # ARABIC END OF AYAH
|
||||
070F ; Cf # SYRIAC ABBREVIATION MARK
|
||||
08E2 ; Cf # ARABIC DISPUTED END OF AYAH
|
||||
180E ; Cf # MONGOLIAN VOWEL SEPARATOR
|
||||
200B..200F ; Cf # [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK
|
||||
202A..202E ; Cf # [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
|
||||
|
@ -3061,7 +3185,7 @@ FFF9..FFFB ; Cf # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION
|
|||
E0001 ; Cf # LANGUAGE TAG
|
||||
E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG
|
||||
|
||||
# Total code points: 150
|
||||
# Total code points: 151
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -3315,6 +3439,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
|
|||
085E ; Po # MANDAIC PUNCTUATION
|
||||
0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
|
||||
0970 ; Po # DEVANAGARI ABBREVIATION SIGN
|
||||
09FD ; Po # BENGALI ABBREVIATION SIGN
|
||||
0AF0 ; Po # GUJARATI ABBREVIATION SIGN
|
||||
0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA
|
||||
0E4F ; Po # THAI CHARACTER FONGMAN
|
||||
|
@ -3366,6 +3491,7 @@ FF3F ; Pc # FULLWIDTH LOW LINE
|
|||
2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN
|
||||
2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM
|
||||
2E41 ; Po # REVERSED COMMA
|
||||
2E43..2E49 ; Po # [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
|
||||
3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK
|
||||
303D ; Po # PART ALTERNATION MARK
|
||||
30FB ; Po # KATAKANA MIDDLE DOT
|
||||
|
@ -3429,10 +3555,19 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
|
|||
111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2
|
||||
11238..1123D ; Po # [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
|
||||
112A9 ; Po # MULTANI SECTION MARK
|
||||
1144B..1144F ; Po # [5] NEWA DANDA..NEWA ABBREVIATION SIGN
|
||||
1145B ; Po # NEWA PLACEHOLDER MARK
|
||||
1145D ; Po # NEWA INSERTION SIGN
|
||||
114C6 ; Po # TIRHUTA ABBREVIATION SIGN
|
||||
115C1..115D7 ; Po # [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
|
||||
11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN
|
||||
11660..1166C ; Po # [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
|
||||
1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
|
||||
11A3F..11A46 ; Po # [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
|
||||
11A9A..11A9C ; Po # [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
|
||||
11A9E..11AA2 ; Po # [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
|
||||
11C41..11C45 ; Po # [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
|
||||
11C70..11C71 ; Po # [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
|
||||
12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
|
||||
16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA
|
||||
16AF5 ; Po # BASSA VAH FULL STOP
|
||||
|
@ -3440,8 +3575,9 @@ FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
|
|||
16B44 ; Po # PAHAWH HMONG SIGN XAUS
|
||||
1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP
|
||||
1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
|
||||
1E95E..1E95F ; Po # [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
|
||||
|
||||
# Total code points: 513
|
||||
# Total code points: 566
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -3528,7 +3664,7 @@ FFE9..FFEC ; Sm # [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW
|
|||
0BF9 ; Sc # TAMIL RUPEE SIGN
|
||||
0E3F ; Sc # THAI CURRENCY SYMBOL BAHT
|
||||
17DB ; Sc # KHMER CURRENCY SYMBOL RIEL
|
||||
20A0..20BE ; Sc # [31] EURO-CURRENCY SIGN..LARI SIGN
|
||||
20A0..20BF ; Sc # [32] EURO-CURRENCY SIGN..BITCOIN SIGN
|
||||
A838 ; Sc # NORTH INDIC RUPEE MARK
|
||||
FDFC ; Sc # RIAL SIGN
|
||||
FE69 ; Sc # SMALL DOLLAR SIGN
|
||||
|
@ -3536,7 +3672,7 @@ FF04 ; Sc # FULLWIDTH DOLLAR SIGN
|
|||
FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN
|
||||
FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN
|
||||
|
||||
# Total code points: 53
|
||||
# Total code points: 54
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -3594,6 +3730,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
|
|||
0BF3..0BF8 ; So # [6] TAMIL DAY SIGN..TAMIL AS ABOVE SIGN
|
||||
0BFA ; So # TAMIL NUMBER SIGN
|
||||
0C7F ; So # TELUGU SIGN TUUMU
|
||||
0D4F ; So # MALAYALAM SIGN PARA
|
||||
0D79 ; So # MALAYALAM DATE MARK
|
||||
0F01..0F03 ; So # [3] TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
|
||||
0F13 ; So # TIBETAN MARK CARET -DZUD RTAGS ME LONG CAN
|
||||
|
@ -3642,8 +3779,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
|
|||
232B..237B ; So # [81] ERASE TO THE LEFT..NOT CHECK MARK
|
||||
237D..239A ; So # [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL
|
||||
23B4..23DB ; So # [40] TOP SQUARE BRACKET..FUSE
|
||||
23E2..23FA ; So # [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD
|
||||
2400..2426 ; So # [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
|
||||
23E2..2426 ; So # [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
|
||||
2440..244A ; So # [11] OCR HOOK..OCR DOUBLE BACKSLASH
|
||||
249C..24E9 ; So # [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
|
||||
2500..25B6 ; So # [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE
|
||||
|
@ -3659,7 +3795,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
|
|||
2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
|
||||
2B98..2BB9 ; So # [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
|
||||
2BBD..2BC8 ; So # [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
|
||||
2BCA..2BD1 ; So # [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
|
||||
2BCA..2BD2 ; So # [9] TOP HALF BLACK CIRCLE..GROUP MARK
|
||||
2BEC..2BEF ; So # [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
|
||||
2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA
|
||||
2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP
|
||||
|
@ -3694,7 +3830,7 @@ FFED..FFEE ; So # [2] HALFWIDTH BLACK SQUARE..HALFWIDTH WHITE CIRCLE
|
|||
FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
|
||||
10137..1013F ; So # [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT
|
||||
10179..10189 ; So # [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
|
||||
1018C ; So # GREEK SINUSOID SIGN
|
||||
1018C..1018E ; So # [3] GREEK SINUSOID SIGN..NOMISMA SIGN
|
||||
10190..1019B ; So # [12] ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN
|
||||
101A0 ; So # GREEK SYMBOL TAU RHO
|
||||
101D0..101FC ; So # [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND
|
||||
|
@ -3727,17 +3863,16 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
|
|||
1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
|
||||
1F110..1F12E ; So # [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
|
||||
1F130..1F16B ; So # [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
|
||||
1F170..1F19A ; So # [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS
|
||||
1F170..1F1AC ; So # [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
|
||||
1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA
|
||||
1F210..1F23A ; So # [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
|
||||
1F210..1F23B ; So # [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
|
||||
1F240..1F248 ; So # [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
|
||||
1F250..1F251 ; So # [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
|
||||
1F260..1F265 ; So # [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
|
||||
1F300..1F3FA ; So # [251] CYCLONE..AMPHORA
|
||||
1F400..1F579 ; So # [378] RAT..JOYSTICK
|
||||
1F57B..1F5A3 ; So # [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
|
||||
1F5A5..1F6D0 ; So # [300] DESKTOP COMPUTER..PLACE OF WORSHIP
|
||||
1F400..1F6D4 ; So # [725] RAT..PAGODA
|
||||
1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
|
||||
1F6F0..1F6F3 ; So # [4] SATELLITE..PASSENGER SHIP
|
||||
1F6F0..1F6F8 ; So # [9] SATELLITE..FLYING SAUCER
|
||||
1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
|
||||
1F780..1F7D4 ; So # [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
|
||||
1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
|
||||
|
@ -3745,11 +3880,15 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
|
|||
1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
|
||||
1F860..1F887 ; So # [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
|
||||
1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
|
||||
1F910..1F918 ; So # [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
|
||||
1F980..1F984 ; So # [5] CRAB..UNICORN FACE
|
||||
1F900..1F90B ; So # [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
|
||||
1F910..1F93E ; So # [47] ZIPPER-MOUTH FACE..HANDBALL
|
||||
1F940..1F94C ; So # [13] WILTED FLOWER..CURLING STONE
|
||||
1F950..1F96B ; So # [28] CROISSANT..CANNED FOOD
|
||||
1F980..1F997 ; So # [24] CRAB..CRICKET
|
||||
1F9C0 ; So # CHEESE WEDGE
|
||||
1F9D0..1F9E6 ; So # [23] FACE WITH MONOCLE..SOCKS
|
||||
|
||||
# Total code points: 5677
|
||||
# Total code points: 5855
|
||||
|
||||
# ================================================
|
||||
|
||||
|
|
|
@ -1,10 +1,11 @@
|
|||
# GraphemeBreakProperty-8.0.0.txt
|
||||
# Date: 2015-02-13, 13:47:14 GMT [MD]
|
||||
# GraphemeBreakProperty-10.0.0.txt
|
||||
# Date: 2017-03-12, 07:03:41 GMT
|
||||
# © 2017 Unicode®, Inc.
|
||||
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
#
|
||||
# Unicode Character Database
|
||||
# Copyright (c) 1991-2015 Unicode, Inc.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -17,6 +18,21 @@
|
|||
|
||||
# ================================================
|
||||
|
||||
0600..0605 ; Prepend # Cf [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
|
||||
06DD ; Prepend # Cf ARABIC END OF AYAH
|
||||
070F ; Prepend # Cf SYRIAC ABBREVIATION MARK
|
||||
08E2 ; Prepend # Cf ARABIC DISPUTED END OF AYAH
|
||||
0D4E ; Prepend # Lo MALAYALAM LETTER DOT REPH
|
||||
110BD ; Prepend # Cf KAITHI NUMBER SIGN
|
||||
111C2..111C3 ; Prepend # Lo [2] SHARADA SIGN JIHVAMULIYA..SHARADA SIGN UPADHMANIYA
|
||||
11A3A ; Prepend # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
|
||||
11A86..11A89 ; Prepend # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
|
||||
11D46 ; Prepend # Lo MASARAM GONDI REPHA
|
||||
|
||||
# Total code points: 19
|
||||
|
||||
# ================================================
|
||||
|
||||
000D ; CR # Cc <control-000D>
|
||||
|
||||
# Total code points: 1
|
||||
|
@ -34,10 +50,7 @@
|
|||
000E..001F ; Control # Cc [18] <control-000E>..<control-001F>
|
||||
007F..009F ; Control # Cc [33] <control-007F>..<control-009F>
|
||||
00AD ; Control # Cf SOFT HYPHEN
|
||||
0600..0605 ; Control # Cf [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
|
||||
061C ; Control # Cf ARABIC LETTER MARK
|
||||
06DD ; Control # Cf ARABIC END OF AYAH
|
||||
070F ; Control # Cf SYRIAC ABBREVIATION MARK
|
||||
180E ; Control # Cf MONGOLIAN VOWEL SEPARATOR
|
||||
200B ; Control # Cf ZERO WIDTH SPACE
|
||||
200E..200F ; Control # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
|
||||
|
@ -51,17 +64,15 @@ D800..DFFF ; Control # Cs [2048] <surrogate-D800>..<surrogate-DFFF>
|
|||
FEFF ; Control # Cf ZERO WIDTH NO-BREAK SPACE
|
||||
FFF0..FFF8 ; Control # Cn [9] <reserved-FFF0>..<reserved-FFF8>
|
||||
FFF9..FFFB ; Control # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR
|
||||
110BD ; Control # Cf KAITHI NUMBER SIGN
|
||||
1BCA0..1BCA3 ; Control # Cf [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP
|
||||
1D173..1D17A ; Control # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
|
||||
E0000 ; Control # Cn <reserved-E0000>
|
||||
E0001 ; Control # Cf LANGUAGE TAG
|
||||
E0002..E001F ; Control # Cn [30] <reserved-E0002>..<reserved-E001F>
|
||||
E0020..E007F ; Control # Cf [96] TAG SPACE..CANCEL TAG
|
||||
E0080..E00FF ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF>
|
||||
E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
|
||||
|
||||
# Total code points: 6030
|
||||
# Total code points: 5925
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -89,6 +100,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
|
|||
0825..0827 ; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
|
||||
0829..082D ; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
|
||||
0859..085B ; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
|
||||
08D4..08E1 ; Extend # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
|
||||
08E3..0902 ; Extend # Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
|
||||
093A ; Extend # Mn DEVANAGARI VOWEL SIGN OE
|
||||
093C ; Extend # Mn DEVANAGARI SIGN NUKTA
|
||||
|
@ -117,6 +129,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
|
|||
0AC7..0AC8 ; Extend # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
|
||||
0ACD ; Extend # Mn GUJARATI SIGN VIRAMA
|
||||
0AE2..0AE3 ; Extend # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
|
||||
0AFA..0AFF ; Extend # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
|
||||
0B01 ; Extend # Mn ORIYA SIGN CANDRABINDU
|
||||
0B3C ; Extend # Mn ORIYA SIGN NUKTA
|
||||
0B3E ; Extend # Mc ORIYA VOWEL SIGN AA
|
||||
|
@ -145,7 +158,8 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
|
|||
0CCC..0CCD ; Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
|
||||
0CD5..0CD6 ; Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
|
||||
0CE2..0CE3 ; Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
|
||||
0D01 ; Extend # Mn MALAYALAM SIGN CANDRABINDU
|
||||
0D00..0D01 ; Extend # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
|
||||
0D3B..0D3C ; Extend # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
|
||||
0D3E ; Extend # Mc MALAYALAM VOWEL SIGN AA
|
||||
0D41..0D44 ; Extend # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
|
||||
0D4D ; Extend # Mn MALAYALAM SIGN VIRAMA
|
||||
|
@ -195,6 +209,7 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
|
|||
17C9..17D3 ; Extend # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
|
||||
17DD ; Extend # Mn KHMER SIGN ATTHACAN
|
||||
180B..180D ; Extend # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
|
||||
1885..1886 ; Extend # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
|
||||
18A9 ; Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA
|
||||
1920..1922 ; Extend # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
|
||||
1927..1928 ; Extend # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
|
||||
|
@ -233,9 +248,9 @@ E01F0..E0FFF ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
|
|||
1CED ; Extend # Mn VEDIC SIGN TIRYAK
|
||||
1CF4 ; Extend # Mn VEDIC TONE CANDRA ABOVE
|
||||
1CF8..1CF9 ; Extend # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
|
||||
1DC0..1DF5 ; Extend # Mn [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
|
||||
1DFC..1DFF ; Extend # Mn [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
|
||||
200C..200D ; Extend # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
|
||||
1DC0..1DF9 ; Extend # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
|
||||
1DFB..1DFF ; Extend # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
|
||||
200C ; Extend # Cf ZERO WIDTH NON-JOINER
|
||||
20D0..20DC ; Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
|
||||
20DD..20E0 ; Extend # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
|
||||
20E1 ; Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE
|
||||
|
@ -256,7 +271,7 @@ A802 ; Extend # Mn SYLOTI NAGRI SIGN DVISVARA
|
|||
A806 ; Extend # Mn SYLOTI NAGRI SIGN HASANTA
|
||||
A80B ; Extend # Mn SYLOTI NAGRI SIGN ANUSVARA
|
||||
A825..A826 ; Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
|
||||
A8C4 ; Extend # Mn SAURASHTRA SIGN VIRAMA
|
||||
A8C4..A8C5 ; Extend # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
|
||||
A8E0..A8F1 ; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
|
||||
A926..A92D ; Extend # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
|
||||
A947..A951 ; Extend # Mn [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
|
||||
|
@ -309,6 +324,7 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
|
|||
1122F..11231 ; Extend # Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
|
||||
11234 ; Extend # Mn KHOJKI SIGN ANUSVARA
|
||||
11236..11237 ; Extend # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
|
||||
1123E ; Extend # Mn KHOJKI SIGN SUKUN
|
||||
112DF ; Extend # Mn KHUDAWADI SIGN ANUSVARA
|
||||
112E3..112EA ; Extend # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
|
||||
11300..11301 ; Extend # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
|
||||
|
@ -318,6 +334,9 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
|
|||
11357 ; Extend # Mc GRANTHA AU LENGTH MARK
|
||||
11366..1136C ; Extend # Mn [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
|
||||
11370..11374 ; Extend # Mn [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
|
||||
11438..1143F ; Extend # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
|
||||
11442..11444 ; Extend # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
|
||||
11446 ; Extend # Mn NEWA SIGN NUKTA
|
||||
114B0 ; Extend # Mc TIRHUTA VOWEL SIGN AA
|
||||
114B3..114B8 ; Extend # Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
|
||||
114BA ; Extend # Mn TIRHUTA VOWEL SIGN SHORT E
|
||||
|
@ -339,6 +358,27 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
|
|||
1171D..1171F ; Extend # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
|
||||
11722..11725 ; Extend # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
|
||||
11727..1172B ; Extend # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
|
||||
11A01..11A06 ; Extend # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
|
||||
11A09..11A0A ; Extend # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
|
||||
11A33..11A38 ; Extend # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
|
||||
11A3B..11A3E ; Extend # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
|
||||
11A47 ; Extend # Mn ZANABAZAR SQUARE SUBJOINER
|
||||
11A51..11A56 ; Extend # Mn [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
|
||||
11A59..11A5B ; Extend # Mn [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
|
||||
11A8A..11A96 ; Extend # Mn [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
|
||||
11A98..11A99 ; Extend # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
|
||||
11C30..11C36 ; Extend # Mn [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
|
||||
11C38..11C3D ; Extend # Mn [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
|
||||
11C3F ; Extend # Mn BHAIKSUKI SIGN VIRAMA
|
||||
11C92..11CA7 ; Extend # Mn [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
|
||||
11CAA..11CB0 ; Extend # Mn [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
|
||||
11CB2..11CB3 ; Extend # Mn [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
|
||||
11CB5..11CB6 ; Extend # Mn [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
|
||||
11D31..11D36 ; Extend # Mn [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
|
||||
11D3A ; Extend # Mn MASARAM GONDI VOWEL SIGN E
|
||||
11D3C..11D3D ; Extend # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
|
||||
11D3F..11D45 ; Extend # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
|
||||
11D47 ; Extend # Mn MASARAM GONDI RA-KARA
|
||||
16AF0..16AF4 ; Extend # Mn [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
|
||||
16B30..16B36 ; Extend # Mn [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
|
||||
16F8F..16F92 ; Extend # Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW
|
||||
|
@ -356,10 +396,17 @@ FF9E..FF9F ; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
|
|||
1DA84 ; Extend # Mn SIGNWRITING LOCATION HEAD NECK
|
||||
1DA9B..1DA9F ; Extend # Mn [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
|
||||
1DAA1..1DAAF ; Extend # Mn [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
|
||||
1E000..1E006 ; Extend # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
|
||||
1E008..1E018 ; Extend # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
|
||||
1E01B..1E021 ; Extend # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
|
||||
1E023..1E024 ; Extend # Mn [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
|
||||
1E026..1E02A ; Extend # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
|
||||
1E8D0..1E8D6 ; Extend # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
|
||||
1E944..1E94A ; Extend # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
|
||||
E0020..E007F ; Extend # Cf [96] TAG SPACE..CANCEL TAG
|
||||
E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
|
||||
|
||||
# Total code points: 1610
|
||||
# Total code points: 1901
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -444,6 +491,7 @@ E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
|
|||
1C34..1C35 ; SpacingMark # Mc [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
|
||||
1CE1 ; SpacingMark # Mc VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
|
||||
1CF2..1CF3 ; SpacingMark # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
|
||||
1CF7 ; SpacingMark # Mc VEDIC SIGN ATIKRAMA
|
||||
A823..A824 ; SpacingMark # Mc [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
|
||||
A827 ; SpacingMark # Mc SYLOTI NAGRI VOWEL SIGN OO
|
||||
A880..A881 ; SpacingMark # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
|
||||
|
@ -482,6 +530,9 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
|
|||
11347..11348 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI
|
||||
1134B..1134D ; SpacingMark # Mc [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
|
||||
11362..11363 ; SpacingMark # Mc [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
|
||||
11435..11437 ; SpacingMark # Mc [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
|
||||
11440..11441 ; SpacingMark # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
|
||||
11445 ; SpacingMark # Mc NEWA SIGN VISARGA
|
||||
114B1..114B2 ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN I..TIRHUTA VOWEL SIGN II
|
||||
114B9 ; SpacingMark # Mc TIRHUTA VOWEL SIGN E
|
||||
114BB..114BC ; SpacingMark # Mc [2] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN O
|
||||
|
@ -498,11 +549,20 @@ ABEC ; SpacingMark # Mc MEETEI MAYEK LUM IYEK
|
|||
116B6 ; SpacingMark # Mc TAKRI SIGN VIRAMA
|
||||
11720..11721 ; SpacingMark # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
|
||||
11726 ; SpacingMark # Mc AHOM VOWEL SIGN E
|
||||
11A07..11A08 ; SpacingMark # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
|
||||
11A39 ; SpacingMark # Mc ZANABAZAR SQUARE SIGN VISARGA
|
||||
11A57..11A58 ; SpacingMark # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
|
||||
11A97 ; SpacingMark # Mc SOYOMBO SIGN VISARGA
|
||||
11C2F ; SpacingMark # Mc BHAIKSUKI VOWEL SIGN AA
|
||||
11C3E ; SpacingMark # Mc BHAIKSUKI SIGN VISARGA
|
||||
11CA9 ; SpacingMark # Mc MARCHEN SUBJOINED LETTER YA
|
||||
11CB1 ; SpacingMark # Mc MARCHEN VOWEL SIGN I
|
||||
11CB4 ; SpacingMark # Mc MARCHEN VOWEL SIGN O
|
||||
16F51..16F7E ; SpacingMark # Mc [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
|
||||
1D166 ; SpacingMark # Mc MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
|
||||
1D16D ; SpacingMark # Mc MUSICAL SYMBOL COMBINING AUGMENTATION DOT
|
||||
|
||||
# Total code points: 330
|
||||
# Total code points: 348
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1333,4 +1393,83 @@ D789..D7A3 ; LVT # Lo [27] HANGUL SYLLABLE HIG..HANGUL SYLLABLE HIH
|
|||
|
||||
# Total code points: 10773
|
||||
|
||||
# ================================================
|
||||
|
||||
261D ; E_Base # So WHITE UP POINTING INDEX
|
||||
26F9 ; E_Base # So PERSON WITH BALL
|
||||
270A..270D ; E_Base # So [4] RAISED FIST..WRITING HAND
|
||||
1F385 ; E_Base # So FATHER CHRISTMAS
|
||||
1F3C2..1F3C4 ; E_Base # So [3] SNOWBOARDER..SURFER
|
||||
1F3C7 ; E_Base # So HORSE RACING
|
||||
1F3CA..1F3CC ; E_Base # So [3] SWIMMER..GOLFER
|
||||
1F442..1F443 ; E_Base # So [2] EAR..NOSE
|
||||
1F446..1F450 ; E_Base # So [11] WHITE UP POINTING BACKHAND INDEX..OPEN HANDS SIGN
|
||||
1F46E ; E_Base # So POLICE OFFICER
|
||||
1F470..1F478 ; E_Base # So [9] BRIDE WITH VEIL..PRINCESS
|
||||
1F47C ; E_Base # So BABY ANGEL
|
||||
1F481..1F483 ; E_Base # So [3] INFORMATION DESK PERSON..DANCER
|
||||
1F485..1F487 ; E_Base # So [3] NAIL POLISH..HAIRCUT
|
||||
1F4AA ; E_Base # So FLEXED BICEPS
|
||||
1F574..1F575 ; E_Base # So [2] MAN IN BUSINESS SUIT LEVITATING..SLEUTH OR SPY
|
||||
1F57A ; E_Base # So MAN DANCING
|
||||
1F590 ; E_Base # So RAISED HAND WITH FINGERS SPLAYED
|
||||
1F595..1F596 ; E_Base # So [2] REVERSED HAND WITH MIDDLE FINGER EXTENDED..RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS
|
||||
1F645..1F647 ; E_Base # So [3] FACE WITH NO GOOD GESTURE..PERSON BOWING DEEPLY
|
||||
1F64B..1F64F ; E_Base # So [5] HAPPY PERSON RAISING ONE HAND..PERSON WITH FOLDED HANDS
|
||||
1F6A3 ; E_Base # So ROWBOAT
|
||||
1F6B4..1F6B6 ; E_Base # So [3] BICYCLIST..PEDESTRIAN
|
||||
1F6C0 ; E_Base # So BATH
|
||||
1F6CC ; E_Base # So SLEEPING ACCOMMODATION
|
||||
1F918..1F91C ; E_Base # So [5] SIGN OF THE HORNS..RIGHT-FACING FIST
|
||||
1F91E..1F91F ; E_Base # So [2] HAND WITH INDEX AND MIDDLE FINGERS CROSSED..I LOVE YOU HAND SIGN
|
||||
1F926 ; E_Base # So FACE PALM
|
||||
1F930..1F939 ; E_Base # So [10] PREGNANT WOMAN..JUGGLING
|
||||
1F93D..1F93E ; E_Base # So [2] WATER POLO..HANDBALL
|
||||
1F9D1..1F9DD ; E_Base # So [13] ADULT..ELF
|
||||
|
||||
# Total code points: 98
|
||||
|
||||
# ================================================
|
||||
|
||||
1F3FB..1F3FF ; E_Modifier # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
|
||||
|
||||
# Total code points: 5
|
||||
|
||||
# ================================================
|
||||
|
||||
200D ; ZWJ # Cf ZERO WIDTH JOINER
|
||||
|
||||
# Total code points: 1
|
||||
|
||||
# ================================================
|
||||
|
||||
2640 ; Glue_After_Zwj # So FEMALE SIGN
|
||||
2642 ; Glue_After_Zwj # So MALE SIGN
|
||||
2695..2696 ; Glue_After_Zwj # So [2] STAFF OF AESCULAPIUS..SCALES
|
||||
2708 ; Glue_After_Zwj # So AIRPLANE
|
||||
2764 ; Glue_After_Zwj # So HEAVY BLACK HEART
|
||||
1F308 ; Glue_After_Zwj # So RAINBOW
|
||||
1F33E ; Glue_After_Zwj # So EAR OF RICE
|
||||
1F373 ; Glue_After_Zwj # So COOKING
|
||||
1F393 ; Glue_After_Zwj # So GRADUATION CAP
|
||||
1F3A4 ; Glue_After_Zwj # So MICROPHONE
|
||||
1F3A8 ; Glue_After_Zwj # So ARTIST PALETTE
|
||||
1F3EB ; Glue_After_Zwj # So SCHOOL
|
||||
1F3ED ; Glue_After_Zwj # So FACTORY
|
||||
1F48B ; Glue_After_Zwj # So KISS MARK
|
||||
1F4BB..1F4BC ; Glue_After_Zwj # So [2] PERSONAL COMPUTER..BRIEFCASE
|
||||
1F527 ; Glue_After_Zwj # So WRENCH
|
||||
1F52C ; Glue_After_Zwj # So MICROSCOPE
|
||||
1F5E8 ; Glue_After_Zwj # So LEFT SPEECH BUBBLE
|
||||
1F680 ; Glue_After_Zwj # So ROCKET
|
||||
1F692 ; Glue_After_Zwj # So FIRE ENGINE
|
||||
|
||||
# Total code points: 22
|
||||
|
||||
# ================================================
|
||||
|
||||
1F466..1F469 ; E_Base_GAZ # So [4] BOY..WOMAN
|
||||
|
||||
# Total code points: 4
|
||||
|
||||
# EOF
|
||||
|
|
|
@ -1,10 +1,11 @@
|
|||
# Scripts-8.0.0.txt
|
||||
# Date: 2015-03-11, 22:29:42 GMT [MD]
|
||||
# Scripts-10.0.0.txt
|
||||
# Date: 2017-03-11, 06:40:37 GMT
|
||||
# © 2017 Unicode®, Inc.
|
||||
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
#
|
||||
# Unicode Character Database
|
||||
# Copyright (c) 1991-2015 Unicode, Inc.
|
||||
# For terms of use, see http://www.unicode.org/terms_of_use.html
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
# For documentation, see http://www.unicode.org/reports/tr44/
|
||||
# For more information, see:
|
||||
# UAX #24, Unicode Script Property: http://www.unicode.org/reports/tr24/
|
||||
# Especially the sections:
|
||||
|
@ -92,10 +93,10 @@
|
|||
0605 ; Common # Cf ARABIC NUMBER MARK ABOVE
|
||||
060C ; Common # Po ARABIC COMMA
|
||||
061B ; Common # Po ARABIC SEMICOLON
|
||||
061C ; Common # Cf ARABIC LETTER MARK
|
||||
061F ; Common # Po ARABIC QUESTION MARK
|
||||
0640 ; Common # Lm ARABIC TATWEEL
|
||||
06DD ; Common # Cf ARABIC END OF AYAH
|
||||
08E2 ; Common # Cf ARABIC DISPUTED END OF AYAH
|
||||
0964..0965 ; Common # Po [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
|
||||
0E3F ; Common # Sc THAI CURRENCY SYMBOL BAHT
|
||||
0FD5..0FD8 ; Common # So [4] RIGHT-FACING SVASTI SIGN..LEFT-FACING SVASTI SIGN WITH DOTS
|
||||
|
@ -110,6 +111,7 @@
|
|||
1CEE..1CF1 ; Common # Lo [4] VEDIC SIGN HEXIFORM LONG ANUSVARA..VEDIC SIGN ANUSVARA UBHAYATO MUKHA
|
||||
1CF2..1CF3 ; Common # Mc [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
|
||||
1CF5..1CF6 ; Common # Lo [2] VEDIC SIGN JIHVAMULIYA..VEDIC SIGN UPADHMANIYA
|
||||
1CF7 ; Common # Mc VEDIC SIGN ATIKRAMA
|
||||
2000..200A ; Common # Zs [11] EN QUAD..HAIR SPACE
|
||||
200B ; Common # Cf ZERO WIDTH SPACE
|
||||
200E..200F ; Common # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
|
||||
|
@ -153,7 +155,7 @@
|
|||
208A..208C ; Common # Sm [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN
|
||||
208D ; Common # Ps SUBSCRIPT LEFT PARENTHESIS
|
||||
208E ; Common # Pe SUBSCRIPT RIGHT PARENTHESIS
|
||||
20A0..20BE ; Common # Sc [31] EURO-CURRENCY SIGN..LARI SIGN
|
||||
20A0..20BF ; Common # Sc [32] EURO-CURRENCY SIGN..BITCOIN SIGN
|
||||
2100..2101 ; Common # So [2] ACCOUNT OF..ADDRESSED TO THE SUBJECT
|
||||
2102 ; Common # L& DOUBLE-STRUCK CAPITAL C
|
||||
2103..2106 ; Common # So [4] DEGREE CELSIUS..CADA UNA
|
||||
|
@ -223,8 +225,7 @@
|
|||
239B..23B3 ; Common # Sm [25] LEFT PARENTHESIS UPPER HOOK..SUMMATION BOTTOM
|
||||
23B4..23DB ; Common # So [40] TOP SQUARE BRACKET..FUSE
|
||||
23DC..23E1 ; Common # Sm [6] TOP PARENTHESIS..BOTTOM TORTOISE SHELL BRACKET
|
||||
23E2..23FA ; Common # So [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD
|
||||
2400..2426 ; Common # So [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
|
||||
23E2..2426 ; Common # So [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
|
||||
2440..244A ; Common # So [11] OCR HOOK..OCR DOUBLE BACKSLASH
|
||||
2460..249B ; Common # No [60] CIRCLED DIGIT ONE..NUMBER TWENTY FULL STOP
|
||||
249C..24E9 ; Common # So [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
|
||||
|
@ -309,7 +310,7 @@
|
|||
2B76..2B95 ; Common # So [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
|
||||
2B98..2BB9 ; Common # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
|
||||
2BBD..2BC8 ; Common # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
|
||||
2BCA..2BD1 ; Common # So [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
|
||||
2BCA..2BD2 ; Common # So [9] TOP HALF BLACK CIRCLE..GROUP MARK
|
||||
2BEC..2BEF ; Common # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
|
||||
2E00..2E01 ; Common # Po [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER
|
||||
2E02 ; Common # Pi LEFT SUBSTITUTION BRACKET
|
||||
|
@ -348,6 +349,7 @@
|
|||
2E40 ; Common # Pd DOUBLE HYPHEN
|
||||
2E41 ; Common # Po REVERSED COMMA
|
||||
2E42 ; Common # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK
|
||||
2E43..2E49 ; Common # Po [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
|
||||
2FF0..2FFB ; Common # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
|
||||
3000 ; Common # Zs IDEOGRAPHIC SPACE
|
||||
3001..3003 ; Common # Po [3] IDEOGRAPHIC COMMA..DITTO MARK
|
||||
|
@ -572,19 +574,18 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
|
|||
1F100..1F10C ; Common # No [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
|
||||
1F110..1F12E ; Common # So [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
|
||||
1F130..1F16B ; Common # So [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
|
||||
1F170..1F19A ; Common # So [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS
|
||||
1F170..1F1AC ; Common # So [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
|
||||
1F1E6..1F1FF ; Common # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
|
||||
1F201..1F202 ; Common # So [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA
|
||||
1F210..1F23A ; Common # So [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
|
||||
1F210..1F23B ; Common # So [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
|
||||
1F240..1F248 ; Common # So [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
|
||||
1F250..1F251 ; Common # So [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
|
||||
1F260..1F265 ; Common # So [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
|
||||
1F300..1F3FA ; Common # So [251] CYCLONE..AMPHORA
|
||||
1F3FB..1F3FF ; Common # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
|
||||
1F400..1F579 ; Common # So [378] RAT..JOYSTICK
|
||||
1F57B..1F5A3 ; Common # So [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
|
||||
1F5A5..1F6D0 ; Common # So [300] DESKTOP COMPUTER..PLACE OF WORSHIP
|
||||
1F400..1F6D4 ; Common # So [725] RAT..PAGODA
|
||||
1F6E0..1F6EC ; Common # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
|
||||
1F6F0..1F6F3 ; Common # So [4] SATELLITE..PASSENGER SHIP
|
||||
1F6F0..1F6F8 ; Common # So [9] SATELLITE..FLYING SAUCER
|
||||
1F700..1F773 ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
|
||||
1F780..1F7D4 ; Common # So [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
|
||||
1F800..1F80B ; Common # So [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
|
||||
|
@ -592,13 +593,17 @@ FFFC..FFFD ; Common # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
|
|||
1F850..1F859 ; Common # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
|
||||
1F860..1F887 ; Common # So [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
|
||||
1F890..1F8AD ; Common # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
|
||||
1F910..1F918 ; Common # So [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
|
||||
1F980..1F984 ; Common # So [5] CRAB..UNICORN FACE
|
||||
1F900..1F90B ; Common # So [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
|
||||
1F910..1F93E ; Common # So [47] ZIPPER-MOUTH FACE..HANDBALL
|
||||
1F940..1F94C ; Common # So [13] WILTED FLOWER..CURLING STONE
|
||||
1F950..1F96B ; Common # So [28] CROISSANT..CANNED FOOD
|
||||
1F980..1F997 ; Common # So [24] CRAB..CRICKET
|
||||
1F9C0 ; Common # So CHEESE WEDGE
|
||||
1F9D0..1F9E6 ; Common # So [23] FACE WITH MONOCLE..SOCKS
|
||||
E0001 ; Common # Cf LANGUAGE TAG
|
||||
E0020..E007F ; Common # Cf [96] TAG SPACE..CANCEL TAG
|
||||
|
||||
# Total code points: 7179
|
||||
# Total code points: 7363
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -641,7 +646,7 @@ A770 ; Latin # Lm MODIFIER LETTER US
|
|||
A771..A787 ; Latin # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T
|
||||
A78B..A78E ; Latin # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
|
||||
A78F ; Latin # Lo LATIN LETTER SINOLOGICAL DOT
|
||||
A790..A7AD ; Latin # L& [30] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER L WITH BELT
|
||||
A790..A7AE ; Latin # L& [31] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER SMALL CAPITAL I
|
||||
A7B0..A7B7 ; Latin # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA
|
||||
A7F7 ; Latin # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I
|
||||
A7F8..A7F9 ; Latin # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
|
||||
|
@ -654,7 +659,7 @@ FB00..FB06 ; Latin # L& [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE S
|
|||
FF21..FF3A ; Latin # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
|
||||
FF41..FF5A ; Latin # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
|
||||
|
||||
# Total code points: 1349
|
||||
# Total code points: 1350
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -708,13 +713,13 @@ AB65 ; Greek # L& GREEK LETTER SMALL CAPITAL OMEGA
|
|||
10175..10178 ; Greek # No [4] GREEK ONE HALF SIGN..GREEK THREE QUARTERS SIGN
|
||||
10179..10189 ; Greek # So [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
|
||||
1018A..1018B ; Greek # No [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN
|
||||
1018C ; Greek # So GREEK SINUSOID SIGN
|
||||
1018C..1018E ; Greek # So [3] GREEK SINUSOID SIGN..NOMISMA SIGN
|
||||
101A0 ; Greek # So GREEK SYMBOL TAU RHO
|
||||
1D200..1D241 ; Greek # So [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54
|
||||
1D242..1D244 ; Greek # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
|
||||
1D245 ; Greek # So GREEK MUSICAL LEIMMA
|
||||
|
||||
# Total code points: 516
|
||||
# Total code points: 518
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -724,6 +729,7 @@ AB65 ; Greek # L& GREEK LETTER SMALL CAPITAL OMEGA
|
|||
0487 ; Cyrillic # Mn COMBINING CYRILLIC POKRYTIE
|
||||
0488..0489 ; Cyrillic # Me [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN
|
||||
048A..052F ; Cyrillic # L& [166] CYRILLIC CAPITAL LETTER SHORT I WITH TAIL..CYRILLIC SMALL LETTER EL WITH DESCENDER
|
||||
1C80..1C88 ; Cyrillic # L& [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
|
||||
1D2B ; Cyrillic # L& CYRILLIC LETTER SMALL CAPITAL EL
|
||||
1D78 ; Cyrillic # Lm MODIFIER LETTER CYRILLIC EN
|
||||
2DE0..2DFF ; Cyrillic # Mn [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS
|
||||
|
@ -740,7 +746,7 @@ A69C..A69D ; Cyrillic # Lm [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER
|
|||
A69E..A69F ; Cyrillic # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
|
||||
FE2E..FE2F ; Cyrillic # Mn [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF
|
||||
|
||||
# Total code points: 434
|
||||
# Total code points: 443
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -791,6 +797,7 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
|
|||
060D ; Arabic # Po ARABIC DATE SEPARATOR
|
||||
060E..060F ; Arabic # So [2] ARABIC POETIC VERSE SIGN..ARABIC SIGN MISRA
|
||||
0610..061A ; Arabic # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA
|
||||
061C ; Arabic # Cf ARABIC LETTER MARK
|
||||
061E ; Arabic # Po ARABIC TRIPLE DOT PUNCTUATION MARK
|
||||
0620..063F ; Arabic # Lo [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE
|
||||
0641..064A ; Arabic # Lo [10] ARABIC LETTER FEH..ARABIC LETTER YEH
|
||||
|
@ -815,6 +822,8 @@ FB46..FB4F ; Hebrew # Lo [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
|
|||
06FF ; Arabic # Lo ARABIC LETTER HEH WITH INVERTED V
|
||||
0750..077F ; Arabic # Lo [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE
|
||||
08A0..08B4 ; Arabic # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
|
||||
08B6..08BD ; Arabic # Lo [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
|
||||
08D4..08E1 ; Arabic # Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
|
||||
08E3..08FF ; Arabic # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
|
||||
FB50..FBB1 ; Arabic # Lo [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
|
||||
FBB2..FBC1 ; Arabic # Sk [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW
|
||||
|
@ -862,7 +871,7 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
|
|||
1EEAB..1EEBB ; Arabic # Lo [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN
|
||||
1EEF0..1EEF1 ; Arabic # Sm [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL
|
||||
|
||||
# Total code points: 1257
|
||||
# Total code points: 1280
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -873,8 +882,9 @@ FE76..FEFC ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
|
|||
0712..072F ; Syriac # Lo [30] SYRIAC LETTER BETH..SYRIAC LETTER PERSIAN DHALATH
|
||||
0730..074A ; Syriac # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
|
||||
074D..074F ; Syriac # Lo [3] SYRIAC LETTER SOGDIAN ZHAIN..SYRIAC LETTER SOGDIAN FE
|
||||
0860..086A ; Syriac # Lo [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
|
||||
|
||||
# Total code points: 77
|
||||
# Total code points: 88
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -944,8 +954,10 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
|
|||
09F4..09F9 ; Bengali # No [6] BENGALI CURRENCY NUMERATOR ONE..BENGALI CURRENCY DENOMINATOR SIXTEEN
|
||||
09FA ; Bengali # So BENGALI ISSHAR
|
||||
09FB ; Bengali # Sc BENGALI GANDA MARK
|
||||
09FC ; Bengali # Lo BENGALI LETTER VEDIC ANUSVARA
|
||||
09FD ; Bengali # Po BENGALI ABBREVIATION SIGN
|
||||
|
||||
# Total code points: 93
|
||||
# Total code points: 95
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -998,8 +1010,9 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
|
|||
0AF0 ; Gujarati # Po GUJARATI ABBREVIATION SIGN
|
||||
0AF1 ; Gujarati # Sc GUJARATI RUPEE SIGN
|
||||
0AF9 ; Gujarati # Lo GUJARATI LETTER ZHA
|
||||
0AFA..0AFF ; Gujarati # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
|
||||
|
||||
# Total code points: 85
|
||||
# Total code points: 91
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1086,6 +1099,7 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
|
|||
|
||||
# ================================================
|
||||
|
||||
0C80 ; Kannada # Lo KANNADA SIGN SPACING CANDRABINDU
|
||||
0C81 ; Kannada # Mn KANNADA SIGN CANDRABINDU
|
||||
0C82..0C83 ; Kannada # Mc [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
|
||||
0C85..0C8C ; Kannada # Lo [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
|
||||
|
@ -1109,15 +1123,16 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
|
|||
0CE6..0CEF ; Kannada # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
|
||||
0CF1..0CF2 ; Kannada # Lo [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA
|
||||
|
||||
# Total code points: 87
|
||||
# Total code points: 88
|
||||
|
||||
# ================================================
|
||||
|
||||
0D01 ; Malayalam # Mn MALAYALAM SIGN CANDRABINDU
|
||||
0D00..0D01 ; Malayalam # Mn [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
|
||||
0D02..0D03 ; Malayalam # Mc [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
|
||||
0D05..0D0C ; Malayalam # Lo [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
|
||||
0D0E..0D10 ; Malayalam # Lo [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
|
||||
0D12..0D3A ; Malayalam # Lo [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
|
||||
0D3B..0D3C ; Malayalam # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
|
||||
0D3D ; Malayalam # Lo MALAYALAM SIGN AVAGRAHA
|
||||
0D3E..0D40 ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN II
|
||||
0D41..0D44 ; Malayalam # Mn [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
|
||||
|
@ -1125,15 +1140,18 @@ A8FD ; Devanagari # Lo DEVANAGARI JAIN OM
|
|||
0D4A..0D4C ; Malayalam # Mc [3] MALAYALAM VOWEL SIGN O..MALAYALAM VOWEL SIGN AU
|
||||
0D4D ; Malayalam # Mn MALAYALAM SIGN VIRAMA
|
||||
0D4E ; Malayalam # Lo MALAYALAM LETTER DOT REPH
|
||||
0D4F ; Malayalam # So MALAYALAM SIGN PARA
|
||||
0D54..0D56 ; Malayalam # Lo [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
|
||||
0D57 ; Malayalam # Mc MALAYALAM AU LENGTH MARK
|
||||
0D58..0D5E ; Malayalam # No [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
|
||||
0D5F..0D61 ; Malayalam # Lo [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
|
||||
0D62..0D63 ; Malayalam # Mn [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
|
||||
0D66..0D6F ; Malayalam # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
|
||||
0D70..0D75 ; Malayalam # No [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
|
||||
0D70..0D78 ; Malayalam # No [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
|
||||
0D79 ; Malayalam # So MALAYALAM DATE MARK
|
||||
0D7A..0D7F ; Malayalam # Lo [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
|
||||
|
||||
# Total code points: 100
|
||||
# Total code points: 117
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1436,21 +1454,24 @@ AB70..ABBF ; Cherokee # L& [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
|
|||
1820..1842 ; Mongolian # Lo [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
|
||||
1843 ; Mongolian # Lm MONGOLIAN LETTER TODO LONG VOWEL SIGN
|
||||
1844..1877 ; Mongolian # Lo [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
|
||||
1880..18A8 ; Mongolian # Lo [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA
|
||||
1880..1884 ; Mongolian # Lo [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
|
||||
1885..1886 ; Mongolian # Mn [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
|
||||
1887..18A8 ; Mongolian # Lo [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
|
||||
18A9 ; Mongolian # Mn MONGOLIAN LETTER ALI GALI DAGALGA
|
||||
18AA ; Mongolian # Lo MONGOLIAN LETTER MANCHU ALI GALI LHA
|
||||
11660..1166C ; Mongolian # Po [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
|
||||
|
||||
# Total code points: 153
|
||||
# Total code points: 166
|
||||
|
||||
# ================================================
|
||||
|
||||
3041..3096 ; Hiragana # Lo [86] HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMALL KE
|
||||
309D..309E ; Hiragana # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
|
||||
309F ; Hiragana # Lo HIRAGANA DIGRAPH YORI
|
||||
1B001 ; Hiragana # Lo HIRAGANA LETTER ARCHAIC YE
|
||||
1B001..1B11E ; Hiragana # Lo [286] HIRAGANA LETTER ARCHAIC YE..HENTAIGANA LETTER N-MU-MO-2
|
||||
1F200 ; Hiragana # So SQUARE HIRAGANA HOKA
|
||||
|
||||
# Total code points: 91
|
||||
# Total code points: 376
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1469,10 +1490,10 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
|
|||
# ================================================
|
||||
|
||||
02EA..02EB ; Bopomofo # Sk [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK
|
||||
3105..312D ; Bopomofo # Lo [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH
|
||||
3105..312E ; Bopomofo # Lo [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
|
||||
31A0..31BA ; Bopomofo # Lo [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
|
||||
|
||||
# Total code points: 70
|
||||
# Total code points: 71
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1485,16 +1506,17 @@ FF71..FF9D ; Katakana # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
|
|||
3038..303A ; Han # Nl [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY
|
||||
303B ; Han # Lm VERTICAL IDEOGRAPHIC ITERATION MARK
|
||||
3400..4DB5 ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
|
||||
4E00..9FD5 ; Han # Lo [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
|
||||
4E00..9FEA ; Han # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
|
||||
F900..FA6D ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D
|
||||
FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
|
||||
20000..2A6D6 ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
|
||||
2A700..2B734 ; Han # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
|
||||
2B740..2B81D ; Han # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
|
||||
2B820..2CEA1 ; Han # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
|
||||
2CEB0..2EBE0 ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
|
||||
2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
|
||||
|
||||
# Total code points: 81734
|
||||
# Total code points: 89228
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1509,8 +1531,9 @@ A490..A4C6 ; Yi # So [55] YI RADICAL QOT..YI RADICAL KE
|
|||
|
||||
10300..1031F ; Old_Italic # Lo [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
|
||||
10320..10323 ; Old_Italic # No [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY
|
||||
1032D..1032F ; Old_Italic # Lo [3] OLD ITALIC LETTER YE..OLD ITALIC LETTER SOUTHERN TSE
|
||||
|
||||
# Total code points: 36
|
||||
# Total code points: 39
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1542,8 +1565,8 @@ A490..A4C6 ; Yi # So [55] YI RADICAL QOT..YI RADICAL KE
|
|||
1CED ; Inherited # Mn VEDIC SIGN TIRYAK
|
||||
1CF4 ; Inherited # Mn VEDIC TONE CANDRA ABOVE
|
||||
1CF8..1CF9 ; Inherited # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
|
||||
1DC0..1DF5 ; Inherited # Mn [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
|
||||
1DFC..1DFF ; Inherited # Mn [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
|
||||
1DC0..1DF9 ; Inherited # Mn [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
|
||||
1DFB..1DFF ; Inherited # Mn [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
|
||||
200C..200D ; Inherited # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
|
||||
20D0..20DC ; Inherited # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
|
||||
20DD..20E0 ; Inherited # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
|
||||
|
@ -1562,7 +1585,7 @@ FE20..FE2D ; Inherited # Mn [14] COMBINING LIGATURE LEFT HALF..COMBINING CON
|
|||
1D1AA..1D1AD ; Inherited # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
|
||||
E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
|
||||
|
||||
# Total code points: 563
|
||||
# Total code points: 568
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1705,8 +1728,13 @@ E0100..E01EF ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-2
|
|||
|
||||
2C00..2C2E ; Glagolitic # L& [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
|
||||
2C30..2C5E ; Glagolitic # L& [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
|
||||
1E000..1E006 ; Glagolitic # Mn [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
|
||||
1E008..1E018 ; Glagolitic # Mn [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
|
||||
1E01B..1E021 ; Glagolitic # Mn [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
|
||||
1E023..1E024 ; Glagolitic # Mn [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
|
||||
1E026..1E02A ; Glagolitic # Mn [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
|
||||
|
||||
# Total code points: 94
|
||||
# Total code points: 132
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -1872,11 +1900,11 @@ A62A..A62B ; Vai # Lo [2] VAI SYLLABLE NDOLE MA..VAI SYLLABLE NDOLE DO
|
|||
A880..A881 ; Saurashtra # Mc [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
|
||||
A882..A8B3 ; Saurashtra # Lo [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA
|
||||
A8B4..A8C3 ; Saurashtra # Mc [16] SAURASHTRA CONSONANT SIGN HAARU..SAURASHTRA VOWEL SIGN AU
|
||||
A8C4 ; Saurashtra # Mn SAURASHTRA SIGN VIRAMA
|
||||
A8C4..A8C5 ; Saurashtra # Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
|
||||
A8CE..A8CF ; Saurashtra # Po [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA
|
||||
A8D0..A8D9 ; Saurashtra # Nd [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE
|
||||
|
||||
# Total code points: 81
|
||||
# Total code points: 82
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2314,8 +2342,9 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
|
|||
11235 ; Khojki # Mc KHOJKI SIGN VIRAMA
|
||||
11236..11237 ; Khojki # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
|
||||
11238..1123D ; Khojki # Po [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
|
||||
1123E ; Khojki # Mn KHOJKI SIGN SUKUN
|
||||
|
||||
# Total code points: 61
|
||||
# Total code points: 62
|
||||
|
||||
# ================================================
|
||||
|
||||
|
@ -2536,4 +2565,129 @@ ABF0..ABF9 ; Meetei_Mayek # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
|
|||
|
||||
# Total code points: 672
|
||||
|
||||
# ================================================
|
||||
|
||||
1E900..1E943 ; Adlam # L& [68] ADLAM CAPITAL LETTER ALIF..ADLAM SMALL LETTER SHA
|
||||
1E944..1E94A ; Adlam # Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
|
||||
1E950..1E959 ; Adlam # Nd [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
|
||||
1E95E..1E95F ; Adlam # Po [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
|
||||
|
||||
# Total code points: 87
|
||||
|
||||
# ================================================
|
||||
|
||||
11C00..11C08 ; Bhaiksuki # Lo [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
|
||||
11C0A..11C2E ; Bhaiksuki # Lo [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
|
||||
11C2F ; Bhaiksuki # Mc BHAIKSUKI VOWEL SIGN AA
|
||||
11C30..11C36 ; Bhaiksuki # Mn [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
|
||||
11C38..11C3D ; Bhaiksuki # Mn [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
|
||||
11C3E ; Bhaiksuki # Mc BHAIKSUKI SIGN VISARGA
|
||||
11C3F ; Bhaiksuki # Mn BHAIKSUKI SIGN VIRAMA
|
||||
11C40 ; Bhaiksuki # Lo BHAIKSUKI SIGN AVAGRAHA
|
||||
11C41..11C45 ; Bhaiksuki # Po [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
|
||||
11C50..11C59 ; Bhaiksuki # Nd [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
|
||||
11C5A..11C6C ; Bhaiksuki # No [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
|
||||
|
||||
# Total code points: 97
|
||||
|
||||
# ================================================
|
||||
|
||||
11C70..11C71 ; Marchen # Po [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
|
||||
11C72..11C8F ; Marchen # Lo [30] MARCHEN LETTER KA..MARCHEN LETTER A
|
||||
11C92..11CA7 ; Marchen # Mn [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
|
||||
11CA9 ; Marchen # Mc MARCHEN SUBJOINED LETTER YA
|
||||
11CAA..11CB0 ; Marchen # Mn [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
|
||||
11CB1 ; Marchen # Mc MARCHEN VOWEL SIGN I
|
||||
11CB2..11CB3 ; Marchen # Mn [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
|
||||
11CB4 ; Marchen # Mc MARCHEN VOWEL SIGN O
|
||||
11CB5..11CB6 ; Marchen # Mn [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
|
||||
|
||||
# Total code points: 68
|
||||
|
||||
# ================================================
|
||||
|
||||
11400..11434 ; Newa # Lo [53] NEWA LETTER A..NEWA LETTER HA
|
||||
11435..11437 ; Newa # Mc [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
|
||||
11438..1143F ; Newa # Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
|
||||
11440..11441 ; Newa # Mc [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
|
||||
11442..11444 ; Newa # Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
|
||||
11445 ; Newa # Mc NEWA SIGN VISARGA
|
||||
11446 ; Newa # Mn NEWA SIGN NUKTA
|
||||
11447..1144A ; Newa # Lo [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
|
||||
1144B..1144F ; Newa # Po [5] NEWA DANDA..NEWA ABBREVIATION SIGN
|
||||
11450..11459 ; Newa # Nd [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
|
||||
1145B ; Newa # Po NEWA PLACEHOLDER MARK
|
||||
1145D ; Newa # Po NEWA INSERTION SIGN
|
||||
|
||||
# Total code points: 92
|
||||
|
||||
# ================================================
|
||||
|
||||
104B0..104D3 ; Osage # L& [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
|
||||
104D8..104FB ; Osage # L& [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
|
||||
|
||||
# Total code points: 72
|
||||
|
||||
# ================================================
|
||||
|
||||
16FE0 ; Tangut # Lm TANGUT ITERATION MARK
|
||||
17000..187EC ; Tangut # Lo [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
|
||||
18800..18AF2 ; Tangut # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
|
||||
|
||||
# Total code points: 6881
|
||||
|
||||
# ================================================
|
||||
|
||||
11D00..11D06 ; Masaram_Gondi # Lo [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
|
||||
11D08..11D09 ; Masaram_Gondi # Lo [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
|
||||
11D0B..11D30 ; Masaram_Gondi # Lo [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
|
||||
11D31..11D36 ; Masaram_Gondi # Mn [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
|
||||
11D3A ; Masaram_Gondi # Mn MASARAM GONDI VOWEL SIGN E
|
||||
11D3C..11D3D ; Masaram_Gondi # Mn [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
|
||||
11D3F..11D45 ; Masaram_Gondi # Mn [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
|
||||
11D46 ; Masaram_Gondi # Lo MASARAM GONDI REPHA
|
||||
11D47 ; Masaram_Gondi # Mn MASARAM GONDI RA-KARA
|
||||
11D50..11D59 ; Masaram_Gondi # Nd [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
|
||||
|
||||
# Total code points: 75
|
||||
|
||||
# ================================================
|
||||
|
||||
16FE1 ; Nushu # Lm NUSHU ITERATION MARK
|
||||
1B170..1B2FB ; Nushu # Lo [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
|
||||
|
||||
# Total code points: 397
|
||||
|
||||
# ================================================
|
||||
|
||||
11A50 ; Soyombo # Lo SOYOMBO LETTER A
|
||||
11A51..11A56 ; Soyombo # Mn [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
|
||||
11A57..11A58 ; Soyombo # Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
|
||||
11A59..11A5B ; Soyombo # Mn [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
|
||||
11A5C..11A83 ; Soyombo # Lo [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
|
||||
11A86..11A89 ; Soyombo # Lo [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
|
||||
11A8A..11A96 ; Soyombo # Mn [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
|
||||
11A97 ; Soyombo # Mc SOYOMBO SIGN VISARGA
|
||||
11A98..11A99 ; Soyombo # Mn [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
|
||||
11A9A..11A9C ; Soyombo # Po [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
|
||||
11A9E..11AA2 ; Soyombo # Po [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
|
||||
|
||||
# Total code points: 80
|
||||
|
||||
# ================================================
|
||||
|
||||
11A00 ; Zanabazar_Square # Lo ZANABAZAR SQUARE LETTER A
|
||||
11A01..11A06 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
|
||||
11A07..11A08 ; Zanabazar_Square # Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
|
||||
11A09..11A0A ; Zanabazar_Square # Mn [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
|
||||
11A0B..11A32 ; Zanabazar_Square # Lo [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
|
||||
11A33..11A38 ; Zanabazar_Square # Mn [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
|
||||
11A39 ; Zanabazar_Square # Mc ZANABAZAR SQUARE SIGN VISARGA
|
||||
11A3A ; Zanabazar_Square # Lo ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
|
||||
11A3B..11A3E ; Zanabazar_Square # Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
|
||||
11A3F..11A46 ; Zanabazar_Square # Po [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
|
||||
11A47 ; Zanabazar_Square # Mn ZANABAZAR SQUARE SUBJOINER
|
||||
|
||||
# Total code points: 72
|
||||
|
||||
# EOF
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -192,7 +192,12 @@ const uint32_t PRIV(ucp_gbtable)[] = {
|
|||
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT), /* 10 LVT */
|
||||
(1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Other */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 12 Other */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 13 E_Base */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 14 E_Modifier */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 15 E_Base_GAZ */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 16 ZWJ */
|
||||
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Glue_After_Zwj */
|
||||
};
|
||||
|
||||
#ifdef SUPPORT_JIT
|
||||
|
@ -227,6 +232,7 @@ version. Like all other character and string literals that are compared against
|
|||
the regular expression pattern, we must use STR_ macros instead of literal
|
||||
strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
||||
|
||||
#define STRING_Adlam0 STR_A STR_d STR_l STR_a STR_m "\0"
|
||||
#define STRING_Ahom0 STR_A STR_h STR_o STR_m "\0"
|
||||
#define STRING_Anatolian_Hieroglyphs0 STR_A STR_n STR_a STR_t STR_o STR_l STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
|
||||
#define STRING_Any0 STR_A STR_n STR_y "\0"
|
||||
|
@ -238,6 +244,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
|
||||
#define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
|
||||
#define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
|
||||
#define STRING_Bhaiksuki0 STR_B STR_h STR_a STR_i STR_k STR_s STR_u STR_k STR_i "\0"
|
||||
#define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
|
||||
#define STRING_Brahmi0 STR_B STR_r STR_a STR_h STR_m STR_i "\0"
|
||||
#define STRING_Braille0 STR_B STR_r STR_a STR_i STR_l STR_l STR_e "\0"
|
||||
|
@ -313,6 +320,8 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
|
||||
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
|
||||
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
|
||||
#define STRING_Marchen0 STR_M STR_a STR_r STR_c STR_h STR_e STR_n "\0"
|
||||
#define STRING_Masaram_Gondi0 STR_M STR_a STR_s STR_a STR_r STR_a STR_m STR_UNDERSCORE STR_G STR_o STR_n STR_d STR_i "\0"
|
||||
#define STRING_Mc0 STR_M STR_c "\0"
|
||||
#define STRING_Me0 STR_M STR_e "\0"
|
||||
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
|
||||
|
@ -330,9 +339,11 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
|
||||
#define STRING_Nd0 STR_N STR_d "\0"
|
||||
#define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
|
||||
#define STRING_Newa0 STR_N STR_e STR_w STR_a "\0"
|
||||
#define STRING_Nko0 STR_N STR_k STR_o "\0"
|
||||
#define STRING_Nl0 STR_N STR_l "\0"
|
||||
#define STRING_No0 STR_N STR_o "\0"
|
||||
#define STRING_Nushu0 STR_N STR_u STR_s STR_h STR_u "\0"
|
||||
#define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
|
||||
#define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
|
||||
#define STRING_Old_Hungarian0 STR_O STR_l STR_d STR_UNDERSCORE STR_H STR_u STR_n STR_g STR_a STR_r STR_i STR_a STR_n "\0"
|
||||
|
@ -343,6 +354,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
|
||||
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
|
||||
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
|
||||
#define STRING_Osage0 STR_O STR_s STR_a STR_g STR_e "\0"
|
||||
#define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
|
||||
#define STRING_P0 STR_P "\0"
|
||||
#define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
|
||||
|
@ -373,6 +385,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Sm0 STR_S STR_m "\0"
|
||||
#define STRING_So0 STR_S STR_o "\0"
|
||||
#define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0"
|
||||
#define STRING_Soyombo0 STR_S STR_o STR_y STR_o STR_m STR_b STR_o "\0"
|
||||
#define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0"
|
||||
#define STRING_Syloti_Nagri0 STR_S STR_y STR_l STR_o STR_t STR_i STR_UNDERSCORE STR_N STR_a STR_g STR_r STR_i "\0"
|
||||
#define STRING_Syriac0 STR_S STR_y STR_r STR_i STR_a STR_c "\0"
|
||||
|
@ -383,6 +396,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Tai_Viet0 STR_T STR_a STR_i STR_UNDERSCORE STR_V STR_i STR_e STR_t "\0"
|
||||
#define STRING_Takri0 STR_T STR_a STR_k STR_r STR_i "\0"
|
||||
#define STRING_Tamil0 STR_T STR_a STR_m STR_i STR_l "\0"
|
||||
#define STRING_Tangut0 STR_T STR_a STR_n STR_g STR_u STR_t "\0"
|
||||
#define STRING_Telugu0 STR_T STR_e STR_l STR_u STR_g STR_u "\0"
|
||||
#define STRING_Thaana0 STR_T STR_h STR_a STR_a STR_n STR_a "\0"
|
||||
#define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
|
||||
|
@ -399,11 +413,13 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
|||
#define STRING_Xwd0 STR_X STR_w STR_d "\0"
|
||||
#define STRING_Yi0 STR_Y STR_i "\0"
|
||||
#define STRING_Z0 STR_Z "\0"
|
||||
#define STRING_Zanabazar_Square0 STR_Z STR_a STR_n STR_a STR_b STR_a STR_z STR_a STR_r STR_UNDERSCORE STR_S STR_q STR_u STR_a STR_r STR_e "\0"
|
||||
#define STRING_Zl0 STR_Z STR_l "\0"
|
||||
#define STRING_Zp0 STR_Z STR_p "\0"
|
||||
#define STRING_Zs0 STR_Z STR_s "\0"
|
||||
|
||||
const char PRIV(utt_names)[] =
|
||||
STRING_Adlam0
|
||||
STRING_Ahom0
|
||||
STRING_Anatolian_Hieroglyphs0
|
||||
STRING_Any0
|
||||
|
@ -415,6 +431,7 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Bassa_Vah0
|
||||
STRING_Batak0
|
||||
STRING_Bengali0
|
||||
STRING_Bhaiksuki0
|
||||
STRING_Bopomofo0
|
||||
STRING_Brahmi0
|
||||
STRING_Braille0
|
||||
|
@ -490,6 +507,8 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Malayalam0
|
||||
STRING_Mandaic0
|
||||
STRING_Manichaean0
|
||||
STRING_Marchen0
|
||||
STRING_Masaram_Gondi0
|
||||
STRING_Mc0
|
||||
STRING_Me0
|
||||
STRING_Meetei_Mayek0
|
||||
|
@ -507,9 +526,11 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Nabataean0
|
||||
STRING_Nd0
|
||||
STRING_New_Tai_Lue0
|
||||
STRING_Newa0
|
||||
STRING_Nko0
|
||||
STRING_Nl0
|
||||
STRING_No0
|
||||
STRING_Nushu0
|
||||
STRING_Ogham0
|
||||
STRING_Ol_Chiki0
|
||||
STRING_Old_Hungarian0
|
||||
|
@ -520,6 +541,7 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Old_South_Arabian0
|
||||
STRING_Old_Turkic0
|
||||
STRING_Oriya0
|
||||
STRING_Osage0
|
||||
STRING_Osmanya0
|
||||
STRING_P0
|
||||
STRING_Pahawh_Hmong0
|
||||
|
@ -550,6 +572,7 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Sm0
|
||||
STRING_So0
|
||||
STRING_Sora_Sompeng0
|
||||
STRING_Soyombo0
|
||||
STRING_Sundanese0
|
||||
STRING_Syloti_Nagri0
|
||||
STRING_Syriac0
|
||||
|
@ -560,6 +583,7 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Tai_Viet0
|
||||
STRING_Takri0
|
||||
STRING_Tamil0
|
||||
STRING_Tangut0
|
||||
STRING_Telugu0
|
||||
STRING_Thaana0
|
||||
STRING_Thai0
|
||||
|
@ -576,186 +600,197 @@ const char PRIV(utt_names)[] =
|
|||
STRING_Xwd0
|
||||
STRING_Yi0
|
||||
STRING_Z0
|
||||
STRING_Zanabazar_Square0
|
||||
STRING_Zl0
|
||||
STRING_Zp0
|
||||
STRING_Zs0;
|
||||
|
||||
const ucp_type_table PRIV(utt)[] = {
|
||||
{ 0, PT_SC, ucp_Ahom },
|
||||
{ 5, PT_SC, ucp_Anatolian_Hieroglyphs },
|
||||
{ 27, PT_ANY, 0 },
|
||||
{ 31, PT_SC, ucp_Arabic },
|
||||
{ 38, PT_SC, ucp_Armenian },
|
||||
{ 47, PT_SC, ucp_Avestan },
|
||||
{ 55, PT_SC, ucp_Balinese },
|
||||
{ 64, PT_SC, ucp_Bamum },
|
||||
{ 70, PT_SC, ucp_Bassa_Vah },
|
||||
{ 80, PT_SC, ucp_Batak },
|
||||
{ 86, PT_SC, ucp_Bengali },
|
||||
{ 94, PT_SC, ucp_Bopomofo },
|
||||
{ 103, PT_SC, ucp_Brahmi },
|
||||
{ 110, PT_SC, ucp_Braille },
|
||||
{ 118, PT_SC, ucp_Buginese },
|
||||
{ 127, PT_SC, ucp_Buhid },
|
||||
{ 133, PT_GC, ucp_C },
|
||||
{ 135, PT_SC, ucp_Canadian_Aboriginal },
|
||||
{ 155, PT_SC, ucp_Carian },
|
||||
{ 162, PT_SC, ucp_Caucasian_Albanian },
|
||||
{ 181, PT_PC, ucp_Cc },
|
||||
{ 184, PT_PC, ucp_Cf },
|
||||
{ 187, PT_SC, ucp_Chakma },
|
||||
{ 194, PT_SC, ucp_Cham },
|
||||
{ 199, PT_SC, ucp_Cherokee },
|
||||
{ 208, PT_PC, ucp_Cn },
|
||||
{ 211, PT_PC, ucp_Co },
|
||||
{ 214, PT_SC, ucp_Common },
|
||||
{ 221, PT_SC, ucp_Coptic },
|
||||
{ 228, PT_PC, ucp_Cs },
|
||||
{ 231, PT_SC, ucp_Cuneiform },
|
||||
{ 241, PT_SC, ucp_Cypriot },
|
||||
{ 249, PT_SC, ucp_Cyrillic },
|
||||
{ 258, PT_SC, ucp_Deseret },
|
||||
{ 266, PT_SC, ucp_Devanagari },
|
||||
{ 277, PT_SC, ucp_Duployan },
|
||||
{ 286, PT_SC, ucp_Egyptian_Hieroglyphs },
|
||||
{ 307, PT_SC, ucp_Elbasan },
|
||||
{ 315, PT_SC, ucp_Ethiopic },
|
||||
{ 324, PT_SC, ucp_Georgian },
|
||||
{ 333, PT_SC, ucp_Glagolitic },
|
||||
{ 344, PT_SC, ucp_Gothic },
|
||||
{ 351, PT_SC, ucp_Grantha },
|
||||
{ 359, PT_SC, ucp_Greek },
|
||||
{ 365, PT_SC, ucp_Gujarati },
|
||||
{ 374, PT_SC, ucp_Gurmukhi },
|
||||
{ 383, PT_SC, ucp_Han },
|
||||
{ 387, PT_SC, ucp_Hangul },
|
||||
{ 394, PT_SC, ucp_Hanunoo },
|
||||
{ 402, PT_SC, ucp_Hatran },
|
||||
{ 409, PT_SC, ucp_Hebrew },
|
||||
{ 416, PT_SC, ucp_Hiragana },
|
||||
{ 425, PT_SC, ucp_Imperial_Aramaic },
|
||||
{ 442, PT_SC, ucp_Inherited },
|
||||
{ 452, PT_SC, ucp_Inscriptional_Pahlavi },
|
||||
{ 474, PT_SC, ucp_Inscriptional_Parthian },
|
||||
{ 497, PT_SC, ucp_Javanese },
|
||||
{ 506, PT_SC, ucp_Kaithi },
|
||||
{ 513, PT_SC, ucp_Kannada },
|
||||
{ 521, PT_SC, ucp_Katakana },
|
||||
{ 530, PT_SC, ucp_Kayah_Li },
|
||||
{ 539, PT_SC, ucp_Kharoshthi },
|
||||
{ 550, PT_SC, ucp_Khmer },
|
||||
{ 556, PT_SC, ucp_Khojki },
|
||||
{ 563, PT_SC, ucp_Khudawadi },
|
||||
{ 573, PT_GC, ucp_L },
|
||||
{ 575, PT_LAMP, 0 },
|
||||
{ 578, PT_SC, ucp_Lao },
|
||||
{ 582, PT_SC, ucp_Latin },
|
||||
{ 588, PT_SC, ucp_Lepcha },
|
||||
{ 595, PT_SC, ucp_Limbu },
|
||||
{ 601, PT_SC, ucp_Linear_A },
|
||||
{ 610, PT_SC, ucp_Linear_B },
|
||||
{ 619, PT_SC, ucp_Lisu },
|
||||
{ 624, PT_PC, ucp_Ll },
|
||||
{ 627, PT_PC, ucp_Lm },
|
||||
{ 630, PT_PC, ucp_Lo },
|
||||
{ 633, PT_PC, ucp_Lt },
|
||||
{ 636, PT_PC, ucp_Lu },
|
||||
{ 639, PT_SC, ucp_Lycian },
|
||||
{ 646, PT_SC, ucp_Lydian },
|
||||
{ 653, PT_GC, ucp_M },
|
||||
{ 655, PT_SC, ucp_Mahajani },
|
||||
{ 664, PT_SC, ucp_Malayalam },
|
||||
{ 674, PT_SC, ucp_Mandaic },
|
||||
{ 682, PT_SC, ucp_Manichaean },
|
||||
{ 693, PT_PC, ucp_Mc },
|
||||
{ 696, PT_PC, ucp_Me },
|
||||
{ 699, PT_SC, ucp_Meetei_Mayek },
|
||||
{ 712, PT_SC, ucp_Mende_Kikakui },
|
||||
{ 726, PT_SC, ucp_Meroitic_Cursive },
|
||||
{ 743, PT_SC, ucp_Meroitic_Hieroglyphs },
|
||||
{ 764, PT_SC, ucp_Miao },
|
||||
{ 769, PT_PC, ucp_Mn },
|
||||
{ 772, PT_SC, ucp_Modi },
|
||||
{ 777, PT_SC, ucp_Mongolian },
|
||||
{ 787, PT_SC, ucp_Mro },
|
||||
{ 791, PT_SC, ucp_Multani },
|
||||
{ 799, PT_SC, ucp_Myanmar },
|
||||
{ 807, PT_GC, ucp_N },
|
||||
{ 809, PT_SC, ucp_Nabataean },
|
||||
{ 819, PT_PC, ucp_Nd },
|
||||
{ 822, PT_SC, ucp_New_Tai_Lue },
|
||||
{ 834, PT_SC, ucp_Nko },
|
||||
{ 838, PT_PC, ucp_Nl },
|
||||
{ 841, PT_PC, ucp_No },
|
||||
{ 844, PT_SC, ucp_Ogham },
|
||||
{ 850, PT_SC, ucp_Ol_Chiki },
|
||||
{ 859, PT_SC, ucp_Old_Hungarian },
|
||||
{ 873, PT_SC, ucp_Old_Italic },
|
||||
{ 884, PT_SC, ucp_Old_North_Arabian },
|
||||
{ 902, PT_SC, ucp_Old_Permic },
|
||||
{ 913, PT_SC, ucp_Old_Persian },
|
||||
{ 925, PT_SC, ucp_Old_South_Arabian },
|
||||
{ 943, PT_SC, ucp_Old_Turkic },
|
||||
{ 954, PT_SC, ucp_Oriya },
|
||||
{ 960, PT_SC, ucp_Osmanya },
|
||||
{ 968, PT_GC, ucp_P },
|
||||
{ 970, PT_SC, ucp_Pahawh_Hmong },
|
||||
{ 983, PT_SC, ucp_Palmyrene },
|
||||
{ 993, PT_SC, ucp_Pau_Cin_Hau },
|
||||
{ 1005, PT_PC, ucp_Pc },
|
||||
{ 1008, PT_PC, ucp_Pd },
|
||||
{ 1011, PT_PC, ucp_Pe },
|
||||
{ 1014, PT_PC, ucp_Pf },
|
||||
{ 1017, PT_SC, ucp_Phags_Pa },
|
||||
{ 1026, PT_SC, ucp_Phoenician },
|
||||
{ 1037, PT_PC, ucp_Pi },
|
||||
{ 1040, PT_PC, ucp_Po },
|
||||
{ 1043, PT_PC, ucp_Ps },
|
||||
{ 1046, PT_SC, ucp_Psalter_Pahlavi },
|
||||
{ 1062, PT_SC, ucp_Rejang },
|
||||
{ 1069, PT_SC, ucp_Runic },
|
||||
{ 1075, PT_GC, ucp_S },
|
||||
{ 1077, PT_SC, ucp_Samaritan },
|
||||
{ 1087, PT_SC, ucp_Saurashtra },
|
||||
{ 1098, PT_PC, ucp_Sc },
|
||||
{ 1101, PT_SC, ucp_Sharada },
|
||||
{ 1109, PT_SC, ucp_Shavian },
|
||||
{ 1117, PT_SC, ucp_Siddham },
|
||||
{ 1125, PT_SC, ucp_SignWriting },
|
||||
{ 1137, PT_SC, ucp_Sinhala },
|
||||
{ 1145, PT_PC, ucp_Sk },
|
||||
{ 1148, PT_PC, ucp_Sm },
|
||||
{ 1151, PT_PC, ucp_So },
|
||||
{ 1154, PT_SC, ucp_Sora_Sompeng },
|
||||
{ 1167, PT_SC, ucp_Sundanese },
|
||||
{ 1177, PT_SC, ucp_Syloti_Nagri },
|
||||
{ 1190, PT_SC, ucp_Syriac },
|
||||
{ 1197, PT_SC, ucp_Tagalog },
|
||||
{ 1205, PT_SC, ucp_Tagbanwa },
|
||||
{ 1214, PT_SC, ucp_Tai_Le },
|
||||
{ 1221, PT_SC, ucp_Tai_Tham },
|
||||
{ 1230, PT_SC, ucp_Tai_Viet },
|
||||
{ 1239, PT_SC, ucp_Takri },
|
||||
{ 1245, PT_SC, ucp_Tamil },
|
||||
{ 1251, PT_SC, ucp_Telugu },
|
||||
{ 1258, PT_SC, ucp_Thaana },
|
||||
{ 1265, PT_SC, ucp_Thai },
|
||||
{ 1270, PT_SC, ucp_Tibetan },
|
||||
{ 1278, PT_SC, ucp_Tifinagh },
|
||||
{ 1287, PT_SC, ucp_Tirhuta },
|
||||
{ 1295, PT_SC, ucp_Ugaritic },
|
||||
{ 1304, PT_SC, ucp_Vai },
|
||||
{ 1308, PT_SC, ucp_Warang_Citi },
|
||||
{ 1320, PT_ALNUM, 0 },
|
||||
{ 1324, PT_PXSPACE, 0 },
|
||||
{ 1328, PT_SPACE, 0 },
|
||||
{ 1332, PT_UCNC, 0 },
|
||||
{ 1336, PT_WORD, 0 },
|
||||
{ 1340, PT_SC, ucp_Yi },
|
||||
{ 1343, PT_GC, ucp_Z },
|
||||
{ 1345, PT_PC, ucp_Zl },
|
||||
{ 1348, PT_PC, ucp_Zp },
|
||||
{ 1351, PT_PC, ucp_Zs }
|
||||
{ 0, PT_SC, ucp_Adlam },
|
||||
{ 6, PT_SC, ucp_Ahom },
|
||||
{ 11, PT_SC, ucp_Anatolian_Hieroglyphs },
|
||||
{ 33, PT_ANY, 0 },
|
||||
{ 37, PT_SC, ucp_Arabic },
|
||||
{ 44, PT_SC, ucp_Armenian },
|
||||
{ 53, PT_SC, ucp_Avestan },
|
||||
{ 61, PT_SC, ucp_Balinese },
|
||||
{ 70, PT_SC, ucp_Bamum },
|
||||
{ 76, PT_SC, ucp_Bassa_Vah },
|
||||
{ 86, PT_SC, ucp_Batak },
|
||||
{ 92, PT_SC, ucp_Bengali },
|
||||
{ 100, PT_SC, ucp_Bhaiksuki },
|
||||
{ 110, PT_SC, ucp_Bopomofo },
|
||||
{ 119, PT_SC, ucp_Brahmi },
|
||||
{ 126, PT_SC, ucp_Braille },
|
||||
{ 134, PT_SC, ucp_Buginese },
|
||||
{ 143, PT_SC, ucp_Buhid },
|
||||
{ 149, PT_GC, ucp_C },
|
||||
{ 151, PT_SC, ucp_Canadian_Aboriginal },
|
||||
{ 171, PT_SC, ucp_Carian },
|
||||
{ 178, PT_SC, ucp_Caucasian_Albanian },
|
||||
{ 197, PT_PC, ucp_Cc },
|
||||
{ 200, PT_PC, ucp_Cf },
|
||||
{ 203, PT_SC, ucp_Chakma },
|
||||
{ 210, PT_SC, ucp_Cham },
|
||||
{ 215, PT_SC, ucp_Cherokee },
|
||||
{ 224, PT_PC, ucp_Cn },
|
||||
{ 227, PT_PC, ucp_Co },
|
||||
{ 230, PT_SC, ucp_Common },
|
||||
{ 237, PT_SC, ucp_Coptic },
|
||||
{ 244, PT_PC, ucp_Cs },
|
||||
{ 247, PT_SC, ucp_Cuneiform },
|
||||
{ 257, PT_SC, ucp_Cypriot },
|
||||
{ 265, PT_SC, ucp_Cyrillic },
|
||||
{ 274, PT_SC, ucp_Deseret },
|
||||
{ 282, PT_SC, ucp_Devanagari },
|
||||
{ 293, PT_SC, ucp_Duployan },
|
||||
{ 302, PT_SC, ucp_Egyptian_Hieroglyphs },
|
||||
{ 323, PT_SC, ucp_Elbasan },
|
||||
{ 331, PT_SC, ucp_Ethiopic },
|
||||
{ 340, PT_SC, ucp_Georgian },
|
||||
{ 349, PT_SC, ucp_Glagolitic },
|
||||
{ 360, PT_SC, ucp_Gothic },
|
||||
{ 367, PT_SC, ucp_Grantha },
|
||||
{ 375, PT_SC, ucp_Greek },
|
||||
{ 381, PT_SC, ucp_Gujarati },
|
||||
{ 390, PT_SC, ucp_Gurmukhi },
|
||||
{ 399, PT_SC, ucp_Han },
|
||||
{ 403, PT_SC, ucp_Hangul },
|
||||
{ 410, PT_SC, ucp_Hanunoo },
|
||||
{ 418, PT_SC, ucp_Hatran },
|
||||
{ 425, PT_SC, ucp_Hebrew },
|
||||
{ 432, PT_SC, ucp_Hiragana },
|
||||
{ 441, PT_SC, ucp_Imperial_Aramaic },
|
||||
{ 458, PT_SC, ucp_Inherited },
|
||||
{ 468, PT_SC, ucp_Inscriptional_Pahlavi },
|
||||
{ 490, PT_SC, ucp_Inscriptional_Parthian },
|
||||
{ 513, PT_SC, ucp_Javanese },
|
||||
{ 522, PT_SC, ucp_Kaithi },
|
||||
{ 529, PT_SC, ucp_Kannada },
|
||||
{ 537, PT_SC, ucp_Katakana },
|
||||
{ 546, PT_SC, ucp_Kayah_Li },
|
||||
{ 555, PT_SC, ucp_Kharoshthi },
|
||||
{ 566, PT_SC, ucp_Khmer },
|
||||
{ 572, PT_SC, ucp_Khojki },
|
||||
{ 579, PT_SC, ucp_Khudawadi },
|
||||
{ 589, PT_GC, ucp_L },
|
||||
{ 591, PT_LAMP, 0 },
|
||||
{ 594, PT_SC, ucp_Lao },
|
||||
{ 598, PT_SC, ucp_Latin },
|
||||
{ 604, PT_SC, ucp_Lepcha },
|
||||
{ 611, PT_SC, ucp_Limbu },
|
||||
{ 617, PT_SC, ucp_Linear_A },
|
||||
{ 626, PT_SC, ucp_Linear_B },
|
||||
{ 635, PT_SC, ucp_Lisu },
|
||||
{ 640, PT_PC, ucp_Ll },
|
||||
{ 643, PT_PC, ucp_Lm },
|
||||
{ 646, PT_PC, ucp_Lo },
|
||||
{ 649, PT_PC, ucp_Lt },
|
||||
{ 652, PT_PC, ucp_Lu },
|
||||
{ 655, PT_SC, ucp_Lycian },
|
||||
{ 662, PT_SC, ucp_Lydian },
|
||||
{ 669, PT_GC, ucp_M },
|
||||
{ 671, PT_SC, ucp_Mahajani },
|
||||
{ 680, PT_SC, ucp_Malayalam },
|
||||
{ 690, PT_SC, ucp_Mandaic },
|
||||
{ 698, PT_SC, ucp_Manichaean },
|
||||
{ 709, PT_SC, ucp_Marchen },
|
||||
{ 717, PT_SC, ucp_Masaram_Gondi },
|
||||
{ 731, PT_PC, ucp_Mc },
|
||||
{ 734, PT_PC, ucp_Me },
|
||||
{ 737, PT_SC, ucp_Meetei_Mayek },
|
||||
{ 750, PT_SC, ucp_Mende_Kikakui },
|
||||
{ 764, PT_SC, ucp_Meroitic_Cursive },
|
||||
{ 781, PT_SC, ucp_Meroitic_Hieroglyphs },
|
||||
{ 802, PT_SC, ucp_Miao },
|
||||
{ 807, PT_PC, ucp_Mn },
|
||||
{ 810, PT_SC, ucp_Modi },
|
||||
{ 815, PT_SC, ucp_Mongolian },
|
||||
{ 825, PT_SC, ucp_Mro },
|
||||
{ 829, PT_SC, ucp_Multani },
|
||||
{ 837, PT_SC, ucp_Myanmar },
|
||||
{ 845, PT_GC, ucp_N },
|
||||
{ 847, PT_SC, ucp_Nabataean },
|
||||
{ 857, PT_PC, ucp_Nd },
|
||||
{ 860, PT_SC, ucp_New_Tai_Lue },
|
||||
{ 872, PT_SC, ucp_Newa },
|
||||
{ 877, PT_SC, ucp_Nko },
|
||||
{ 881, PT_PC, ucp_Nl },
|
||||
{ 884, PT_PC, ucp_No },
|
||||
{ 887, PT_SC, ucp_Nushu },
|
||||
{ 893, PT_SC, ucp_Ogham },
|
||||
{ 899, PT_SC, ucp_Ol_Chiki },
|
||||
{ 908, PT_SC, ucp_Old_Hungarian },
|
||||
{ 922, PT_SC, ucp_Old_Italic },
|
||||
{ 933, PT_SC, ucp_Old_North_Arabian },
|
||||
{ 951, PT_SC, ucp_Old_Permic },
|
||||
{ 962, PT_SC, ucp_Old_Persian },
|
||||
{ 974, PT_SC, ucp_Old_South_Arabian },
|
||||
{ 992, PT_SC, ucp_Old_Turkic },
|
||||
{ 1003, PT_SC, ucp_Oriya },
|
||||
{ 1009, PT_SC, ucp_Osage },
|
||||
{ 1015, PT_SC, ucp_Osmanya },
|
||||
{ 1023, PT_GC, ucp_P },
|
||||
{ 1025, PT_SC, ucp_Pahawh_Hmong },
|
||||
{ 1038, PT_SC, ucp_Palmyrene },
|
||||
{ 1048, PT_SC, ucp_Pau_Cin_Hau },
|
||||
{ 1060, PT_PC, ucp_Pc },
|
||||
{ 1063, PT_PC, ucp_Pd },
|
||||
{ 1066, PT_PC, ucp_Pe },
|
||||
{ 1069, PT_PC, ucp_Pf },
|
||||
{ 1072, PT_SC, ucp_Phags_Pa },
|
||||
{ 1081, PT_SC, ucp_Phoenician },
|
||||
{ 1092, PT_PC, ucp_Pi },
|
||||
{ 1095, PT_PC, ucp_Po },
|
||||
{ 1098, PT_PC, ucp_Ps },
|
||||
{ 1101, PT_SC, ucp_Psalter_Pahlavi },
|
||||
{ 1117, PT_SC, ucp_Rejang },
|
||||
{ 1124, PT_SC, ucp_Runic },
|
||||
{ 1130, PT_GC, ucp_S },
|
||||
{ 1132, PT_SC, ucp_Samaritan },
|
||||
{ 1142, PT_SC, ucp_Saurashtra },
|
||||
{ 1153, PT_PC, ucp_Sc },
|
||||
{ 1156, PT_SC, ucp_Sharada },
|
||||
{ 1164, PT_SC, ucp_Shavian },
|
||||
{ 1172, PT_SC, ucp_Siddham },
|
||||
{ 1180, PT_SC, ucp_SignWriting },
|
||||
{ 1192, PT_SC, ucp_Sinhala },
|
||||
{ 1200, PT_PC, ucp_Sk },
|
||||
{ 1203, PT_PC, ucp_Sm },
|
||||
{ 1206, PT_PC, ucp_So },
|
||||
{ 1209, PT_SC, ucp_Sora_Sompeng },
|
||||
{ 1222, PT_SC, ucp_Soyombo },
|
||||
{ 1230, PT_SC, ucp_Sundanese },
|
||||
{ 1240, PT_SC, ucp_Syloti_Nagri },
|
||||
{ 1253, PT_SC, ucp_Syriac },
|
||||
{ 1260, PT_SC, ucp_Tagalog },
|
||||
{ 1268, PT_SC, ucp_Tagbanwa },
|
||||
{ 1277, PT_SC, ucp_Tai_Le },
|
||||
{ 1284, PT_SC, ucp_Tai_Tham },
|
||||
{ 1293, PT_SC, ucp_Tai_Viet },
|
||||
{ 1302, PT_SC, ucp_Takri },
|
||||
{ 1308, PT_SC, ucp_Tamil },
|
||||
{ 1314, PT_SC, ucp_Tangut },
|
||||
{ 1321, PT_SC, ucp_Telugu },
|
||||
{ 1328, PT_SC, ucp_Thaana },
|
||||
{ 1335, PT_SC, ucp_Thai },
|
||||
{ 1340, PT_SC, ucp_Tibetan },
|
||||
{ 1348, PT_SC, ucp_Tifinagh },
|
||||
{ 1357, PT_SC, ucp_Tirhuta },
|
||||
{ 1365, PT_SC, ucp_Ugaritic },
|
||||
{ 1374, PT_SC, ucp_Vai },
|
||||
{ 1378, PT_SC, ucp_Warang_Citi },
|
||||
{ 1390, PT_ALNUM, 0 },
|
||||
{ 1394, PT_PXSPACE, 0 },
|
||||
{ 1398, PT_SPACE, 0 },
|
||||
{ 1402, PT_UCNC, 0 },
|
||||
{ 1406, PT_WORD, 0 },
|
||||
{ 1410, PT_SC, ucp_Yi },
|
||||
{ 1413, PT_GC, ucp_Z },
|
||||
{ 1415, PT_SC, ucp_Zanabazar_Square },
|
||||
{ 1432, PT_PC, ucp_Zl },
|
||||
{ 1435, PT_PC, ucp_Zp },
|
||||
{ 1438, PT_PC, ucp_Zs }
|
||||
};
|
||||
|
||||
const size_t PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);
|
||||
|
|
5541
src/pcre2_ucd.c
5541
src/pcre2_ucd.c
File diff suppressed because it is too large
Load Diff
|
@ -100,9 +100,7 @@ enum {
|
|||
ucp_Zs /* Space separator */
|
||||
};
|
||||
|
||||
/* These are grapheme break properties. Note that the code for processing them
|
||||
assumes that the values are less than 16. If more values are added that take
|
||||
the number to 16 or more, the code will have to be rewritten. */
|
||||
/* These are grapheme break properties. */
|
||||
|
||||
enum {
|
||||
ucp_gbCR, /* 0 */
|
||||
|
@ -117,7 +115,12 @@ enum {
|
|||
ucp_gbLV, /* 9 Hangul syllable type LV */
|
||||
ucp_gbLVT, /* 10 Hangul syllable type LVT */
|
||||
ucp_gbRegionalIndicator, /* 11 */
|
||||
ucp_gbOther /* 12 */
|
||||
ucp_gbOther, /* 12 */
|
||||
ucp_gbE_Base, /* 13 */
|
||||
ucp_gbE_Modifier, /* 14 */
|
||||
ucp_gbE_Base_GAZ, /* 15 */
|
||||
ucp_gbZWJ, /* 16 */
|
||||
ucp_gbGlue_After_Zwj /* 17 */
|
||||
};
|
||||
|
||||
/* These are the script identifications. */
|
||||
|
@ -184,13 +187,13 @@ enum {
|
|||
ucp_Tifinagh,
|
||||
ucp_Ugaritic,
|
||||
ucp_Yi,
|
||||
/* New for Unicode 5.0: */
|
||||
/* New for Unicode 5.0 */
|
||||
ucp_Balinese,
|
||||
ucp_Cuneiform,
|
||||
ucp_Nko,
|
||||
ucp_Phags_Pa,
|
||||
ucp_Phoenician,
|
||||
/* New for Unicode 5.1: */
|
||||
/* New for Unicode 5.1 */
|
||||
ucp_Carian,
|
||||
ucp_Cham,
|
||||
ucp_Kayah_Li,
|
||||
|
@ -202,7 +205,7 @@ enum {
|
|||
ucp_Saurashtra,
|
||||
ucp_Sundanese,
|
||||
ucp_Vai,
|
||||
/* New for Unicode 5.2: */
|
||||
/* New for Unicode 5.2 */
|
||||
ucp_Avestan,
|
||||
ucp_Bamum,
|
||||
ucp_Egyptian_Hieroglyphs,
|
||||
|
@ -218,11 +221,11 @@ enum {
|
|||
ucp_Samaritan,
|
||||
ucp_Tai_Tham,
|
||||
ucp_Tai_Viet,
|
||||
/* New for Unicode 6.0.0: */
|
||||
/* New for Unicode 6.0.0 */
|
||||
ucp_Batak,
|
||||
ucp_Brahmi,
|
||||
ucp_Mandaic,
|
||||
/* New for Unicode 6.1.0: */
|
||||
/* New for Unicode 6.1.0 */
|
||||
ucp_Chakma,
|
||||
ucp_Meroitic_Cursive,
|
||||
ucp_Meroitic_Hieroglyphs,
|
||||
|
@ -230,7 +233,7 @@ enum {
|
|||
ucp_Sharada,
|
||||
ucp_Sora_Sompeng,
|
||||
ucp_Takri,
|
||||
/* New for Unicode 7.0.0: */
|
||||
/* New for Unicode 7.0.0 */
|
||||
ucp_Bassa_Vah,
|
||||
ucp_Caucasian_Albanian,
|
||||
ucp_Duployan,
|
||||
|
@ -254,13 +257,24 @@ enum {
|
|||
ucp_Siddham,
|
||||
ucp_Tirhuta,
|
||||
ucp_Warang_Citi,
|
||||
/* New for Unicode 8.0.0: */
|
||||
/* New for Unicode 8.0.0 */
|
||||
ucp_Ahom,
|
||||
ucp_Anatolian_Hieroglyphs,
|
||||
ucp_Hatran,
|
||||
ucp_Multani,
|
||||
ucp_Old_Hungarian,
|
||||
ucp_SignWriting
|
||||
ucp_SignWriting,
|
||||
/* New for Unicode 10.0.0 (no update since 8.0.0) */
|
||||
ucp_Adlam,
|
||||
ucp_Bhaiksuki,
|
||||
ucp_Marchen,
|
||||
ucp_Newa,
|
||||
ucp_Osage,
|
||||
ucp_Tangut,
|
||||
ucp_Masaram_Gondi,
|
||||
ucp_Nushu,
|
||||
ucp_Soyombo,
|
||||
ucp_Zanabazar_Square
|
||||
};
|
||||
|
||||
#endif /* PCRE2_UCP_H_IDEMPOTENT_GUARD */
|
||||
|
|
201
src/pcre2test.c
201
src/pcre2test.c
|
@ -473,6 +473,12 @@ so many of them that they are split into two fields. */
|
|||
#define CTL_UTF8_INPUT 0x40000000u
|
||||
#define CTL_ZERO_TERMINATE 0x80000000u
|
||||
|
||||
/* Combinations */
|
||||
|
||||
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
|
||||
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
|
||||
#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL)
|
||||
|
||||
/* Second control word */
|
||||
|
||||
#define CTL2_SUBSTITUTE_EXTENDED 0x00000001u
|
||||
|
@ -480,15 +486,10 @@ so many of them that they are split into two fields. */
|
|||
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
|
||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
|
||||
#define CTL2_SUBJECT_LITERAL 0x00000010u
|
||||
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
|
||||
|
||||
#define CTL_NL_SET 0x40000000u /* Informational */
|
||||
#define CTL_BSR_SET 0x80000000u /* Informational */
|
||||
|
||||
/* Combinations */
|
||||
|
||||
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
|
||||
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
|
||||
#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL)
|
||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||
|
||||
/* These are the matching controls that may be set either on a pattern or on a
|
||||
data line. They are copied from the pattern controls as initial settings for
|
||||
|
@ -601,6 +602,7 @@ static modstruct modlist[] = {
|
|||
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
|
||||
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
||||
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
||||
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
|
||||
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
|
||||
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
|
||||
{ "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) },
|
||||
|
@ -723,7 +725,7 @@ static modstruct modlist[] = {
|
|||
CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \
|
||||
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)
|
||||
|
||||
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL_BSR_SET|CTL_NL_SET)
|
||||
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET)
|
||||
|
||||
/* Controls that apply only at compile time with 'push'. */
|
||||
|
||||
|
@ -3688,8 +3690,8 @@ for (;;)
|
|||
#else
|
||||
*((uint16_t *)field) = PCRE2_BSR_UNICODE;
|
||||
#endif
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_BSR_SET;
|
||||
else dctl->control2 &= ~CTL_BSR_SET;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_BSR_SET;
|
||||
else dctl->control2 &= ~CTL2_BSR_SET;
|
||||
}
|
||||
else
|
||||
{
|
||||
|
@ -3698,8 +3700,8 @@ for (;;)
|
|||
else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0)
|
||||
*((uint16_t *)field) = PCRE2_BSR_UNICODE;
|
||||
else goto INVALID_VALUE;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_BSR_SET;
|
||||
else dctl->control2 |= CTL_BSR_SET;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_BSR_SET;
|
||||
else dctl->control2 |= CTL2_BSR_SET;
|
||||
}
|
||||
pp = ep;
|
||||
break;
|
||||
|
@ -3792,14 +3794,14 @@ for (;;)
|
|||
if (i == 0)
|
||||
{
|
||||
*((uint16_t *)field) = NEWLINE_DEFAULT;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_NL_SET;
|
||||
else dctl->control2 &= ~CTL_NL_SET;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_NL_SET;
|
||||
else dctl->control2 &= ~CTL2_NL_SET;
|
||||
}
|
||||
else
|
||||
{
|
||||
*((uint16_t *)field) = i;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_NL_SET;
|
||||
else dctl->control2 |= CTL_NL_SET;
|
||||
if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_NL_SET;
|
||||
else dctl->control2 |= CTL2_NL_SET;
|
||||
}
|
||||
pp = ep;
|
||||
break;
|
||||
|
@ -3971,7 +3973,7 @@ Returns: nothing
|
|||
static void
|
||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||
|
@ -3979,10 +3981,11 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
|||
((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "",
|
||||
((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "",
|
||||
((controls & CTL_BINCODE) != 0)? " bincode" : "",
|
||||
((controls2 & CTL_BSR_SET) != 0)? " bsr" : "",
|
||||
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
|
||||
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
|
||||
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
|
||||
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
|
||||
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
|
||||
((controls & CTL_DFA) != 0)? " dfa" : "",
|
||||
((controls & CTL_EXPAND) != 0)? " expand" : "",
|
||||
((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
|
||||
|
@ -3996,7 +3999,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
|||
((controls & CTL_JITVERIFY) != 0)? " jitverify" : "",
|
||||
((controls & CTL_MARK) != 0)? " mark" : "",
|
||||
((controls & CTL_MEMORY) != 0)? " memory" : "",
|
||||
((controls2 & CTL_NL_SET) != 0)? " newline" : "",
|
||||
((controls2 & CTL2_NL_SET) != 0)? " newline" : "",
|
||||
((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "",
|
||||
((controls & CTL_POSIX) != 0)? " posix" : "",
|
||||
((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "",
|
||||
|
@ -4435,7 +4438,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
|
|||
|
||||
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
|
||||
|
||||
if ((pat_patctl.control2 & CTL_BSR_SET) != 0 ||
|
||||
if ((pat_patctl.control2 & CTL2_BSR_SET) != 0 ||
|
||||
(FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0)
|
||||
fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)?
|
||||
"any Unicode newline" : "CR, LF, or CRLF");
|
||||
|
@ -5268,7 +5271,7 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
|||
if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB;
|
||||
if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP;
|
||||
if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE;
|
||||
if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC;
|
||||
if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC;
|
||||
if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE;
|
||||
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
|
||||
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
|
||||
|
@ -5276,8 +5279,8 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
|||
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
|
||||
{
|
||||
preg.re_endp = (char *)pbuffer8 + patlen;
|
||||
cflags |= REG_PEND;
|
||||
}
|
||||
cflags |= REG_PEND;
|
||||
}
|
||||
|
||||
rc = regcomp(&preg, (char *)pbuffer8, cflags);
|
||||
|
||||
|
@ -5530,7 +5533,7 @@ if (test_mode == PCRE32_MODE && pbuffer32 != NULL)
|
|||
appropriate default newline setting, local_newline_default will be non-zero. We
|
||||
use this if there is no explicit newline modifier. */
|
||||
|
||||
if ((pat_patctl.control2 & CTL_NL_SET) == 0 && local_newline_default != 0)
|
||||
if ((pat_patctl.control2 & CTL2_NL_SET) == 0 && local_newline_default != 0)
|
||||
{
|
||||
SETFLD(pat_context, newline_convention, local_newline_default);
|
||||
}
|
||||
|
@ -5540,11 +5543,11 @@ NULL context. */
|
|||
|
||||
use_pat_context = ((pat_patctl.control & CTL_NULLCONTEXT) != 0)?
|
||||
NULL : PTR(pat_context);
|
||||
|
||||
|
||||
/* If PCRE2_LITERAL is set, set use_forbid_utf zero because PCRE2_NEVER_UTF
|
||||
and PCRE2_NEVER_UCP are invalid with it. */
|
||||
|
||||
if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0;
|
||||
if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0;
|
||||
|
||||
/* Compile many times when timing. */
|
||||
|
||||
|
@ -5556,7 +5559,7 @@ if (timeit > 0)
|
|||
{
|
||||
clock_t start_time = clock();
|
||||
PCRE2_COMPILE(compiled_code, pbuffer, patlen,
|
||||
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
|
||||
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
|
||||
use_pat_context);
|
||||
time_taken += clock() - start_time;
|
||||
if (TEST(compiled_code, !=, NULL))
|
||||
|
@ -5665,7 +5668,7 @@ if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
|
|||
/* If an explicit newline modifier was given, set the information flag in the
|
||||
pattern so that it is preserved over push/pop. */
|
||||
|
||||
if ((pat_patctl.control2 & CTL_NL_SET) != 0)
|
||||
if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
|
||||
{
|
||||
SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET);
|
||||
}
|
||||
|
@ -5822,11 +5825,11 @@ return capcount;
|
|||
*************************************************/
|
||||
|
||||
/* Called from a PCRE2 library as a result of the (?C) item. We print out where
|
||||
we are in the match. Yield zero unless more callouts than the fail count, or
|
||||
the callout data is not zero. The only differences in the callout block for
|
||||
different code unit widths are that the pointers to the subject, the most
|
||||
recent MARK, and a callout argument string point to strings of the appropriate
|
||||
width. Casts can be used to deal with this.
|
||||
we are in the match (unless suppressed). Yield zero unless more callouts than
|
||||
the fail count, or the callout data is not zero. The only differences in the
|
||||
callout block for different code unit widths are that the pointers to the
|
||||
subject, the most recent MARK, and a callout argument string point to strings
|
||||
of the appropriate width. Casts can be used to deal with this.
|
||||
|
||||
Argument: a pointer to a callout block
|
||||
Return:
|
||||
|
@ -5839,6 +5842,7 @@ uint32_t i, pre_start, post_start, subject_length;
|
|||
PCRE2_SIZE current_position;
|
||||
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
|
||||
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
|
||||
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
|
||||
|
||||
/* This FILE is used for echoing the subject. This is done only once in simple
|
||||
cases. */
|
||||
|
@ -5887,75 +5891,82 @@ if (callout_capture)
|
|||
}
|
||||
}
|
||||
|
||||
/* Re-print the subject in canonical form (with escapes for non-printing
|
||||
characters), the first time, or if giving full details. On subsequent calls in
|
||||
the same match, we use PCHARS() just to find the printed lengths of the
|
||||
substrings. */
|
||||
/* Unless suppressed, re-print the subject in canonical form (with escapes for
|
||||
non-printing characters), the first time, or if giving full details. On
|
||||
subsequent calls in the same match, we use PCHARS() just to find the printed
|
||||
lengths of the substrings. */
|
||||
|
||||
if (f != NULL) fprintf(f, "--->");
|
||||
|
||||
/* The subject before the match start. */
|
||||
|
||||
PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
|
||||
|
||||
/* If a lookbehind is involved, the current position may be earlier than the
|
||||
match start. If so, use the match start instead. */
|
||||
|
||||
current_position = (cb->current_position >= cb->start_match)?
|
||||
cb->current_position : cb->start_match;
|
||||
|
||||
/* The subject between the match start and the current position. */
|
||||
|
||||
PCHARS(post_start, cb->subject, cb->start_match,
|
||||
current_position - cb->start_match, utf, f);
|
||||
|
||||
/* Print from the current position to the end. */
|
||||
|
||||
PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
|
||||
utf, f);
|
||||
|
||||
/* Calculate the total subject printed length (no print). */
|
||||
|
||||
PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
|
||||
|
||||
if (f != NULL) fprintf(f, "\n");
|
||||
|
||||
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
|
||||
callout whose number has not already been shown with captured strings, show the
|
||||
number here. A callout with a string argument has been displayed above. */
|
||||
|
||||
if (cb->callout_number == 255)
|
||||
if (callout_where)
|
||||
{
|
||||
fprintf(outfile, "%+3d ", (int)cb->pattern_position);
|
||||
if (cb->pattern_position > 99) fprintf(outfile, "\n ");
|
||||
}
|
||||
else
|
||||
{
|
||||
if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " ");
|
||||
else fprintf(outfile, "%3d ", cb->callout_number);
|
||||
}
|
||||
|
||||
/* Now show position indicators */
|
||||
if (f != NULL) fprintf(f, "--->");
|
||||
|
||||
for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
|
||||
fprintf(outfile, "^");
|
||||
/* The subject before the match start. */
|
||||
|
||||
if (post_start > 0)
|
||||
{
|
||||
for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
|
||||
PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
|
||||
|
||||
/* If a lookbehind is involved, the current position may be earlier than the
|
||||
match start. If so, use the match start instead. */
|
||||
|
||||
current_position = (cb->current_position >= cb->start_match)?
|
||||
cb->current_position : cb->start_match;
|
||||
|
||||
/* The subject between the match start and the current position. */
|
||||
|
||||
PCHARS(post_start, cb->subject, cb->start_match,
|
||||
current_position - cb->start_match, utf, f);
|
||||
|
||||
/* Print from the current position to the end. */
|
||||
|
||||
PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
|
||||
utf, f);
|
||||
|
||||
/* Calculate the total subject printed length (no print). */
|
||||
|
||||
PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
|
||||
|
||||
if (f != NULL) fprintf(f, "\n");
|
||||
|
||||
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
|
||||
callout whose number has not already been shown with captured strings, show the
|
||||
number here. A callout with a string argument has been displayed above. */
|
||||
|
||||
if (cb->callout_number == 255)
|
||||
{
|
||||
fprintf(outfile, "%+3d ", (int)cb->pattern_position);
|
||||
if (cb->pattern_position > 99) fprintf(outfile, "\n ");
|
||||
}
|
||||
else
|
||||
{
|
||||
if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " ");
|
||||
else fprintf(outfile, "%3d ", cb->callout_number);
|
||||
}
|
||||
|
||||
/* Now show position indicators */
|
||||
|
||||
for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
|
||||
fprintf(outfile, "^");
|
||||
|
||||
if (post_start > 0)
|
||||
{
|
||||
for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
|
||||
fprintf(outfile, "^");
|
||||
}
|
||||
|
||||
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
|
||||
fprintf(outfile, " ");
|
||||
|
||||
if (cb->next_item_length != 0)
|
||||
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
|
||||
pbuffer8 + cb->pattern_position);
|
||||
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
|
||||
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
|
||||
fprintf(outfile, " ");
|
||||
|
||||
if (cb->next_item_length != 0)
|
||||
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
|
||||
pbuffer8 + cb->pattern_position);
|
||||
|
||||
fprintf(outfile, "\n");
|
||||
first_callout = FALSE;
|
||||
|
||||
/* Show any mark info */
|
||||
|
||||
if (cb->mark != last_callout_mark)
|
||||
{
|
||||
if (cb->mark == NULL)
|
||||
|
@ -5969,6 +5980,8 @@ if (cb->mark != last_callout_mark)
|
|||
last_callout_mark = cb->mark;
|
||||
}
|
||||
|
||||
/* Show callout data */
|
||||
|
||||
if (callout_data_ptr != NULL)
|
||||
{
|
||||
int callout_data = *((int32_t *)callout_data_ptr);
|
||||
|
@ -5979,6 +5992,8 @@ if (callout_data_ptr != NULL)
|
|||
}
|
||||
}
|
||||
|
||||
/* Keep count and give the appropriate return code */
|
||||
|
||||
callout_count++;
|
||||
|
||||
if (cb->callout_number == dat_datctl.cerror[0] &&
|
||||
|
|
|
@ -6,14 +6,16 @@
|
|||
#newline_default lf any anycrlf
|
||||
|
||||
# PCRE2 and Perl disagree about the characteristics of certain Unicode
|
||||
# characters. For example, 061C is considered by Perl to be Arabic, though
|
||||
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
|
||||
# graphic and printable according to Perl, though they are actually "isolate"
|
||||
# control characters. That is why the following tests are here rather than in
|
||||
# test 4.
|
||||
# characters. For example, 061C was considered by Perl to be Arabic, though
|
||||
# it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
|
||||
# However, it *is* in that file for Unicode 10, but when I came to re-check,
|
||||
# Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
|
||||
|
||||
# 2066-2069 are graphic and printable according to Perl, though they are
|
||||
# actually "isolate" control characters. That is why the following tests are
|
||||
# here rather than in test 4.
|
||||
|
||||
/^[\p{Arabic}]/utf
|
||||
\= Expect no match
|
||||
\x{061c}
|
||||
|
||||
/^[[:graph:]]+$/utf,ucp
|
||||
|
@ -2022,5 +2024,21 @@
|
|||
|
||||
/Aሴ+B/literal,utf,no_utf_check
|
||||
Aሴ+B
|
||||
|
||||
# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
|
||||
# doesn't recognize all these scripts. In time these three tests can be moved
|
||||
# to test 4.
|
||||
|
||||
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
|
||||
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
|
||||
(\p{Zanabazar_Square}+)/x,utf
|
||||
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
|
||||
|
||||
/^\x{1E900}\x{104B0}/i,utf
|
||||
\x{1E900}\x{104B0}
|
||||
\x{1E922}\x{104D8}
|
||||
|
||||
/^(?:(\X)(?C))+$/utf
|
||||
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
|
||||
|
||||
# End of testinput5
|
||||
|
|
|
@ -6,16 +6,18 @@
|
|||
#newline_default lf any anycrlf
|
||||
|
||||
# PCRE2 and Perl disagree about the characteristics of certain Unicode
|
||||
# characters. For example, 061C is considered by Perl to be Arabic, though
|
||||
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
|
||||
# graphic and printable according to Perl, though they are actually "isolate"
|
||||
# control characters. That is why the following tests are here rather than in
|
||||
# test 4.
|
||||
# characters. For example, 061C was considered by Perl to be Arabic, though
|
||||
# it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
|
||||
# However, it *is* in that file for Unicode 10, but when I came to re-check,
|
||||
# Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
|
||||
|
||||
# 2066-2069 are graphic and printable according to Perl, though they are
|
||||
# actually "isolate" control characters. That is why the following tests are
|
||||
# here rather than in test 4.
|
||||
|
||||
/^[\p{Arabic}]/utf
|
||||
\= Expect no match
|
||||
\x{061c}
|
||||
No match
|
||||
0: \x{61c}
|
||||
|
||||
/^[[:graph:]]+$/utf,ucp
|
||||
\= Expect no match
|
||||
|
@ -4585,5 +4587,84 @@ No match
|
|||
/Aሴ+B/literal,utf,no_utf_check
|
||||
Aሴ+B
|
||||
0: A\x{1234}+B
|
||||
|
||||
# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
|
||||
# doesn't recognize all these scripts. In time these three tests can be moved
|
||||
# to test 4.
|
||||
|
||||
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
|
||||
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
|
||||
(\p{Zanabazar_Square}+)/x,utf
|
||||
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}
|
||||
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
|
||||
1: \x{1e900}\x{1e924}\x{1e953}
|
||||
2: \x{11c00}\x{11c2d}\x{11c3e}
|
||||
3: \x{11c70}\x{11c77}\x{11cab}
|
||||
4: \x{11400}\x{1142f}\x{11455}
|
||||
5: \x{104b0}\x{104d8}\x{104fb}
|
||||
6: \x{16fe0}\x{18800}\x{18af2}
|
||||
7: \x{11d00}\x{11d3a}\x{11d59}
|
||||
8: \x{16fe1}\x{1b170}\x{1b2fb}
|
||||
9: \x{11a50}\x{11a58}\x{11aa2}
|
||||
10: \x{11a00}\x{11a07}\x{11a47}
|
||||
|
||||
/^\x{1E900}\x{104B0}/i,utf
|
||||
\x{1E900}\x{104B0}
|
||||
0: \x{1e900}\x{104b0}
|
||||
\x{1E922}\x{104D8}
|
||||
0: \x{1e922}\x{104d8}
|
||||
|
||||
/^(?:(\X)(?C))+$/utf
|
||||
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
|
||||
Callout 0: last capture = 1
|
||||
1: \x{1e900}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{1e924}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{1e953}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11c00}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11c2d}\x{11c3e}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11c70}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11c77}\x{11cab}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11400}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{1142f}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11455}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{104b0}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{104d8}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{104fb}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{16fe0}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{18800}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{18af2}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11d00}\x{11d3a}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11d59}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{16fe1}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{1b170}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{1b2fb}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11a50}\x{11a58}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11aa2}
|
||||
Callout 0: last capture = 1
|
||||
1: \x{11a00}\x{11a07}\x{11a47}
|
||||
0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
|
||||
1: \x{11a00}\x{11a07}\x{11a47}
|
||||
|
||||
# End of testinput5
|
||||
|
|
Loading…
Reference in New Issue