Revised script handling (see ChangeLog)

This commit is contained in:
Philip Hazel 2021-12-21 15:39:46 +00:00
parent 92d7cf1dd0
commit b29732063b
24 changed files with 1507 additions and 1064 deletions

View File

@ -45,6 +45,19 @@ of applications treat NULL/0 in this way.
16. Very minor code speed up for maximizing character property matches.
17. A number of changes to script matching for \p and \P:
(a) Script extensions for a character are now coded as a bitmap instead of
a list of script numbers, which should be faster and does not need a
loop.
(b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms
sc and scx).
(c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being
the same as \p{scx:scriptname} because this change happened in Perl at
release 5.26.
Version 10.39 29-October-2021
-----------------------------

View File

@ -32,6 +32,7 @@
# Added support for bidi class and bidi control, 06-December-2021
# This also involved lower casing strings and removing underscores, in
# accordance with Unicode's "loose matching" rules, which Perl observes.
# Changed default script type from PT_SC to PT_SCX, 18-December-2021
script_names = ['Unknown', 'Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@ -104,7 +105,7 @@ std_bidiclass_names = stdnames(bidiclass_names)
# names. We keep both the standardized name and the original, because the
# latter is used for the ucp_xx names.
utt_table = list(zip(std_script_names, script_names, ['PT_SC'] * len(script_names)))
utt_table = list(zip(std_script_names, script_names, ['PT_SCX'] * len(script_names)))
utt_table += list(zip(std_category_names, category_names, ['PT_PC'] * len(category_names)))
utt_table += list(zip(std_general_category_names, general_category_names, ['PT_GC'] * len(general_category_names)))
utt_table += list(zip(std_bidiclass_names, bidiclass_names, ['PT_BIDICL'] * len(bidiclass_names)))

View File

@ -100,6 +100,8 @@
# PCRE2-10.39: Updated for Unicode 14.0.0
# 05-December-2021: Added code to scan DerivedBidiClass.txt for bidi class,
# and also PropList.txt for the Bidi_Control property
# 19-December-2021: Reworked script extensions lists to be bit maps instead
# of zero-terminated lists of script numbers.
# ----------------------------------------------------------------------------
#
#
@ -128,11 +130,12 @@
# in script runs all come from the same set. The first element in the vector
# contains the number of subsequent elements, which are in ascending order.
#
# The ucd_script_sets vector contains lists of script numbers that are the
# Script Extensions properties of certain characters. Each list is terminated
# by zero (ucp_Unknown). A character with more than one script listed for its
# Script Extension property has a negative value in its record. This is the
# negated offset to the start of the relevant list in the ucd_script_sets
# The ucd_script_sets vector contains bitmaps that represent lists of scripts
# for the Script Extensions properties of certain characters. Each bitmap
# consists of a fixed number of unsigned 32-bit numbers, enough to allocate
# a bit for every known script. A character with more than one script listed
# for its Script Extension property has a negative value in its record. This is
# the negated offset to the start of the relevant bitmap in the ucd_script_sets
# vector.
#
# The ucd_records table contains one instance of every unique record that is
@ -186,15 +189,15 @@
# 3 = ucp_gbExtend => Grapheme break property "Extend"
# 0 => Not part of a caseless set
# 0 => No other case
# -122 => Script Extension list offset = 122
# 19 = ucp_bidiNSM => Bidi class non-spacing mark
# -228 => Script Extension list offset = 228
# 13 = ucp_bidiNSM => Bidi class non-spacing mark
# 0 => Dummy value, unused at present
#
# At offset 101 in the ucd_script_sets vector we find the list 3, 15, 107, 29,
# and terminator 0. This means that this character is expected to be used with
# any of those scripts, which are Bengali, Devanagari, Grantha, and Kannada.
# At offset 228 in the ucd_script_sets vector we find a bitmap with bits 3, 15,
# 29, and 107 set. This means that this character is expected to be used with
# any of those scripts, which are Bengali, Devanagari, Kannada, and Grantha.
#
# Philip Hazel, last updated 05 December 2021.
# Philip Hazel, last updated 19 December 2021.
##############################################################################
@ -507,7 +510,7 @@ break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend',
# BIDI class property names in the DerivedBidiClass.txt file
bidiclass_names = ['AL', 'AN', 'B', 'BN', 'CS', 'EN', 'ES', 'ET', 'FSI', 'L',
bidiclass_names = ['AL', 'AN', 'B', 'BN', 'CS', 'EN', 'ES', 'ET', 'FSI', 'L',
'LRE', 'LRI', 'LRO', 'NSM', 'ON', 'PDF', 'PDI', 'R', 'RLE', 'RLI', 'RLO',
'S', 'WS' ]
@ -574,7 +577,7 @@ file.close()
# file, setting 'Unknown' as the default (this will never be a Script Extension
# value), then scan it and fill in the default from Scripts. Code added by PH
# in October 2018. Positive values are used for just a single script for a
# code point. Negative values are negated offsets in a list of lists of
# code point. Negative values are negated offsets in a list of bitsets of
# multiple scripts. Initialize this list with a single entry, as the zeroth
# element is never used.
@ -582,9 +585,22 @@ script_lists = [0]
script_abbrevs_default = script_abbrevs.index('Zzzz')
scriptx = read_table('Unicode.tables/ScriptExtensions.txt', get_script_extension, script_abbrevs_default)
# Scan all characters and set their default script extension to the main
# script. We also have to adjust negative scriptx values, following a change in
# the way these work. They are currently negated offsets into the script_lists
# list, but have to be changed into indices in the new ucd_script_sets vector,
# which has fixed-size entries. We can compute the new offset by counting the
# zeros that precede the current offset.
for i in range(0, MAX_UNICODE):
if scriptx[i] == script_abbrevs_default:
scriptx[i] = script[i]
elif scriptx[i] < 0:
count = 1
for j in range(-scriptx[i], 0, -1):
if script_lists[j] == 0:
count += 1
scriptx[i] = -count * (int(len(script_names)/32) + 1)
# With the addition of the Script Extensions field, we needed some padding to
# get the Unicode records up to 12 bytes (multiple of 4). Originally this was a
@ -803,18 +819,30 @@ for d in digitsets:
count += 1
print("\n};\n")
print("/* This vector is a list of lists of scripts for the Script Extension")
print("property. Each sublist is zero-terminated. */\n")
print("const uint8_t PRIV(ucd_script_sets)[] = {")
print("/* This vector is a list of script bitsets for the Script Extension")
print("property. */\n")
print("const uint32_t PRIV(ucd_script_sets)[] = {")
bitword_count = len(script_names)/32 + 1
bitwords = [0] * int(bitword_count)
count = 0
print(" /* 0 */", end='')
for d in script_lists:
print(" %3d," % d, end='')
count += 1
if d == 0:
print("\n /* %3d */" % count, end='')
print("\n};\n")
s = " "
print(" ", end='')
for x in bitwords:
print("%s" %s, end='')
s = ", "
print("0x%08xu" % x, end='')
print(",\n", end='')
bitwords = [0] * int(bitword_count)
else:
x = int(d/32)
y = int(d%32)
bitwords[x] = bitwords[x] | (1 << y)
print("};\n")
# Output the main UCD tables.

View File

@ -308,7 +308,7 @@ const ucp_type_table *u;
for (i = 0; i < PRIV(utt_size); i++)
{
u = PRIV(utt) + i;
if (u->type == PT_SC && u->value == script) break;
if (u->type == PT_SCX && u->value == script) break;
}
if (i < PRIV(utt_size))
@ -461,12 +461,30 @@ if (scriptx != script)
else
{
const char *sep = "";
/*
const uint8_t *p = PRIV(ucd_script_sets) - scriptx;
while (*p != 0)
{
printf("%s%s", sep, get_scriptname(*p++));
sep = ", ";
}
*/
const uint32_t *p = PRIV(ucd_script_sets) - scriptx;
for (int i = 0; i < ucp_Script_Count; i++)
{
int x = i/32;
int y = i%32;
if ((p[x] & (1u<<y)) != 0)
{
printf("%s%s", sep, get_scriptname(i));
sep = ", ";
}
}
}
printf("]");
}
@ -538,7 +556,7 @@ while (*s != 0)
for (i = 0; i < PRIV(utt_size); i++)
{
const ucp_type_table *u = PRIV(utt) + i;
if (u->type == PT_SC && strcmp(CS(value + offset),
if (u->type == PT_SCX && strcmp(CS(value + offset),
PRIV(utt_names) + u->name_offset) == 0)
{
c = u->value;
@ -686,11 +704,11 @@ for (c = 0; c <= 0x10ffff; c++)
if (scriptx_count > 0)
{
const uint8_t *char_scriptx = NULL;
const uint32_t *bits_scriptx = NULL;
unsigned int found = 0;
int scriptx = UCD_SCRIPTX(c);
if (scriptx < 0) char_scriptx = PRIV(ucd_script_sets) - scriptx;
if (scriptx < 0) bits_scriptx = PRIV(ucd_script_sets) - scriptx;
for (i = 0; i < scriptx_count; i++)
{
@ -704,15 +722,9 @@ for (c = 0; c <= 0x10ffff; c++)
else
{
const uint8_t *p;
for (p = char_scriptx; *p != 0; p++)
{
if (scriptx_list[i] == *p)
{
found++;
break;
}
}
int x = scriptx_list[i]/32;
int y = scriptx_list[i]%32;
if ((bits_scriptx[x] & (1u<<y)) != 0) found++;
}
}
/* Negative requirement */
@ -724,10 +736,9 @@ for (c = 0; c <= 0x10ffff; c++)
}
else
{
const uint8_t *p;
for (p = char_scriptx; *p != 0; p++)
if (-scriptx_list[i] == *p) break;
if (*p == 0) found++;
int x = scriptx_list[i]/32;
int y = scriptx_list[i]%32;
if ((bits_scriptx[x] & (1u<<y)) == 0) found++;
}
}
}
@ -881,7 +892,7 @@ else if (strcmp(CS name, "list") == 0)
if (strcmp(CS name, "script") == 0 || strcmp(CS name, "scripts") == 0)
{
for (i = 0; i < PRIV(utt_size); i++)
if (PRIV(utt)[i].type == PT_SC)
if (PRIV(utt)[i].type == PT_SCX)
printf("%s\n", PRIV(utt_names) + PRIV(utt)[i].name_offset);
}

View File

@ -381,11 +381,11 @@ U+10F27 R Letter: Other letter, oldsogdian, Other
U+10F30 AL Letter: Other letter, sogdian, Other
findprop a836 a833 1cf4 20f0 1cd0
U+A836 L Symbol: Other symbol, common, Other, [devanagari, dogra, gujarati, gurmukhi, khojki, kaithi, mahajani, modi, khudawadi, takri, tirhuta]
U+A833 L Number: Other number, common, Other, [devanagari, dogra, gujarati, gurmukhi, khojki, kannada, kaithi, mahajani, modi, nandinagari, khudawadi, takri, tirhuta]
U+1CF4 NSM Mark: Non-spacing mark, inherited, Extend, [devanagari, grantha, kannada]
U+20F0 NSM Mark: Non-spacing mark, inherited, Extend, [devanagari, grantha, latin]
U+1CD0 NSM Mark: Non-spacing mark, inherited, Extend, [bengali, devanagari, grantha, kannada]
U+A836 L Symbol: Other symbol, common, Other, [devanagari, gujarati, gurmukhi, kaithi, takri, khojki, khudawadi, mahajani, modi, tirhuta, dogra]
U+A833 L Number: Other number, common, Other, [devanagari, gujarati, gurmukhi, kannada, kaithi, takri, khojki, khudawadi, mahajani, modi, tirhuta, dogra, nandinagari]
U+1CF4 NSM Mark: Non-spacing mark, inherited, Extend, [devanagari, kannada, grantha]
U+20F0 NSM Mark: Non-spacing mark, inherited, Extend, [devanagari, latin, grantha]
U+1CD0 NSM Mark: Non-spacing mark, inherited, Extend, [bengali, devanagari, kannada, grantha]
findprop 32ff
U+32FF L Symbol: Other symbol, common, Other, [han]

View File

@ -49,8 +49,8 @@ U+2010..U+2015 ON Punctuation: Dash punctuation, common, Other
U+2E3A..U+2E3B ON Punctuation: Dash punctuation, common, Other
U+2E40 ON Punctuation: Dash punctuation, common, Other
U+2E5D ON Punctuation: Dash punctuation, common, Other
U+301C ON Punctuation: Dash punctuation, common, Other, [bopomofo, hangul, han, hiragana, katakana]
U+3030 ON Punctuation: Dash punctuation, common, Extended Pictographic, [bopomofo, hangul, han, hiragana, katakana]
U+301C ON Punctuation: Dash punctuation, common, Other, [bopomofo, han, hangul, hiragana, katakana]
U+3030 ON Punctuation: Dash punctuation, common, Extended Pictographic, [bopomofo, han, hangul, hiragana, katakana]
U+30A0 ON Punctuation: Dash punctuation, common, Other, [hiragana, katakana]
U+FE31..U+FE32 ON Punctuation: Dash punctuation, common, Other
U+FE58 ON Punctuation: Dash punctuation, common, Other
@ -168,7 +168,7 @@ U+002C CS Punctuation: Other punctuation, common, Other
U+002E..U+002F CS Punctuation: Other punctuation, common, Other
U+003A CS Punctuation: Other punctuation, common, Other
U+00A0 CS Separator: Space separator, common, Other
U+060C CS Punctuation: Other punctuation, common, Other, [arabic, nko, hanifirohingya, syriac, thaana, yezidi]
U+060C CS Punctuation: Other punctuation, common, Other, [arabic, syriac, thaana, nko, hanifirohingya, yezidi]
U+202F CS Separator: Space separator, common, Other, [latin, mongolian]
U+2044 CS Symbol: Mathematical symbol, common, Other
U+FE50 CS Punctuation: Other punctuation, common, Other

View File

@ -123,20 +123,21 @@ opcode is used to select the column. The values are as follows:
*/
static const uint8_t propposstab[PT_TABSIZE][PT_TABSIZE] = {
/* ANY LAMP GC PC SC ALNUM SPACE PXSPACE WORD CLIST UCNC BIDICL BIDICO */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_ANY */
{ 0, 3, 0, 0, 0, 3, 1, 1, 0, 0, 0, 0, 1 }, /* PT_LAMP */
{ 0, 0, 2, 4, 0, 9, 10, 10, 11, 0, 0, 0, 0 }, /* PT_GC */
{ 0, 0, 5, 2, 0, 15, 16, 16, 17, 0, 0, 0, 0 }, /* PT_PC */
{ 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_SC */
{ 0, 3, 6, 12, 0, 3, 1, 1, 0, 0, 0, 0, 1 }, /* PT_ALNUM */
{ 0, 1, 7, 13, 0, 1, 3, 3, 1, 0, 0, 0, 1 }, /* PT_SPACE */
{ 0, 1, 7, 13, 0, 1, 3, 3, 1, 0, 0, 0, 1 }, /* PT_PXSPACE */
{ 0, 0, 8, 14, 0, 0, 1, 1, 3, 0, 0, 0, 1 }, /* PT_WORD */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_CLIST */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0 }, /* PT_UCNC */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_BIDICL */
{ 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } /* PT_BIDICO */
/* ANY LAMP GC PC SC SCX ALNUM SPACE PXSPACE WORD CLIST UCNC BIDICL BIDICO */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_ANY */
{ 0, 3, 0, 0, 0, 0, 3, 1, 1, 0, 0, 0, 0, 1 }, /* PT_LAMP */
{ 0, 0, 2, 4, 0, 0, 9, 10, 10, 11, 0, 0, 0, 0 }, /* PT_GC */
{ 0, 0, 5, 2, 0, 0, 15, 16, 16, 17, 0, 0, 0, 0 }, /* PT_PC */
{ 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_SC */
{ 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_SCX */
{ 0, 3, 6, 12, 0, 0, 3, 1, 1, 0, 0, 0, 0, 1 }, /* PT_ALNUM */
{ 0, 1, 7, 13, 0, 0, 1, 3, 3, 1, 0, 0, 0, 1 }, /* PT_SPACE */
{ 0, 1, 7, 13, 0, 0, 1, 3, 3, 1, 0, 0, 0, 1 }, /* PT_PXSPACE */
{ 0, 0, 8, 14, 0, 0, 0, 1, 1, 3, 0, 0, 0, 1 }, /* PT_WORD */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_CLIST */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0 }, /* PT_UCNC */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_BIDICL */
{ 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } /* PT_BIDICO */
};
/* This table is used to check whether auto-possessification is possible
@ -198,6 +199,8 @@ static BOOL
check_char_prop(uint32_t c, unsigned int ptype, unsigned int pdata,
BOOL negated)
{
BOOL ok;
int scriptx;
const uint32_t *p;
const ucd_record *prop = GET_UCD(c);
@ -217,6 +220,13 @@ switch(ptype)
case PT_SC:
return (pdata == prop->script) == negated;
case PT_SCX:
scriptx = prop->scriptx;
ok = pdata == prop->script || pdata == (unsigned int)scriptx;
if (!ok && scriptx < 0)
ok = MAPBIT(PRIV(ucd_script_sets) - scriptx, pdata) != 0;
return ok == negated;
/* These are specials */
case PT_ALNUM:
@ -253,14 +263,14 @@ switch(ptype)
if (c == *p++) return negated;
}
break; /* Control never reaches here */
/* Haven't yet thought these through. */
case PT_BIDICL:
return FALSE;
case PT_BIDICO:
return FALSE;
return FALSE;
}
return FALSE;

View File

@ -2092,6 +2092,7 @@ PCRE2_SIZE i, bot, top;
PCRE2_SPTR ptr = *ptrptr;
PCRE2_UCHAR name[50];
PCRE2_UCHAR *vptr = NULL;
uint16_t ptscript = PT_NOTSCRIPT;
if (ptr >= cb->end_pattern) goto ERROR_RETURN;
c = *ptr++;
@ -2118,8 +2119,9 @@ if (c == CHAR_LEFT_CURLY_BRACKET)
if (c == CHAR_NUL) goto ERROR_RETURN;
if (c == CHAR_RIGHT_CURLY_BRACKET) break;
name[i] = tolower(c);
if (c == ':' || c == '=') vptr = name + i;
if ((c == ':' || c == '=') && vptr == NULL) vptr = name + i;
}
if (c != CHAR_RIGHT_CURLY_BRACKET) goto ERROR_RETURN;
name[i] = 0;
}
@ -2137,25 +2139,56 @@ else goto ERROR_RETURN;
*ptrptr = ptr;
/* If the property contains ':' or '=' we have class name and value separately
specified. The only case currently supported is Bidi_Class (synonym BC), for
which the property names are "bidi<name>". */
specified. The following are supported:
. Bidi_Class (synonym bc), for which the property names are "bidi<name>".
. Script (synonym sc) for which the property name is the script name
. Script_Extensions (synonym scx), ditto
As this is a small number, we currently just check the names directly. If this
grows, a sorted table and a switch will be neater.
For both the script properties, set a PT_xxx value so that (1) they can be
distinguished and (2) invalid script names that happen to be the name of
another property can be diagnosed. */
if (vptr != NULL)
{
*vptr = 0; /* Terminate property name */
if (PRIV(strcmp_c8)(name, "bidiclass") != 0 &&
PRIV(strcmp_c8)(name, "bc") != 0)
int offset = 0;
PCRE2_UCHAR sname[8];
*vptr = 0; /* Terminate property name */
if (PRIV(strcmp_c8)(name, STRING_bidiclass) == 0 ||
PRIV(strcmp_c8)(name, STRING_bc) == 0)
{
offset = 4;
sname[0] = CHAR_b;
sname[1] = CHAR_i; /* There is no strcpy_c8 function */
sname[2] = CHAR_d;
sname[3] = CHAR_i;
}
else if (PRIV(strcmp_c8)(name, STRING_script) == 0 ||
PRIV(strcmp_c8)(name, STRING_sc) == 0)
ptscript = PT_SC;
else if (PRIV(strcmp_c8)(name, STRING_scriptextensions) == 0 ||
PRIV(strcmp_c8)(name, STRING_scx) == 0)
ptscript = PT_SCX;
else
{
*errorcodeptr = ERR47;
return FALSE;
}
memmove(name + 4, vptr + 1, (name + i - vptr)*sizeof(PCRE2_UCHAR));
name[1] = 'i'; /* Can't use PRIV(strcpy)() because it adds 0 */
name[2] = 'd';
name[3] = 'i';
/* Adjust the string in name[] as needed */
memmove(name + offset, vptr + 1, (name + i - vptr)*sizeof(PCRE2_UCHAR));
if (offset != 0) memmove(name, sname, offset*sizeof(PCRE2_UCHAR));
}
/* Search for a recognized property name using binary chop. */
/* Search for a recognized property using binary chop. */
bot = 0;
top = PRIV(utt_size);
@ -2165,16 +2198,27 @@ while (bot < top)
int r;
i = (bot + top) >> 1;
r = PRIV(strcmp_c8)(name, PRIV(utt_names) + PRIV(utt)[i].name_offset);
/* When a matching property is found, some extra checking is needed when the
\p{xx:yy} syntax is used and xx is either sc or scx. */
if (r == 0)
{
*ptypeptr = PRIV(utt)[i].type;
*pdataptr = PRIV(utt)[i].value;
if (vptr == NULL || ptscript == PT_NOTSCRIPT)
*ptypeptr = PRIV(utt)[i].type;
else
{
if (PRIV(utt)[i].type != PT_SCX) break; /* Non-script found */
*ptypeptr = ptscript;
}
return TRUE;
}
if (r > 0) bot = i + 1; else top = i;
}
*errorcodeptr = ERR47; /* Unrecognized name */
*errorcodeptr = ERR47; /* Unrecognized property */
return FALSE;
ERROR_RETURN: /* Malformed \P or \p */
@ -5858,7 +5902,7 @@ for (;; pptr++)
case ESC_D:
should_flip_negation = TRUE;
for (int i = 0; i < 32; i++)
for (int i = 0; i < 32; i++)
classbits[i] |= (uint8_t)(~cbits[i+cbit_digit]);
break;
@ -5868,7 +5912,7 @@ for (;; pptr++)
case ESC_W:
should_flip_negation = TRUE;
for (int i = 0; i < 32; i++)
for (int i = 0; i < 32; i++)
classbits[i] |= (uint8_t)(~cbits[i+cbit_word]);
break;
@ -5885,7 +5929,7 @@ for (;; pptr++)
case ESC_S:
should_flip_negation = TRUE;
for (int i = 0; i < 32; i++)
for (int i = 0; i < 32; i++)
classbits[i] |= (uint8_t)(~cbits[i+cbit_space]);
break;
@ -6276,7 +6320,7 @@ for (;; pptr++)
bravalue = OP_COND;
{
int count, index;
unsigned int i;
unsigned int i;
PCRE2_SPTR name;
named_group *ng = cb->named_groups;
uint32_t length = *(++pptr);

View File

@ -1193,6 +1193,12 @@ for (;;)
OK = prop->script == code[2];
break;
case PT_SCX:
OK = prop->script == code[2] || prop->scriptx == (int)code[2];
if (!OK && prop->scriptx < 0)
OK = MAPBIT(PRIV(ucd_script_sets) - prop->scriptx, code[2]) != 0;
break;
/* These are specials for combination cases. */
case PT_ALNUM:
@ -1459,6 +1465,12 @@ for (;;)
OK = prop->script == code[3];
break;
case PT_SCX:
OK = prop->script == code[3] || prop->scriptx == (int)code[3];
if (!OK && prop->scriptx < 0)
OK = MAPBIT(PRIV(ucd_script_sets) - prop->scriptx, code[3]) != 0;
break;
/* These are specials for combination cases. */
case PT_ALNUM:
@ -1708,6 +1720,12 @@ for (;;)
OK = prop->script == code[3];
break;
case PT_SCX:
OK = prop->script == code[3] || prop->scriptx == (int)code[3];
if (!OK && prop->scriptx < 0)
OK = MAPBIT(PRIV(ucd_script_sets) - prop->scriptx, code[3]) != 0;
break;
/* These are specials for combination cases. */
case PT_ALNUM:
@ -1982,6 +2000,14 @@ for (;;)
OK = prop->script == code[1 + IMM2_SIZE + 2];
break;
case PT_SCX:
OK = prop->script == code[1 + IMM2_SIZE + 2] ||
prop->scriptx == (int)code[1 + IMM2_SIZE + 2];
if (!OK && prop->scriptx < 0)
OK = MAPBIT(PRIV(ucd_script_sets) - prop->scriptx,
code[1 + IMM2_SIZE + 2]) != 0;
break;
/* These are specials for combination cases. */
case PT_ALNUM:

View File

@ -119,7 +119,7 @@ static const unsigned char compile_error_texts[] =
/* 45 */
"this version of PCRE2 does not have support for \\P, \\p, or \\X\0"
"malformed \\P or \\p sequence\0"
"unknown property name after \\P or \\p\0"
"unknown property after \\P or \\p\0"
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " code units)\0"
"too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0"
/* 50 */

View File

@ -954,6 +954,13 @@ a positive value. */
#define STRING_LIMIT_RECURSION_EQ "LIMIT_RECURSION="
#define STRING_MARK "MARK"
#define STRING_bc "bc"
#define STRING_bidiclass "bidiclass"
#define STRING_sc "sc"
#define STRING_script "script"
#define STRING_scriptextensions "scriptextensions"
#define STRING_scx "scx"
#else /* SUPPORT_UNICODE */
/* UTF-8 support is enabled; always use UTF-8 (=ASCII) character codes. This
@ -1248,28 +1255,39 @@ only. */
#define STRING_LIMIT_RECURSION_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_R STR_E STR_C STR_U STR_R STR_S STR_I STR_O STR_N STR_EQUALS_SIGN
#define STRING_MARK STR_M STR_A STR_R STR_K
#define STRING_bc STR_b STR_c
#define STRING_bidiclass STR_b STR_i STR_d STR_i STR_c STR_l STR_a STR_s STR_s
#define STRING_sc STR_s STR_c
#define STRING_script STR_s STR_c STR_r STR_i STR_p STR_t
#define STRING_scriptextensions STR_s STR_c STR_r STR_i STR_p STR_t STR_e STR_x STR_t STR_e STR_n STR_s STR_i STR_o STR_n STR_s
#define STRING_scx STR_s STR_c STR_x
#endif /* SUPPORT_UNICODE */
/* -------------------- End of character and string names -------------------*/
/* -------------------- Definitions for compiled patterns -------------------*/
/* Codes for different types of Unicode property */
/* Codes for different types of Unicode property. If these definitions are
changed, the autopossessifying table in pcre2_auto_possess.c must be updated to
match. */
#define PT_ANY 0 /* Any property - matches all chars */
#define PT_LAMP 1 /* L& - the union of Lu, Ll, Lt */
#define PT_GC 2 /* Specified general characteristic (e.g. L) */
#define PT_PC 3 /* Specified particular characteristic (e.g. Lu) */
#define PT_SC 4 /* Script (e.g. Han) */
#define PT_ALNUM 5 /* Alphanumeric - the union of L and N */
#define PT_SPACE 6 /* Perl space - general category Z plus 9,10,12,13 */
#define PT_PXSPACE 7 /* POSIX space - Z plus 9,10,11,12,13 */
#define PT_WORD 8 /* Word - L plus N plus underscore */
#define PT_CLIST 9 /* Pseudo-property: match character list */
#define PT_UCNC 10 /* Universal Character nameable character */
#define PT_BIDICL 11 /* Specified bidi class */
#define PT_BIDICO 12 /* Bidi control character */
#define PT_TABSIZE 13 /* Size of square table for autopossessify tests */
#define PT_SC 4 /* Script only (e.g. Han) */
#define PT_SCX 5 /* Script extensions (includes SC) */
#define PT_ALNUM 6 /* Alphanumeric - the union of L and N */
#define PT_SPACE 7 /* Perl space - general category Z plus 9,10,12,13 */
#define PT_PXSPACE 8 /* POSIX space - Z plus 9,10,11,12,13 */
#define PT_WORD 9 /* Word - L plus N plus underscore */
#define PT_CLIST 10 /* Pseudo-property: match character list */
#define PT_UCNC 11 /* Universal Character nameable character */
#define PT_BIDICL 12 /* Specified bidi class */
#define PT_BIDICO 13 /* Bidi control character */
#define PT_TABSIZE 14 /* Size of square table for autopossessify tests */
/* The following special properties are used only in XCLASS items, when POSIX
classes are specified and PCRE2_UCP is set - in other words, for Unicode
@ -1277,9 +1295,14 @@ handling of these classes. They are not available via the \p or \P escapes like
those in the above list, and so they do not take part in the autopossessifying
table. */
#define PT_PXGRAPH 13 /* [:graph:] - characters that mark the paper */
#define PT_PXPRINT 14 /* [:print:] - [:graph:] plus non-control spaces */
#define PT_PXPUNCT 15 /* [:punct:] - punctuation characters */
#define PT_PXGRAPH 14 /* [:graph:] - characters that mark the paper */
#define PT_PXPRINT 15 /* [:print:] - [:graph:] plus non-control spaces */
#define PT_PXPUNCT 16 /* [:punct:] - punctuation characters */
/* This value is used when parsing \p and \P escapes to indicate that neither
\p{script:...} nor \p{scx:...} has been encountered. */
#define PT_NOTSCRIPT 255
/* Flag bits and data types for the extended class (OP_XCLASS) for classes that
contain characters with values greater than 255. */
@ -1826,6 +1849,12 @@ typedef struct {
#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case)))
#define UCD_SCRIPTX(ch) GET_UCD(ch)->scriptx
/* The "scriptx" field, when negative, gives an offset into a vector of 32-bit
words that form a bitmap representing a list of scripts. This macro tests for a
script in the map by number. */
#define MAPBIT(map,script) ((map)[(script)/32]&(1u<<((script)%32)))
/* The "bidi" field has the 0x80 bit set if the character has the Bidi_Control
property. The remaining bits hold the bidi class, but as there are only 23
classes, we can mask off 5 bits - leaving two free for the future. */
@ -1916,7 +1945,7 @@ extern const uint32_t PRIV(hspace_list)[];
extern const uint32_t PRIV(vspace_list)[];
extern const uint32_t PRIV(ucd_caseless_sets)[];
extern const uint32_t PRIV(ucd_digit_sets)[];
extern const uint8_t PRIV(ucd_script_sets)[];
extern const uint32_t PRIV(ucd_script_sets)[];
extern const ucd_record PRIV(ucd_records)[];
#if PCRE2_CODE_UNIT_WIDTH == 32
extern const ucd_record PRIV(dummy_ucd_record)[];

View File

@ -160,7 +160,7 @@ enum { RM100=100, RM101 };
enum { RM200=200, RM201, RM202, RM203, RM204, RM205, RM206, RM207,
RM208, RM209, RM210, RM211, RM212, RM213, RM214, RM215,
RM216, RM217, RM218, RM219, RM220, RM221, RM222, RM223,
RM224 };
RM224, RM225 };
#endif
/* Define short names for general fields in the current backtrack frame, which
@ -2452,6 +2452,17 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
RRETURN(MATCH_NOMATCH);
break;
case PT_SCX:
{
int scriptx = prop->scriptx;
BOOL ok = Fecode[2] == prop->script ||
Fecode[2] == (unsigned int)scriptx;
if (!ok && scriptx < 0)
ok = MAPBIT((PRIV(ucd_script_sets) - scriptx), Fecode[2]) != 0;
if (ok == notmatch) RRETURN(MATCH_NOMATCH);
}
break;
/* These are specials */
case PT_ALNUM:
@ -2713,6 +2724,28 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
}
break;
case PT_SCX:
for (i = 1; i <= Lmin; i++)
{
BOOL ok;
int scriptx;
const ucd_record *prop;
if (Feptr >= mb->end_subject)
{
SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH);
}
GETCHARINCTEST(fc, Feptr);
prop = GET_UCD(fc);
scriptx = prop->scriptx;
ok = prop->script == Lpropvalue || scriptx == (int)Lpropvalue;
if (!ok && scriptx < 0)
ok = MAPBIT(PRIV(ucd_script_sets) - scriptx, Lpropvalue) != 0;
if (ok == notmatch)
RRETURN(MATCH_NOMATCH);
}
break;
case PT_ALNUM:
for (i = 1; i <= Lmin; i++)
{
@ -3385,8 +3418,8 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
if (Lmin == Lmax) continue;
/* If minimizing, we have to test the rest of the pattern before each
subsequent match. This means we cannot use a local "notmatch" variable as
in the other cases. As all 4 temporary 32-bit values in the frame are
subsequent match. This means we cannot use a local "notmatch" variable as
in the other cases. As all 4 temporary 32-bit values in the frame are
already in use, just test the type each time. */
if (reptype == REPTYPE_MIN)
@ -3484,6 +3517,31 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
}
/* Control never gets here */
case PT_SCX:
for (;;)
{
BOOL ok;
int scriptx;
const ucd_record *prop;
RMATCH(Fecode, RM225);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH);
if (Feptr >= mb->end_subject)
{
SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH);
}
GETCHARINCTEST(fc, Feptr);
prop = GET_UCD(fc);
scriptx = prop->scriptx;
ok = prop->script == Lpropvalue || scriptx == (int)Lpropvalue;
if (!ok && scriptx < 0)
ok = MAPBIT(PRIV(ucd_script_sets) - scriptx, Lpropvalue) != 0;
if (ok == (Lctype == OP_NOTPROP))
RRETURN(MATCH_NOMATCH);
}
/* Control never gets here */
case PT_ALNUM:
for (;;)
{
@ -3947,8 +4005,8 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
}
/* If maximizing, it is worth using inline code for speed, doing the type
test once at the start (i.e. keep it out of the loops). Once again,
"notmatch" can be an ordinary local variable because the loops do not call
test once at the start (i.e. keep it out of the loops). Once again,
"notmatch" can be an ordinary local variable because the loops do not call
RMATCH. */
else
@ -4041,6 +4099,29 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
}
break;
case PT_SCX:
for (i = Lmin; i < Lmax; i++)
{
BOOL ok;
const ucd_record *prop;
int scriptx;
int len = 1;
if (Feptr >= mb->end_subject)
{
SCHECK_PARTIAL();
break;
}
GETCHARLENTEST(fc, Feptr, len);
prop = GET_UCD(fc);
scriptx = prop->scriptx;
ok = prop->script == Lpropvalue || scriptx == (int)Lpropvalue;
if (!ok && scriptx < 0)
ok = MAPBIT(PRIV(ucd_script_sets) - scriptx, Lpropvalue) != 0;
if (ok == notmatch) break;
Feptr+= len;
}
break;
case PT_ALNUM:
for (i = Lmin; i < Lmax; i++)
{
@ -6172,7 +6253,7 @@ switch (Freturn_id)
LBL(200) LBL(201) LBL(202) LBL(203) LBL(204) LBL(205) LBL(206)
LBL(207) LBL(208) LBL(209) LBL(210) LBL(211) LBL(212) LBL(213)
LBL(214) LBL(215) LBL(216) LBL(217) LBL(218) LBL(219) LBL(220)
LBL(221) LBL(222) LBL(223) LBL(224)
LBL(221) LBL(222) LBL(223) LBL(224) LBL(225)
#endif
default:

View File

@ -237,11 +237,15 @@ get_ucpname(unsigned int ptype, unsigned int pvalue)
{
#ifdef SUPPORT_UNICODE
int i;
if (ptype == PT_SC) ptype = PT_SCX; /* Table has scx values */
for (i = PRIV(utt_size) - 1; i >= 0; i--)
{
if (ptype == PRIV(utt)[i].type && pvalue == PRIV(utt)[i].value) break;
}
return (i >= 0)? PRIV(utt_names) + PRIV(utt)[i].name_offset : "??";
#else /* No UTF support */
(void)ptype;
(void)pvalue;
@ -273,8 +277,9 @@ print_prop(FILE *f, PCRE2_SPTR code, const char *before, const char *after)
{
if (code[1] != PT_CLIST)
{
const char *sc = (code[1] == PT_SC)? "script:" : "";
const char *s = get_ucpname(code[1], code[2]);
fprintf(f, "%s%s %c%s%s", before, OP_names[*code], toupper(s[0]), s+1, after);
fprintf(f, "%s%s %s%c%s%s", before, OP_names[*code], sc, toupper(s[0]), s+1, after);
}
else
{

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
New API code Copyright (c) 2016-2021 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -77,17 +77,17 @@ records (and is only likely to be a few hundred). */
#define SCRIPT_HANHIRAKATA (-99997)
#define SCRIPT_HANBOPOMOFO (-99996)
#define SCRIPT_HANHANGUL (-99995)
#define SCRIPT_LIST (-99994)
#define SCRIPT_MAP (-99994)
#define INTERSECTION_LIST_SIZE 50
#define MAPSIZE (ucp_Script_Count/32 + 1)
BOOL
PRIV(script_run)(PCRE2_SPTR ptr, PCRE2_SPTR endptr, BOOL utf)
{
#ifdef SUPPORT_UNICODE
int require_script = SCRIPT_UNSET;
uint8_t intersection_list[INTERSECTION_LIST_SIZE];
const uint8_t *require_list = NULL;
uint32_t intersection_map[MAPSIZE];
const uint32_t *require_map = NULL;
uint32_t require_digitset = 0;
uint32_t c;
@ -197,20 +197,13 @@ for (;;)
if (scriptx != ucp_Han && scriptx != ucp_Hangul) return FALSE;
break;
/* We have a list of scripts to check that is derived from one or
more previous characters. This is either one of the lists in
/* We have a bitmap of scripts to check that is derived from one or
more previous characters. This is either one of the maps in
ucd_script_sets[] (for one previous character) or the intersection of
several lists for multiple characters. */
several maps for multiple characters. */
case SCRIPT_LIST:
{
const uint8_t *list;
for (list = require_list; *list != 0; list++)
{
if (*list == scriptx) break;
}
if (*list == 0) return FALSE;
}
case SCRIPT_MAP:
if (MAPBIT(require_map, scriptx) == 0) return FALSE;
/* The rest of the string must be in this script, but we have to
allow for the Han complications. */
@ -249,19 +242,18 @@ for (;;)
} /* End of handing positive scriptx */
/* If scriptx is negative, this character is a mark-type character that
has a list of permitted scripts. */
has a list of permitted scripts, which are encoded in a bitmap. */
else
{
uint32_t chspecial;
const uint8_t *clist, *rlist;
const uint8_t *list = PRIV(ucd_script_sets) - scriptx;
const uint32_t *map = PRIV(ucd_script_sets) - scriptx;
switch(require_script)
{
case SCRIPT_UNSET:
require_list = PRIV(ucd_script_sets) - scriptx;
require_script = SCRIPT_LIST;
require_map = PRIV(ucd_script_sets) - scriptx;
require_script = SCRIPT_MAP;
break;
/* An inspection of the Unicode 11.0.0 files shows that there are the
@ -282,17 +274,11 @@ for (;;)
case SCRIPT_HANPENDING:
chspecial = 0;
for (; *list != 0; list++)
{
switch (*list)
{
case ucp_Bopomofo: chspecial |= FOUND_BOPOMOFO; break;
case ucp_Hiragana: chspecial |= FOUND_HIRAGANA; break;
case ucp_Katakana: chspecial |= FOUND_KATAKANA; break;
case ucp_Hangul: chspecial |= FOUND_HANGUL; break;
default: break;
}
}
if (MAPBIT(map, ucp_Bopomofo) != 0) chspecial |= FOUND_BOPOMOFO;
if (MAPBIT(map, ucp_Hiragana) != 0) chspecial |= FOUND_HIRAGANA;
if (MAPBIT(map, ucp_Katakana) != 0) chspecial |= FOUND_KATAKANA;
if (MAPBIT(map, ucp_Hangul) != 0) chspecial |= FOUND_HANGUL;
if (chspecial == 0) return FALSE;
@ -311,76 +297,44 @@ for (;;)
break;
case SCRIPT_HANHIRAKATA:
for (; *list != 0; list++)
{
if (*list == ucp_Hiragana || *list == ucp_Katakana) break;
}
if (*list == 0) return FALSE;
break;
if (MAPBIT(map, ucp_Hiragana) != 0) break;
if (MAPBIT(map, ucp_Katakana) != 0) break;
return FALSE;
case SCRIPT_HANBOPOMOFO:
for (; *list != 0; list++)
{
if (*list == ucp_Bopomofo) break;
}
if (*list == 0) return FALSE;
break;
if (MAPBIT(map, ucp_Bopomofo) != 0) break;
return FALSE;
case SCRIPT_HANHANGUL:
for (; *list != 0; list++)
{
if (*list == ucp_Hangul) break;
}
if (*list == 0) return FALSE;
break;
if (MAPBIT(map, ucp_Hangul) != 0) break;
return FALSE;
/* Previously encountered one or more characters that are allowed
with a list of scripts. Build the intersection of the required list
with this character's list in intersection_list[]. This code is
written so that it still works OK if the required list is already in
that vector. */
with this character's list in intersection_map[]. */
case SCRIPT_LIST:
{
int i = 0;
for (rlist = require_list; *rlist != 0; rlist++)
{
for (clist = list; *clist != 0; clist++)
{
if (*rlist == *clist)
{
intersection_list[i++] = *rlist;
break;
}
}
}
if (i == 0) return FALSE; /* No scripts in common */
case SCRIPT_MAP:
for (int i = 0; i < MAPSIZE; i++)
intersection_map[i] = require_map[i] & map[i];
/* If there's just one script in common, we could set it as the
unique required script. However, in the new bitmap arrangements,
finding the one script is expensive, so leave this out for now.
Otherwise, make the intersection map the required map. */
/* If there's just one script in common, we can set it as the
unique required script. Otherwise, terminate the intersection list
and make it the required list. */
/*
if (onescript >= 0) require_script = onescript;
else require_map = intersection_map;
*/
if (i == 1)
{
require_script = intersection_list[0];
}
else
{
intersection_list[i] = 0;
require_list = intersection_list;
}
}
require_map = intersection_map;
break;
/* The previously set required script is a single script, not
Han-related. Check that it is in this character's list. */
default:
for (; *list != 0; list++)
{
if (*list == require_script) break;
}
if (*list == 0) return FALSE;
if (MAPBIT(map, require_script) == 0) return FALSE;
break;
}
} /* End of handling negative scriptx */

View File

@ -710,19 +710,19 @@ const char PRIV(utt_names)[] =
STRING_zs0;
const ucp_type_table PRIV(utt)[] = {
{ 0, PT_SC, ucp_Adlam },
{ 6, PT_SC, ucp_Ahom },
{ 11, PT_SC, ucp_Anatolian_Hieroglyphs },
{ 0, PT_SCX, ucp_Adlam },
{ 6, PT_SCX, ucp_Ahom },
{ 11, PT_SCX, ucp_Anatolian_Hieroglyphs },
{ 32, PT_ANY, 0 },
{ 36, PT_SC, ucp_Arabic },
{ 43, PT_SC, ucp_Armenian },
{ 52, PT_SC, ucp_Avestan },
{ 60, PT_SC, ucp_Balinese },
{ 69, PT_SC, ucp_Bamum },
{ 75, PT_SC, ucp_Bassa_Vah },
{ 84, PT_SC, ucp_Batak },
{ 90, PT_SC, ucp_Bengali },
{ 98, PT_SC, ucp_Bhaiksuki },
{ 36, PT_SCX, ucp_Arabic },
{ 43, PT_SCX, ucp_Armenian },
{ 52, PT_SCX, ucp_Avestan },
{ 60, PT_SCX, ucp_Balinese },
{ 69, PT_SCX, ucp_Bamum },
{ 75, PT_SCX, ucp_Bassa_Vah },
{ 84, PT_SCX, ucp_Batak },
{ 90, PT_SCX, ucp_Bengali },
{ 98, PT_SCX, ucp_Bhaiksuki },
{ 108, PT_BIDICL, ucp_bidiAL },
{ 115, PT_BIDICL, ucp_bidiAN },
{ 122, PT_BIDICL, ucp_bidiB },
@ -748,197 +748,197 @@ const ucp_type_table PRIV(utt)[] = {
{ 272, PT_BIDICL, ucp_bidiRLO },
{ 280, PT_BIDICL, ucp_bidiS },
{ 286, PT_BIDICL, ucp_bidiWS },
{ 293, PT_SC, ucp_Bopomofo },
{ 302, PT_SC, ucp_Brahmi },
{ 309, PT_SC, ucp_Braille },
{ 317, PT_SC, ucp_Buginese },
{ 326, PT_SC, ucp_Buhid },
{ 293, PT_SCX, ucp_Bopomofo },
{ 302, PT_SCX, ucp_Brahmi },
{ 309, PT_SCX, ucp_Braille },
{ 317, PT_SCX, ucp_Buginese },
{ 326, PT_SCX, ucp_Buhid },
{ 332, PT_GC, ucp_C },
{ 334, PT_SC, ucp_Canadian_Aboriginal },
{ 353, PT_SC, ucp_Carian },
{ 360, PT_SC, ucp_Caucasian_Albanian },
{ 334, PT_SCX, ucp_Canadian_Aboriginal },
{ 353, PT_SCX, ucp_Carian },
{ 360, PT_SCX, ucp_Caucasian_Albanian },
{ 378, PT_PC, ucp_Cc },
{ 381, PT_PC, ucp_Cf },
{ 384, PT_SC, ucp_Chakma },
{ 391, PT_SC, ucp_Cham },
{ 396, PT_SC, ucp_Cherokee },
{ 405, PT_SC, ucp_Chorasmian },
{ 384, PT_SCX, ucp_Chakma },
{ 391, PT_SCX, ucp_Cham },
{ 396, PT_SCX, ucp_Cherokee },
{ 405, PT_SCX, ucp_Chorasmian },
{ 416, PT_PC, ucp_Cn },
{ 419, PT_PC, ucp_Co },
{ 422, PT_SC, ucp_Common },
{ 429, PT_SC, ucp_Coptic },
{ 422, PT_SCX, ucp_Common },
{ 429, PT_SCX, ucp_Coptic },
{ 436, PT_PC, ucp_Cs },
{ 439, PT_SC, ucp_Cuneiform },
{ 449, PT_SC, ucp_Cypriot },
{ 457, PT_SC, ucp_Cypro_Minoan },
{ 469, PT_SC, ucp_Cyrillic },
{ 478, PT_SC, ucp_Deseret },
{ 486, PT_SC, ucp_Devanagari },
{ 497, PT_SC, ucp_Dives_Akuru },
{ 508, PT_SC, ucp_Dogra },
{ 514, PT_SC, ucp_Duployan },
{ 523, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 543, PT_SC, ucp_Elbasan },
{ 551, PT_SC, ucp_Elymaic },
{ 559, PT_SC, ucp_Ethiopic },
{ 568, PT_SC, ucp_Georgian },
{ 577, PT_SC, ucp_Glagolitic },
{ 588, PT_SC, ucp_Gothic },
{ 595, PT_SC, ucp_Grantha },
{ 603, PT_SC, ucp_Greek },
{ 609, PT_SC, ucp_Gujarati },
{ 618, PT_SC, ucp_Gunjala_Gondi },
{ 631, PT_SC, ucp_Gurmukhi },
{ 640, PT_SC, ucp_Han },
{ 644, PT_SC, ucp_Hangul },
{ 651, PT_SC, ucp_Hanifi_Rohingya },
{ 666, PT_SC, ucp_Hanunoo },
{ 674, PT_SC, ucp_Hatran },
{ 681, PT_SC, ucp_Hebrew },
{ 688, PT_SC, ucp_Hiragana },
{ 697, PT_SC, ucp_Imperial_Aramaic },
{ 713, PT_SC, ucp_Inherited },
{ 723, PT_SC, ucp_Inscriptional_Pahlavi },
{ 744, PT_SC, ucp_Inscriptional_Parthian },
{ 766, PT_SC, ucp_Javanese },
{ 775, PT_SC, ucp_Kaithi },
{ 782, PT_SC, ucp_Kannada },
{ 790, PT_SC, ucp_Katakana },
{ 799, PT_SC, ucp_Kayah_Li },
{ 807, PT_SC, ucp_Kharoshthi },
{ 818, PT_SC, ucp_Khitan_Small_Script },
{ 836, PT_SC, ucp_Khmer },
{ 842, PT_SC, ucp_Khojki },
{ 849, PT_SC, ucp_Khudawadi },
{ 439, PT_SCX, ucp_Cuneiform },
{ 449, PT_SCX, ucp_Cypriot },
{ 457, PT_SCX, ucp_Cypro_Minoan },
{ 469, PT_SCX, ucp_Cyrillic },
{ 478, PT_SCX, ucp_Deseret },
{ 486, PT_SCX, ucp_Devanagari },
{ 497, PT_SCX, ucp_Dives_Akuru },
{ 508, PT_SCX, ucp_Dogra },
{ 514, PT_SCX, ucp_Duployan },
{ 523, PT_SCX, ucp_Egyptian_Hieroglyphs },
{ 543, PT_SCX, ucp_Elbasan },
{ 551, PT_SCX, ucp_Elymaic },
{ 559, PT_SCX, ucp_Ethiopic },
{ 568, PT_SCX, ucp_Georgian },
{ 577, PT_SCX, ucp_Glagolitic },
{ 588, PT_SCX, ucp_Gothic },
{ 595, PT_SCX, ucp_Grantha },
{ 603, PT_SCX, ucp_Greek },
{ 609, PT_SCX, ucp_Gujarati },
{ 618, PT_SCX, ucp_Gunjala_Gondi },
{ 631, PT_SCX, ucp_Gurmukhi },
{ 640, PT_SCX, ucp_Han },
{ 644, PT_SCX, ucp_Hangul },
{ 651, PT_SCX, ucp_Hanifi_Rohingya },
{ 666, PT_SCX, ucp_Hanunoo },
{ 674, PT_SCX, ucp_Hatran },
{ 681, PT_SCX, ucp_Hebrew },
{ 688, PT_SCX, ucp_Hiragana },
{ 697, PT_SCX, ucp_Imperial_Aramaic },
{ 713, PT_SCX, ucp_Inherited },
{ 723, PT_SCX, ucp_Inscriptional_Pahlavi },
{ 744, PT_SCX, ucp_Inscriptional_Parthian },
{ 766, PT_SCX, ucp_Javanese },
{ 775, PT_SCX, ucp_Kaithi },
{ 782, PT_SCX, ucp_Kannada },
{ 790, PT_SCX, ucp_Katakana },
{ 799, PT_SCX, ucp_Kayah_Li },
{ 807, PT_SCX, ucp_Kharoshthi },
{ 818, PT_SCX, ucp_Khitan_Small_Script },
{ 836, PT_SCX, ucp_Khmer },
{ 842, PT_SCX, ucp_Khojki },
{ 849, PT_SCX, ucp_Khudawadi },
{ 859, PT_GC, ucp_L },
{ 861, PT_LAMP, 0 },
{ 864, PT_SC, ucp_Lao },
{ 868, PT_SC, ucp_Latin },
{ 864, PT_SCX, ucp_Lao },
{ 868, PT_SCX, ucp_Latin },
{ 874, PT_LAMP, 0 },
{ 877, PT_SC, ucp_Lepcha },
{ 884, PT_SC, ucp_Limbu },
{ 890, PT_SC, ucp_Linear_A },
{ 898, PT_SC, ucp_Linear_B },
{ 906, PT_SC, ucp_Lisu },
{ 877, PT_SCX, ucp_Lepcha },
{ 884, PT_SCX, ucp_Limbu },
{ 890, PT_SCX, ucp_Linear_A },
{ 898, PT_SCX, ucp_Linear_B },
{ 906, PT_SCX, ucp_Lisu },
{ 911, PT_PC, ucp_Ll },
{ 914, PT_PC, ucp_Lm },
{ 917, PT_PC, ucp_Lo },
{ 920, PT_PC, ucp_Lt },
{ 923, PT_PC, ucp_Lu },
{ 926, PT_SC, ucp_Lycian },
{ 933, PT_SC, ucp_Lydian },
{ 926, PT_SCX, ucp_Lycian },
{ 933, PT_SCX, ucp_Lydian },
{ 940, PT_GC, ucp_M },
{ 942, PT_SC, ucp_Mahajani },
{ 951, PT_SC, ucp_Makasar },
{ 959, PT_SC, ucp_Malayalam },
{ 969, PT_SC, ucp_Mandaic },
{ 977, PT_SC, ucp_Manichaean },
{ 988, PT_SC, ucp_Marchen },
{ 996, PT_SC, ucp_Masaram_Gondi },
{ 942, PT_SCX, ucp_Mahajani },
{ 951, PT_SCX, ucp_Makasar },
{ 959, PT_SCX, ucp_Malayalam },
{ 969, PT_SCX, ucp_Mandaic },
{ 977, PT_SCX, ucp_Manichaean },
{ 988, PT_SCX, ucp_Marchen },
{ 996, PT_SCX, ucp_Masaram_Gondi },
{ 1009, PT_PC, ucp_Mc },
{ 1012, PT_PC, ucp_Me },
{ 1015, PT_SC, ucp_Medefaidrin },
{ 1027, PT_SC, ucp_Meetei_Mayek },
{ 1039, PT_SC, ucp_Mende_Kikakui },
{ 1052, PT_SC, ucp_Meroitic_Cursive },
{ 1068, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 1088, PT_SC, ucp_Miao },
{ 1015, PT_SCX, ucp_Medefaidrin },
{ 1027, PT_SCX, ucp_Meetei_Mayek },
{ 1039, PT_SCX, ucp_Mende_Kikakui },
{ 1052, PT_SCX, ucp_Meroitic_Cursive },
{ 1068, PT_SCX, ucp_Meroitic_Hieroglyphs },
{ 1088, PT_SCX, ucp_Miao },
{ 1093, PT_PC, ucp_Mn },
{ 1096, PT_SC, ucp_Modi },
{ 1101, PT_SC, ucp_Mongolian },
{ 1111, PT_SC, ucp_Mro },
{ 1115, PT_SC, ucp_Multani },
{ 1123, PT_SC, ucp_Myanmar },
{ 1096, PT_SCX, ucp_Modi },
{ 1101, PT_SCX, ucp_Mongolian },
{ 1111, PT_SCX, ucp_Mro },
{ 1115, PT_SCX, ucp_Multani },
{ 1123, PT_SCX, ucp_Myanmar },
{ 1131, PT_GC, ucp_N },
{ 1133, PT_SC, ucp_Nabataean },
{ 1143, PT_SC, ucp_Nandinagari },
{ 1133, PT_SCX, ucp_Nabataean },
{ 1143, PT_SCX, ucp_Nandinagari },
{ 1155, PT_PC, ucp_Nd },
{ 1158, PT_SC, ucp_Newa },
{ 1163, PT_SC, ucp_New_Tai_Lue },
{ 1173, PT_SC, ucp_Nko },
{ 1158, PT_SCX, ucp_Newa },
{ 1163, PT_SCX, ucp_New_Tai_Lue },
{ 1173, PT_SCX, ucp_Nko },
{ 1177, PT_PC, ucp_Nl },
{ 1180, PT_PC, ucp_No },
{ 1183, PT_SC, ucp_Nushu },
{ 1189, PT_SC, ucp_Nyiakeng_Puachue_Hmong },
{ 1210, PT_SC, ucp_Ogham },
{ 1216, PT_SC, ucp_Ol_Chiki },
{ 1224, PT_SC, ucp_Old_Hungarian },
{ 1237, PT_SC, ucp_Old_Italic },
{ 1247, PT_SC, ucp_Old_North_Arabian },
{ 1263, PT_SC, ucp_Old_Permic },
{ 1273, PT_SC, ucp_Old_Persian },
{ 1284, PT_SC, ucp_Old_Sogdian },
{ 1295, PT_SC, ucp_Old_South_Arabian },
{ 1311, PT_SC, ucp_Old_Turkic },
{ 1321, PT_SC, ucp_Old_Uyghur },
{ 1331, PT_SC, ucp_Oriya },
{ 1337, PT_SC, ucp_Osage },
{ 1343, PT_SC, ucp_Osmanya },
{ 1183, PT_SCX, ucp_Nushu },
{ 1189, PT_SCX, ucp_Nyiakeng_Puachue_Hmong },
{ 1210, PT_SCX, ucp_Ogham },
{ 1216, PT_SCX, ucp_Ol_Chiki },
{ 1224, PT_SCX, ucp_Old_Hungarian },
{ 1237, PT_SCX, ucp_Old_Italic },
{ 1247, PT_SCX, ucp_Old_North_Arabian },
{ 1263, PT_SCX, ucp_Old_Permic },
{ 1273, PT_SCX, ucp_Old_Persian },
{ 1284, PT_SCX, ucp_Old_Sogdian },
{ 1295, PT_SCX, ucp_Old_South_Arabian },
{ 1311, PT_SCX, ucp_Old_Turkic },
{ 1321, PT_SCX, ucp_Old_Uyghur },
{ 1331, PT_SCX, ucp_Oriya },
{ 1337, PT_SCX, ucp_Osage },
{ 1343, PT_SCX, ucp_Osmanya },
{ 1351, PT_GC, ucp_P },
{ 1353, PT_SC, ucp_Pahawh_Hmong },
{ 1365, PT_SC, ucp_Palmyrene },
{ 1375, PT_SC, ucp_Pau_Cin_Hau },
{ 1353, PT_SCX, ucp_Pahawh_Hmong },
{ 1365, PT_SCX, ucp_Palmyrene },
{ 1375, PT_SCX, ucp_Pau_Cin_Hau },
{ 1385, PT_PC, ucp_Pc },
{ 1388, PT_PC, ucp_Pd },
{ 1391, PT_PC, ucp_Pe },
{ 1394, PT_PC, ucp_Pf },
{ 1397, PT_SC, ucp_Phags_Pa },
{ 1405, PT_SC, ucp_Phoenician },
{ 1397, PT_SCX, ucp_Phags_Pa },
{ 1405, PT_SCX, ucp_Phoenician },
{ 1416, PT_PC, ucp_Pi },
{ 1419, PT_PC, ucp_Po },
{ 1422, PT_PC, ucp_Ps },
{ 1425, PT_SC, ucp_Psalter_Pahlavi },
{ 1440, PT_SC, ucp_Rejang },
{ 1447, PT_SC, ucp_Runic },
{ 1425, PT_SCX, ucp_Psalter_Pahlavi },
{ 1440, PT_SCX, ucp_Rejang },
{ 1447, PT_SCX, ucp_Runic },
{ 1453, PT_GC, ucp_S },
{ 1455, PT_SC, ucp_Samaritan },
{ 1465, PT_SC, ucp_Saurashtra },
{ 1455, PT_SCX, ucp_Samaritan },
{ 1465, PT_SCX, ucp_Saurashtra },
{ 1476, PT_PC, ucp_Sc },
{ 1479, PT_SC, ucp_Sharada },
{ 1487, PT_SC, ucp_Shavian },
{ 1495, PT_SC, ucp_Siddham },
{ 1503, PT_SC, ucp_SignWriting },
{ 1515, PT_SC, ucp_Sinhala },
{ 1479, PT_SCX, ucp_Sharada },
{ 1487, PT_SCX, ucp_Shavian },
{ 1495, PT_SCX, ucp_Siddham },
{ 1503, PT_SCX, ucp_SignWriting },
{ 1515, PT_SCX, ucp_Sinhala },
{ 1523, PT_PC, ucp_Sk },
{ 1526, PT_PC, ucp_Sm },
{ 1529, PT_PC, ucp_So },
{ 1532, PT_SC, ucp_Sogdian },
{ 1540, PT_SC, ucp_Sora_Sompeng },
{ 1552, PT_SC, ucp_Soyombo },
{ 1560, PT_SC, ucp_Sundanese },
{ 1570, PT_SC, ucp_Syloti_Nagri },
{ 1582, PT_SC, ucp_Syriac },
{ 1589, PT_SC, ucp_Tagalog },
{ 1597, PT_SC, ucp_Tagbanwa },
{ 1606, PT_SC, ucp_Tai_Le },
{ 1612, PT_SC, ucp_Tai_Tham },
{ 1620, PT_SC, ucp_Tai_Viet },
{ 1628, PT_SC, ucp_Takri },
{ 1634, PT_SC, ucp_Tamil },
{ 1640, PT_SC, ucp_Tangsa },
{ 1647, PT_SC, ucp_Tangut },
{ 1654, PT_SC, ucp_Telugu },
{ 1661, PT_SC, ucp_Thaana },
{ 1668, PT_SC, ucp_Thai },
{ 1673, PT_SC, ucp_Tibetan },
{ 1681, PT_SC, ucp_Tifinagh },
{ 1690, PT_SC, ucp_Tirhuta },
{ 1698, PT_SC, ucp_Toto },
{ 1703, PT_SC, ucp_Ugaritic },
{ 1712, PT_SC, ucp_Unknown },
{ 1720, PT_SC, ucp_Vai },
{ 1724, PT_SC, ucp_Vithkuqi },
{ 1733, PT_SC, ucp_Wancho },
{ 1740, PT_SC, ucp_Warang_Citi },
{ 1532, PT_SCX, ucp_Sogdian },
{ 1540, PT_SCX, ucp_Sora_Sompeng },
{ 1552, PT_SCX, ucp_Soyombo },
{ 1560, PT_SCX, ucp_Sundanese },
{ 1570, PT_SCX, ucp_Syloti_Nagri },
{ 1582, PT_SCX, ucp_Syriac },
{ 1589, PT_SCX, ucp_Tagalog },
{ 1597, PT_SCX, ucp_Tagbanwa },
{ 1606, PT_SCX, ucp_Tai_Le },
{ 1612, PT_SCX, ucp_Tai_Tham },
{ 1620, PT_SCX, ucp_Tai_Viet },
{ 1628, PT_SCX, ucp_Takri },
{ 1634, PT_SCX, ucp_Tamil },
{ 1640, PT_SCX, ucp_Tangsa },
{ 1647, PT_SCX, ucp_Tangut },
{ 1654, PT_SCX, ucp_Telugu },
{ 1661, PT_SCX, ucp_Thaana },
{ 1668, PT_SCX, ucp_Thai },
{ 1673, PT_SCX, ucp_Tibetan },
{ 1681, PT_SCX, ucp_Tifinagh },
{ 1690, PT_SCX, ucp_Tirhuta },
{ 1698, PT_SCX, ucp_Toto },
{ 1703, PT_SCX, ucp_Ugaritic },
{ 1712, PT_SCX, ucp_Unknown },
{ 1720, PT_SCX, ucp_Vai },
{ 1724, PT_SCX, ucp_Vithkuqi },
{ 1733, PT_SCX, ucp_Wancho },
{ 1740, PT_SCX, ucp_Warang_Citi },
{ 1751, PT_ALNUM, 0 },
{ 1755, PT_PXSPACE, 0 },
{ 1759, PT_SPACE, 0 },
{ 1763, PT_UCNC, 0 },
{ 1767, PT_WORD, 0 },
{ 1771, PT_SC, ucp_Yezidi },
{ 1778, PT_SC, ucp_Yi },
{ 1771, PT_SCX, ucp_Yezidi },
{ 1778, PT_SCX, ucp_Yi },
{ 1781, PT_GC, ucp_Z },
{ 1783, PT_SC, ucp_Zanabazar_Square },
{ 1783, PT_SCX, ucp_Zanabazar_Square },
{ 1799, PT_PC, ucp_Zl },
{ 1802, PT_PC, ucp_Zp },
{ 1805, PT_PC, ucp_Zs }

View File

@ -130,66 +130,65 @@ const uint32_t PRIV(ucd_digit_sets)[] = {
0x1e959, 0x1fbf9,
};
/* This vector is a list of lists of scripts for the Script Extension
property. Each sublist is zero-terminated. */
/* This vector is a list of script bitsets for the Script Extension
property. */
const uint8_t PRIV(ucd_script_sets)[] = {
/* 0 */ 0,
/* 1 */ 1, 11, 0,
/* 4 */ 1, 144, 0,
/* 7 */ 1, 64, 0,
/* 10 */ 1, 50, 0,
/* 13 */ 1, 56, 0,
/* 16 */ 3, 15, 0,
/* 19 */ 4, 23, 0,
/* 22 */ 6, 84, 0,
/* 25 */ 12, 36, 0,
/* 28 */ 13, 18, 0,
/* 31 */ 13, 34, 0,
/* 34 */ 13, 118, 0,
/* 37 */ 13, 50, 0,
/* 40 */ 15, 107, 0,
/* 43 */ 15, 150, 0,
/* 46 */ 15, 100, 0,
/* 49 */ 15, 54, 0,
/* 52 */ 17, 34, 0,
/* 55 */ 107, 54, 0,
/* 58 */ 21, 108, 0,
/* 61 */ 22, 129, 0,
/* 64 */ 23, 34, 0,
/* 67 */ 27, 30, 0,
/* 70 */ 29, 150, 0,
/* 73 */ 34, 38, 0,
/* 76 */ 112, 158, 0,
/* 79 */ 38, 65, 0,
/* 82 */ 1, 50, 56, 0,
/* 86 */ 1, 56, 156, 0,
/* 90 */ 3, 96, 49, 0,
/* 94 */ 96, 39, 53, 0,
/* 98 */ 157, 12, 36, 0,
/* 102 */ 12, 110, 36, 0,
/* 106 */ 15, 107, 29, 0,
/* 110 */ 15, 107, 34, 0,
/* 114 */ 23, 27, 30, 0,
/* 118 */ 69, 34, 39, 0,
/* 122 */ 3, 15, 107, 29, 0,
/* 127 */ 7, 25, 52, 51, 0,
/* 132 */ 15, 142, 85, 111, 0,
/* 137 */ 4, 24, 23, 27, 30, 0,
/* 143 */ 1, 64, 144, 50, 56, 156, 0,
/* 150 */ 4, 24, 23, 27, 30, 61, 0,
/* 157 */ 15, 29, 37, 44, 54, 55, 0,
/* 164 */ 132, 1, 64, 144, 50, 56, 156, 0,
/* 172 */ 3, 15, 107, 29, 150, 44, 55, 124, 0,
/* 181 */ 132, 1, 95, 112, 158, 121, 144, 148, 50, 0,
/* 191 */ 15, 142, 21, 22, 108, 85, 111, 114, 109, 102, 124, 0,
/* 203 */ 3, 15, 107, 21, 22, 29, 34, 37, 44, 54, 55, 124, 0,
/* 216 */ 3, 15, 107, 21, 22, 29, 34, 37, 44, 100, 54, 55, 124, 0,
/* 230 */ 15, 142, 21, 22, 108, 29, 85, 111, 114, 150, 109, 102, 124, 0,
/* 244 */ 15, 142, 21, 22, 108, 29, 85, 111, 37, 114, 150, 109, 102, 124, 0,
/* 259 */ 3, 15, 142, 143, 138, 107, 21, 22, 29, 111, 37, 150, 44, 109, 48, 49, 102, 54, 55, 124, 0,
/* 280 */ 3, 15, 142, 143, 138, 107, 21, 22, 29, 35, 111, 37, 150, 44, 109, 48, 49, 102, 54, 55, 124, 0,
/* 302 */
const uint32_t PRIV(ucd_script_sets)[] = {
0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000802u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00010000u, 0x00000000u,
0x00000002u, 0x00000000u, 0x00000001u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x00040000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x01000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00008008u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00800010u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000040u, 0x00000000u, 0x00100000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00001000u, 0x00000010u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00042000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00002000u, 0x00000004u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00002000u, 0x00000000u, 0x00000000u, 0x00400000u, 0x00000000u, 0x00000000u,
0x00002000u, 0x00040000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00008000u, 0x00000000u, 0x00000000u, 0x00000800u, 0x00000000u, 0x00000000u,
0x00008000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00400000u, 0x00000000u,
0x00008000u, 0x00000000u, 0x00000000u, 0x00000010u, 0x00000000u, 0x00000000u,
0x00008000u, 0x00400000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00020000u, 0x00000004u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000000u, 0x00400000u, 0x00000000u, 0x00000800u, 0x00000000u, 0x00000000u,
0x00200000u, 0x00000000u, 0x00000000u, 0x00001000u, 0x00000000u, 0x00000000u,
0x00400000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000002u, 0x00000000u,
0x00800000u, 0x00000004u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x48000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x20000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00400000u, 0x00000000u,
0x00000000u, 0x00000044u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000000u, 0x00000000u, 0x00000000u, 0x00010000u, 0x40000000u, 0x00000000u,
0x00000000u, 0x00000040u, 0x00000002u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x01040000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x01000000u, 0x00000000u, 0x00000000u, 0x10000000u, 0x00000000u,
0x00000008u, 0x00020000u, 0x00000000u, 0x00000001u, 0x00000000u, 0x00000000u,
0x00000000u, 0x00200080u, 0x00000000u, 0x00000001u, 0x00000000u, 0x00000000u,
0x00001000u, 0x00000010u, 0x00000000u, 0x00000000u, 0x20000000u, 0x00000000u,
0x00001000u, 0x00000010u, 0x00000000u, 0x00004000u, 0x00000000u, 0x00000000u,
0x20008000u, 0x00000000u, 0x00000000u, 0x00000800u, 0x00000000u, 0x00000000u,
0x00008000u, 0x00000004u, 0x00000000u, 0x00000800u, 0x00000000u, 0x00000000u,
0x48800000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000000u, 0x00000084u, 0x00000020u, 0x00000000u, 0x00000000u, 0x00000000u,
0x20008008u, 0x00000000u, 0x00000000u, 0x00000800u, 0x00000000u, 0x00000000u,
0x02000080u, 0x00180000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00008000u, 0x00000000u, 0x00200000u, 0x00008000u, 0x00004000u, 0x00000000u,
0x49800010u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x01040000u, 0x00000001u, 0x00000000u, 0x10010000u, 0x00000000u,
0x49800010u, 0x20000000u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x20008000u, 0x00c01020u, 0x00000000u, 0x00000000u, 0x00000000u, 0x00000000u,
0x00000002u, 0x01040000u, 0x00000001u, 0x00000000u, 0x10010010u, 0x00000000u,
0x20008008u, 0x00801000u, 0x00000000u, 0x10000800u, 0x00400000u, 0x00000000u,
0x00000002u, 0x00040000u, 0x80000000u, 0x02010000u, 0x40110010u, 0x00000000u,
0x00608000u, 0x00000000u, 0x00200000u, 0x1004b040u, 0x00004000u, 0x00000000u,
0x20608008u, 0x00c01024u, 0x00000000u, 0x10000800u, 0x00000000u, 0x00000000u,
0x20608008u, 0x00c01024u, 0x00000000u, 0x10000810u, 0x00000000u, 0x00000000u,
0x20608000u, 0x00000000u, 0x00200000u, 0x1004b040u, 0x00404000u, 0x00000000u,
0x20608000u, 0x00000020u, 0x00200000u, 0x1004b040u, 0x00404000u, 0x00000000u,
0x20608008u, 0x00c31020u, 0x00000000u, 0x1000a840u, 0x0040c400u, 0x00000000u,
0x20608008u, 0x00c31028u, 0x00000000u, 0x1000a840u, 0x0040c400u, 0x00000000u,
};
/* These are the main two-stage UCD tables. The fields in each record are:
@ -407,9 +406,9 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 13, 9, 12, 88, 1, 13, 9, 0, }, /* 204 */
{ 13, 5, 12, 88, -1, 13, 9, 0, }, /* 205 */
{ 13, 26, 12, 0, 0, 13, 9, 0, }, /* 206 */
{ 13, 12, 3, 0, 0, -34, 13, 0, }, /* 207 */
{ 13, 12, 3, 0, 0, -28, 13, 0, }, /* 208 */
{ 28, 12, 3, 0, 0, -31, 13, 0, }, /* 209 */
{ 13, 12, 3, 0, 0, -72, 13, 0, }, /* 207 */
{ 13, 12, 3, 0, 0, -60, 13, 0, }, /* 208 */
{ 28, 12, 3, 0, 0, -66, 13, 0, }, /* 209 */
{ 13, 11, 3, 0, 0, 13, 13, 0, }, /* 210 */
{ 13, 9, 12, 0, 15, 13, 9, 0, }, /* 211 */
{ 13, 5, 12, 0, -15, 13, 9, 0, }, /* 212 */
@ -432,19 +431,19 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 1, 25, 12, 0, 0, 1, 0, 0, }, /* 229 */
{ 1, 21, 12, 0, 0, 1, 7, 0, }, /* 230 */
{ 1, 23, 12, 0, 0, 1, 0, 0, }, /* 231 */
{ 10, 21, 12, 0, 0, -143, 4, 0, }, /* 232 */
{ 10, 21, 12, 0, 0, -252, 4, 0, }, /* 232 */
{ 1, 21, 12, 0, 0, 1, 0, 0, }, /* 233 */
{ 1, 26, 12, 0, 0, 1, 14, 0, }, /* 234 */
{ 1, 12, 3, 0, 0, 1, 13, 0, }, /* 235 */
{ 10, 21, 12, 0, 0, -143, 0, 0, }, /* 236 */
{ 1, 1, 2, 0, 0, -82, 128, 0, }, /* 237 */
{ 10, 21, 12, 0, 0, -164, 0, 0, }, /* 238 */
{ 10, 21, 12, 0, 0, -252, 0, 0, }, /* 236 */
{ 1, 1, 2, 0, 0, -168, 128, 0, }, /* 237 */
{ 10, 21, 12, 0, 0, -270, 0, 0, }, /* 238 */
{ 1, 7, 12, 0, 0, 1, 0, 0, }, /* 239 */
{ 10, 6, 12, 0, 0, -181, 0, 0, }, /* 240 */
{ 28, 12, 3, 0, 0, -10, 13, 0, }, /* 241 */
{ 1, 13, 12, 0, 0, -86, 1, 0, }, /* 242 */
{ 10, 6, 12, 0, 0, -282, 0, 0, }, /* 240 */
{ 28, 12, 3, 0, 0, -24, 13, 0, }, /* 241 */
{ 1, 13, 12, 0, 0, -174, 1, 0, }, /* 242 */
{ 1, 21, 12, 0, 0, 1, 1, 0, }, /* 243 */
{ 1, 21, 12, 0, 0, -4, 0, 0, }, /* 244 */
{ 1, 21, 12, 0, 0, -12, 0, 0, }, /* 244 */
{ 1, 6, 12, 0, 0, 1, 0, 0, }, /* 245 */
{ 1, 13, 12, 0, 0, 1, 5, 0, }, /* 246 */
{ 1, 26, 12, 0, 0, 1, 0, 0, }, /* 247 */
@ -473,18 +472,18 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 15, 12, 3, 0, 0, 15, 13, 0, }, /* 270 */
{ 15, 10, 5, 0, 0, 15, 9, 0, }, /* 271 */
{ 15, 7, 12, 0, 0, 15, 9, 0, }, /* 272 */
{ 28, 12, 3, 0, 0, -216, 13, 0, }, /* 273 */
{ 28, 12, 3, 0, 0, -203, 13, 0, }, /* 274 */
{ 10, 21, 12, 0, 0, -259, 9, 0, }, /* 275 */
{ 10, 21, 12, 0, 0, -280, 9, 0, }, /* 276 */
{ 15, 13, 12, 0, 0, -132, 9, 0, }, /* 277 */
{ 28, 12, 3, 0, 0, -300, 13, 0, }, /* 273 */
{ 28, 12, 3, 0, 0, -294, 13, 0, }, /* 274 */
{ 10, 21, 12, 0, 0, -318, 9, 0, }, /* 275 */
{ 10, 21, 12, 0, 0, -324, 9, 0, }, /* 276 */
{ 15, 13, 12, 0, 0, -240, 9, 0, }, /* 277 */
{ 15, 21, 12, 0, 0, 15, 9, 0, }, /* 278 */
{ 15, 6, 12, 0, 0, 15, 9, 0, }, /* 279 */
{ 3, 7, 12, 0, 0, 3, 9, 0, }, /* 280 */
{ 3, 12, 3, 0, 0, 3, 13, 0, }, /* 281 */
{ 3, 10, 5, 0, 0, 3, 9, 0, }, /* 282 */
{ 3, 10, 3, 0, 0, 3, 9, 0, }, /* 283 */
{ 3, 13, 12, 0, 0, -90, 9, 0, }, /* 284 */
{ 3, 13, 12, 0, 0, -180, 9, 0, }, /* 284 */
{ 3, 23, 12, 0, 0, 3, 7, 0, }, /* 285 */
{ 3, 15, 12, 0, 0, 3, 9, 0, }, /* 286 */
{ 3, 26, 12, 0, 0, 3, 9, 0, }, /* 287 */
@ -492,12 +491,12 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 22, 12, 3, 0, 0, 22, 13, 0, }, /* 289 */
{ 22, 10, 5, 0, 0, 22, 9, 0, }, /* 290 */
{ 22, 7, 12, 0, 0, 22, 9, 0, }, /* 291 */
{ 22, 13, 12, 0, 0, -61, 9, 0, }, /* 292 */
{ 22, 13, 12, 0, 0, -126, 9, 0, }, /* 292 */
{ 22, 21, 12, 0, 0, 22, 9, 0, }, /* 293 */
{ 21, 12, 3, 0, 0, 21, 13, 0, }, /* 294 */
{ 21, 10, 5, 0, 0, 21, 9, 0, }, /* 295 */
{ 21, 7, 12, 0, 0, 21, 9, 0, }, /* 296 */
{ 21, 13, 12, 0, 0, -58, 9, 0, }, /* 297 */
{ 21, 13, 12, 0, 0, -120, 9, 0, }, /* 297 */
{ 21, 21, 12, 0, 0, 21, 9, 0, }, /* 298 */
{ 21, 23, 12, 0, 0, 21, 7, 0, }, /* 299 */
{ 44, 12, 3, 0, 0, 44, 13, 0, }, /* 300 */
@ -511,9 +510,9 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 54, 7, 12, 0, 0, 54, 9, 0, }, /* 308 */
{ 54, 10, 3, 0, 0, 54, 9, 0, }, /* 309 */
{ 54, 10, 5, 0, 0, 54, 9, 0, }, /* 310 */
{ 54, 13, 12, 0, 0, -55, 9, 0, }, /* 311 */
{ 54, 15, 12, 0, 0, -55, 9, 0, }, /* 312 */
{ 54, 26, 12, 0, 0, -55, 14, 0, }, /* 313 */
{ 54, 13, 12, 0, 0, -114, 9, 0, }, /* 311 */
{ 54, 15, 12, 0, 0, -114, 9, 0, }, /* 312 */
{ 54, 26, 12, 0, 0, -114, 14, 0, }, /* 313 */
{ 54, 26, 12, 0, 0, 54, 14, 0, }, /* 314 */
{ 54, 23, 12, 0, 0, 54, 7, 0, }, /* 315 */
{ 55, 12, 3, 0, 0, 55, 13, 0, }, /* 316 */
@ -529,7 +528,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 29, 21, 12, 0, 0, 29, 9, 0, }, /* 326 */
{ 29, 12, 3, 0, 0, 29, 9, 0, }, /* 327 */
{ 29, 10, 3, 0, 0, 29, 9, 0, }, /* 328 */
{ 29, 13, 12, 0, 0, -70, 9, 0, }, /* 329 */
{ 29, 13, 12, 0, 0, -144, 9, 0, }, /* 329 */
{ 37, 12, 3, 0, 0, 37, 13, 0, }, /* 330 */
{ 37, 10, 5, 0, 0, 37, 9, 0, }, /* 331 */
{ 37, 7, 12, 0, 0, 37, 9, 0, }, /* 332 */
@ -569,13 +568,13 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 39, 10, 12, 0, 0, 39, 9, 0, }, /* 366 */
{ 39, 12, 3, 0, 0, 39, 13, 0, }, /* 367 */
{ 39, 10, 5, 0, 0, 39, 9, 0, }, /* 368 */
{ 39, 13, 12, 0, 0, -94, 9, 0, }, /* 369 */
{ 39, 13, 12, 0, 0, -186, 9, 0, }, /* 369 */
{ 39, 21, 12, 0, 0, 39, 9, 0, }, /* 370 */
{ 39, 13, 12, 0, 0, 39, 9, 0, }, /* 371 */
{ 39, 26, 12, 0, 0, 39, 9, 0, }, /* 372 */
{ 17, 9, 12, 0, 7264, 17, 9, 0, }, /* 373 */
{ 17, 5, 12, 0, 3008, 17, 9, 0, }, /* 374 */
{ 10, 21, 12, 0, 0, -52, 9, 0, }, /* 375 */
{ 10, 21, 12, 0, 0, -108, 9, 0, }, /* 375 */
{ 17, 6, 12, 0, 0, 17, 9, 0, }, /* 376 */
{ 24, 7, 6, 0, 0, 24, 9, 0, }, /* 377 */
{ 24, 7, 7, 0, 0, 24, 9, 0, }, /* 378 */
@ -605,7 +604,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 25, 7, 12, 0, 0, 25, 9, 0, }, /* 402 */
{ 25, 12, 3, 0, 0, 25, 13, 0, }, /* 403 */
{ 25, 10, 5, 0, 0, 25, 9, 0, }, /* 404 */
{ 10, 21, 12, 0, 0, -127, 9, 0, }, /* 405 */
{ 10, 21, 12, 0, 0, -234, 9, 0, }, /* 405 */
{ 7, 7, 12, 0, 0, 7, 9, 0, }, /* 406 */
{ 7, 12, 3, 0, 0, 7, 13, 0, }, /* 407 */
{ 52, 7, 12, 0, 0, 52, 9, 0, }, /* 408 */
@ -619,7 +618,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 32, 13, 12, 0, 0, 32, 9, 0, }, /* 416 */
{ 32, 15, 12, 0, 0, 32, 14, 0, }, /* 417 */
{ 38, 21, 12, 0, 0, 38, 14, 0, }, /* 418 */
{ 10, 21, 12, 0, 0, -79, 14, 0, }, /* 419 */
{ 10, 21, 12, 0, 0, -162, 14, 0, }, /* 419 */
{ 38, 17, 12, 0, 0, 38, 14, 0, }, /* 420 */
{ 38, 12, 3, 0, 0, 38, 13, 0, }, /* 421 */
{ 38, 1, 2, 0, 0, 38, 3, 0, }, /* 422 */
@ -685,28 +684,28 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 13, 5, 12, 108, 35267, 13, 9, 0, }, /* 482 */
{ 17, 9, 12, 0, -3008, 17, 9, 0, }, /* 483 */
{ 76, 21, 12, 0, 0, 76, 9, 0, }, /* 484 */
{ 28, 12, 3, 0, 0, -122, 13, 0, }, /* 485 */
{ 28, 12, 3, 0, 0, -228, 13, 0, }, /* 485 */
{ 28, 12, 3, 0, 0, 15, 13, 0, }, /* 486 */
{ 10, 21, 12, 0, 0, -40, 9, 0, }, /* 487 */
{ 28, 12, 3, 0, 0, -16, 13, 0, }, /* 488 */
{ 28, 12, 3, 0, 0, -46, 13, 0, }, /* 489 */
{ 28, 12, 3, 0, 0, -157, 13, 0, }, /* 490 */
{ 10, 10, 5, 0, 0, -16, 9, 0, }, /* 491 */
{ 10, 7, 12, 0, 0, -43, 9, 0, }, /* 492 */
{ 10, 7, 12, 0, 0, -16, 9, 0, }, /* 493 */
{ 10, 21, 12, 0, 0, -84, 9, 0, }, /* 487 */
{ 28, 12, 3, 0, 0, -36, 13, 0, }, /* 488 */
{ 28, 12, 3, 0, 0, -96, 13, 0, }, /* 489 */
{ 28, 12, 3, 0, 0, -264, 13, 0, }, /* 490 */
{ 10, 10, 5, 0, 0, -36, 9, 0, }, /* 491 */
{ 10, 7, 12, 0, 0, -90, 9, 0, }, /* 492 */
{ 10, 7, 12, 0, 0, -36, 9, 0, }, /* 493 */
{ 10, 7, 12, 0, 0, 15, 9, 0, }, /* 494 */
{ 10, 7, 12, 0, 0, -172, 9, 0, }, /* 495 */
{ 10, 7, 12, 0, 0, -40, 9, 0, }, /* 496 */
{ 28, 12, 3, 0, 0, -106, 13, 0, }, /* 497 */
{ 10, 7, 12, 0, 0, -276, 9, 0, }, /* 495 */
{ 10, 7, 12, 0, 0, -84, 9, 0, }, /* 496 */
{ 28, 12, 3, 0, 0, -204, 13, 0, }, /* 497 */
{ 10, 10, 5, 0, 0, 3, 9, 0, }, /* 498 */
{ 28, 12, 3, 0, 0, -40, 13, 0, }, /* 499 */
{ 28, 12, 3, 0, 0, -84, 13, 0, }, /* 499 */
{ 10, 7, 12, 0, 0, 150, 9, 0, }, /* 500 */
{ 13, 5, 12, 0, 0, 13, 9, 0, }, /* 501 */
{ 13, 6, 12, 0, 0, 13, 9, 0, }, /* 502 */
{ 34, 5, 12, 0, 35332, 34, 9, 0, }, /* 503 */
{ 34, 5, 12, 0, 3814, 34, 9, 0, }, /* 504 */
{ 34, 5, 12, 0, 35384, 34, 9, 0, }, /* 505 */
{ 28, 12, 3, 0, 0, -37, 13, 0, }, /* 506 */
{ 28, 12, 3, 0, 0, -78, 13, 0, }, /* 506 */
{ 28, 12, 3, 0, 0, 50, 13, 0, }, /* 507 */
{ 34, 9, 12, 92, 1, 34, 9, 0, }, /* 508 */
{ 34, 5, 12, 92, -1, 34, 9, 0, }, /* 509 */
@ -742,7 +741,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 10, 1, 2, 0, 0, 10, 143, 0, }, /* 539 */
{ 10, 1, 2, 0, 0, 10, 140, 0, }, /* 540 */
{ 10, 1, 2, 0, 0, 10, 148, 0, }, /* 541 */
{ 10, 29, 12, 0, 0, -73, 4, 0, }, /* 542 */
{ 10, 29, 12, 0, 0, -150, 4, 0, }, /* 542 */
{ 10, 21, 14, 0, 0, 10, 14, 0, }, /* 543 */
{ 10, 25, 12, 0, 0, 10, 4, 0, }, /* 544 */
{ 0, 2, 2, 0, 0, 0, 3, 0, }, /* 545 */
@ -751,7 +750,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 10, 1, 2, 0, 0, 10, 136, 0, }, /* 548 */
{ 10, 1, 2, 0, 0, 10, 144, 0, }, /* 549 */
{ 0, 2, 12, 0, 0, 0, 7, 0, }, /* 550 */
{ 28, 12, 3, 0, 0, -110, 13, 0, }, /* 551 */
{ 28, 12, 3, 0, 0, -210, 13, 0, }, /* 551 */
{ 10, 9, 12, 0, 0, 10, 9, 0, }, /* 552 */
{ 10, 5, 12, 0, 0, 10, 9, 0, }, /* 553 */
{ 20, 9, 12, 96, -7517, 20, 9, 0, }, /* 554 */
@ -793,31 +792,31 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 59, 21, 12, 0, 0, 59, 9, 0, }, /* 590 */
{ 59, 12, 3, 0, 0, 59, 13, 0, }, /* 591 */
{ 13, 12, 3, 0, 0, 13, 13, 0, }, /* 592 */
{ 10, 21, 12, 0, 0, -28, 14, 0, }, /* 593 */
{ 10, 21, 12, 0, 0, -60, 14, 0, }, /* 593 */
{ 23, 26, 12, 0, 0, 23, 14, 0, }, /* 594 */
{ 10, 21, 12, 0, 0, -150, 14, 0, }, /* 595 */
{ 10, 21, 12, 0, 0, -137, 14, 0, }, /* 596 */
{ 10, 21, 12, 0, 0, -258, 14, 0, }, /* 595 */
{ 10, 21, 12, 0, 0, -246, 14, 0, }, /* 596 */
{ 23, 6, 12, 0, 0, 23, 9, 0, }, /* 597 */
{ 10, 7, 12, 0, 0, 23, 9, 0, }, /* 598 */
{ 23, 14, 12, 0, 0, 23, 9, 0, }, /* 599 */
{ 10, 22, 12, 0, 0, -150, 14, 0, }, /* 600 */
{ 10, 18, 12, 0, 0, -150, 14, 0, }, /* 601 */
{ 10, 26, 12, 0, 0, -137, 14, 0, }, /* 602 */
{ 10, 17, 12, 0, 0, -137, 14, 0, }, /* 603 */
{ 10, 22, 12, 0, 0, -137, 14, 0, }, /* 604 */
{ 10, 18, 12, 0, 0, -137, 14, 0, }, /* 605 */
{ 28, 12, 3, 0, 0, -19, 13, 0, }, /* 606 */
{ 10, 22, 12, 0, 0, -258, 14, 0, }, /* 600 */
{ 10, 18, 12, 0, 0, -258, 14, 0, }, /* 601 */
{ 10, 26, 12, 0, 0, -246, 14, 0, }, /* 602 */
{ 10, 17, 12, 0, 0, -246, 14, 0, }, /* 603 */
{ 10, 22, 12, 0, 0, -246, 14, 0, }, /* 604 */
{ 10, 18, 12, 0, 0, -246, 14, 0, }, /* 605 */
{ 28, 12, 3, 0, 0, -42, 13, 0, }, /* 606 */
{ 24, 10, 3, 0, 0, 24, 9, 0, }, /* 607 */
{ 10, 17, 14, 0, 0, -137, 14, 0, }, /* 608 */
{ 10, 6, 12, 0, 0, -67, 9, 0, }, /* 609 */
{ 10, 7, 12, 0, 0, -114, 9, 0, }, /* 610 */
{ 10, 21, 14, 0, 0, -114, 14, 0, }, /* 611 */
{ 10, 17, 14, 0, 0, -246, 14, 0, }, /* 608 */
{ 10, 6, 12, 0, 0, -138, 9, 0, }, /* 609 */
{ 10, 7, 12, 0, 0, -216, 9, 0, }, /* 610 */
{ 10, 21, 14, 0, 0, -216, 14, 0, }, /* 611 */
{ 10, 26, 12, 0, 0, 23, 14, 0, }, /* 612 */
{ 27, 7, 12, 0, 0, 27, 9, 0, }, /* 613 */
{ 28, 12, 3, 0, 0, -67, 13, 0, }, /* 614 */
{ 10, 24, 12, 0, 0, -67, 14, 0, }, /* 615 */
{ 28, 12, 3, 0, 0, -138, 13, 0, }, /* 614 */
{ 10, 24, 12, 0, 0, -138, 14, 0, }, /* 615 */
{ 27, 6, 12, 0, 0, 27, 9, 0, }, /* 616 */
{ 10, 17, 12, 0, 0, -67, 14, 0, }, /* 617 */
{ 10, 17, 12, 0, 0, -138, 14, 0, }, /* 617 */
{ 30, 7, 12, 0, 0, 30, 9, 0, }, /* 618 */
{ 30, 6, 12, 0, 0, 30, 9, 0, }, /* 619 */
{ 4, 7, 12, 0, 0, 4, 9, 0, }, /* 620 */
@ -849,7 +848,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 79, 14, 12, 0, 0, 79, 9, 0, }, /* 646 */
{ 79, 12, 3, 0, 0, 79, 13, 0, }, /* 647 */
{ 79, 21, 12, 0, 0, 79, 9, 0, }, /* 648 */
{ 10, 24, 12, 0, 0, -64, 14, 0, }, /* 649 */
{ 10, 24, 12, 0, 0, -132, 14, 0, }, /* 649 */
{ 34, 9, 12, 0, -35332, 34, 9, 0, }, /* 650 */
{ 10, 24, 12, 0, 0, 10, 9, 0, }, /* 651 */
{ 34, 9, 12, 0, -42280, 34, 9, 0, }, /* 652 */
@ -869,11 +868,11 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 49, 12, 3, 0, 0, 49, 13, 0, }, /* 666 */
{ 49, 10, 5, 0, 0, 49, 9, 0, }, /* 667 */
{ 49, 26, 12, 0, 0, 49, 14, 0, }, /* 668 */
{ 10, 15, 12, 0, 0, -244, 9, 0, }, /* 669 */
{ 10, 15, 12, 0, 0, -230, 9, 0, }, /* 670 */
{ 10, 26, 12, 0, 0, -191, 9, 0, }, /* 671 */
{ 10, 23, 12, 0, 0, -191, 7, 0, }, /* 672 */
{ 10, 26, 12, 0, 0, -191, 7, 0, }, /* 673 */
{ 10, 15, 12, 0, 0, -312, 9, 0, }, /* 669 */
{ 10, 15, 12, 0, 0, -306, 9, 0, }, /* 670 */
{ 10, 26, 12, 0, 0, -288, 9, 0, }, /* 671 */
{ 10, 23, 12, 0, 0, -288, 7, 0, }, /* 672 */
{ 10, 26, 12, 0, 0, -288, 7, 0, }, /* 673 */
{ 65, 7, 12, 0, 0, 65, 9, 0, }, /* 674 */
{ 65, 21, 12, 0, 0, 65, 14, 0, }, /* 675 */
{ 75, 10, 5, 0, 0, 75, 9, 0, }, /* 676 */
@ -881,12 +880,12 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 75, 12, 3, 0, 0, 75, 13, 0, }, /* 678 */
{ 75, 21, 12, 0, 0, 75, 9, 0, }, /* 679 */
{ 75, 13, 12, 0, 0, 75, 9, 0, }, /* 680 */
{ 15, 12, 3, 0, 0, -16, 13, 0, }, /* 681 */
{ 15, 7, 12, 0, 0, -49, 9, 0, }, /* 682 */
{ 15, 12, 3, 0, 0, -36, 13, 0, }, /* 681 */
{ 15, 7, 12, 0, 0, -102, 9, 0, }, /* 682 */
{ 69, 13, 12, 0, 0, 69, 9, 0, }, /* 683 */
{ 69, 7, 12, 0, 0, 69, 9, 0, }, /* 684 */
{ 69, 12, 3, 0, 0, 69, 13, 0, }, /* 685 */
{ 10, 21, 12, 0, 0, -118, 9, 0, }, /* 686 */
{ 10, 21, 12, 0, 0, -222, 9, 0, }, /* 686 */
{ 69, 21, 12, 0, 0, 69, 9, 0, }, /* 687 */
{ 74, 7, 12, 0, 0, 74, 9, 0, }, /* 688 */
{ 74, 12, 3, 0, 0, 74, 13, 0, }, /* 689 */
@ -896,7 +895,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 84, 10, 5, 0, 0, 84, 9, 0, }, /* 693 */
{ 84, 7, 12, 0, 0, 84, 9, 0, }, /* 694 */
{ 84, 21, 12, 0, 0, 84, 9, 0, }, /* 695 */
{ 10, 6, 12, 0, 0, -22, 9, 0, }, /* 696 */
{ 10, 6, 12, 0, 0, -48, 9, 0, }, /* 696 */
{ 84, 13, 12, 0, 0, 84, 9, 0, }, /* 697 */
{ 39, 6, 12, 0, 0, 39, 9, 0, }, /* 698 */
{ 68, 7, 12, 0, 0, 68, 9, 0, }, /* 699 */
@ -921,27 +920,27 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 0, 4, 12, 0, 0, 0, 9, 0, }, /* 718 */
{ 0, 3, 12, 0, 0, 0, 9, 0, }, /* 719 */
{ 26, 25, 12, 0, 0, 26, 6, 0, }, /* 720 */
{ 10, 18, 12, 0, 0, -7, 14, 0, }, /* 721 */
{ 10, 22, 12, 0, 0, -7, 14, 0, }, /* 722 */
{ 10, 18, 12, 0, 0, -18, 14, 0, }, /* 721 */
{ 10, 22, 12, 0, 0, -18, 14, 0, }, /* 722 */
{ 0, 2, 12, 0, 0, 0, 3, 0, }, /* 723 */
{ 1, 7, 12, 0, 0, -13, 0, 0, }, /* 724 */
{ 1, 26, 12, 0, 0, -13, 14, 0, }, /* 725 */
{ 10, 6, 3, 0, 0, -67, 9, 0, }, /* 726 */
{ 1, 7, 12, 0, 0, -30, 0, 0, }, /* 724 */
{ 1, 26, 12, 0, 0, -30, 14, 0, }, /* 725 */
{ 10, 6, 3, 0, 0, -138, 9, 0, }, /* 726 */
{ 10, 1, 2, 0, 0, 10, 14, 0, }, /* 727 */
{ 36, 7, 12, 0, 0, 36, 9, 0, }, /* 728 */
{ 10, 21, 12, 0, 0, -98, 9, 0, }, /* 729 */
{ 10, 21, 12, 0, 0, -98, 14, 0, }, /* 730 */
{ 10, 21, 12, 0, 0, -25, 9, 0, }, /* 731 */
{ 10, 15, 12, 0, 0, -102, 9, 0, }, /* 732 */
{ 10, 26, 12, 0, 0, -25, 9, 0, }, /* 733 */
{ 10, 21, 12, 0, 0, -192, 9, 0, }, /* 729 */
{ 10, 21, 12, 0, 0, -192, 14, 0, }, /* 730 */
{ 10, 21, 12, 0, 0, -54, 9, 0, }, /* 731 */
{ 10, 15, 12, 0, 0, -198, 9, 0, }, /* 732 */
{ 10, 26, 12, 0, 0, -54, 9, 0, }, /* 733 */
{ 20, 14, 12, 0, 0, 20, 14, 0, }, /* 734 */
{ 20, 15, 12, 0, 0, 20, 14, 0, }, /* 735 */
{ 20, 26, 12, 0, 0, 20, 14, 0, }, /* 736 */
{ 20, 26, 12, 0, 0, 20, 9, 0, }, /* 737 */
{ 71, 7, 12, 0, 0, 71, 9, 0, }, /* 738 */
{ 67, 7, 12, 0, 0, 67, 9, 0, }, /* 739 */
{ 28, 12, 3, 0, 0, -1, 13, 0, }, /* 740 */
{ 10, 15, 12, 0, 0, -1, 5, 0, }, /* 741 */
{ 28, 12, 3, 0, 0, -6, 13, 0, }, /* 740 */
{ 10, 15, 12, 0, 0, -6, 5, 0, }, /* 741 */
{ 42, 7, 12, 0, 0, 42, 9, 0, }, /* 742 */
{ 42, 15, 12, 0, 0, 42, 9, 0, }, /* 743 */
{ 19, 7, 12, 0, 0, 19, 9, 0, }, /* 744 */
@ -999,7 +998,7 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 112, 12, 3, 0, 0, 112, 13, 0, }, /* 796 */
{ 112, 15, 12, 0, 0, 112, 17, 0, }, /* 797 */
{ 112, 21, 12, 0, 0, 112, 17, 0, }, /* 798 */
{ 112, 21, 12, 0, 0, -76, 17, 0, }, /* 799 */
{ 112, 21, 12, 0, 0, -156, 17, 0, }, /* 799 */
{ 78, 7, 12, 0, 0, 78, 17, 0, }, /* 800 */
{ 78, 21, 12, 0, 0, 78, 14, 0, }, /* 801 */
{ 83, 7, 12, 0, 0, 83, 17, 0, }, /* 802 */
@ -1071,11 +1070,11 @@ const ucd_record PRIV(ucd_records)[] = { /* 12576 bytes, record size 12 */
{ 109, 10, 5, 0, 0, 109, 9, 0, }, /* 868 */
{ 109, 13, 12, 0, 0, 109, 9, 0, }, /* 869 */
{ 107, 12, 3, 0, 0, 107, 13, 0, }, /* 870 */
{ 107, 12, 3, 0, 0, -55, 13, 0, }, /* 871 */
{ 107, 12, 3, 0, 0, -114, 13, 0, }, /* 871 */
{ 107, 10, 5, 0, 0, 107, 9, 0, }, /* 872 */
{ 107, 10, 5, 0, 0, -55, 9, 0, }, /* 873 */
{ 107, 10, 5, 0, 0, -114, 9, 0, }, /* 873 */
{ 107, 7, 12, 0, 0, 107, 9, 0, }, /* 874 */
{ 28, 12, 3, 0, 0, -55, 13, 0, }, /* 875 */
{ 28, 12, 3, 0, 0, -114, 13, 0, }, /* 875 */
{ 107, 10, 3, 0, 0, 107, 9, 0, }, /* 876 */
{ 135, 7, 12, 0, 0, 135, 9, 0, }, /* 877 */
{ 135, 10, 5, 0, 0, 135, 9, 0, }, /* 878 */

View File

@ -325,7 +325,10 @@ enum {
ucp_Old_Uyghur,
ucp_Tangsa,
ucp_Toto,
ucp_Vithkuqi
ucp_Vithkuqi,
/* This must be last */
ucp_Script_Count
};
#endif /* PCRE2_UCP_H_IDEMPOTENT_GUARD */

View File

@ -134,7 +134,9 @@ while ((t = *data++) != XCL_END)
else /* XCL_PROP & XCL_NOTPROP */
{
const ucd_record *prop = GET_UCD(c);
int scriptx;
BOOL isprop = t == XCL_PROP;
BOOL ok;
switch(*data)
{
@ -160,6 +162,14 @@ while ((t = *data++) != XCL_END)
if ((data[1] == prop->script) == isprop) return !negated;
break;
case PT_SCX:
scriptx = prop->scriptx;
ok = data[1] == prop->script || data[1] == (PCRE2_UCHAR)scriptx;
if (!ok && scriptx < 0)
ok = MAPBIT(PRIV(ucd_script_sets) - scriptx, data[1]);
if (ok == isprop) return !negated;
break;
case PT_ALNUM:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N) == isprop)

598
testdata/testinput4 vendored

File diff suppressed because it is too large Load Diff

24
testdata/testinput5 vendored
View File

@ -1337,6 +1337,8 @@
# These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE
#subject no_jit
/^[\p{Batak}]/utf
\x{1bc0}
\x{1bff}
@ -1356,6 +1358,8 @@
\x{85c}
\x{85d}
#subject -no_jit
/(\X*)(.)/s,utf
A\x{300}
@ -2035,6 +2039,8 @@
# doesn't recognize all these scripts. In time these three tests can be moved
# to test 4.
#subject no_jit
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
@ -2043,7 +2049,7 @@
/^\x{1E900}\x{104B0}/i,utf
\x{1E900}\x{104B0}
\x{1E922}\x{104D8}
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
@ -2092,6 +2098,8 @@
\x{655}
\x{1D1AA}
#subject -no_jit
/\N{U+}/
/\N{U+}/utf
@ -2192,4 +2200,18 @@
/\p{bidi_control}+\p{L&}/B
/\p{han}/B
/\p{script:han}/B
/\p{sc:han}/B
/\p{script extensions:han}/B
/\p{scx:han}/B
# Test error - invalid script name
/\p{sc:L}/
# End of testinput5

28
testdata/testinput7 vendored
View File

@ -2203,4 +2203,32 @@
# -----------------------------------------------------------------------------
/\p{katakana}/utf
\x{30a1}
\x{3001}
/\p{scx:katakana}/utf
\x{30a1}
\x{3001}
/\p{script extensions:katakana}/utf
\x{30a1}
\x{3001}
/\p{sc:katakana}/utf
\x{30a1}
\= Expect no match
\x{3001}
/\p{script:katakana}/utf
\x{30a1}
\= Expect no match
\x{3001}
/\p{sc:katakana}{3,}/utf
\x{30a1}\x{30fa}\x{32d0}\x{1b122}\x{ff66}\x{3001}ABC
/\p{sc:katakana}{3,}?/utf
\x{30a1}\x{30fa}\x{32d0}\x{1b122}\x{ff66}\x{3001}ABC
# End of testinput7

610
testdata/testoutput4 vendored

File diff suppressed because it is too large Load Diff

55
testdata/testoutput5 vendored
View File

@ -2842,6 +2842,8 @@ No match
# These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE
#subject no_jit
/^[\p{Batak}]/utf
\x{1bc0}
0: \x{1bc0}
@ -2871,6 +2873,8 @@ No match
\x{85d}
No match
#subject -no_jit
/(\X*)(.)/s,utf
A\x{300}
0: A
@ -4599,6 +4603,8 @@ No match
# doesn't recognize all these scripts. In time these three tests can be moved
# to test 4.
#subject no_jit
/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
(\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
(\p{Zanabazar_Square}+)/x,utf
@ -4620,7 +4626,7 @@ No match
0: \x{1e900}\x{104b0}
\x{1E922}\x{104D8}
0: \x{1e922}\x{104d8}
/^(?:(\X)(?C))+$/utf
\x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where
Callout 0: last capture = 1
@ -4755,6 +4761,8 @@ No match
\x{1D1AA}
0: \x{1d1aa}
#subject -no_jit
/\N{U+}/
Failed: error 193 at offset 2: \N{U+dddd} is supported only in Unicode (UTF) mode
@ -4967,4 +4975,49 @@ Subject length lower bound = 3
End
------------------------------------------------------------------
/\p{han}/B
------------------------------------------------------------------
Bra
prop Han
Ket
End
------------------------------------------------------------------
/\p{script:han}/B
------------------------------------------------------------------
Bra
prop script:Han
Ket
End
------------------------------------------------------------------
/\p{sc:han}/B
------------------------------------------------------------------
Bra
prop script:Han
Ket
End
------------------------------------------------------------------
/\p{script extensions:han}/B
------------------------------------------------------------------
Bra
prop Han
Ket
End
------------------------------------------------------------------
/\p{scx:han}/B
------------------------------------------------------------------
Bra
prop Han
Ket
End
------------------------------------------------------------------
# Test error - invalid script name
/\p{sc:L}/
Failed: error 147 at offset 8: unknown property after \P or \p
# End of testinput5

42
testdata/testoutput7 vendored
View File

@ -3714,4 +3714,46 @@ No match
# -----------------------------------------------------------------------------
/\p{katakana}/utf
\x{30a1}
0: \x{30a1}
\x{3001}
0: \x{3001}
/\p{scx:katakana}/utf
\x{30a1}
0: \x{30a1}
\x{3001}
0: \x{3001}
/\p{script extensions:katakana}/utf
\x{30a1}
0: \x{30a1}
\x{3001}
0: \x{3001}
/\p{sc:katakana}/utf
\x{30a1}
0: \x{30a1}
\= Expect no match
\x{3001}
No match
/\p{script:katakana}/utf
\x{30a1}
0: \x{30a1}
\= Expect no match
\x{3001}
No match
/\p{sc:katakana}{3,}/utf
\x{30a1}\x{30fa}\x{32d0}\x{1b122}\x{ff66}\x{3001}ABC
0: \x{30a1}\x{30fa}\x{32d0}\x{1b122}\x{ff66}
/\p{sc:katakana}{3,}?/utf
\x{30a1}\x{30fa}\x{32d0}\x{1b122}\x{ff66}\x{3001}ABC
0: \x{30a1}\x{30fa}\x{32d0}\x{1b122}\x{ff66}
1: \x{30a1}\x{30fa}\x{32d0}\x{1b122}
2: \x{30a1}\x{30fa}\x{32d0}
# End of testinput7