harfbuzz/src/gen-emoji-table.py

#!/usr/bin/env python3

"""usage: ./gen-emoji-table.py emoji-data.txt emoji-test.txt

Input file:
* https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt
* https://www.unicode.org/Public/emoji/latest/emoji-test.txt
"""

import sys
from collections import OrderedDict
import packTab

if len (sys.argv) != 3:
	sys.exit (__doc__)

f = open(sys.argv[1])
header = [f.readline () for _ in range(10)]

ranges = OrderedDict()
for line in f.readlines():
	line = line.strip()
	if not line or line[0] == '#':
		continue
	rang, typ = [s.strip() for s in line.split('#')[0].split(';')[:2]]

	rang = [int(s, 16) for s in rang.split('..')]
	if len(rang) > 1:
		start, end = rang
	else:
		start = end = rang[0]

	if typ not in ranges:
		ranges[typ] = []
	if ranges[typ] and ranges[typ][-1][1] == start - 1:
		ranges[typ][-1] = (ranges[typ][-1][0], end)
	else:
		ranges[typ].append((start, end))


print ("/* == Start of generated table == */")
print ("/*")
print (" * The following tables are generated by running:")
print (" *")
print (" *   ./gen-emoji-table.py emoji-data.txt")
print (" *")
print (" * on file with this header:")
print (" *")
for l in header:
	print (" * %s" % (l.strip()))
print (" */")
print ()
print ("#ifndef HB_UNICODE_EMOJI_TABLE_HH")
print ("#define HB_UNICODE_EMOJI_TABLE_HH")
print ()
print ('#include "hb-unicode.hh"')
print ()

for typ, s in ranges.items():
	if typ != "Extended_Pictographic": continue

	arr = dict()
	for start,end in s:
		for i in range(start, end + 1):
			arr[i] = 1

	sol = packTab.pack_table(arr, 0, compression=9)
	code = packTab.Code('_hb_emoji')
	sol.genCode(code, 'is_'+typ)
	code.print_c(linkage='static inline')
	print()

print ()
print ("#endif /* HB_UNICODE_EMOJI_TABLE_HH */")
print ()
print ("/* == End of generated table == */")


# Generate test file.
sequences = []
with open(sys.argv[2]) as f:
    for line in f.readlines():
        if "#" in line:
            line = line[:line.index("#")]
        if ";" in line:
            line = line[:line.index(";")]
        line = line.strip()
        line = line.split(" ")
        if len(line) < 2:
            continue
        sequences.append(line)

with open("../test/shape/data/in-house/tests/emoji-clusters.tests", "w") as f:
    for sequence in sequences:
        f.write("../fonts/AdobeBlank2.ttf;--no-glyph-names --no-positions --font-funcs=ot")
        f.write(";" + ",".join(sequence))
        f.write(";[" + "|".join("1=0" for c in sequence) + "]\n")
Remove python2 support from tests/utils scripts 2020-02-19 12:26:55 +01:00			`#!/usr/bin/env python3`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00
[test] Add generated tests for emoji clusters Fixes https://github.com/harfbuzz/harfbuzz/issues/3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations). 2021-07-29 01:12:46 +02:00			`"""usage: ./gen-emoji-table.py emoji-data.txt emoji-test.txt`
minor, move scripts manuals to __doc__ 2020-05-28 12:31:15 +02:00
			`Input file:`
			`* https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt`
[test] Add generated tests for emoji clusters Fixes https://github.com/harfbuzz/harfbuzz/issues/3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations). 2021-07-29 01:12:46 +02:00			`* https://www.unicode.org/Public/emoji/latest/emoji-test.txt`
minor, move scripts manuals to __doc__ 2020-05-28 12:31:15 +02:00			`"""`

[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00			`import sys`
			`from collections import OrderedDict`
[emoji] Port generator to packtab 2019-06-26 23:49:15 +02:00			`import packTab`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00
[test] Add generated tests for emoji clusters Fixes https://github.com/harfbuzz/harfbuzz/issues/3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations). 2021-07-29 01:12:46 +02:00			`if len (sys.argv) != 3:`
minor, use sys.exit print shorthand 2020-05-28 20:21:29 +02:00			`sys.exit (__doc__)`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00
			`f = open(sys.argv[1])`
			`header = [f.readline () for _ in range(10)]`

Shrink the emoji table by merging adjacent ranges 2018-11-20 21:41:45 +01:00			`ranges = OrderedDict()`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00			`for line in f.readlines():`
			`line = line.strip()`
			`if not line or line[0] == '#':`
			`continue`
			`rang, typ = [s.strip() for s in line.split('#')[0].split(';')[:2]]`

			`rang = [int(s, 16) for s in rang.split('..')]`
			`if len(rang) > 1:`
			`start, end = rang`
			`else:`
			`start = end = rang[0]`

Shrink the emoji table by merging adjacent ranges 2018-11-20 21:41:45 +01:00			`if typ not in ranges:`
			`ranges[typ] = []`
			`if ranges[typ] and ranges[typ][-1][1] == start - 1:`
			`ranges[typ][-1] = (ranges[typ][-1][0], end)`
			`else:`
			`ranges[typ].append((start, end))`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00


			`print ("/* == Start of generated table == */")`
			`print ("/*")`
			`print (" * The following tables are generated by running:")`
			`print (" *")`
			`print (" * ./gen-emoji-table.py emoji-data.txt")`
			`print (" *")`
			`print (" * on file with this header:")`
			`print (" *")`
			`for l in header:`
			`print (" * %s" % (l.strip()))`
			`print (" */")`
			`print ()`
			`print ("#ifndef HB_UNICODE_EMOJI_TABLE_HH")`
			`print ("#define HB_UNICODE_EMOJI_TABLE_HH")`
			`print ()`
			`print ('#include "hb-unicode.hh"')`
			`print ()`

style fix for pylint complain 2019-06-28 20:23:51 +02:00			`for typ, s in ranges.items():`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00			`if typ != "Extended_Pictographic": continue`
[emoji] Port generator to packtab 2019-06-26 23:49:15 +02:00
style fix for pylint complain 2019-06-28 20:23:51 +02:00			`arr = dict()`
			`for start,end in s:`
[emoji] Fix emoji table generation Previously, the last of each range having Extended_Pictograph property was not processed as so. Ouch! Test: $ echo x > null; hb-shape null -u U+1f43b,U+200d,U+2744,U+fe0f Before: [gid0=0+1000\|gid0=2+1000] After: [gid0=0+1000\|gid0=0+1000] Caught by https://github.com/harfbuzz/harfbuzz/issues/3017 2021-06-09 23:10:52 +02:00			`for i in range(start, end + 1):`
style fix for pylint complain 2019-06-28 20:23:51 +02:00			`arr[i] = 1`
[emoji] Port generator to packtab 2019-06-26 23:49:15 +02:00
[emoji] Bump compression level 2022-07-30 08:09:33 +02:00			`sol = packTab.pack_table(arr, 0, compression=9)`
style fix for pylint complain 2019-06-28 20:23:51 +02:00			`code = packTab.Code('_hb_emoji')`
			`sol.genCode(code, 'is_'+typ)`
			`code.print_c(linkage='static inline')`
			`print()`
[emoji] Add emoji Extended_Pictographic table and function Part of https://github.com/harfbuzz/harfbuzz/issues/1159 . 2018-10-03 17:46:48 +02:00
			`print ()`
			`print ("#endif /* HB_UNICODE_EMOJI_TABLE_HH */")`
			`print ()`
			`print ("/* == End of generated table == */")`
[test] Add generated tests for emoji clusters Fixes https://github.com/harfbuzz/harfbuzz/issues/3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations). 2021-07-29 01:12:46 +02:00

			`# Generate test file.`
			`sequences = []`
			`with open(sys.argv[2]) as f:`
			`for line in f.readlines():`
			`if "#" in line:`
			`line = line[:line.index("#")]`
			`if ";" in line:`
			`line = line[:line.index(";")]`
			`line = line.strip()`
			`line = line.split(" ")`
[test] Write one sequence per-line https://github.com/harfbuzz/harfbuzz/pull/3087#issuecomment-888691436 2021-07-29 01:52:55 +02:00			`if len(line) < 2:`
[test] Add generated tests for emoji clusters Fixes https://github.com/harfbuzz/harfbuzz/issues/3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations). 2021-07-29 01:12:46 +02:00			`continue`
			`sequences.append(line)`

[emoji] Regenerate test data Fix generator. 2022-07-30 08:08:44 +02:00			`with open("../test/shape/data/in-house/tests/emoji-clusters.tests", "w") as f:`
[test] Write one sequence per-line https://github.com/harfbuzz/harfbuzz/pull/3087#issuecomment-888691436 2021-07-29 01:52:55 +02:00			`for sequence in sequences:`
[emoji] Regenerate test data Fix generator. 2022-07-30 08:08:44 +02:00			`f.write("../fonts/AdobeBlank2.ttf;--no-glyph-names --no-positions --font-funcs=ot")`
			`f.write(";" + ",".join(sequence))`
			`f.write(";[" + "\|".join("1=0" for c in sequence) + "]\n")`