harfbuzz

Commit Graph

Author	SHA1	Message	Date
Behdad Esfahbod	a7960bdfb0	[config] Add HB_NO_LANGUAGE_LONG and enable in TINY profile Disables 3letter language tags and more complex ones. Fixes https://github.com/harfbuzz/harfbuzz/issues/3664	2022-06-20 17:55:28 -06:00
David Corbett	e3e685e5ee	[ot-tags] Fix `min_subtag_len` calculations	2022-05-18 18:30:01 -06:00
Behdad Esfahbod	e24797aeac	[ot-tags] Follow-up to previous commit Part of https://github.com/harfbuzz/harfbuzz/issues/3591	2022-05-18 11:10:10 -06:00
Behdad Esfahbod	f5d619be79	[ot-tags] Further gate the slow complex case, and add more tests Part of https://github.com/harfbuzz/harfbuzz/issues/3591 Still 'zh-trad' is the slowest case. -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_trad 136 ns 136 ns 5107838 BM_hb_ot_tags_from_script_and_language/COMMON ab_abcd 115 ns 115 ns 6103104 BM_hb_ot_tags_from_script_and_language/COMMON ab_abc 25.4 ns 25.3 ns 27674482 BM_hb_ot_tags_from_script_and_language/COMMON abcdef_XY 20.2 ns 20.1 ns 34795719 BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 19.4 ns 19.3 ns 36390401 BM_hb_ot_tags_from_script_and_language/COMMON cxy_CN 33.5 ns 33.4 ns 20998939 BM_hb_ot_tags_from_script_and_language/COMMON exy_CN 25.1 ns 25.0 ns 27705832 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 34.2 ns 34.1 ns 20564356 BM_hb_ot_tags_from_script_and_language/COMMON en_US 15.5 ns 15.5 ns 45032204 BM_hb_ot_tags_from_script_and_language/LATIN en_US 15.9 ns 15.8 ns 44412379 BM_hb_ot_tags_from_script_and_language/COMMON none 4.72 ns 4.71 ns 149101665 BM_hb_ot_tags_from_script_and_language/LATIN none 4.72 ns 4.70 ns 149254498	2022-05-18 11:04:52 -06:00
Behdad Esfahbod	3df8017e9b	[ot-tag] Optimize subtag_matches() more	2022-05-17 17:29:39 -06:00
Behdad Esfahbod	909f00ac6e	[ot-tags] Further speed up language bsearch() Using an integer tag to bsearch, instead of string. Part of: https://github.com/harfbuzz/harfbuzz/issues/3591 Before: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.11 ns 8.08 ns 87067795 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 53.6 ns 53.5 ns 13042418 BM_hb_ot_tags_from_script_and_language/COMMON en_US 24.2 ns 24.1 ns 29052731 BM_hb_ot_tags_from_script_and_language/LATIN en_US 24.4 ns 24.3 ns 28736769 BM_hb_ot_tags_from_script_and_language/COMMON none 4.43 ns 4.41 ns 160370413 BM_hb_ot_tags_from_script_and_language/LATIN none 4.35 ns 4.34 ns 160578191 After: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 7.97 ns 7.95 ns 85208363 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 41.7 ns 41.6 ns 16945817 BM_hb_ot_tags_from_script_and_language/COMMON en_US 16.1 ns 16.0 ns 43613523 BM_hb_ot_tags_from_script_and_language/LATIN en_US 16.5 ns 16.4 ns 42568107 BM_hb_ot_tags_from_script_and_language/COMMON none 4.30 ns 4.29 ns 164055469 BM_hb_ot_tags_from_script_and_language/LATIN none 4.29 ns 4.27 ns 163793591	2022-05-17 15:51:41 -06:00
Behdad Esfahbod	15be0deda0	[ot-tags] Optimize lang_matches() Part of https://github.com/harfbuzz/harfbuzz/issues/3591 Before: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.67 ns 8.64 ns 80324382 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 91.2 ns 90.9 ns 7674131 BM_hb_ot_tags_from_script_and_language/COMMON en_US 41.1 ns 41.0 ns 17174093 BM_hb_ot_tags_from_script_and_language/LATIN en_US 41.3 ns 41.2 ns 17000876 BM_hb_ot_tags_from_script_and_language/COMMON none 4.56 ns 4.55 ns 153914130 BM_hb_ot_tags_from_script_and_language/LATIN none 4.53 ns 4.52 ns 153830303 After: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.24 ns 8.21 ns 84078465 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 77.5 ns 77.2 ns 9059230 BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.8 ns 38.7 ns 17790692 BM_hb_ot_tags_from_script_and_language/LATIN en_US 37.6 ns 37.5 ns 18648293 BM_hb_ot_tags_from_script_and_language/COMMON none 4.50 ns 4.49 ns 155573267 BM_hb_ot_tags_from_script_and_language/LATIN none 4.49 ns 4.47 ns 156456653	2022-05-17 14:57:08 -06:00
Behdad Esfahbod	dd3c858f84	[ot-tags] Speed up hb_ot_tags_from_language() Part of https://github.com/harfbuzz/harfbuzz/issues/3591 "After that, bulk of the time I suppose is spent in binary-searching the language table. I suggest we split the language table in 2-letter and 3-letter tags, to speed-up the vast majority of cases that are 2-letter." benchmark-ot, before: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 111 ns 6286271 BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.6 ns 60.4 ns 11671176 BM_hb_ot_tags_from_script_and_language/LATIN en_US 61.3 ns 61.1 ns 11442645 BM_hb_ot_tags_from_script_and_language/COMMON none 4.75 ns 4.74 ns 146997235 BM_hb_ot_tags_from_script_and_language/LATIN none 4.65 ns 4.64 ns 150938747 After: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 89.5 ns 89.2 ns 7747649 BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.5 ns 38.4 ns 18199432 BM_hb_ot_tags_from_script_and_language/LATIN en_US 39.0 ns 38.9 ns 18049238 BM_hb_ot_tags_from_script_and_language/COMMON none 4.53 ns 4.52 ns 154895110 BM_hb_ot_tags_from_script_and_language/LATIN none 4.54 ns 4.52 ns 154762105	2022-05-17 14:28:28 -06:00
Behdad Esfahbod	9baccb9860	[ot-tags] Speed up hb_ot_tags_from_complex_language() Part of https://github.com/harfbuzz/harfbuzz/issues/3591 2. All the subtag_matches outside the switch match long strings (>= 6 or so). As such, check the tag for such length before going into any of them. benchmark-ot, before: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 172 ns 171 ns 4083155 BM_hb_ot_tags_from_script_and_language/COMMON en_US 120 ns 119 ns 5849947 BM_hb_ot_tags_from_script_and_language/LATIN en_US 113 ns 112 ns 5840326 BM_hb_ot_tags_from_script_and_language/COMMON none 4.66 ns 4.64 ns 151396224 BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 149019593 After: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 112 ns 6357763 BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.5 ns 60.3 ns 11475091 BM_hb_ot_tags_from_script_and_language/LATIN en_US 54.9 ns 54.8 ns 12575690 BM_hb_ot_tags_from_script_and_language/COMMON none 4.61 ns 4.59 ns 152388450 BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 151497600	2022-05-17 13:34:34 -06:00
David Corbett	ae9afd9772	Let BCP 47 tag "mo" fall back to OT tag 'ROM '	2022-01-30 14:32:59 -05:00
David Corbett	a184c5f851	Don’t always inherit from macrolanguages If an OpenType tag maps to a BCP 47 macrolanguage, that is presumably to support the use of the macrolanguage as a vague stand-in for one of its individual languages. For example, "ar" and "zh" are often used for "arb" and "cmn". When the OpenType tag maps to a macrolanguage and some but not all of its individual languages, that indicates that the OpenType tag only corresponds to the listed individual languages (which may be referred to using the macrolanguage subtag) but not the missing individual languages. In particular, INUK (Nunavik Inuktitut) is mapped to "ike" (Eastern Canadian Inuktitut) and "iu" (Inuktitut) but not to "ikt" (Inuinnaqtun), so "ikt" should not inherit the INUK mapping from its macrolanguage "iu".	2022-01-30 13:28:23 -05:00
David Corbett	0b1bf89cc2	Replace “[family]” with “[collection]” Not all language collections are language families.	2022-01-29 10:15:23 -05:00
David Corbett	0e31595e0d	Infer tag mappings for unregistered macrolanguages Every macrolanguage not mentioned in the OT language system tag registry is mapped to every tag of its individual languages, if those have registered tags.	2022-01-29 10:15:23 -05:00
David Corbett	2404617a60	Update language system tag registry to OT 1.9	2021-12-09 07:18:57 -07:00
David Corbett	d18915f920	Reformat gen-tag-table.py	2021-03-28 10:21:46 -07:00
David Corbett	e19de65eae	Update hb-ot-tag-table.hh (#2890 )	2021-03-08 10:12:47 -08:00
David Corbett	b2e7bb2a7c	Don’t map BCP 47 to coincidentally similar OT tag	2020-11-22 19:35:47 -08:00
David Corbett	e1df2c5277	Map ISO 639 code qul to language system tag 'QUH '	2020-11-22 11:52:23 -07:00
David Corbett	17da41bd06	Update language system tag registry to OT 1.8.4	2020-11-18 11:13:35 -08:00
David Corbett	27170e058d	Fix names for language tag in gen-tag-table.py A BCP 47 language tag with both a script subtag and a region subtag would be printed as a human-readable name in hb-ot-tag-table.hh as if it only had its language subtag.	2020-11-16 10:59:07 -08:00
David Corbett	dec52006d9	Map BCP 47 tags to all macrolanguages The general rule is that if a BCP 47 macrolanguage maps to an OpenType language system tag, all its individual languages map to it too. Previously, a tag like "prs" (Dari) would not map to the language system tag ('FAR ') of its macrolanguage ("fa") because "prs" already has its own language system tag ('DRI '). That exception has been removed: now "prs" maps to 'DRI ' and falls back to 'FAR '.	2020-10-11 11:38:40 -07:00
David Corbett	1d53268dfe	Fix two-way mapping of "man" and 'MNK '	2020-10-11 11:38:40 -07:00
David Corbett	ab38cf6746	Map hy-arevmda to 'HYE ' instead of HYE0	2020-10-11 11:38:40 -07:00
David Corbett	916c5a9007	Consistently emit BCP 47 subtag scope suffixes	2020-10-11 11:38:40 -07:00
David Corbett	ac3f859a30	Demote unregistered vendor-specific language tags	2020-09-09 17:50:59 -04:00
David Corbett	91fe20f0f5	Disambiguate OT tags when primary tag is not first	2020-09-08 09:20:00 -04:00
Ebrahim Byagowi	ad87155fd0	minor, use py3's open(encoding=)	2020-05-29 00:11:19 +04:30
Ebrahim Byagowi	7554f618ec	minor, use sys.exit print shorthand	2020-05-28 23:34:37 +04:30
Ebrahim Byagowi	08f1d95a50	minor, move scripts manuals to __doc__	2020-05-28 15:13:12 +04:30
David Corbett	7a961692e9	Update IANA Language Subtag Registry to 2020-05-12	2020-05-14 10:34:42 -04:00
David Corbett	fd748fac41	Update to Unicode 13.0.0	2020-04-29 17:17:03 -04:00
Ebrahim Byagowi	e17fd0d91c	[tools] More on py3 compatibility	2020-02-24 00:10:11 +03:30
Ebrahim Byagowi	8c652f72fc	Minor, switch to https links where possible	2020-02-19 16:32:44 +03:30
Ebrahim Byagowi	bbcbcafc25	[tool] Minor, move input files link	2020-02-19 16:21:47 +03:30
Ebrahim Byagowi	8d19907704	Remove python2 support from tests/utils scripts	2020-02-19 16:17:45 +03:30
Evgeniy Reizner	4dc87365d7	Add links to files used by python scripts. Closes #2150	2020-02-09 20:52:49 +03:30
David Corbett	6745a600bf	Comment out ot_languages where fallback suffices	2019-04-17 10:28:59 -04:00
David Corbett	1ce11b4437	Reduce LangTag from 3 language system tags to 1	2019-04-16 11:41:01 -04:00
David Corbett	bca7a16938	Update language system tag registry to OT 1.8.3	2018-10-11 13:54:28 -04:00
David Corbett	3f8877473f	Switch on the first char of a complex language tag This results in a tenfold speed-up for the common case of tags that are not complex, in the sense of `hb_ot_tags_from_complex_language`.	2018-10-11 13:54:28 -04:00
David Corbett	a754d44195	Map Quechua languages to closest ones with tags OpenType only officially maps four ISO 639 codes to Quechua languages, but prior versions of HarfBuzz also mapped qu to 'QUZ '. Because qu is a macrolanguage, the mapping now applies to all individual Quechua languages. OpenType calls 'QUZ ' "Quechua", but it really corresponds to Cusco Quechua, so the individual Quechua languages should not all necessarily be mapped to it.	2018-10-11 13:54:28 -04:00
David Corbett	7c7cb2a989	Match extlang subtags If the second subtag of a BCP 47 tag is three letters long, it denotes an extended language. The tag converter ignores the language subtag and uses the extended language instead. There are some grandfathered exceptions, which are handled earlier.	2018-10-11 13:54:28 -04:00
David Corbett	2f1f961cc0	Autogenerate the BCP 47 to OpenType mappings The new script, gen-tag-table.py, generates `ot_languages` automatically from the [OpenType language system tag registry][ot] and the [IANA Language Subtag Registry][bcp47] with some manual modifications. If an OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's individual languages are mapped to the same OpenType tag, except for individual languages with their own OpenType mappings. Deprecated BCP 47 tags are canonicalized. [ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags [bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry Some OpenType tags correspond to multiple ISO 639 codes. The mapping from ISO 639 codes lists OpenType tags in priority order, such that more specific or more likely tags appear first. Some OpenType tags have no corresponding ISO 639 code in the registry so their mappings use BCP 47 subtags besides the language. For example, any BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH' is mapped back to und-fonipa. Other OpenType tags have no corresponding ISO 639 code because it is not clear what they are for. HarfBuzz just ignores these tags. One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei with the script tag 'hani', implying that it is not a romanization scheme after all. It would be simple enough to add this mapping to gen-tag-table.py once a definitive mapping is determined. The manual modifications are mainly either obvious mappings that the OpenType registry omits or mappings for compatibility with previous versions of HarfBuzz. Some of the old mappings were discarded, though, for homophonous language names. For example, OpenType maps 'KUI ' to kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd and kxu both happen to be called "Kui". gen-tag-table.py also generates a function to convert multi-subtag tags like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh` and the hard-coded list of special cases in `hb_ot_tags_from_language`. It also generates a function to convert OpenType tags to BCP 47, replacing the hard-coded list of special cases in `hb_ot_tag_to_language`.	2018-10-11 13:54:28 -04:00

43 Commits