Commit Graph

19 Commits

Author SHA1 Message Date
David Corbett ac3f859a30 Demote unregistered vendor-specific language tags 2020-09-09 17:50:59 -04:00
David Corbett 91fe20f0f5 Disambiguate OT tags when primary tag is not first 2020-09-08 09:20:00 -04:00
Ebrahim Byagowi ad87155fd0 minor, use py3's open(encoding=) 2020-05-29 00:11:19 +04:30
Ebrahim Byagowi 7554f618ec minor, use sys.exit print shorthand 2020-05-28 23:34:37 +04:30
Ebrahim Byagowi 08f1d95a50 minor, move scripts manuals to __doc__ 2020-05-28 15:13:12 +04:30
David Corbett 7a961692e9 Update IANA Language Subtag Registry to 2020-05-12 2020-05-14 10:34:42 -04:00
David Corbett fd748fac41 Update to Unicode 13.0.0 2020-04-29 17:17:03 -04:00
Ebrahim Byagowi e17fd0d91c [tools] More on py3 compatibility 2020-02-24 00:10:11 +03:30
Ebrahim Byagowi 8c652f72fc Minor, switch to https links where possible 2020-02-19 16:32:44 +03:30
Ebrahim Byagowi bbcbcafc25 [tool] Minor, move input files link 2020-02-19 16:21:47 +03:30
Ebrahim Byagowi 8d19907704 Remove python2 support from tests/utils scripts 2020-02-19 16:17:45 +03:30
Evgeniy Reizner 4dc87365d7 Add links to files used by python scripts.
Closes #2150
2020-02-09 20:52:49 +03:30
David Corbett 6745a600bf Comment out ot_languages where fallback suffices 2019-04-17 10:28:59 -04:00
David Corbett 1ce11b4437 Reduce LangTag from 3 language system tags to 1 2019-04-16 11:41:01 -04:00
David Corbett bca7a16938 Update language system tag registry to OT 1.8.3 2018-10-11 13:54:28 -04:00
David Corbett 3f8877473f Switch on the first char of a complex language tag
This results in a tenfold speed-up for the common case of tags that are
not complex, in the sense of `hb_ot_tags_from_complex_language`.
2018-10-11 13:54:28 -04:00
David Corbett a754d44195 Map Quechua languages to closest ones with tags
OpenType only officially maps four ISO 639 codes to Quechua languages,
but prior versions of HarfBuzz also mapped qu to 'QUZ '. Because qu is a
macrolanguage, the mapping now applies to all individual Quechua
languages. OpenType calls 'QUZ ' "Quechua", but it really corresponds to
Cusco Quechua, so the individual Quechua languages should not all
necessarily be mapped to it.
2018-10-11 13:54:28 -04:00
David Corbett 7c7cb2a989 Match extlang subtags
If the second subtag of a BCP 47 tag is three letters long, it denotes
an extended language. The tag converter ignores the language subtag and
uses the extended language instead.

There are some grandfathered exceptions, which are handled earlier.
2018-10-11 13:54:28 -04:00
David Corbett 2f1f961cc0 Autogenerate the BCP 47 to OpenType mappings
The new script, gen-tag-table.py, generates `ot_languages` automatically
from the [OpenType language system tag registry][ot] and the [IANA
Language Subtag Registry][bcp47] with some manual modifications. If an
OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's
individual languages are mapped to the same OpenType tag, except for
individual languages with their own OpenType mappings. Deprecated
BCP 47 tags are canonicalized.

[ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
[bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

Some OpenType tags correspond to multiple ISO 639 codes. The mapping
from ISO 639 codes lists OpenType tags in priority order, such that more
specific or more likely tags appear first.

Some OpenType tags have no corresponding ISO 639 code in the registry so
their mappings use BCP 47 subtags besides the language. For example, any
BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH'
is mapped back to und-fonipa.

Other OpenType tags have no corresponding ISO 639 code because it is not
clear what they are for. HarfBuzz just ignores these tags.

One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means
zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei
with the script tag 'hani', implying that it is not a romanization
scheme after all. It would be simple enough to add this mapping to
gen-tag-table.py once a definitive mapping is determined.

The manual modifications are mainly either obvious mappings that the
OpenType registry omits or mappings for compatibility with previous
versions of HarfBuzz. Some of the old mappings were discarded, though,
for homophonous language names. For example, OpenType maps 'KUI ' to
kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd
and kxu both happen to be called "Kui".

gen-tag-table.py also generates a function to convert multi-subtag tags
like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh`
and the hard-coded list of special cases in `hb_ot_tags_from_language`.
It also generates a function to convert OpenType tags to BCP 47,
replacing the hard-coded list of special cases in
`hb_ot_tag_to_language`.
2018-10-11 13:54:28 -04:00