Commit Graph

18 Commits

Author SHA1 Message Date
David Corbett 28d091d045 Parse Indic3 tags 2018-10-11 17:44:13 -04:00
Behdad Esfahbod 57b05210b1 [test] Fix use of deprecated symbols 2018-10-11 15:03:21 -04:00
David Corbett 7f1fbfe2e3 Add hb_ot_tags_to_script_and_language 2018-10-11 13:54:28 -04:00
David Corbett 7c7cb2a989 Match extlang subtags
If the second subtag of a BCP 47 tag is three letters long, it denotes
an extended language. The tag converter ignores the language subtag and
uses the extended language instead.

There are some grandfathered exceptions, which are handled earlier.
2018-10-11 13:54:28 -04:00
David Corbett 2f1f961cc0 Autogenerate the BCP 47 to OpenType mappings
The new script, gen-tag-table.py, generates `ot_languages` automatically
from the [OpenType language system tag registry][ot] and the [IANA
Language Subtag Registry][bcp47] with some manual modifications. If an
OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's
individual languages are mapped to the same OpenType tag, except for
individual languages with their own OpenType mappings. Deprecated
BCP 47 tags are canonicalized.

[ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
[bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

Some OpenType tags correspond to multiple ISO 639 codes. The mapping
from ISO 639 codes lists OpenType tags in priority order, such that more
specific or more likely tags appear first.

Some OpenType tags have no corresponding ISO 639 code in the registry so
their mappings use BCP 47 subtags besides the language. For example, any
BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH'
is mapped back to und-fonipa.

Other OpenType tags have no corresponding ISO 639 code because it is not
clear what they are for. HarfBuzz just ignores these tags.

One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means
zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei
with the script tag 'hani', implying that it is not a romanization
scheme after all. It would be simple enough to add this mapping to
gen-tag-table.py once a definitive mapping is determined.

The manual modifications are mainly either obvious mappings that the
OpenType registry omits or mappings for compatibility with previous
versions of HarfBuzz. Some of the old mappings were discarded, though,
for homophonous language names. For example, OpenType maps 'KUI ' to
kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd
and kxu both happen to be called "Kui".

gen-tag-table.py also generates a function to convert multi-subtag tags
like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh`
and the hard-coded list of special cases in `hb_ot_tags_from_language`.
It also generates a function to convert OpenType tags to BCP 47,
replacing the hard-coded list of special cases in
`hb_ot_tag_to_language`.
2018-10-11 13:54:28 -04:00
David Corbett 91067716f5 Refactor the selection of script and language tags
The old hb-ot-tag.cc functions, `hb_ot_tags_from_script` and
`hb_ot_tag_from_language`, are now wrappers around a new function:
`hb_ot_tags`. It converts a script and a language to arrays of script
tags and language tags. This will make it easier to add new script tags
to scripts, like 'dev3'. It also allows for language fallback chains;
nothing produces more than one language yet though.

Where the old functions return the default tags 'DFLT' and 'dflt',
`hb_ot_tags` returns an empty array. The caller is responsible for
using the default tag in that case.

The new function also adds a new private use subtag syntax for script
overrides: "x-hbscabcd" requests a script tag of 'abcd'.

The old hb-ot-layout.cc functions,`hb_ot_layout_table_choose_script` and
`hb_ot_layout_script_find_language` are now wrappers around the new
functions `hb_ot_layout_table_select_script` and
`hb_ot_layout_script_select_language`. They are essentially the same as
the old ones plus a tag count parameter.

Closes #495.
2018-10-11 13:54:28 -04:00
Ebrahim Byagowi f24b0b9728 Update the links and revive the dead ones 2018-04-12 13:44:32 +04:30
Behdad Esfahbod 6c1848b1e3 Misc warning fixes 2018-02-10 15:52:35 -06:00
Behdad Esfahbod f70100417c [test] Minor 2018-02-07 13:58:23 -05:00
Sascha Brawer 1337428e4f Update language tags to OpenType 1.8.1 (#403)
Resolves https://github.com/behdad/harfbuzz/issues/324
2017-01-18 04:51:02 -08:00
Sascha Brawer e7ecbba2cc Support Americanist Phonetic Notation
OpenType language system tag: `APPH`
https://www.microsoft.com/typography/otspec/languagetags.htm

IETF BCP47 variant tag: `fonnapa`
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
2016-08-18 14:13:26 +02:00
Behdad Esfahbod 18c19dd34d Fix build 2016-08-09 13:03:14 -07:00
Sascha Brawer f2ad935e19 Handle language tags that indicate phonetic IPA transcription
The BCP-47 registry defines a variant subtag "fonipa" that can be used
in combination with arbitrary other language tags. For example,
"rm-CH-fonipa-sursilv" indicates the Sursilvan dialect of Romansh
as used in Switzerland, transcribed used the International Phonetic
Alphabet.

http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
2015-09-29 14:32:06 +01:00
Behdad Esfahbod 6334495ac1 Use zh-Hans / zh-Hant when converting OT language tag to hb_language_t 2014-07-10 19:22:07 -04:00
Behdad Esfahbod f381e320df Fix lang matching logic
Previous code was broken logically, but harmless.
2014-07-10 19:20:35 -04:00
Behdad Esfahbod ee5350d667 Accept BCP 47 zh-Hans / zh-Hant language tags 2014-07-10 19:18:56 -04:00
Behdad Esfahbod de796a6fb9 Add "new" Myanmar OT Script tag
Windows 8 added support for Myanmar shaping using the "mym2" script tag,
even though Windows never supported the old "mymr" tag.
2012-11-12 17:27:51 -08:00
Behdad Esfahbod 4d6dafd47f Rename test/ to test/api/ 2012-01-19 14:52:02 -05:00