Commit Graph

7136 Commits

Author SHA1 Message Date
David Corbett bca7a16938 Update language system tag registry to OT 1.8.3 2018-10-11 13:54:28 -04:00
David Corbett 7f1fbfe2e3 Add hb_ot_tags_to_script_and_language 2018-10-11 13:54:28 -04:00
David Corbett 3f8877473f Switch on the first char of a complex language tag
This results in a tenfold speed-up for the common case of tags that are
not complex, in the sense of `hb_ot_tags_from_complex_language`.
2018-10-11 13:54:28 -04:00
David Corbett a754d44195 Map Quechua languages to closest ones with tags
OpenType only officially maps four ISO 639 codes to Quechua languages,
but prior versions of HarfBuzz also mapped qu to 'QUZ '. Because qu is a
macrolanguage, the mapping now applies to all individual Quechua
languages. OpenType calls 'QUZ ' "Quechua", but it really corresponds to
Cusco Quechua, so the individual Quechua languages should not all
necessarily be mapped to it.
2018-10-11 13:54:28 -04:00
David Corbett 65d01f7755 Test deprecated tag fallback in a font
The font supports the deprecated tag 'DHV ' instead of 'DIV '. dv is
mapped to 'DIV ' and 'DHV ', in that order. The test specifies
`--language=dv`, demonstrating that if a font does not support the first
OpenType tag mapped to a BCP 47 tag, it will fall back to the next tag.
2018-10-11 13:54:28 -04:00
David Corbett 7c7cb2a989 Match extlang subtags
If the second subtag of a BCP 47 tag is three letters long, it denotes
an extended language. The tag converter ignores the language subtag and
uses the extended language instead.

There are some grandfathered exceptions, which are handled earlier.
2018-10-11 13:54:28 -04:00
David Corbett 2f1f961cc0 Autogenerate the BCP 47 to OpenType mappings
The new script, gen-tag-table.py, generates `ot_languages` automatically
from the [OpenType language system tag registry][ot] and the [IANA
Language Subtag Registry][bcp47] with some manual modifications. If an
OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's
individual languages are mapped to the same OpenType tag, except for
individual languages with their own OpenType mappings. Deprecated
BCP 47 tags are canonicalized.

[ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
[bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

Some OpenType tags correspond to multiple ISO 639 codes. The mapping
from ISO 639 codes lists OpenType tags in priority order, such that more
specific or more likely tags appear first.

Some OpenType tags have no corresponding ISO 639 code in the registry so
their mappings use BCP 47 subtags besides the language. For example, any
BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH'
is mapped back to und-fonipa.

Other OpenType tags have no corresponding ISO 639 code because it is not
clear what they are for. HarfBuzz just ignores these tags.

One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means
zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei
with the script tag 'hani', implying that it is not a romanization
scheme after all. It would be simple enough to add this mapping to
gen-tag-table.py once a definitive mapping is determined.

The manual modifications are mainly either obvious mappings that the
OpenType registry omits or mappings for compatibility with previous
versions of HarfBuzz. Some of the old mappings were discarded, though,
for homophonous language names. For example, OpenType maps 'KUI ' to
kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd
and kxu both happen to be called "Kui".

gen-tag-table.py also generates a function to convert multi-subtag tags
like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh`
and the hard-coded list of special cases in `hb_ot_tags_from_language`.
It also generates a function to convert OpenType tags to BCP 47,
replacing the hard-coded list of special cases in
`hb_ot_tag_to_language`.
2018-10-11 13:54:28 -04:00
David Corbett 2c7d4db7af Deprecate obsolete functions
`hb_ot_tags` replaces `hb_ot_tags_from_script` and
`hb_ot_tag_from_language`.

`hb_ot_layout_table_select_script` replaces
`hb_ot_layout_table_choose_script`.

`hb_ot_layout_script_select_language` replaces
`hb_ot_layout_script_find_language`.
2018-10-11 13:54:28 -04:00
David Corbett 91067716f5 Refactor the selection of script and language tags
The old hb-ot-tag.cc functions, `hb_ot_tags_from_script` and
`hb_ot_tag_from_language`, are now wrappers around a new function:
`hb_ot_tags`. It converts a script and a language to arrays of script
tags and language tags. This will make it easier to add new script tags
to scripts, like 'dev3'. It also allows for language fallback chains;
nothing produces more than one language yet though.

Where the old functions return the default tags 'DFLT' and 'dflt',
`hb_ot_tags` returns an empty array. The caller is responsible for
using the default tag in that case.

The new function also adds a new private use subtag syntax for script
overrides: "x-hbscabcd" requests a script tag of 'abcd'.

The old hb-ot-layout.cc functions,`hb_ot_layout_table_choose_script` and
`hb_ot_layout_script_find_language` are now wrappers around the new
functions `hb_ot_layout_table_select_script` and
`hb_ot_layout_script_select_language`. They are essentially the same as
the old ones plus a tag count parameter.

Closes #495.
2018-10-11 13:54:28 -04:00
David Corbett a03f5f4dfb Replace "ISO 639" with "BCP 47"
`hb_language_from_string` accepts not only ISO 639 but also BCP 47. Not
all ISO 639 codes are valid BCP 47 tags but the function does not accept
overlong language subtags anyway.
2018-10-11 13:54:28 -04:00
Behdad Esfahbod 0b9d60e1a1 [aat] Apply kerx if GPOS kern was not applied
Ned tells me this is what Apple does.
2018-10-11 13:26:58 -04:00
Behdad Esfahbod b59a428af0 Minor 2018-10-11 13:24:17 -04:00
Behdad Esfahbod 100e95f48e [trak] Add tests 2018-10-11 11:30:45 -04:00
Behdad Esfahbod 04f72e8990 [trak] Implement extrapolation
This concludes trak, as well as AAT shaping support!
2018-10-11 11:25:07 -04:00
Behdad Esfahbod d6a12dba6d [trak] Fix, and hook up
Works beautifully!  Test coming.
2018-10-11 11:10:06 -04:00
Behdad Esfahbod 3d7dea6dfd [trak] Handle nSizes=0 and 1 2018-10-11 10:32:08 -04:00
Behdad Esfahbod 451f3de521 [trak] Fix counting 2018-10-11 10:30:32 -04:00
Behdad Esfahbod a5be380cae [trak] More 2018-10-11 10:29:02 -04:00
Behdad Esfahbod d06c4a867f [trak] Only adjust around first glyph
Assumes graphemes only have one base glyph.
2018-10-11 10:22:01 -04:00
Behdad Esfahbod 071a2cbcdd [trak] Clean up 2018-10-11 10:18:46 -04:00
Behdad Esfahbod fbbd926dba [kerx] Implement Format4 action_type=1 contour-point-based attachment
Untested.

This concludes kerx table support!
2018-10-11 01:22:29 -04:00
Behdad Esfahbod b6bc0d4ff6 [kerx] Implement Format4 action_type=2 coordinate-based attachment
Untested.
2018-10-11 01:17:57 -04:00
Behdad Esfahbod 1622ba5943 [kerx] Implement Format4 'ankr'-based mark attachment
Tested with Kannada MN:

$ HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0CCD,0C95,0CD6
[kn_ka.vattu=0+230|kn_ai_length_mark=1@326,0+607]
2018-10-11 01:17:33 -04:00
Behdad Esfahbod 7bb4da7d95 [aat] Wire up 'ankr' table to apply context 2018-10-11 00:52:07 -04:00
Behdad Esfahbod 28f0367aab [kerx] Flesh out Format4
Doesn't apply actions yet.
2018-10-11 00:46:12 -04:00
Behdad Esfahbod 947962a287 [ankr] Implement table access 2018-10-10 23:07:03 -04:00
Behdad Esfahbod 7281cb3eeb [ankr] Start fixing 2018-10-10 22:56:52 -04:00
Behdad Esfahbod 34caadc5c7 Ugh. Re-enable accidentally disabled GPOS 2018-10-10 22:17:07 -04:00
Behdad Esfahbod f7c45bc33e [kerx] Allow granularly disabling kerning 2018-10-10 22:15:13 -04:00
Behdad Esfahbod 2b72c4b63d [kerx] Comment 2018-10-10 21:53:14 -04:00
Behdad Esfahbod 9f450f07b0 [kerx] Make Format1 work
Tested using Kannada MN:

$ HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CCd,C95,CCD
[kn_ka.virama=0+1299|kn_ka.vattu=0+115|_blank=0@-115,0+385]

$ HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CCd,C95,CCD --features=-kern
[kn_ka.virama=0+1799|kn_ka.vattu=0+230|_blank=0+0]

I don't see the GPOS table in the font do the same.  ¯\_(ツ)_/¯
2018-10-10 21:46:58 -04:00
Behdad Esfahbod 504cb68fc9 Disable mark advance zeroing as well as mark fallback positioning if doing kerx 2018-10-10 21:29:46 -04:00
Behdad Esfahbod 8496753796 [kerx] Implement Format1
Untested.
2018-10-10 21:18:37 -04:00
Behdad Esfahbod c9165f5450 [kerx] More UnsizedArrayOf<> 2018-10-10 20:43:21 -04:00
Behdad Esfahbod ca54eba484 [kerx] Fix bound-checking error introduced a couple commits past 2018-10-10 20:41:16 -04:00
Behdad Esfahbod 339036dd97 [kerx] Start fleshing out Format1 2018-10-10 20:37:22 -04:00
Behdad Esfahbod ab1f30bd05 [kerx] Implement Format6
Untested.  The only Apple font shipping with this format is San Francisco fonts
that use this for their kerx variation tables, which we don't support.
2018-10-10 20:10:20 -04:00
Behdad Esfahbod c9a2ce9e05 [kerx] Move bounds-checking to subtable length itself 2018-10-10 20:00:44 -04:00
Behdad Esfahbod 22955b23cd [kerx] Start fleshing out Format6 2018-10-10 19:58:20 -04:00
Behdad Esfahbod f6aaad9b4f [kerx] When rejecting variable kerning, also check for tupleCount 2018-10-10 19:20:06 -04:00
Behdad Esfahbod 7ed5366d3c [kerx] No-op
Tested that Format0 works with Kannada MN font:

$ make -j5 lib -s && HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2
[kn_ka=0+1000|kn_matra_uu=0@-30,0+1345]

$ make -j5 lib -s && HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2 --features=-kern
[kn_ka=0+1030|kn_matra_uu=0+1375]

Note that GPOS does the same with 'dist' feature, and applies the whole difference to the
same glyph:

$ make -j5 lib -s && ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2
[kn_ka=0+970|kn_matra_uu=0+1375]

$ make -j5 lib -s && ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2 --features=-dist
[kn_ka=0+1030|kn_matra_uu=0+1375]
2018-10-10 19:12:27 -04:00
Behdad Esfahbod 7fa69e92ca Comment 2018-10-10 19:02:32 -04:00
Behdad Esfahbod 7e6e5bf614 Fix option string matching 2018-10-10 18:59:07 -04:00
Behdad Esfahbod 5d34164d98 [kern/kerx] Fix offset base
Disable kern Format2.

Fix kerx Format2.  Manually tested this with Tamil MN font and it works:

$ HB_OPTIONS=aat ./hb-shape Tamil\ MN.ttc -u 0B94,0B95
[tgv_au=0+3435|tgc_ka=1@-75,0+1517]

 HB_OPTIONS=aat ./hb-shape Tamil\ MN.ttc -u 0B94,0B95 --features=-kern
[tgv_au=0+3510|tgc_ka=1+1592]
2018-10-10 18:23:09 -04:00
Behdad Esfahbod 60f86d32d7 [kerx] Don't loop over kerning subtables if kerning disabled 2018-10-10 18:10:05 -04:00
Behdad Esfahbod 38a7a8a89e Allow HB_OPTIONS=aat to prefer AAT tables over OT
Fixes https://github.com/harfbuzz/harfbuzz/issues/322
2018-10-10 17:44:46 -04:00
Behdad Esfahbod 44f09afd5b [kerx] Skip variation subtables 2018-10-10 17:32:32 -04:00
Behdad Esfahbod 1e8fdd285f Remove HAVE_OT
We never tested compiling without it.  Just kill it.  We always build
our own shaper.
2018-10-10 16:32:35 -04:00
Behdad Esfahbod 7727e73756 [kerx] Actually hook up, and fix crash 2018-10-10 13:24:51 -04:00
Behdad Esfahbod b3390990f5 Add per-subtable set-digests
This speeds up Roboto shaping by ~10%.  I was hoping for more.
Still, good defense against lookups with many subtables.
2018-10-10 12:13:25 -04:00