OpenType only officially maps four ISO 639 codes to Quechua languages,
but prior versions of HarfBuzz also mapped qu to 'QUZ '. Because qu is a
macrolanguage, the mapping now applies to all individual Quechua
languages. OpenType calls 'QUZ ' "Quechua", but it really corresponds to
Cusco Quechua, so the individual Quechua languages should not all
necessarily be mapped to it.
If the second subtag of a BCP 47 tag is three letters long, it denotes
an extended language. The tag converter ignores the language subtag and
uses the extended language instead.
There are some grandfathered exceptions, which are handled earlier.
The new script, gen-tag-table.py, generates `ot_languages` automatically
from the [OpenType language system tag registry][ot] and the [IANA
Language Subtag Registry][bcp47] with some manual modifications. If an
OpenType tag maps to a BCP 47 macrolanguage, all the macrolanguage's
individual languages are mapped to the same OpenType tag, except for
individual languages with their own OpenType mappings. Deprecated
BCP 47 tags are canonicalized.
[ot]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
[bcp47]: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
Some OpenType tags correspond to multiple ISO 639 codes. The mapping
from ISO 639 codes lists OpenType tags in priority order, such that more
specific or more likely tags appear first.
Some OpenType tags have no corresponding ISO 639 code in the registry so
their mappings use BCP 47 subtags besides the language. For example, any
BCP 47 tag with a fonipa variant subtag is mapped to 'IPPH', and 'IPPH'
is mapped back to und-fonipa.
Other OpenType tags have no corresponding ISO 639 code because it is not
clear what they are for. HarfBuzz just ignores these tags.
One such ignored tag is 'ZHP ' (Chinese Phonetic). It probably means
zh-Latn. However, it is used in Microsoft JhengHei and Microsoft YaHei
with the script tag 'hani', implying that it is not a romanization
scheme after all. It would be simple enough to add this mapping to
gen-tag-table.py once a definitive mapping is determined.
The manual modifications are mainly either obvious mappings that the
OpenType registry omits or mappings for compatibility with previous
versions of HarfBuzz. Some of the old mappings were discarded, though,
for homophonous language names. For example, OpenType maps 'KUI ' to
kxu; previous versions of HarfBuzz also mapped it to kvd, because kvd
and kxu both happen to be called "Kui".
gen-tag-table.py also generates a function to convert multi-subtag tags
like el-polyton and zh-HK to OpenType tags, replacing `ot_languages_zh`
and the hard-coded list of special cases in `hb_ot_tags_from_language`.
It also generates a function to convert OpenType tags to BCP 47,
replacing the hard-coded list of special cases in
`hb_ot_tag_to_language`.
The old hb-ot-tag.cc functions, `hb_ot_tags_from_script` and
`hb_ot_tag_from_language`, are now wrappers around a new function:
`hb_ot_tags`. It converts a script and a language to arrays of script
tags and language tags. This will make it easier to add new script tags
to scripts, like 'dev3'. It also allows for language fallback chains;
nothing produces more than one language yet though.
Where the old functions return the default tags 'DFLT' and 'dflt',
`hb_ot_tags` returns an empty array. The caller is responsible for
using the default tag in that case.
The new function also adds a new private use subtag syntax for script
overrides: "x-hbscabcd" requests a script tag of 'abcd'.
The old hb-ot-layout.cc functions,`hb_ot_layout_table_choose_script` and
`hb_ot_layout_script_find_language` are now wrappers around the new
functions `hb_ot_layout_table_select_script` and
`hb_ot_layout_script_select_language`. They are essentially the same as
the old ones plus a tag count parameter.
Closes#495.
`hb_language_from_string` accepts not only ISO 639 but also BCP 47. Not
all ISO 639 codes are valid BCP 47 tags but the function does not accept
overlong language subtags anyway.
Tested using Kannada MN:
$ HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CCd,C95,CCD
[kn_ka.virama=0+1299|kn_ka.vattu=0+115|_blank=0@-115,0+385]
$ HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CCd,C95,CCD --features=-kern
[kn_ka.virama=0+1799|kn_ka.vattu=0+230|_blank=0+0]
I don't see the GPOS table in the font do the same. ¯\_(ツ)_/¯
Tested that Format0 works with Kannada MN font:
$ make -j5 lib -s && HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2
[kn_ka=0+1000|kn_matra_uu=0@-30,0+1345]
$ make -j5 lib -s && HB_OPTIONS=aat ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2 --features=-kern
[kn_ka=0+1030|kn_matra_uu=0+1375]
Note that GPOS does the same with 'dist' feature, and applies the whole difference to the
same glyph:
$ make -j5 lib -s && ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2
[kn_ka=0+970|kn_matra_uu=0+1375]
$ make -j5 lib -s && ./hb-shape Kannada\ MN.ttc -u 0C95,0CC2 --features=-dist
[kn_ka=0+1030|kn_matra_uu=0+1375]
Makes our FT-backed hb_font_t safe to use from multiple threads. Still,
the underlying FT_Face should NOT be used from other threads by client
or other libraries.
Maybe I add a lock()/unlock() public API ala PangoFT2 and cairo-ft.
Maybe not.
Now that we have get_h_advances() and get_nominal_glyphs() implemented, the
overhead of doing a proper atomic load would be once per run, NOT once per
glyph. So, no need to pre-load the tables to avoid that overhead.
As such, hb_ot_font_set_funcs() has become really cheap. Can *finally* make
it be default font functions on all newly created fonts!
Some more measurable speedup. The recent commits' speedups are as follows:
Testing with Roboto, ****when disabling kern and liga****:
Before:
FT --features=-kern,-liga
user↦ 0m0.521s
OT --features=-liga,-kern
user↦ 0m0.568s
After:
FT --features=-liga,-kern
user↦ 0m0.428s
OT --features=-liga,-kern
user↦ 0m0.470s
So, 17% speedup.
Note that FT callbacks are faster than OT these days since we added an advance
cache to FT. I don't think the difference is enough to justify adding a cache
to OT.
When not disabling kern, the thing is three times slower, so the speedups
are three times less impressive... Still, 5% not bad for a codebase that I
otherwise thought is optimized out.
Note that, because of this and other optimiztions in our main shaper,
disabling kern and liga, the OT shaper is now *faster* than the fallback
shaper. So, that's my recommendation to clients that need the absolute
fastest...
Unused as of now. To be wired up to normalizer, which would remove
overhead and allow hb-ot-font initialization to become a no-op, so
we can enable it by default.
Instead of passing a CFLAG/CXXFLAG to define HB_EXTERN, define it
directly in src/hb.hh as __declspec(dllexport) extern when we are
building HarfBuzz as DLLs on Visual Studio. Define HB_INTERNAL
as nothing without defining HB_NO_VISIBILITY when building HarfBuzz as
DLLs to avoid linker errors on Visual Studio builds.
Also "install" harfbuzz-subset.dll into $(PREFIX)\bin as the
hb-subset utility will depend on that DLL at runtime, when HarfBuzz is
built as DLLs. Since it consists of private APIs that are subject to
change, we do not install its headers nor .lib file.
"The values in the right class table are stored pre-multiplied by the
number of bytes in a single kerning value, and the values in the left
class table are stored pre-multiplied by the number of bytes in one
row. This eliminates needing to multiply the row and column values
together to determine the location of the kerning value. The array can
be indexed by doing the right- and left-hand class mappings, adding the
class values to the address of the array, and fetching the kerning
value to which the new address points."
a01194aaf4 (commitcomment-30772041)
"The tag space of tags consisting of four uppercase letters (A-Z) with no punctuation,
spaces, or numbers, is reserved as a vendor space. Font vendors may use such tags to
identify private features."
Fixes https://github.com/harfbuzz/harfbuzz/issues/1019
New numbers:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707261 out of 707394 tests passed. 133 failed (0.0188014%)
GUJARATI: 366353 out of 366457 tests passed. 104 failed (0.0283799%)
GURMUKHI: 60729 out of 60747 tests passed. 18 failed (0.0296311%)
KANNADA: 951300 out of 951913 tests passed. 613 failed (0.0643966%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42327 out of 42329 tests passed. 2 failed (0.00472489%)
SINHALA: 271596 out of 271847 tests passed. 251 failed (0.0923313%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Devanagari regressed because Uniscribe doesn't enforce the full set.
Tests added with the *-vowel-letters.txt files in tree and Noto fonts.
Based on experimenting with Uniscribe to extract grammar and categories.
Failures down from 44 to 35:
KHMER: 299089 out of 299124 tests passed. 35 failed (0.0117008%)
We still don't enforce the one-matra rule pre-decomposition, but enforce
an order and one-matra-per-position post-decomposition.
https://github.com/harfbuzz/harfbuzz/issues/667
These are only relevant to Sinhala, because they decompose in other
cases. The USE spec categorizes them all as VPst. No idea why we
weren't following that before.
Alternative woul have been to resurrect F_COMBINE that I removed in
70136a78cb
But this does it for now. I'm not sure why check-static-inits.sh didn't
catch this before. Clang -Weverything bot did:
CXX libharfbuzz_la-hb-ot-shape-complex-indic.lo
hb-ot-shape-complex-indic.cc:99:1: warning: declaration requires a global constructor [-Wglobal-constructors]
indic_features[] =
^
1 warning generated.
CXX libharfbuzz_la-hb-ot-shape-complex-khmer.lo
hb-ot-shape-complex-khmer.cc:36:1: warning: declaration requires a global constructor [-Wglobal-constructors]
khmer_features[] =
^
1 warning generated.