Previously we only synthesized GDEF glyph classes if the glyphClassDef
array in GDEF was null. This worked well enough, and is indeed what
OpenType requires: "If the font does not include a GlyphClassDef table,
the client must define and maintain this information when using the
GSUB and GPOS tables." That sentence does not quite make sense since
one needs Unicode properties as well, but is close enough.
However, looks like Arial Unicode as shipped on WinXP, does have GDEF
glyph class array, but defines no classes for Hebrew. This results
in Hebrew marks not getting their widths zeroed. So, with this change,
we synthesize glyph class for any glyph that is not specified in the
GDEF glyph class table. Since, from our point of view, a glyph not
being listed in that table is a font bug, any unwanted consequence of
this change is a font bug :).
Note that we still don't get the same rendering as Uniscribe, since
Uniscribe seems to do fallback positioning as well, even though the
font does have a GPOS table (which does NOT cover Hebrew!). We are
not going to try to match that though.
Test string for Arial Unicode:
U+05E9,U+05B8,U+05C1,U+05DC
Before: [gid1166=3+991|gid1142=0+737|gid5798=0+1434]
After: [gid1166=3+991|gid1142=0+0|gid5798=0+1434]
Uniscribe: [gid1166=3+991|gid1142=0@348,0+0|gid5798=0+1434]
Note that our new output matches what we were generating until July
2014, because the Hebrew shaper used to zero mark advances based on
Unicode, NOT GDEF. That's 9e834e29e0.
Reported by Greg Douglas.
This makes a lot of code safer. We only try modifying the object in one
place, after making sure it's safe to do so. So, do a const_cast<> in
that one place...
Normally if you want to, say, conditionally prevent a 'pref', you
would use blocking contextual matching. Some designers instead
form the 'pref' form, then undo it in context. To detect that
we now also remember glyphs that went through MultipleSubst.
In the only place that this is used, Uniscribe seems to only care
about the "last" transformation between Ligature and Multiple
substitions. Ie. if you ligate, expand, and ligate again, it
moves the pref, but if you ligate and expand it doesn't. That's
why we clear the MULTIPLIED bit when setting LIGATED.
Micro-test added. Test: U+0D2F,0D4D,0D30 with font from:
[1]
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
Before we were just relying on the compiler inlining them and not
leaving a trace in our public API. Try to fix. Hopefully not
breaking anyone's build.
Previously we only supported recursive sublookups with
ascending indices. We were also not correctly handling
non-1-to-1 recursed lookups.
Fix all that!
Fixes the three tests in test/shaping/tests/context-matching.tests,
which were derived from NotoSansBengali and NotoSansDevanagari
among others.
When matching lookups, be smart about default-ignorable characters.
In particular:
Do nothing specific about ZWNJ, but for the other default-ignorables:
If the lookup in question uses the ignorable character in a sequence,
then match it as we used to do. However, if the sequence match will
fail because the default-ignorable blocked it, try skipping the
ignorable character and continue.
The most immediate thing it means is that if Lam-Alef forms a ligature,
then Lam-ZWJ-Alef will do to. Finally!
One exception: when matching for GPOS, or for backtrack/lookahead of
GSUB, we ignore ZWNJ too. That's the right thing to do.
It certainly is possible to build fonts that this feature will result
in undesirable glyphs, but it's hard to think of a real-world case
that that would happen.
This *does* break Indic shaping right now, since Indic Unicode has
specific rules for what ZWJ/ZWNJ mean, and skipping ZWJ is breaking
those rules. That will be fixed in upcoming commits.