When matching lookups, be smart about default-ignorable characters.
In particular:
Do nothing specific about ZWNJ, but for the other default-ignorables:
If the lookup in question uses the ignorable character in a sequence,
then match it as we used to do. However, if the sequence match will
fail because the default-ignorable blocked it, try skipping the
ignorable character and continue.
The most immediate thing it means is that if Lam-Alef forms a ligature,
then Lam-ZWJ-Alef will do to. Finally!
One exception: when matching for GPOS, or for backtrack/lookahead of
GSUB, we ignore ZWNJ too. That's the right thing to do.
It certainly is possible to build fonts that this feature will result
in undesirable glyphs, but it's hard to think of a real-world case
that that would happen.
This *does* break Indic shaping right now, since Indic Unicode has
specific rules for what ZWJ/ZWNJ mean, and skipping ZWJ is breaking
those rules. That will be fixed in upcoming commits.
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.
We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:
- For Indic, no explicit zeroing happens whatsoever, which
is the same as before,
- For Myanmar, zero advance width of glyphs marked as marks
*in GDEF*, and do that *before* applying GPOS. This seems
to be what the new Win8 Myanmar shaper does,
- For everything else, zero advance width of glyphs that are
from General_Category=Mn Unicode characters, and do so
before applying GPOS. This seems to be what Uniscribe does
for Latin at least.
With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'. See preivous
commit.
Before, when matching ligatures, we never skipping over base / liga
glyphs even if that was what the LookupFlags asked for.
Fixed now. We carefully reviewed all instances of this, and tested with
Amiri as well as some Indic scripts, and are confident that this should
NOT break anyone's fonts. It's also how Uniscribe does it, from what
we can tell.
If in a MarkPos table, a base has no anchor for a particular mark class,
return NULL such that the subsequent subtables get a chance at it.
Test case:
hb-shape ./EBGaramond12-Regular.otf ἂ --features="ss20","smcp"
Gives me a good 10% speedup for the Devanagari test case. Less so
for less lookup-intensive tests.
For the Devanagari test case, the false positive rate of the GSUB digest
is 4%.
If there is no GPOS, zero mark advances.
If there *is* GPOS and the shaper requests so, zero mark advances for
attached marks.
Fixes regression with Tibetan, where the font has GPOS, and marks a
glyph as mark where it shouldn't get zero advance.
This commit: a3313e5400 broke MarkMarkPos
when one of the marks itself is a ligature. That regressed 26 Tibetan
tests (up from zero!). Fix that. Tibetan back to zero.
And use it to speed up the hotspot by checking coverage directly in
the main loop, not 10 functions deep in.
Gives me a solid 20% boost with Indic test suite. Less so for less
lookup-intensive scenarios.
Remove the "fast_path" hack from before.
This was broken as a result of 7b84c536c1.
As Khaled reported, MarkMark positioning was broken with glyphs
resulting from a MultipleSubst. Fixed. Test with the ALLAH character
in Amiri.
I couldn't measure significant performance gains out of this; maybe
about 5% (with one million Malayalam strings). Still, not bad.
But reminds me that optimizing this codebase without profiling first
is simply not going to work. Oh well...