If we need to apply many many lookups, we can fasten that up by applying
them in batches. For each batch we keep the union of the coverage of
the lookups participating. We can then skip glyph ranges that do NOT
participate in any lookup in the batch. The batch partition is
determined optimally by a mathematical probability model on the glyphs
and a dynamic-program to optimize the partition.
The net effect is 30% speedup on Amiri. the downside is more memory
consuption as each batch will keep an hb_set_t of its coverage.
I'm not yet convinced that the tradeoff is worth pursuing. I'm trying
to find out ways to optimized this more, with less memory overhead.
This work also ignores the number of subtables per lookup. That may
prove to be very important for the performance numbers from here on.
This is not ideal as we don't like -L/usr/lib in our linker line.
But this is only relevant to environments that don't have pkgconfig
files for ICU...
https://github.com/behdad/harfbuzz/pull/2
We were not initializing the digests properly and as a result they were
being initialized to zero, making digest1 to never do any useful work.
Speeds up Amiri shaping significantly.
See thread "an issue regarding discrepancy between Korean and Unicode
standards" on the mailing list for the rationale. In short: Uniscribe
doesn't, so fonts are designed to work without it.
Testing shows that this is closer to what Uniscribe does.
Reported by Khaled Hosny:
"""
commit 568000274c
...
This commit is causing a regression with Amiri, the string “هَٰذ” with
Uniscribe and HarfBuzz before this commit, gives:
[uni0630.fina=3+965|uni0670.medi=0+600|uni064E=0@-256,0+0|uni0647.init=0+926]
But now it gives:
[uni0630.fina=3+965|uni0670.medi=0+0|uni064E=0@-256,0+0|uni0647.init=0+926]
i.e. uni0670.medi is zeroed though it has a base glyph GDEF class.
"""
The test case is U+0647,U+064E,U+0670,U+0630 with Amiri.