If we need to apply many many lookups, we can fasten that up by applying
them in batches. For each batch we keep the union of the coverage of
the lookups participating. We can then skip glyph ranges that do NOT
participate in any lookup in the batch. The batch partition is
determined optimally by a mathematical probability model on the glyphs
and a dynamic-program to optimize the partition.
The net effect is 30% speedup on Amiri. the downside is more memory
consuption as each batch will keep an hb_set_t of its coverage.
I'm not yet convinced that the tradeoff is worth pursuing. I'm trying
to find out ways to optimized this more, with less memory overhead.
This work also ignores the number of subtables per lookup. That may
prove to be very important for the performance numbers from here on.
When matching lookups, be smart about default-ignorable characters.
In particular:
Do nothing specific about ZWNJ, but for the other default-ignorables:
If the lookup in question uses the ignorable character in a sequence,
then match it as we used to do. However, if the sequence match will
fail because the default-ignorable blocked it, try skipping the
ignorable character and continue.
The most immediate thing it means is that if Lam-Alef forms a ligature,
then Lam-ZWJ-Alef will do to. Finally!
One exception: when matching for GPOS, or for backtrack/lookahead of
GSUB, we ignore ZWNJ too. That's the right thing to do.
It certainly is possible to build fonts that this feature will result
in undesirable glyphs, but it's hard to think of a real-world case
that that would happen.
This *does* break Indic shaping right now, since Indic Unicode has
specific rules for what ZWJ/ZWNJ mean, and skipping ZWJ is breaking
those rules. That will be fixed in upcoming commits.