Finally: Fixes https://github.com/harfbuzz/harfbuzz/issues/1356
Test case:
$ ./hb-shape GeezaPro.ttc -u U+0628,U+064A,U+064E,U+0651,U+0629
[u0629.final.tehMarbuta=4+713|u064e_u0651.shaddaFatha=1@0,-200+0|u064a.medial.yeh=1+656|u0628.initial.beh=0+656]
The mark positioning (kern table CrossStream kerning) only works if deleted
glyph (as result of ligation) is still in stream and pushed through the
state machine.
From the issue:
"In this font, the virama,ya first forms a ligature, then decomposes back to
virama,ya. This causes those two to be marked parts of a MultipleSubst
sequence. When attaching the matra, we look for the first of the MultipleSubst
sequence because that's where we attach to (because of eg #740). In this case,
the first glyph in the MultipleSubst sequence is a mark, so we skip it and
attach to the base char before it."
Font in question is Nirmala UI from Windows 10. Test sequence:
U+0926,U+094D,U+092F,U+0941
Fixes https://github.com/harfbuzz/harfbuzz/issues/1020
Not optimized to use sortedness yet. Also start putting in place infra
to faster reject bad data.
A version of Chandas.ttf found on some Chrome bots has 660kb of GPOS,
mostly junk. That is causing 48 million of set->add() calls in
collect_glyphs(), which is insane.
In the upcoming commits, I'll be speeding that up by optimizing
add_sorted_array(), while also reducing work by rejecting out-of-sort
arrays quickly and propagate the rejection.
Part of https://bugs.chromium.org/p/chromium/issues/detail?id=794896
Apparently a base glyph can also become an attached component of a
ligature if the ligature-forming lookup used IgnoreBase. This was
being confused with a non-first component of a MultipleSubst and
hence not matched for mark-attachment. Tweak test to fix.
Fixes https://github.com/behdad/harfbuzz/issues/543
That commit moved the advance adjustment for mark positioning to
be applied immediately, instead of doing late before. This breaks
if mark advances are zeroed late, like in Arabic. Also, easier to
hit it in RTL scripts since a single mark with non-zero advance is
enough to hit the bug, whereas in LTR, at least two marks are needed.
This reopens https://github.com/behdad/harfbuzz/issues/211
The cursive+mark interaction is broken again. To be fixed in a
different way.
Fixes https://github.com/behdad/harfbuzz/issues/211
What happens in that bug is that a mark is attached to base first,
then a second mark is cursive-chained to the first mark. This only
"works" because it's in the Indic shaper where mark advances are
not zeroed.
Before, we didn't allow cursive to run on marks at all. Fix that.
We also where updating mark major offsets at the end of GPOS, such
that changes in advance of base will not change the mark attachment
position. That was superior to the alternative (which is what Uniscribe
does BTW), but made it hard to apply cursive to the mark after it
was positioned. We could track major-direction offset changes and
apply that to cursive in the post process, but that's a much trickier
thing to do than the fix here, which is to immediately apply the
major-direction advance-width offsets... Ie.:
https://github.com/behdad/harfbuzz/issues/211#issuecomment-183194739
If this breaks any fonts, the font should be fixed to do mark attachment
after all the advances are set up first (kerning, etc).
Finally, this, still doesn't make us match Uniscribe, for I explained
in that bug. Looks like Uniscribe applies minor-direction cursive
adjustment immediate as well. We don't, and we like it our way, at
least for now. Eg. the sequence in the test case does this:
- The first subscript attaches with mark-to-base, moving in x only,
- The second subscript attaches with cursive attachment to first subscript
moving in x only,
- A final context rule moves the first subscript up by 104 units.
The way we do, the final shift-up, also shifts up the second subscript
mark because it's cursively-attached. Uniscribe doesn't. We get:
[ttaorya=0+1307|casubscriptorya=0@-242,104+-231|casubscriptnarroworya=0@20,104+507]
while Uniscribe gets:
[ttaorya=0+1307|casubscriptorya=0@-242,104+-211|casubscriptnarroworya=0+487]
note the different y-offset of the last glyph. In our view, after cursive,
things move together, period.
See thread "Issue with cursive attachment" started by Khaled.
Turned out fixing this wasn't as bad as I had assumed. I like the
new code better; we now have a theoretical model of cursive
connections that is easier to reason about.
This makes a lot of code safer. We only try modifying the object in one
place, after making sure it's safe to do so. So, do a const_cast<> in
that one place...
Currently:
- Initializing skippy is very expensive,
- Our lookup accelerator (using set-digests) can be very ineffecite,
As such, we end up many times initializing skippy but then failing
coverage check. Reordering fixes that.
When, later, we fix our accelerator to have truly small false-positive
rate (for example by using the frozen-sets), then we might want to
reorder these checks such that we wouldn't calculate coverage number
if skippy is going to fail.
This shows a 5% speedup with Roboto already.
Roboto has glyphs (like 'F') that have 200 kerning pairs.
Add a handcoded bsearch instead of previous linear search.
This doesn't show much speedup though, apparently we spend the
bulk of the time somewhere before here.