Previously we made CGJ unskippable. Now, if CGJ did NOT prevent
any reordering, allow skipping over it. To make this work we
had to make changes to the Arabic mark reordering algorithm
implementation to renumber moved MCM marks. See comments.
Fixes https://github.com/harfbuzz/harfbuzz/issues/554
Apparently a base glyph can also become an attached component of a
ligature if the ligature-forming lookup used IgnoreBase. This was
being confused with a non-first component of a MultipleSubst and
hence not matched for mark-attachment. Tweak test to fix.
Fixes https://github.com/behdad/harfbuzz/issues/543
New Indic numbers are:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366355 out of 366457 tests passed. 102 failed (0.0278341%)
GURMUKHI: 60729 out of 60747 tests passed. 18 failed (0.0296311%)
KANNADA: 951201 out of 951913 tests passed. 712 failed (0.0747968%)
KHMER: 299071 out of 299124 tests passed. 53 failed (0.0177184%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Before 71c0a1429d GURMUKHI used to be at 15,
because Uniscribe seems to allow this character standalone, but that looks
wrong.
If two marks want to ligate and they belong to different components of the
same ligature glyph, and said ligature glyph is to be ignored according to
mark-filtering rules, then allow.
Example Burmese senquence:
U+1004,U+103A,U+1039,U+101B,U+103D,U+102D
Test font provided by Norbert Lindenberg.
Fixes https://github.com/behdad/harfbuzz/issues/545
I like to have a mode where CONTAINS_NOTDEF and CONTAINS_DOTTEDCIRCLE are not
returned. Abused a value of -1 for that. hb-shape now uses it. Fixes two
of the six tests failing with --verify in test/shaping/run-tests.sh.
Fixes https://github.com/behdad/harfbuzz/issues/294
Also fixes a bunch of other Indic issues. Test results after:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366355 out of 366457 tests passed. 102 failed (0.0278341%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951201 out of 951913 tests passed. 712 failed (0.0747968%)
KHMER: 299071 out of 299124 tests passed. 53 failed (0.0177184%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Before:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
* Shaping tests for Tibetan vowels
* Test-cases for the Dzongkha contractions with multiple vowel-signs added.
* going to be removed
* Extended contraction-test-cases to all test cases in contractions.txt that actually use multiple-vowels (113 cases)
* Guard against underflow when adjusting length
With the fuzz-testcase in mozilla bug 1295299, we end up with a recursed lookup that removes 3 items, when `match_positions[idx]` is 0, which results in (unsigned) `end` wrapping to a huge value.
Making `end` a signed int is probably the simplest route to a fix.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1295299.
* Add testcase for #421.
* [indic] Add support for Grantha marks that may be used in Tamil to the Indic table.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1331339.
Testcase: U+0BA4,U+0BC6,U+1133c,U+0BAA,U+1133c,U+0BC6,U+1133c
* [indic] Add test for Grantha nukta that is allowed in Tamil by ScriptExtensions.txt
The numbers for right-to-left scripts are processed also from right to
left, so the order of applying “numr” and “dnom” features should be
reversed in such case.
Fixes https://github.com/behdad/harfbuzz/issues/395
The so-called Python 2 “narrow” builds support UCS2 only, this is a
workaround to allow unichr to work with any Unicode character in such
builds. This fixes Travis-CI failure as it has narrow Python 2 builds.
Copied from:
https://github.com/behdad/fonttools/blob/master/Lib/fontTools/misc/py23.py
Fixes https://github.com/behdad/harfbuzz/issues/243
With javatext.ttf, the reodering medial Ra gets its advance width
zero'ed in Uniscribe implementation, and the font adds the advance
back. Our Indic shaper does not do that, but USE does. So, route
Javanese through USE. That's what Microsoft does anyway. Test:
U+A9A5,U+A9BA
This also seems to fix the following sequence, and variations thereof:
U+A99F,U+A9C0,U+A9A2,U+A9BF
Apparently some clients have reference-table callbacks that copy the table.
As such, avoid loading 'glyf' table which is only needed if fallback positioning
happens.
That commit moved the advance adjustment for mark positioning to
be applied immediately, instead of doing late before. This breaks
if mark advances are zeroed late, like in Arabic. Also, easier to
hit it in RTL scripts since a single mark with non-zero advance is
enough to hit the bug, whereas in LTR, at least two marks are needed.
This reopens https://github.com/behdad/harfbuzz/issues/211
The cursive+mark interaction is broken again. To be fixed in a
different way.
This better emulates Unicode grapheme clusters.
Note that Uniscribe does NOT do this, but should be harmless with most clients,
and improve fallback with clients that use HarfBuzz cluster as unit of fallback.
Fixes https://github.com/behdad/harfbuzz/issues/217
This is what Microsoft's implementation does. Marks that need advance
need to add it back using 'dist' or other feature in GPOS. Update tests to
match.
Fixes https://github.com/behdad/harfbuzz/issues/211
What happens in that bug is that a mark is attached to base first,
then a second mark is cursive-chained to the first mark. This only
"works" because it's in the Indic shaper where mark advances are
not zeroed.
Before, we didn't allow cursive to run on marks at all. Fix that.
We also where updating mark major offsets at the end of GPOS, such
that changes in advance of base will not change the mark attachment
position. That was superior to the alternative (which is what Uniscribe
does BTW), but made it hard to apply cursive to the mark after it
was positioned. We could track major-direction offset changes and
apply that to cursive in the post process, but that's a much trickier
thing to do than the fix here, which is to immediately apply the
major-direction advance-width offsets... Ie.:
https://github.com/behdad/harfbuzz/issues/211#issuecomment-183194739
If this breaks any fonts, the font should be fixed to do mark attachment
after all the advances are set up first (kerning, etc).
Finally, this, still doesn't make us match Uniscribe, for I explained
in that bug. Looks like Uniscribe applies minor-direction cursive
adjustment immediate as well. We don't, and we like it our way, at
least for now. Eg. the sequence in the test case does this:
- The first subscript attaches with mark-to-base, moving in x only,
- The second subscript attaches with cursive attachment to first subscript
moving in x only,
- A final context rule moves the first subscript up by 104 units.
The way we do, the final shift-up, also shifts up the second subscript
mark because it's cursively-attached. Uniscribe doesn't. We get:
[ttaorya=0+1307|casubscriptorya=0@-242,104+-231|casubscriptnarroworya=0@20,104+507]
while Uniscribe gets:
[ttaorya=0+1307|casubscriptorya=0@-242,104+-211|casubscriptnarroworya=0+487]
note the different y-offset of the last glyph. In our view, after cursive,
things move together, period.