New approach to fix this:
69f9fbc420
Previous approach was reverted as it was too broad. See context:
https://github.com/behdad/harfbuzz/issues/347#issuecomment-267838368
With U+05E9,U+05B8,U+05C1,U+05DC and Arial Unicode, we now (correctly) disable
GDEF and GPOS, so we get results very close to Uniscribe, but slightly different
since our fallback position logic is not exactly the same:
Before: [gid1166=3+991|gid1142=0+737|gid5798=0+1434]
After: [gid1166=3+991|gid1142=0@402,-26+0|gid5798=0+1434]
Uniscribe: [gid1166=3+991|gid1142=0@348,0+0|gid5798=0+1434]
Fixes https://github.com/behdad/harfbuzz/issues/243
With javatext.ttf, the reodering medial Ra gets its advance width
zero'ed in Uniscribe implementation, and the font adds the advance
back. Our Indic shaper does not do that, but USE does. So, route
Javanese through USE. That's what Microsoft does anyway. Test:
U+A9A5,U+A9BA
This also seems to fix the following sequence, and variations thereof:
U+A99F,U+A9C0,U+A9A2,U+A9BF
These were never tested with Indic shaper, and indeed wouldn't work there
because they didn't have their viramas and other config defined. They are
all also supported by MS through USE, so route them there.
Looks like Unsicribe responds to the 'mymr' tag by zeroing marks
GDEF_LATE instead of generic-shaper UNICODE_LATE. Implement that.
Fixes
Bug 81775 - Incorrect Rendering with harfbuzz-ng myanmar unicode
https://bugs.freedesktop.org/show_bug.cgi?id=81775
Micro-test added based on Padauk.
Not exhaustively tested, but I think I got the intended logic
right.
The logic can perhaps be simplified. Maybe we should disabled
normalization with this shaper. Then again, for now focusing on
correctness.
This reverts commit d5bd0590ae.
The reasoning behind that logic was flawed and made under
a misunderstanding of the original problem, and caused
regressions as reported by Jonathan Kew in thread titled
"tibetan marks" in Oct 2013. Apparently I have had fixed
the original problem with this commit:
7e08f1258d
So, revert the faulty commit and everything seems to be in good
shape.
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.
We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:
- For Indic, no explicit zeroing happens whatsoever, which
is the same as before,
- For Myanmar, zero advance width of glyphs marked as marks
*in GDEF*, and do that *before* applying GPOS. This seems
to be what the new Win8 Myanmar shaper does,
- For everything else, zero advance width of glyphs that are
from General_Category=Mn Unicode characters, and do so
before applying GPOS. This seems to be what Uniscribe does
for Latin at least.
With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'. See preivous
commit.
Had to do some refactoring to make this happen...
Under uniscribe bug compatibility mode, we still plit them
Uniscrie-style, but Jonathan and I convinced ourselves that there is no
harm doing this the Unicode way. This change makes that happen, and
unbreaks free Sinhala fonts.
Windows 8 adds a Myanmar shaper using the 'mym2' tag. Route that
through the Indic shaper. It's still very broken, but at least this
does NOT break old-style Myanmar shaping using the generic shaper.
For Arabic and Indic shapers, if the font doesn't have a script system
for the script, use default shaper.
Make an exception for Arabic script since we have fallback logic for
that one.
The merger of normalizer and glyph-mapping broke shapers that
modified text stream. Unbreak them by adding a new preprocess_text
shaping stage that happens before normalizing/cmap and disallow
setup_mask modification of actual text.
If there is no GPOS, zero mark advances.
If there *is* GPOS and the shaper requests so, zero mark advances for
attached marks.
Fixes regression with Tibetan, where the font has GPOS, and marks a
glyph as mark where it shouldn't get zero advance.