Not for Arabic, but for Indic-like scripts. ZWJ/ZWNJ have special
meanings in those scripts, so let font lookups take full control.
This undoes the regression caused by automatic-joiners handling
introduced two commits ago.
We only disable automatic joiner handling for the "basic shaping
features" of Indic, Myanmar, and SEAsian shapers. The "presentation
forms" and other features are still applied with automatic-joiner
handling.
This change also changes the test suite failure statistics, such that
a few scripts show more "failures". The most affected is Kannada.
However, upon inspection, we believe that in most, if not all, of the
new failures, we are producing results superior to Uniscribe. Hard to
count those!
Here's an example of what is fixed by the recent joiner-handling
changes:
https://bugs.freedesktop.org/show_bug.cgi?id=58714
New numbers, for future reference:
BENGALI: 353892 out of 354188 tests passed. 296 failed (0.0835714%)
DEVANAGARI: 707336 out of 707394 tests passed. 58 failed (0.00819911%)
GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047983 out of 1048334 tests passed. 351 failed (0.0334817%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
When matching lookups, be smart about default-ignorable characters.
In particular:
Do nothing specific about ZWNJ, but for the other default-ignorables:
If the lookup in question uses the ignorable character in a sequence,
then match it as we used to do. However, if the sequence match will
fail because the default-ignorable blocked it, try skipping the
ignorable character and continue.
The most immediate thing it means is that if Lam-Alef forms a ligature,
then Lam-ZWJ-Alef will do to. Finally!
One exception: when matching for GPOS, or for backtrack/lookahead of
GSUB, we ignore ZWNJ too. That's the right thing to do.
It certainly is possible to build fonts that this feature will result
in undesirable glyphs, but it's hard to think of a real-world case
that that would happen.
This *does* break Indic shaping right now, since Indic Unicode has
specific rules for what ZWJ/ZWNJ mean, and skipping ZWJ is breaking
those rules. That will be fixed in upcoming commits.
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.
We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:
- For Indic, no explicit zeroing happens whatsoever, which
is the same as before,
- For Myanmar, zero advance width of glyphs marked as marks
*in GDEF*, and do that *before* applying GPOS. This seems
to be what the new Win8 Myanmar shaper does,
- For everything else, zero advance width of glyphs that are
from General_Category=Mn Unicode characters, and do so
before applying GPOS. This seems to be what Uniscribe does
for Latin at least.
With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'. See preivous
commit.
API additions:
hb_segment_properties_t
HB_SEGMENT_PROPERTIES_DEFAULT
hb_segment_properties_equal()
hb_segment_properties_hash()
hb_buffer_set_segment_properties()
hb_buffer_get_segment_properties()
hb_ot_layout_glyph_class_t
hb_shape_plan_t
hb_shape_plan_create()
hb_shape_plan_create_cached()
hb_shape_plan_get_empty()
hb_shape_plan_reference()
hb_shape_plan_destroy()
hb_shape_plan_set_user_data()
hb_shape_plan_get_user_data()
hb_shape_plan_execute()
hb_ot_shape_plan_collect_lookups()
API changes:
Rename hb_ot_layout_feature_get_lookup_indexes() to
hb_ot_layout_feature_get_lookups().
New header file:
hb-shape-plan.h
And a bunch of prototyped but not implemented stuff. Coming soon.
(Tests fail because of the prototypes right now.)
We need the font for glyph lookup during GSUB pauses in Indic shaper.
Could perhaps be avoided, but at this point, we don't mean to support
separate substitute()/position() entry points (anymore), so there is
no point in not providing the font to GSUB.
Gives me a good 10% speedup for the Devanagari test case. Less so
for less lookup-intensive tests.
For the Devanagari test case, the false positive rate of the GSUB digest
is 4%.
This commit: a3313e5400 broke MarkMarkPos
when one of the marks itself is a ligature. That regressed 26 Tibetan
tests (up from zero!). Fix that. Tibetan back to zero.
This was broken as a result of 7b84c536c1.
As Khaled reported, MarkMark positioning was broken with glyphs
resulting from a MultipleSubst. Fixed. Test with the ALLAH character
in Amiri.
I was reconsidering whether y should grow down, since all three/four
times I've used this API I was tricked and got that wrong in my use.
So I was very inclined to make y grow down instead of up. However,
considering that the font space has y up and it would be very confusing
for callbacks to work against that, I decided that what I really want
is for the user to be able to set y_scale to a negative number to imply
that user-space y grows down.
Changing x_scale/y_scale from unsigned int to int allows that, and I've
made pango to use that instead of negating glyph y_offset later. hb-ft
however still has y group up. I *guess* that's how FreeType works?
I'm not sure, FreeType docs don't make this clear...
I'm happy with the resolution :-).