This is a hack for now. Will be fixed when we do complex-shaper-driven
normalization properly.
The results with or without decomposition are the same, but Uniscribe
does not normalize, so this matches better.
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra. If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe. Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence... So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.
Fixes some 100 or more of Sinhala failures. Now at 622 only (0.23%).
POS_BASE can disappear if base ligated backward. Define base as last
with position not after base.
Fixes a few hundred of Sinhala failures with Iskoola Pota.
It's a visual Repha.
Still not positioning logical Repha as occurs in Malayalam.
Another 200 Khmer failures fixed. 547 to go. That's better than
Devanagari!
This reorders glyphs within the cluster to a nominal order. This should
have no visible effect on the output, but helps with testing, for
getting the same hb-shape output for visually-equal glyphs for each
cluster.
In such scripts (ie. Khmer), a ZWJ/ZWNJ shouldn't stop the search for
base. So, instead just choose the first consonant as base directly.
Test sequence:
U+1798,200c,U+17C9,U+17D2,U+179B,U+17C1,U+17C7
Mark stuff after a pre-base reordering Ro 'cfar'. Used in Khmer.
This allows distinguishing the following cases with MS Khmer fonts:
U+1784,U+17D2,U+179A,U+17D2,U+1782
U+1784,U+17D2,U+1782,U+17D2,U+179A
In Khmer, a final subjoined consonant or independent vowel can occur
after matras. This final subjoined thing should NOT be reordered to
before the matra even though it's subjoined.
Fixes another 1k of the Khmer failures. Not much left really.
Amend the syllable structure to allow a final subscripted consonant
(Coeng+C) and a final subscripted independent vowel (Coeng+V).
Fixes another 2k of Khmer failures.
Normally, we attach the Halant to the previous character and move it
with it. For after-base consonants however, the Halant "belongs" to the
consonant after, so attach it so.
This fixes Bengali sequences involving post-base consonant Ya, which
should ligate with the Halant to form Ya Phala, but previously a
reordered matras was blocking the ligation.
Seems like this is what Uniscribe is doing, and does not break any fonts
we tested (with Devanagari, Malayalam, Khmer, and Bengali), while fixing
some Ra Phala sequences for Bengali with Vrinda. Fixes another 2% of
Bengali failures (a couple more to go).
Uniscribe does not apply 'kern' in the Indic module. Some of the Khmer
fonts they ship have small adjustments in the 'kern' table. Disable
'kern' in the Indic module under Uniscribe bug compatibility mode.
Fixes some 10% of the Khmer failures. Remains under 3% (excluding
dotted-circle ones).
Mark them the same as the Register Shifters for now. Need to rename
that category to something more sensible after all is settled.
Fixes another percent of Khmer failures. Down to under 3%!