I typically don't like including generating files in tree. But like to
make an exception for this, since this forms the canonical list of
options one would need to go through when building with alternative
build systems.
Incidentally, this makes it not crash with icu-le-hb anymore...
I'm not smart / stupid enough to spend two more days debugging C++
linking issues, and this is ABI-stable at least.
That's really the logic desired. Except that MONGOLIAN VOWEL SEPARATOR
is not default_ignorable but it really should be. Reported to Unicode.
Based on suggestion from Konstantin Ritt.
To be used for a variety of purposes. We save up to five characters
in each direction. No public API changes, everything is taken care
of already. All clients need to do is to call hb_buffer_add_utf* with
the full text + segment info (or at least some context) instead of
just passing in the segment.
Various operations (hb_buffer_reset, hb_buffer_set_length,
hb_buffer_add*) automatically reset the relevant contexts.
I don't expect ragel to be creating too much noise in its generated
output, and including this in-tree helps users right now. We can
revisit this later if it proved to be too much trouble.
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions. That was then causing syllable match failures. We now
find syllables before any features have been applied.
Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
With this in place, you can remove GDEF/GSUB/GPOS tables from Arabic
fonts and still get per-component marks positioned on
oh-yeah-fallback-formed LAM-ALEF ligatures with marks in between the LAM
and ALEF.
Now *that*'s pretty cool, if a bit anachronistic...
Uniscribe accepts a Halant,ZWJ before matras. Allow that.
BENGALI down from 295 to 291
DEVANAGARI down from 69 to 57
GUJARATI down from 19 to 17
KANNADA down from 871 to 867
MALAYALAM down from 340 to 337
TELUGU down from 20 to 16
Currently at:
BENGALI: 353897 out of 354188 tests passed. 291 failed (0.0821598%)
DEVANAGARI: 707337 out of 707394 tests passed. 57 failed (0.00805774%)
GUJARATI: 366440 out of 366457 tests passed. 17 failed (0.00463902%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951046 out of 951913 tests passed. 867 failed (0.0910798%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047997 out of 1048334 tests passed. 337 failed (0.0321462%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970557 out of 970573 tests passed. 16 failed (0.00164851%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
Now that we insert dotted-circle, tests break more easily when our indic
machine breaks.
In particular, a few Devanagari tests were having sequences like
"C,H,ZWJ,N", and because of the ZWJ the Nukta does NOT get reordered to
before the Halant as the grammar used to expect... Fixup.
Another case is as simple as "C,ZWJ,SM".
Fixes 10 out of 79 failures:
DEVANAGARI: 707325 out of 707394 tests passed. 69 failed (0.00975411%)