Commit Graph

2374 Commits

Author SHA1 Message Date
Behdad Esfahbod bdd080431a [Indic] Reposition Oriya Candrabindu
Oriya failures down from 0.65% to 0.20%.
2012-07-20 16:03:09 -04:00
Behdad Esfahbod 5f0eaaad12 [Indic] Fix base search in final_reordering
Fixes most Malayalam failures.  Down from 1.6% to 0.38% now.  Fixes a
few more in other scripts too.
2012-07-20 15:47:24 -04:00
Behdad Esfahbod 81202bd860 [Indic] Don't attach SM/VD to other characters 2012-07-20 15:14:51 -04:00
Behdad Esfahbod efb4ad7356 Fix compiler warnings
If x is not constant, we cannot ASSERT_STATIC on it.
2012-07-20 14:27:38 -04:00
Behdad Esfahbod f31d97e44e [Indic] Form Telugu Reph out of Ra,Virama,ZWJ
Apparently this was approved in Feb 2012.  No font yet.
2012-07-20 14:13:35 -04:00
Behdad Esfahbod 2e193b240e [Indic] Don't split U+0AC9
Althought IndicMatraCategory.txt classifies it as Top_And_Right matra,
it does not have Unicode decomposition, and Uniscribe does not do
anything special about it either.

Gujarati failures down from 0.672% to 0.0130966%.
2012-07-20 14:02:35 -04:00
Behdad Esfahbod 30c3d5e9fc [Indic] Simplify Uniscribe cluster emulation
Now that we break syllables on Halant,ZWNJ, this code can be simplified.
2012-07-20 13:56:32 -04:00
Behdad Esfahbod decf6ffca4 [Indic] Minor! 2012-07-20 13:51:31 -04:00
Behdad Esfahbod 9e4f94a72c [Indic] Break syllables at Halant,ZWNJ
That's really what Uniscribe does, and explains a lot of pecularities of
Halant,ZWNJ before the base.

Sent Telugu from 1% failures to 0.03%.  Improved Kannada and Malayalam
slightly.  Fixed half of Bengali, and did NOT break anything!
2012-07-20 13:48:03 -04:00
Behdad Esfahbod 2c372b80f6 [Indic] Better check for applying 'init'
Specifically, don't apply 'init' if previous char is a joiner.

Fixes some more of Bengali.
2012-07-20 13:37:48 -04:00
Behdad Esfahbod 34a7440b7c [GPOS] Don't zero mark advances
Fixes more of Telugu, Kannada, and Oriya.

May break things (outside Indic...), but we cannot think of any font relying
on this immediately.
2012-07-20 12:40:39 -04:00
Behdad Esfahbod 8ed248de77 [Indic] Minor 2012-07-20 11:42:24 -04:00
Behdad Esfahbod d0e68dbd0b [Indic] Implement reph positioning step 5
Not tuned, just copied from step 2.  Fixes another 0.5% of Kannada
failures.  1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod a9e45c32e4 [Indic] Don't let ZWNJ at the end of syllable affect base search
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod 20b68e699f [Indic] Apply 'cjct' globally
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod 51e764de44 [Indic] Unbreak old scriptures
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod 900cf3d449 Minor 2012-07-20 10:18:23 -04:00
Behdad Esfahbod 87cd63266e [Indic] Recategorize some Kannada right matras
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod 3604d64ced [Indic] Recategorize GURMUKHI ADDAK
It's not in IndicSyllabicCategory.txt.  Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod 8932858123 Minor 2012-07-19 21:02:38 -04:00
Behdad Esfahbod 47ef931f13 [buffer] Make sure out_info = info during GPOS 2012-07-19 20:52:44 -04:00
Behdad Esfahbod ae63cf2062 Print line number during return when tracing 2012-07-19 20:45:41 -04:00
Behdad Esfahbod 5249f3aee1 [Indic] Unbreak Khmer
For Khmer, all consonants are subjoining.  No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod e0475345d5 [Indic] Apply 'akhn' globally
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod c87bcddb10 [Indic] Add failing test for Kannada 2012-07-19 20:03:25 -04:00
Behdad Esfahbod fa247ebe52 [Indic] Better position U+0CD5
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod f055442716 [Indic] Lookup consonant position in the font
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00
Behdad Esfahbod 74d1d88781 [GSUB] Fix would_apply() for LigatureSubst 2012-07-19 16:14:23 -04:00
Behdad Esfahbod 787f7d1e9b [TODO] Minor 2012-07-19 15:29:13 -04:00
Behdad Esfahbod be73a5f936 Add src/test-would-substitute tool 2012-07-19 15:12:18 -04:00
Behdad Esfahbod e72b360ac6 Refactor / finish would_apply() operation
Untested.
2012-07-19 14:44:46 -04:00
Behdad Esfahbod 8c973ebf0f [Indic] Implement per-script matra positioning
Following what the spec says.

Brings down Telugu failures from 40% to 3.75%, and Kannada failures from
44% to 10%.  Does NOT affect other scripts' test results.
2012-07-19 13:25:08 -04:00
Behdad Esfahbod 8bb32458f9 [Indic] More refactoring 2012-07-19 13:04:44 -04:00
Behdad Esfahbod 9ccc6382ba [Indic] Minor refactoring 2012-07-19 12:45:31 -04:00
Behdad Esfahbod f83aaa3133 [Indic] Minor 2012-07-19 12:23:23 -04:00
Behdad Esfahbod be8b9f5f71 [Indic] Start refactoring different matra positions per script 2012-07-19 12:11:12 -04:00
Behdad Esfahbod deeb540a74 [test] Ignore tests with DOTTED CIRCLE in the output 2012-07-19 11:30:48 -04:00
Behdad Esfahbod b01d9b3d90 [Indic] Disallow decomposition of a couple characters
This is a hack for now.  Will be fixed when we do complex-shaper-driven
normalization properly.

The results with or without decomposition are the same, but Uniscribe
does not normalize, so this matches better.
2012-07-19 11:25:49 -04:00
Behdad Esfahbod 422ecd2d3c [Indic] Accept a forced Rakar sequence at the end of syllable
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra.  If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe.  Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence...  So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.

Fixes some 100 or more of Sinhala failures.  Now at 622 only (0.23%).
2012-07-18 23:25:58 -04:00
Behdad Esfahbod 6fc1732003 [Indic] Allow joiners on both sides of Halant at the same time
The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
for Rakar.  Fixes two-thousand Sinhala tests.  Not many left.
2012-07-18 17:49:19 -04:00
Behdad Esfahbod 10cdc94eee [Indic] In final reordering, find base, even if it disappeared
POS_BASE can disappear if base ligated backward.  Define base as last
with position not after base.

Fixes a few hundred of Sinhala failures with Iskoola Pota.
2012-07-18 17:43:23 -04:00
Behdad Esfahbod 9c4d24a3a6 [Indic] Minor 2012-07-18 17:29:10 -04:00
Behdad Esfahbod 3285e107c9 [Indic] Implement Sinhala "Al Lakuna" Reph behavior
In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
2012-07-18 17:22:14 -04:00
Behdad Esfahbod 91cade7555 [Indic/Unicode] Decompose Sinhala split matras the way Uniscribe likes
Makes no visual difference.

Fixes most of the failures.  Down from 15% to 1.3%!
2012-07-18 16:50:41 -04:00
Behdad Esfahbod d8942dcbb4 Apply Tibetan (global) features.
Fixes all Tibetan failures.  All 180k of them!

Merges back Hangul into the default shaper.
2012-07-18 16:34:10 -04:00
Behdad Esfahbod 552d19b7a1 [Indic] Treat Register Shifters like Nukta
Really this time.

Fixes another 18 Khmer tests.
2012-07-18 16:02:33 -04:00
Behdad Esfahbod e8cd81f76d [Indic] Minor 2012-07-18 16:00:20 -04:00
Behdad Esfahbod 69f26bf39c [Indic] Fix Matra reordering when base is at end of syllable
For example: U+915,U+200c,U+93f

Fixes last Tamil failure!
2012-07-18 15:47:51 -04:00
Behdad Esfahbod d16ccc4ae7 Leave one extra item at the end of buffer allocation
Just in case, for the times we do out-of-bounds access.

jk
2012-07-18 15:43:55 -04:00
Behdad Esfahbod 075d671f10 [Indic] Fix out-of-bounds array access 2012-07-18 15:41:53 -04:00