From 9fa052733eb93a3ce1205f63ff8f74cb295cbe99 Mon Sep 17 00:00:00 2001 From: Behdad Esfahbod Date: Mon, 23 Jul 2012 18:19:17 -0400 Subject: [PATCH] [Indic] Limit syllables to at most five consonants Seems to be about what Uniscribe does. Not exactly. But close enough. More consonants will start a new cluster. A few scripts went way down in failures. In particular: - Devanagari failures went down from 490 to 56. - Telugu went down from 113 to 49. Other scripts went down slightly or didn't change. New numbers: BENGALI: 353908 out of 354285 tests passed. 377 failed (0.106412%) DEVANAGARI: 693572 out of 693628 tests passed. 56 failed (0.00807349%) GUJARATI: 366485 out of 366506 tests passed. 21 failed (0.00572978%) GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%) KANNADA: 950730 out of 951913 tests passed. 1183 failed (0.124276%) KHMER: 298613 out of 299124 tests passed. 511 failed (0.170832%) MALAYALAM: 1046881 out of 1048416 tests passed. 1535 failed (0.146411%) ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%) SINHALA: 271333 out of 271847 tests passed. 514 failed (0.189077%) TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%) TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%) Some of the remaining Telugu and Devanagari issues seem to be Uniscribe eating Anusvara when placed before a non-joiner. Ouch! --- src/hb-ot-shape-complex-indic-machine.rl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/hb-ot-shape-complex-indic-machine.rl b/src/hb-ot-shape-complex-indic-machine.rl index 62091e2fb..01a22e852 100644 --- a/src/hb-ot-shape-complex-indic-machine.rl +++ b/src/hb-ot-shape-complex-indic-machine.rl @@ -72,9 +72,9 @@ final_halant_group = halant_group | h.ZWNJ; halant_or_matra_group = (final_halant_group | matra_group*); -consonant_syllable = Repha? (cn.halant_group)* cn A? halant_or_matra_group? syllable_tail; -vowel_syllable = reph? V.n? (halant_group.cn | ZWJ.cn)* halant_or_matra_group? syllable_tail; -standalone_cluster = reph? place_holder.n? (halant_group.cn)* halant_or_matra_group? syllable_tail; +consonant_syllable = Repha? (cn.halant_group){0,4} cn A? halant_or_matra_group? syllable_tail; +vowel_syllable = reph? V.n? (halant_group.cn | ZWJ.cn){0,4} halant_or_matra_group? syllable_tail; +standalone_cluster = reph? place_holder.n? (halant_group.cn){0,4} halant_or_matra_group? syllable_tail; other = any; main := |*