[Indic] Further adjust base algorithm for Sinhala

Apparently if there is C,V,ZWJ,C, the first C will be base, but if
it's C,ZWJ,V,C, the second one will be.

Note that Uniscribe implements this differently, by breaking syllable in
the case of C,ZWJ,V,C and putting the first consonant in one syllable
and the rest in the next syllable.

Sinhala failures down from 208 to 158 (0.0581209%).  No changes to
Khmer.
This commit is contained in:
Behdad Esfahbod 2012-07-24 00:21:16 -04:00
parent 73d71cc527
commit 71fd5e80ad
2 changed files with 10 additions and 4 deletions

View File

@ -560,12 +560,15 @@ initial_reordering_consonant_syllable (const hb_ot_map_t *map, hb_buffer_t *buff
base = limit; base = limit;
/* Find the last base consonant that is not blocked by ZWJ. If there is /* Find the last base consonant that is not blocked by ZWJ. If there is
* a ZWJ before a bse consonant, that would request a subjoined form. */ * a ZWJ right before a base consonant, that would request a subjoined form. */
for (unsigned int i = limit; i < end; i++) for (unsigned int i = limit; i < end; i++)
if (is_consonant (info[i]) && info[i].indic_position() == POS_BASE_C) if (is_consonant (info[i]) && info[i].indic_position() == POS_BASE_C)
base = i; {
else if (info[i].indic_category() == OT_ZWJ) if (limit < i && info[i - 1].indic_category() == OT_ZWJ)
break; break;
else
base = i;
}
/* Mark all subsequent consonants as below. */ /* Mark all subsequent consonants as below. */
for (unsigned int i = base + 1; i < end; i++) for (unsigned int i = base + 1; i < end; i++)

View File

@ -32,3 +32,6 @@
ග්‍යෙ ග්‍යෙ
ර්‍ය්‍ය ර්‍ය්‍ය
එ‍ඬේ එ‍ඬේ
න්ගේ
න්‍ගේ
න‍්ගේ