[Indic] Further adjust base algorithm for Sinhala

Apparently if there is C,V,ZWJ,C, the first C will be base, but if it's C,ZWJ,V,C, the second one will be. Note that Uniscribe implements this differently, by breaking syllable in the case of C,ZWJ,V,C and putting the first consonant in one syllable and the rest in the next syllable. Sinhala failures down from 208 to 158 (0.0581209%). No changes to Khmer.
2012-07-24 00:21:16 -04:00 · 2012-07-24 00:21:16 -04:00 · 71fd5e80ad
parent 73d71cc527
commit 71fd5e80ad
2 changed files with 10 additions and 4 deletions
--- a/src/hb-ot-shape-complex-indic.cc
+++ b/src/hb-ot-shape-complex-indic.cc
@ -560,12 +560,15 @@ initial_reordering_consonant_syllable (const hb_ot_map_t *map, hb_buffer_t *buff
 	base = limit;
      /* Find the last base consonant that is not blocked by ZWJ.  If there is
-       * a ZWJ before a bse consonant, that would request a subjoined form. */
+       * a ZWJ right before a base consonant, that would request a subjoined form. */
      for (unsigned int i = limit; i < end; i++)
        if (is_consonant (info[i]) && info[i].indic_position() == POS_BASE_C)
-	  base = i;
+	{
-	else if (info[i].indic_category() == OT_ZWJ)
+	  if (limit < i && info[i - 1].indic_category() == OT_ZWJ)
 	    break;
          else
 	    base = i;
 	}
      /* Mark all subsequent consonants as below. */
      for (unsigned int i = base + 1; i < end; i++)
--- a/test/shaping/texts/in-tree/shaper-indic/indic/script-sinhala/misc/misc.txt
+++ b/test/shaping/texts/in-tree/shaper-indic/indic/script-sinhala/misc/misc.txt
@ -32,3 +32,6 @@
 ග්‍යෙ
 ර්‍ය්‍ය
 එ‍ඬේ
 න්ගේ
 න්‍ගේ
 න‍්ගේ