Commit Graph

362 Commits

Author SHA1 Message Date
Behdad Esfahbod e9b2a4cfe5 [indic] Support U+1CED 2014-05-23 15:49:10 -04:00
Behdad Esfahbod d19f8e8570 [indic] Support U+A8F2..A8F7,1CE9..1CEC,1CEE..1CF1 2014-05-23 15:47:36 -04:00
Behdad Esfahbod 9f9bd9bf31 [indic] Rename avagraha cluster to symbol cluster
In anticipation of adding more characters to that class of clusters.
2014-05-23 15:35:38 -04:00
Behdad Esfahbod a498565ced [indic] Support U+1CF2,U+1CF3 2014-05-22 19:39:56 -04:00
Behdad Esfahbod ecb98babba [indic] Support U+1CE2..U+1CE8 2014-05-22 19:36:21 -04:00
Behdad Esfahbod 37bf2c9224 Minor 2014-05-22 19:35:17 -04:00
Behdad Esfahbod 131e17ff9a [indic] Support U+1CF5,1CF6 2014-05-22 19:33:10 -04:00
Behdad Esfahbod 72ead0cc72 [indic] Treat U+1CE1 as a tone-mark too
It's spacing, but otherwise the same as the other ones.
2014-05-22 19:12:10 -04:00
Behdad Esfahbod e848bfae7c [indic] Recategorize U+A8E0..A8F1 as OT_VD
Up to two of them come after all OT_A characters.
2014-05-22 18:50:34 -04:00
Behdad Esfahbod c11fc68339 [indic] Support more extended Devanagari tone marks
Also adjust U+0953,0954 handling.
2014-05-22 18:41:49 -04:00
Behdad Esfahbod 26c836e53d [indic] Handle "Cantillation marks for the Samaveda" 2014-05-21 18:35:48 -04:00
Behdad Esfahbod 29531128f2 [indic] Improve reph formation of Sinhala and Telugu
Sinhala and Telugu use "explicit" reph.  That is, the reph is formed by
a Ra,H,ZWJ sequence.  Previously, upon detecting this sequence, we were
checking checking whether the 'rphf' feature applies to the first two
glyphs of the sequence.  This is how the Microsoft fonts are designed.
However, testing with Noto shows that apparently Uniscribe also forms
the reph if the lookup ligates all three glyphs.  So, try both
sequences.

Doesn't affect test results for Sinhala or Telugu.

https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=232
2014-05-15 14:04:02 -06:00
Behdad Esfahbod b082ef373c Typo 2014-04-25 11:48:10 -07:00
Behdad Esfahbod 828e109c7a [indic] Fix-up zero-context matching
commit b5a0f69e47
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Thu Oct 17 18:04:23 2013 +0200

    [indic] Pass zero-context=false to would_substitute for newer scripts

    For scripts without an old/new spec distinction, use zero-context=false.
    This changes behavior in Sinhala / Khmer, but doesn't seem to regress.
    This will be useful and used in Javanese.

The *intention* was to change zero-context from true to false for scripts that
don't have old-vs-new specs.  However, checking the code, looks like we
essentially change zero-context to always be true; ie. we only changed things
for old-spec, and we broke them.  That's what causes this bug:

  https://bugs.freedesktop.org/show_bug.cgi?id=76705

The root of the bug is here:

  /* Use zero-context would_substitute() matching for new-spec of the main
   * Indic scripts, but not for old-spec or scripts with one spec only. */
  bool zero_context = indic_plan->config->has_old_spec || !indic_plan->is_old_spec;

Note that is_old_spec itself is:

  indic_plan->is_old_spec = indic_plan->config->has_old_spec && ((plan->map.chosen_script[0] & 0x000000FF) != '2');

It's easy to show that zero_context is now always true.  What we really meant was:

  bool zero_context = indic_plan->config->has_old_spec && !indic_plan->is_old_spec;

Ie, "&&" instead of "||".  We made this change supposedly to make Javanese
work.  But apparently we got it working regardless!  So I'm going to fix this
to only change the logic for old-spec and not touch other cases.
2014-04-18 16:53:34 -07:00
Behdad Esfahbod 0682ddd05c [indic] Support U+17DD KHMER SIGN ATTHACAN
As requested by Martin Hosken on the list.
2014-04-08 16:03:35 -07:00
Behdad Esfahbod 3d6ca0d32e [ot] Simplify normalization_preference again
No shaper has more than one behavior re this, so no need for a callback.
2013-12-31 16:35:37 +08:00
Behdad Esfahbod 71b4c999a5 Revert "Zero marks by GDEF for Tibetan"
This reverts commit d5bd0590ae.

The reasoning behind that logic was flawed and made under
a misunderstanding of the original problem, and caused
regressions as reported by Jonathan Kew in thread titled
"tibetan marks" in Oct 2013.  Apparently I have had fixed
the original problem with this commit:

  7e08f1258d

So, revert the faulty commit and everything seems to be in good
shape.
2013-10-28 00:43:27 +01:00
Behdad Esfahbod 46a863d91d [indic] Adjust pref reordering logic
For Javanese (pref_len == 1) only reorder if it didn't ligate.  That's
sensible, and what the spec says.  For other Indic (pref_len > 1)
only reorder if ligated.

Doesn't change any test numbers.
2013-10-27 23:28:12 +01:00
Behdad Esfahbod ddce2d8df6 [indic] Improve positioning of post-base bells and whistles
Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not provide
same results as Windows8
https://bugs.freedesktop.org/show_bug.cgi?id=58714

Test with U+0CB0,U+200D,U+0CCD,U+0C95,U+0CBF and tunga.ttf.

Improves some scripts.  Improves Bengali too, but numbers
are up because we produce better results than Uniscribe for some
sequences now.

New numbers:
BENGALI: 353724 out of 354188 tests passed. 464 failed (0.131004%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2013-10-18 18:17:29 +02:00
Behdad Esfahbod d5bd0590ae Zero marks by GDEF for Tibetan
See:
http://lists.freedesktop.org/archives/harfbuzz/2013-April/003101.html
2013-10-18 18:17:29 +02:00
Behdad Esfahbod c16012e901 [indic] Add Javanese support!
Seems to be working just fine!
2013-10-18 18:17:29 +02:00
Behdad Esfahbod 9a49351cc2 [indic] Swith pref logic to use _hb_glyph_info_substituted()
See comments from caveat!  Seems to work fine.

This is useful for Javanese which has an atomically encoded pre-base
reordering Ra which should only be reordered if it was substituted
by the pref feature.
2013-10-18 11:25:24 +02:00
Behdad Esfahbod f175aa33c5 [indic] Fix compiler warnings 2013-10-18 11:25:24 +02:00
Behdad Esfahbod a1f7b28561 [otlayout] Switch over from old is_a_ligature() to IS_LIGATED
Impact should be minimal and positive.
2013-10-18 11:25:24 +02:00
Behdad Esfahbod 3ddf892b53 [otlayout] Renaming 2013-10-18 11:21:15 +02:00
Behdad Esfahbod 8f9ec92dfc [indic] Adjust Javanese base algorithm 2013-10-17 19:52:47 +02:00
Behdad Esfahbod 74f4bbf056 [indic] Towards supporting atomicly-encoded prebase-reorderings 2013-10-17 19:07:53 +02:00
Behdad Esfahbod efed40b975 [indic] Minor refactoring of reph handling 2013-10-17 18:50:11 +02:00
Behdad Esfahbod 684fe59ff8 [indic] Minor refactoring of would_substitute() 2013-10-17 18:30:06 +02:00
Behdad Esfahbod b5a0f69e47 [indic] Pass zero-context=false to would_substitute for newer scripts
For scripts without an old/new spec distinction, use zero-context=false.
This changes behavior in Sinhala / Khmer, but doesn't seem to regress.
This will be useful and used in Javanese.
2013-10-17 18:04:23 +02:00
Behdad Esfahbod c4e71ff36d [indic] Clean up Khmer and Sinhala base finding algorithm 2013-10-17 17:04:47 +02:00
Behdad Esfahbod e10453e6fb [indic] Add BASE_POS_LAST_SINHALA
Previously we planted this into the mode used for Khmer.  There's not
really much in common between the two, so separate again.
2013-10-17 16:49:06 +02:00
Behdad Esfahbod 9ac6b01e0c [indic] Adjust Sinhala cluster merging under uniscribe
Similar to 190c8f2b60 but for
Sinhala.
2013-10-17 16:27:38 +02:00
Behdad Esfahbod 6b2abdcd20 [indic] Improve clusters in presence of reph 2013-10-17 13:15:43 +02:00
Behdad Esfahbod 42d0f55cbc [indic] Apply calt,clig in the same stage as presentation features
Whic means these twp are applied per-syllable now.  Apparently
in some Khmer fonts the clig interacts with presentation features.

Test case: U+1781,U+17D2,U+1789,U+17BB,U+17C6 with Mondulkiri-R.ttf
should produce one big ligature.
2013-10-17 13:06:22 +02:00
Behdad Esfahbod ae9a5834df [indic] Fix pref vs blwf interaction
If a glyph can be both blwf and pref, we were wrongly sorting it
in the post position instead of below position.
2013-10-17 12:24:55 +02:00
Behdad Esfahbod c7dacac02c [indic] Don't apply blwf before base under old-spec mode
Test case: U+09AC,U+09CD,U+09A6 with Lohit-Bengali 2.5.3.
2013-10-17 12:20:46 +02:00
Behdad Esfahbod 3756efaf4e [indic] Misc harmless fixes!
First, we were abusing OT_VD instead of OT_A.  Fix that
but moving OT_A in the grammar where it belongs (which
is different from what the spec says).

Also, only allow medial consonants after all other
consonants.  This doesn't affect any current character.

Finally, fix Halant attachment in presence of medial
consonants.  Again, this currently doesn't affect any
sequence.

I lied.  There's Gurmukhi U+0A75 which is Consonant_Medial.
Uniscribe allows one of those in each of these positions:
before matras, after matras and before syllable modifiers,
and after syllable modifiers!  We currently just allow
unlimited numbers of it, before matras.
2013-10-16 19:06:29 +02:00
Behdad Esfahbod 28d5daec94 [indic] More granular post-base cluster merging! 2013-10-16 12:32:12 +02:00
Behdad Esfahbod 9cb59d460e [indic] Fix cluster merging of left matras
The merge_clusters there was totally broken.
2013-10-16 11:34:07 +02:00
Behdad Esfahbod 190c8f2b60 [indic] Adjust cluster merging under uniscribe mode for Tamil
Apparently Uniscribe Tamil shaper doesn't ship chubby clusters
for Tamil.  Adjust to that.
2013-10-16 11:33:18 +02:00
Behdad Esfahbod f5299eff5c [indic] Simplify reph logic
*Shouldn't* break anything.
2013-10-15 18:21:32 +02:00
Behdad Esfahbod 65a929b1c0 [indic] If Malayalam dot-reph formed a ligature, don't move it
Rachana-0.6 implements dot-reph by ligation, so we shouldn't move it.
Uniscribe doesn't either.  Test case:

  U+0D4E,U+0D1A,U+0D4D,U+0D1A,U+0D4D
2013-10-15 18:21:32 +02:00
Behdad Esfahbod a01cbf6cbe [indic] Harmless reordering of Khmer features! 2013-10-15 18:21:32 +02:00
Behdad Esfahbod eb10233b26 [indic] Apply 'kern' for all scripts except for Khmer in Uniscribe mode
Seems to better match Uniscribe.

Note: NotoSansTelugu-Regular has kern feature, so this fixes most of the
positioning failures there, except for the kern pairs blocked by a
(non-)joiner, in which case we (correctly) kern, but Uniscribe doesn't.
2013-10-15 18:21:32 +02:00
Behdad Esfahbod 30145272a7 [indic] Don't apply presentation features across syllables
More like Uniscribe...  We still allow user-defined features to
work across syllables, but not pres,blws,abs,psts,etc.

This "regressed" Sinhala numbers by 11.  These are cases were
there's Consonant followed by Ra,Halant,ZWJ at the of text.
The Ra,Halant,ZWJ ends up forming reph, which is wrong...
But before we were also ligating that reph with the previous
consonant.  That's even more wrong.  That's also what Uniscribe
does.

Current numbers:

BENGALI: 353732 out of 354188 tests passed. 456 failed (0.128745%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271655 out of 271847 tests passed. 192 failed (0.070628%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2013-10-15 18:20:59 +02:00
Behdad Esfahbod 3c7b3641cf [indic] Handle Avagraha
It can come either at the end(ish!) of the syllable, or independently.
When independent, it accepts a few bits and pieces.
2013-10-15 13:14:31 +02:00
Behdad Esfahbod 8acbb6be27 [indic] Some scripts like blwf applied to pre-base characters
...while some don't!

Improved Bengali, Devanagari, Gurmukhi, Malayalam.

Updated numbers:

BENGALI: 353732 out of 354188 tests passed. 456 failed (0.128745%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048134 out of 1048334 tests passed. 200 failed (0.0190779%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2013-10-15 12:29:07 +02:00
Behdad Esfahbod bdd8873fd8 Revert "[Indic] don't apply 'calt' by default in Indic shaper"
This reverts commit 952121007c.

In light of discussion on the mailing list...
2013-08-07 17:58:25 -04:00
Jonathan Kew 952121007c [Indic] don't apply 'calt' by default in Indic shaper 2013-08-06 10:36:14 -04:00
Behdad Esfahbod 9245e98742 [Indic] Add Javanese config
We should add for other scripts too, send me the virama codepoint
and script name...
2013-06-26 20:57:58 -04:00
Behdad Esfahbod a8cf7b43fa [Indic] Futher adjust ZWJ handling in Indic-like shapers
After the Ngapi hackfest work, we were assuming that fonts
won't use presentation features to choose specific forms
(eg. conjuncts).  As such, we were using auto-joiner behavior
for such features.  It proved to be troublesome as many fonts
used presentation forms ('pres') for example to form conjuncts,
which need to be disabled when a ZWJ is inserted.

Two examples:

	U+0D2F,U+200D,U+0D4D,U+0D2F with kartika.ttf
	U+0995,U+09CD,U+200D,U+09B7 with vrinda.ttf

What we do now is to never do magic to ZWJ during GSUB's main input
match for Indic-style shapers.  Note that backtrack/lookahead are still
matched liberally, as is GPOS.  This seems to be an acceptable
compromise.

As to the bug that initially started this work, that one needs to
be fixed differently:

  Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not
  provide same results as Windows8
  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers:

BENGALI: 353689 out of 354188 tests passed. 499 failed (0.140886%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048102 out of 1048334 tests passed. 232 failed (0.0221304%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-03-19 06:22:06 -04:00
Behdad Esfahbod fb7c182bf9 [Indic] Minor 2013-03-06 00:53:24 -05:00
Behdad Esfahbod 8144936d07 [Indic] Work around fonts with broken new-spec tables
See comments, and this thread:

http://lists.freedesktop.org/archives/harfbuzz/2013-March/002990.html

Originally reported here:

https://code.google.com/p/chromium/issues/detail?id=96143

Doesn't change test suite numbers.
2013-03-05 20:08:59 -05:00
Behdad Esfahbod 41732f1fe3 [Indic] Help compiler put indic_features table in .rodata
The overridden "or" operator was preventing the flag expression from
being const, and putting the table in .data instead or .rodata.
2013-02-27 20:40:54 -05:00
Behdad Esfahbod 94789fd601 [Indic] Sort pre-base reordering consonants with post-forms
Before, we were marking them as below-form for initial reordering.
However, there is a rule that says "post consonants should follow
below consonsnts" for base determination purposes.  Malayalam has
port-form YA/VA, and RA is pre-base.  As such, for a sequence like
YA,Virama,YA,Virama,RA, the correct base is at index 0.  But
because the code was seeing RA as a below-base, it was stopping at
the second YA as base, instead of jumping it as a post-base.

By treating prebase-reordering consonants like post-forms, this
is fixed.

MALAYALAM went down from 351 to 265.  Other numbers didn't change:

BENGALI: 353686 out of 354188 tests passed. 502 failed (0.141733%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048069 out of 1048334 tests passed. 265 failed (0.0252782%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-02-26 21:22:37 -05:00
Behdad Esfahbod cfc507c543 [Indic-like] Disable automatic joiner handling for basic shaping features
Not for Arabic, but for Indic-like scripts.  ZWJ/ZWNJ have special
meanings in those scripts, so let font lookups take full control.

This undoes the regression caused by automatic-joiners handling
introduced two commits ago.

We only disable automatic joiner handling for the "basic shaping
features" of Indic, Myanmar, and SEAsian shapers.  The "presentation
forms" and other features are still applied with automatic-joiner
handling.

This change also changes the test suite failure statistics, such that
a few scripts show more "failures".  The most affected is Kannada.
However, upon inspection, we believe that in most, if not all, of the
new failures, we are producing results superior to Uniscribe.  Hard to
count those!

Here's an example of what is fixed by the recent joiner-handling
changes:

  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers, for future reference:

BENGALI: 353892 out of 354188 tests passed. 296 failed (0.0835714%)
DEVANAGARI: 707336 out of 707394 tests passed. 58 failed (0.00819911%)
GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047983 out of 1048334 tests passed. 351 failed (0.0334817%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-02-14 13:10:54 -05:00
Behdad Esfahbod ec5448667b Add hb_ot_map_feature_flags_t
Code cleanup.  No (intended) functional change.
2013-02-14 12:53:57 -05:00
Behdad Esfahbod e7ffcfafb1 Clean-up add_bool_feature 2013-02-14 11:58:13 -05:00
Behdad Esfahbod 1f91c39677 Indent 2013-02-13 09:38:40 -05:00
Behdad Esfahbod a0cb9f33ee [Indic] Improve base finding in final_reordering
Fixes 5 Malayalam failures!

MALAYALAM: 1048016 out of 1048334 tests passed. 318 failed (0.0303338%)
2013-02-13 09:26:55 -05:00
Behdad Esfahbod f22b7e7778 [Indic] Track base position when reordering things
Ouch, how did things ever work without this?!  The added test that has a
dot-reph as well as a pre-base reordering Ra perfectly demonstrates the
bug (tested with Nirmala font from Win8 for example).  Testing suggests
that Win8 shaper has the *exact* same bug / behavior that we used to
have.  Odd.
2013-02-13 07:32:46 -05:00
Behdad Esfahbod 85c51ec2e1 [Indic] Fix Eyelash Ra with old Devanagari spec 2013-02-12 18:17:39 -05:00
Behdad Esfahbod 63e48bc33b [Indic] Apply 'blwf' before 'half'
This reverts 167b625d98.  It didn't
matter before, but that's going to change with next commit.
2013-02-12 18:02:07 -05:00
Behdad Esfahbod 70d6565711 [Indic] Apply 'vatu' before 'cjct'
This essentially reverts 1d6846db9e,
but that commit is from way back when.  We should be better
following the spec order now again.
2013-02-12 18:02:07 -05:00
Behdad Esfahbod bab02d339f Rename HB_OT_INDIC_OPTIONS env var to HB_OPTIONS
The Myanmar shaper now respects the uniscribe-bug-compatibility
option too.
2013-02-12 15:26:45 -05:00
Behdad Esfahbod 3a83d33ec0 Add South-East Asian shaper
Handles Tai Tham, Cham, and New Tai Lue for now.
2013-02-12 12:14:10 -05:00
Behdad Esfahbod 568000274c Adjust mark advance-width zeroing logic for Myanmar
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.

We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:

  - For Indic, no explicit zeroing happens whatsoever, which
    is the same as before,

  - For Myanmar, zero advance width of glyphs marked as marks
    *in GDEF*, and do that *before* applying GPOS.  This seems
    to be what the new Win8 Myanmar shaper does,

  - For everything else, zero advance width of glyphs that are
    from General_Category=Mn Unicode characters, and do so
    before applying GPOS.  This seems to be what Uniscribe does
    for Latin at least.

With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'.  See preivous
commit.
2013-02-12 09:44:57 -05:00
Behdad Esfahbod 98628cac9f Add Win8-style Myanmar shaper
Myanmar failures down from 51% to 0.00204648%!

MYANMAR: 1123860 out of 1123883 tests passed. 23 failed (0.00204648%)
2013-02-11 14:20:08 -05:00
Behdad Esfahbod 1df5644958 Minor 2013-02-11 14:18:09 -05:00
Behdad Esfahbod 9621e0ba29 [Indic] Fix bug introduced in 8b217f5ac5
Was breaking reph formation logic when the Ra is the only consonant.
Devanagari regression fixed.  Down to 57 failures again.  Ouch.
2013-02-11 12:59:36 -05:00
Behdad Esfahbod ecd454b3cd [Indic] In old-spec shaping, don't move viramas around if seq ends with one
For example: u0c9a u0ccd u0c9a u0ccd with Lohit.  See:

https://bugs.freedesktop.org/show_bug.cgi?id=59118
2013-01-08 18:09:46 -06:00
Behdad Esfahbod 596740db04 [Indic] Insert dottedcircle after a lone Malayalam dot-reph 2012-12-21 19:41:04 -05:00
Behdad Esfahbod 8b217f5ac5 [Indic] Reorder Malayalam dot-reph to after base
Test sequence is simple: U+0D4E,U+0D15.  The doth-reph should be
reordered to after the Ka.

https://bugzilla.redhat.com/show_bug.cgi?id=799565
2012-12-21 15:49:26 -05:00
Behdad Esfahbod 742c4ee97e Minor 2012-12-21 15:35:03 -05:00
Behdad Esfahbod b71b0bd9ee [Indic] Add link to Sinhala split matra section of the Sinhala spec 2012-12-05 19:20:31 -05:00
Behdad Esfahbod 0beb66e3a6 Fix warnings 2012-12-05 19:14:28 -05:00
Behdad Esfahbod 43b6531500 [Indic] Another try to unbreak Sinhala split matras
Just read the comments...
2012-11-16 13:14:26 -08:00
Behdad Esfahbod eba312c8d1 Plumbing to get shape plan and font into complex decompose function
So we can handle Sinhala split matras smartly...  Coming soon.
2012-11-16 12:58:38 -08:00
Behdad Esfahbod 362a990b22 Rename hb_ot_layout_would_substitute_lookup() and hb_ot_layout_substitute_closure_lookup()
To match upcoming API.
2012-11-15 14:57:31 -08:00
Behdad Esfahbod f41dc2d35b Fix undefined behavior in Indic dottedcircle
Chromium Issue 158998:	Conditional jump in harfbuzz-ng
http://code.google.com/p/chromium/issues/detail?id=158998
2012-11-15 10:36:43 -08:00
Behdad Esfahbod 851784f837 Improve shaper selection 2012-11-14 17:53:09 -08:00
Behdad Esfahbod d469fadce8 [Indic] Exchange abort() for assert() 2012-11-14 15:07:36 -08:00
Behdad Esfahbod 865745b5b8 Don't do fallback positioning for Indic and Thai shapers 2012-11-14 13:48:26 -08:00
Behdad Esfahbod dde5506fd9 [Indic] Don't move virama with left matra
This is important for the Sinhala U+0DDA split matra since it decomposes
to U+0DD9,U+0DCA where U+0DD9 is a left matra and U+0DCA is the virama.
We don't want to move the virama with the left matra.
TEST: U+0D9A,U+0DDA

Note that we were already doing this in the Uniscribe bug compatibility
mode.  We now do it all the time.
2012-11-14 11:37:04 -08:00
Behdad Esfahbod 0736915b8e [Indic] Decompose Sinhala split matras the way old HarfBuzz / Pango did
Had to do some refactoring to make this happen...

Under uniscribe bug compatibility mode, we still plit them
Uniscrie-style, but Jonathan and I convinced ourselves that there is no
harm doing this the Unicode way.  This change makes that happen, and
unbreaks free Sinhala fonts.
2012-11-13 12:35:35 -08:00
Behdad Esfahbod 8173f23f3f [Indic] Add config for Myanmar 2012-11-12 18:37:20 -08:00
Behdad Esfahbod c4be991743 Typo 2012-11-12 14:27:33 -08:00
Behdad Esfahbod 56be677781 [Indic] Port 'pref' logic to look into font tables
...instead of using a hardcoded list of Ra characters.
2012-11-12 14:09:40 -08:00
Behdad Esfahbod f2c0f59043 [Indic] Port reph handling logic to look into font features
...instead of using a hardcoded list of Ra characters.
2012-11-12 14:02:02 -08:00
Behdad Esfahbod 6b389ddc36 [Indic] Don't apply 'liga'
Uniscribe doesn't.  And some fonts abuse this feature to get Indic
shaping working in non-complex applications like Adobe's apps.

No change in numbers:

BENGALI: 353897 out of 354188 tests passed. 291 failed (0.0821598%)
DEVANAGARI: 707337 out of 707394 tests passed. 57 failed (0.00805774%)
GUJARATI: 366440 out of 366457 tests passed. 17 failed (0.00463902%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951046 out of 951913 tests passed. 867 failed (0.0910798%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048011 out of 1048334 tests passed. 323 failed (0.0308108%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970557 out of 970573 tests passed. 16 failed (0.00164851%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-11-12 11:02:56 -08:00
Behdad Esfahbod 88d3c98e30 [Indic] Position pre-base reordering Ra after Chillus in Malayalam
The logic for pre-base reordering follows the left matra logic.
We had an exception for Malayalam/Tamil in the left matra repositioning
which was not reflected in pre-base reordering.

Malayalam failures down from 337 to 323.

BENGALI: 353996 out of 354285 tests passed. 289 failed (0.0815727%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048011 out of 1048334 tests passed. 323 failed (0.0308108%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271726 out of 271847 tests passed. 121 failed (0.0445103%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-10-29 16:46:44 -07:00
Behdad Esfahbod 166b5cf7ec [Indic] Find syllables before any features are applied
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions.  That was then causing syllable match failures.  We now
find syllables before any features have been applied.

Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
2012-09-07 14:56:01 -04:00
Behdad Esfahbod 27bd55bd2c [Indic] Tamil does not have half-forms either
The Win7 Tamil font does not realy on this behavior, but the WinXP
version does.  Handle Tamil like Malayalam: Matras always move to
before base.

WinXP Tamil failures went down from 168964 (15.4752%) to 167
(0.0152953%) (two orders of magnitude reduction!).

Included in this is a minor fixup that actually fixed a few tests
with non-Tamil too.  Numbers at:

BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-09-05 15:22:02 -04:00
Behdad Esfahbod b85800f9de [Indic] Implement dotted-circle insertion for broken clusters
No panic, we reeally insert dotted circle when it's absolutely broken.

Fixes most of the dotted-circle cases against Uniscribe. (for Devanagari
fixes 80% of them, for Khmer 70%; the rest look like Uniscribe being
really bogus...)

I had to make a decision.  Apparently Uniscribe adds one dotted circle
to each broken character.  I tried that, but that goes wrong easily with
split matras.  So I made it add only one dotted circle to an entire
broken syllable tail.  As in: "if there was a dotted circle here, this
would have formed a correct cluster."  That works better for split
stuff, and I like it more.
2012-08-31 19:18:20 -04:00
Behdad Esfahbod 327d14ef18 [Indic] Start adding dotted-circle instrastructure 2012-08-31 16:49:34 -04:00
Behdad Esfahbod 23b0e9d7dc [Indic] Fix switch
D'oh.  Was working by pure chance :)).
2012-08-26 14:30:38 -04:00
Behdad Esfahbod b5584ee4be [Indic] For old-spec, match non-zero context
Fixes consonant-position with old-spec Malayalam.  Uniscribe seem to be
doing this.  Fixes below-base La (eg. Pa,H,La) with AnjaliNewLipi.ttf.
Doesn't regress new-spec or other scripts.
2012-08-23 16:26:07 -04:00
Behdad Esfahbod d9b204d3d2 [GSUB] Allow non-zero-context matching in would_apply()
To be used in the next patch.
2012-08-23 16:22:28 -04:00
Behdad Esfahbod 6732d62e78 [Indic] Implement pre-base reordering Ra for old-spec Malayalam
Fixes Pa,H,Ra sequence with AnjaliNewLipi.ttf.
2012-08-23 15:32:12 -04:00