Commit Graph

92 Commits

Author SHA1 Message Date
Behdad Esfahbod dcf4d95fea [khmer] Split off Khmer shaper from Indic
Towards fixing https://github.com/harfbuzz/harfbuzz/issues/667
The Khmer spec is different enough from other Indic ones to require
its own grammar.

No change in functionality.  Test numbers are:

BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366355 out of 366457 tests passed. 102 failed (0.0278341%)
GURMUKHI: 60729 out of 60747 tests passed. 18 failed (0.0296311%)
KANNADA: 951300 out of 951913 tests passed. 613 failed (0.0643966%)
KHMER: 299071 out of 299124 tests passed. 53 failed (0.0177184%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
2018-01-05 14:54:31 +00:00
Behdad Esfahbod 7036f1d22c [ot] Remove shaper name
In ten years we never used them...
2017-10-27 14:42:59 -06:00
Behdad Esfahbod dbdbfe3d7b Use nullptr instead of NULL 2017-10-15 12:11:08 +02:00
Behdad Esfahbod ab8d70ec70 [arabic] Implement Unicode Arabic Mark Ordering Algorithm UTR#53
Fixes https://github.com/behdad/harfbuzz/issues/509
2017-10-04 14:47:10 +02:00
Behdad Esfahbod 57c55ef834 [ot] Improve shaper selection heuristic 2017-10-02 18:21:27 +02:00
Behdad Esfahbod 1535f8c672 Add Unicode 10 scripts 2017-10-02 16:12:18 +02:00
Behdad Esfahbod e888f642db Route Adlam through Arabic shaper
Fixes joined Adlam rendering.

Fixes https://github.com/googlei18n/noto-fonts/issues/828
2017-01-26 14:50:14 -08:00
Behdad Esfahbod e2b878055b Disable OTL processing for Hebrew if GPOS doesn't have Hebrew subtable
New approach to fix this:
69f9fbc420

Previous approach was reverted as it was too broad.  See context:
https://github.com/behdad/harfbuzz/issues/347#issuecomment-267838368

With U+05E9,U+05B8,U+05C1,U+05DC and Arial Unicode, we now (correctly) disable
GDEF and GPOS, so we get results very close to Uniscribe, but slightly different
since our fallback position logic is not exactly the same:

Before:		[gid1166=3+991|gid1142=0+737|gid5798=0+1434]
After:		[gid1166=3+991|gid1142=0@402,-26+0|gid5798=0+1434]
Uniscribe:	[gid1166=3+991|gid1142=0@348,0+0|gid5798=0+1434]
2016-12-22 14:43:23 -06:00
Behdad Esfahbod 30e6e29f0f [indic/use] Move Javanese from Indic shaper to USE
Fixes https://github.com/behdad/harfbuzz/issues/243

With javatext.ttf, the reodering medial Ra gets its advance width
zero'ed in Uniscribe implementation, and the font adds the advance
back.  Our Indic shaper does not do that, but USE does.  So, route
Javanese through USE.  That's what Microsoft does anyway.  Test:

  U+A9A5,U+A9BA

This also seems to fix the following sequence, and variations thereof:

  U+A99F,U+A9C0,U+A9A2,U+A9BF
2016-05-06 15:52:27 +01:00
Behdad Esfahbod 691086f131 Add Unicode 9 beta scripts
These are frozen, so good time to add.
2016-05-06 12:09:53 +01:00
Behdad Esfahbod eaadcbbc53 Remove now-unused mark zeroing BY_UNICODE 2016-02-10 18:29:54 +07:00
Behdad Esfahbod fc06cff40f Remove HB_OT_SHAPE_ZERO_WIDTH_MARKS_DEFAULT
The DEFAULT naming wasn't helpful, so just remove it.
2015-12-17 17:47:35 +00:00
Behdad Esfahbod 136863371c Add new shaper method postprocess_glyphs()
Unused currently.  To be used for Syriac stretch implementation.

https://github.com/behdad/harfbuzz/issues/141
2015-11-05 13:24:15 -08:00
Behdad Esfahbod db1e9cdd41 Retire SEA shaper in favor of USE 2015-07-21 17:46:06 +01:00
Behdad Esfahbod 87dde9c647 [USE] Only use USE shaper if script system is not DFLT
Same logic as Indic and SEA.
2015-07-21 17:31:43 +01:00
Behdad Esfahbod 29832d797f Route misc untested scripts through USE shaper instead of Indic
These were never tested with Indic shaper, and indeed wouldn't work there
because they didn't have their viramas and other config defined.  They are
all also supported by MS through USE, so route them there.
2015-07-21 17:24:18 +01:00
Behdad Esfahbod 52a9577956 [USE] Hook up new scripts to USE shaper
Don't reroute scripts that we were routing to other shapers
before (just yet).
2015-07-21 10:02:04 +01:00
Jonathan Kew f724cc3516 Don't apply Arabic shaping to vertical text. 2015-04-24 12:19:02 -07:00
Roozbeh Pournader 5eb939ddfe Change New Tai Lue shaping engine from SEA to default
This is to reflect the UTC decision to change the encoding model of
New Tai Lue from logical to visual to be similar to Thai, Lao, and
Tai Viet: http://www.unicode.org/L2/L2014/14250.htm#141-C26

The visual encoding is already the current practice of encoding New
Tai Lue on the web anyway:
http://www.unicode.org/L2/L2014/14195-newtailue.txt

Fixes behdad/harfbuzz#66.
2015-01-18 14:39:18 -08:00
Behdad Esfahbod 6f2d9ba52a Add old-Myanmar shaper
Looks like Unsicribe responds to the 'mymr' tag by zeroing marks
GDEF_LATE instead of generic-shaper UNICODE_LATE.  Implement that.

Fixes
Bug 81775 - Incorrect Rendering with harfbuzz-ng myanmar unicode
https://bugs.freedesktop.org/show_bug.cgi?id=81775

Micro-test added based on Padauk.
2014-07-26 19:18:59 -04:00
Behdad Esfahbod 7cfee38276 [unicode7] Route Manichaean and Psalter Pahlavi through Arabic shaper
Still needs update to joining table to fully work.
2014-06-18 12:22:45 -04:00
Behdad Esfahbod f14bb7de63 [ot] Separate out hebrew and tibetan shapers from default
Now default shaper is truly no-op.
2013-12-31 16:49:15 +08:00
Behdad Esfahbod 6300cd7253 [ot] Define HB_OT_SHAPE_ZERO_WIDTH_MARKS_DEFAULT 2013-12-31 16:38:47 +08:00
Behdad Esfahbod 3d6ca0d32e [ot] Simplify normalization_preference again
No shaper has more than one behavior re this, so no need for a callback.
2013-12-31 16:35:37 +08:00
Behdad Esfahbod c98b7183f7 [ot] Add Hangul shaper
Not exhaustively tested, but I think I got the intended logic
right.

The logic can perhaps be simplified.  Maybe we should disabled
normalization with this shaper.  Then again, for now focusing on
correctness.
2013-12-31 16:23:48 +08:00
Behdad Esfahbod 71b4c999a5 Revert "Zero marks by GDEF for Tibetan"
This reverts commit d5bd0590ae.

The reasoning behind that logic was flawed and made under
a misunderstanding of the original problem, and caused
regressions as reported by Jonathan Kew in thread titled
"tibetan marks" in Oct 2013.  Apparently I have had fixed
the original problem with this commit:

  7e08f1258d

So, revert the faulty commit and everything seems to be in good
shape.
2013-10-28 00:43:27 +01:00
Behdad Esfahbod d5bd0590ae Zero marks by GDEF for Tibetan
See:
http://lists.freedesktop.org/archives/harfbuzz/2013-April/003101.html
2013-10-18 18:17:29 +02:00
Behdad Esfahbod 321df83fb4 Route Buginese through the SEA shaper
Both Indic and SEA seem to do it just fine, but SEA is much
simpler.
2013-10-17 18:16:14 +02:00
Behdad Esfahbod 54e6f6c588 Clean up list of Unicode scripts
Rename HB_SCRIPT_CANADIAN_ABORIGINAL to HB_SCRIPT_CANADIAN_SYLLABICS
and a macro for the old name.
2013-08-09 14:36:18 -04:00
Behdad Esfahbod 127daf15e0 Arabic mark width-zeroing regression
Mozilla Bug 873902 - Display Arabic text with diacritics is bad
https://bugzilla.mozilla.org/show_bug.cgi?id=873902
2013-05-20 09:11:35 -04:00
Behdad Esfahbod 587e5753e0 Add note re Hangul shaping 2013-04-05 12:38:58 -04:00
Behdad Esfahbod 3a83d33ec0 Add South-East Asian shaper
Handles Tai Tham, Cham, and New Tai Lue for now.
2013-02-12 12:14:10 -05:00
Behdad Esfahbod 5676d5d527 [Indic] Make sure New Tai Lue works! 2013-02-12 10:31:14 -05:00
Behdad Esfahbod 568000274c Adjust mark advance-width zeroing logic for Myanmar
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.

We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:

  - For Indic, no explicit zeroing happens whatsoever, which
    is the same as before,

  - For Myanmar, zero advance width of glyphs marked as marks
    *in GDEF*, and do that *before* applying GPOS.  This seems
    to be what the new Win8 Myanmar shaper does,

  - For everything else, zero advance width of glyphs that are
    from General_Category=Mn Unicode characters, and do so
    before applying GPOS.  This seems to be what Uniscribe does
    for Latin at least.

With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'.  See preivous
commit.
2013-02-12 09:44:57 -05:00
Behdad Esfahbod 98628cac9f Add Win8-style Myanmar shaper
Myanmar failures down from 51% to 0.00204648%!

MYANMAR: 1123860 out of 1123883 tests passed. 23 failed (0.00204648%)
2013-02-11 14:20:08 -05:00
Behdad Esfahbod 16c914c2a6 [Indic] One more try at unbreaking Khmer fonts
See comments and discussion on the list.
2012-11-21 01:04:15 -05:00
Behdad Esfahbod eba312c8d1 Plumbing to get shape plan and font into complex decompose function
So we can handle Sinhala split matras smartly...  Coming soon.
2012-11-16 12:58:38 -08:00
Behdad Esfahbod 851784f837 Improve shaper selection 2012-11-14 17:53:09 -08:00
Behdad Esfahbod 0f80a89de9 Don't route Kharoshthi through the Indic shaper
It's a simple, right-to-left, script.
2012-11-14 15:05:19 -08:00
Behdad Esfahbod 865745b5b8 Don't do fallback positioning for Indic and Thai shapers 2012-11-14 13:48:26 -08:00
Behdad Esfahbod 981748cb2e [Indic] If Khmer fonts have a 'liga' feature, use generic shaper
Seems to produce more coherent results than trying the Indic shaper on
them.  I'm looking at you, Kh-* fonts...
2012-11-14 13:38:16 -08:00
Behdad Esfahbod 0736915b8e [Indic] Decompose Sinhala split matras the way old HarfBuzz / Pango did
Had to do some refactoring to make this happen...

Under uniscribe bug compatibility mode, we still plit them
Uniscrie-style, but Jonathan and I convinced ourselves that there is no
harm doing this the Unicode way.  This change makes that happen, and
unbreaks free Sinhala fonts.
2012-11-13 12:35:35 -08:00
Behdad Esfahbod 9e92978c8a [Indic] Route "new" Myanmar tag through the Indic shaper
Windows 8 adds a Myanmar shaper using the 'mym2' tag.  Route that
through the Indic shaper.  It's still very broken, but at least this
does NOT break old-style Myanmar shaping using the generic shaper.
2012-11-12 18:36:10 -08:00
Behdad Esfahbod 5ab3855f81 Choose shaper based on chosen OT script tag
For Arabic and Indic shapers, if the font doesn't have a script system
for the script, use default shaper.

Make an exception for Arabic script since we have fallback logic for
that one.
2012-11-12 18:27:42 -08:00
Behdad Esfahbod 9b37b4c580 Make planner available to complex shaper choosing logic 2012-11-12 18:23:38 -08:00
Behdad Esfahbod 43149afbc0 Route MEETEI_MAYEK through the Indic shaper
Since it has a couple of left-"matras".
2012-11-12 13:34:17 -08:00
Behdad Esfahbod 3ba7bc14ea Implement 'Phags-pa shaping
Through the Arabic shaper.  It's similar to Mongolian.
2012-11-01 20:05:04 -07:00
Behdad Esfahbod 9f9f04c222 [OT] Unbreak Thai shaping and fallback Arabic shaping
The merger of normalizer and glyph-mapping broke shapers that
modified text stream.  Unbreak them by adding a new preprocess_text
shaping stage that happens before normalizing/cmap and disallow
setup_mask modification of actual text.
2012-08-11 18:34:13 -04:00
Behdad Esfahbod cd0c6e148f Shuffle buffer variable allocations around
To room for more allocations, coming.
2012-08-09 21:48:55 -04:00
Behdad Esfahbod a8c6da90f4 [OT] Add per-complex-shaper shape_plan data
Hookup some Indic data to it.  More to come.
2012-08-02 10:46:34 -04:00