Commit Graph

105 Commits

Author SHA1 Message Date
Behdad Esfahbod aa7044de0c Generalize flags types 2015-11-04 16:25:57 -08:00
Behdad Esfahbod 7793aad946 Normalize various spaces to space if font doesn't support
This resurrects the space fallback feature, after I disabled
the compatibility decomposition.  Now I can release HarfBuzz
again without breaking Pango!

It also remembers which space character it was, such that later
on we can approximate the width of this particular space
character.  That part is not implemented yet.

We normalize all GC=Zs chars except for U+1680 OGHA SPACE MARK,
which is better left alone.
2015-11-04 15:51:41 -08:00
Behdad Esfahbod 9ac4b9656d Add Unicode space category
Unused so far.
2015-11-04 14:19:25 -08:00
Behdad Esfahbod 8249ec3f86 Make top-byte of unicode_props available to be used differently per-GC 2015-11-04 13:26:17 -08:00
Behdad Esfahbod cc5d3a3388 Towards using top-byte of unicode-props for more things 2015-11-04 13:22:33 -08:00
Behdad Esfahbod 2f38dde5a1 Add _hb_glyph_info_is_unicode_mark()
Unused right now.
2015-11-04 13:17:33 -08:00
Behdad Esfahbod 90d75f93bb Tighten ccc-setting a bit and document it 2015-11-03 12:58:12 -08:00
Behdad Esfahbod ed2024ef93 [perf] Micro-optimize 2015-11-02 18:03:38 -08:00
Behdad Esfahbod 76a5310a83 Remove irrelevant comment
I tried moving the is_default_ignorable() function to an INTERNAL
function.  That made the binary size grow by 5k AND things got a
tad bit slower!
2015-11-02 17:52:45 -08:00
Behdad Esfahbod 9382c471ea Combine unicode_props0/1 into a uint16
Slightly faster.  In prep for more changes.
2015-11-02 17:36:51 -08:00
Behdad Esfahbod 7127718545 [perf] Only call combining_class() for marks
Saves some time.  Also preparing for reusing the ccc byte for other stuff.
2015-11-02 17:27:48 -08:00
Behdad Esfahbod df6cb84449 Merge branch 'use' 2015-07-26 19:40:55 +02:00
Behdad Esfahbod 0f98fe88f4 [ot] Search globally for 'vert' feature if not found in specified script/lang
Fixes https://github.com/behdad/harfbuzz/issues/63
2015-07-23 11:52:11 +01:00
Behdad Esfahbod 4ba796b26e Refactor _hb_glyph_info_is_default_ignorable() 2015-07-22 17:41:31 +01:00
Behdad Esfahbod ac596511a8 Add foreach_syllable
Use it in USE.
2015-07-22 11:54:02 +01:00
Behdad Esfahbod 595936ec25 [USE] Hook of rphf and pref custom processing
Still no reordering.
2015-07-21 14:15:35 +01:00
Behdad Esfahbod 241eac9559 Hide internals of lookup accelerators 2015-02-25 15:43:25 -08:00
Behdad Esfahbod 395b35903e Avoid accessing layout tables at face destruction
"Fixes" https://bugs.freedesktop.org/show_bug.cgi?id=86300

Based on discussion someone else who had a similar issue, most probably
the user is releasing FT_Face before destructing hb_face_t / hb_font_t.
While that's a client bug, and while we can (and should) use FreeType
refcounting to help avoid that, it happens that we were accessing
the table when we didn't really have to.  Avoid that.
2014-12-28 16:03:26 -08:00
Behdad Esfahbod 8f3eebf7ee Make sure gsubgpos buffer vars are available during fallback_position
Add buffer var allocation asserts to a few key places.
2014-08-02 19:07:49 -04:00
Behdad Esfahbod 7e8c389546 Minor warnings fixes
Some systems insist on -Wmissing-field-initializers.  We have too many,
by design.  Fix a few easy ones.
2014-07-25 11:23:17 -04:00
Behdad Esfahbod 7627100f42 Mark unsigned integer literals with the u suffix
Simplifies hb_in_range() calls as the type can be inferred.
The rest is obsessiveness, I admit.
2014-07-11 16:22:13 -04:00
Behdad Esfahbod 04dc52fa15 [indic] Recover OT_H undergone ligation and multiplication
Sometimes font designers form half/pref/etc consonant forms
unconditionally and then undo that conditionally.  Try to
recover the OT_H classification in those cases.

No test number changes expected.
2014-06-09 14:20:06 -04:00
Behdad Esfahbod 832a6f99b3 [indic] Don't reorder reph/pref if ligature was expanded
Normally if you want to, say, conditionally prevent a 'pref', you
would use blocking contextual matching.  Some designers instead
form the 'pref' form, then undo it in context.  To detect that
we now also remember glyphs that went through MultipleSubst.

In the only place that this is used, Uniscribe seems to only care
about the "last" transformation between Ligature and Multiple
substitions.  Ie. if you ligate, expand, and ligate again, it
moves the pref, but if you ligate and expand it doesn't.  That's
why we clear the MULTIPLIED bit when setting LIGATED.

Micro-test added.  Test: U+0D2F,0D4D,0D30 with font from:

[1]
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
2014-06-05 20:36:01 -04:00
Behdad Esfahbod 6faff8e413 Add static storage classifier to inline functions
Before we were just relying on the compiler inlining them and not
leaving a trace in our public API.  Try to fix.  Hopefully not
breaking anyone's build.
2014-04-28 14:30:44 -07:00
Behdad Esfahbod 2a8c49ade0 Remove unnecessary includes 2013-12-11 20:24:20 -05:00
Behdad Esfahbod 0f3fe37fcc Comment 2013-10-18 19:14:22 +02:00
Behdad Esfahbod 8570273414 [otlayout] Add _hb_glyph_info_substituted()
Currently unused.
2013-10-18 11:25:24 +02:00
Behdad Esfahbod a1f7b28561 [otlayout] Switch over from old is_a_ligature() to IS_LIGATED
Impact should be minimal and positive.
2013-10-18 11:25:24 +02:00
Behdad Esfahbod 09675a8115 [otlayout] Add HB_OT_LAYOUT_GLYPH_PROPS_LIGATED
Currently unused.
2013-10-18 11:25:24 +02:00
Behdad Esfahbod 05ad6b50ac [otlayout] Add HB_OT_LAYOUT_GLYPH_PROPS_SUBSTITUTED
Currently unused.
2013-10-18 11:21:15 +02:00
Behdad Esfahbod 101303dbf7 [otlayout] More shuffling around 2013-10-18 11:21:15 +02:00
Behdad Esfahbod 91689de260 [otlayout] Add _hb_glyph_info_set_glyph_props()
No functional change.
2013-10-18 11:21:15 +02:00
Behdad Esfahbod 3ddf892b53 [otlayout] Renaming 2013-10-18 11:21:15 +02:00
Behdad Esfahbod 2e96d2c6ee [otlayout] More shuffling 2013-10-18 11:21:15 +02:00
Behdad Esfahbod 469524692b [otlayout] Code shuffling 2013-10-18 11:21:15 +02:00
Behdad Esfahbod 11fb16cb84 Use unsigned enums for mask types 2013-10-18 11:21:11 +02:00
Behdad Esfahbod 03058c3d1e [otlayout] Remove two unused HB_OT_LAYOUT_GLYPH_PROPS_* values 2013-10-17 20:55:34 +02:00
Behdad Esfahbod 941b699204 [otlayout] Remove unused HB_OT_LAYOUT_GLYPH_PROPS_UNCLASSIFIED 2013-10-17 20:47:33 +02:00
Behdad Esfahbod c52ddab72e [arabic] Make ZWJ prevent ligatures instead of facilitating it
Unicode 6.2.0 Section 16.2 / Figure 16.3 says:

"For backward compatibility, between Arabic characters a ZWJ acts just
like the sequence <ZWJ, ZWNJ, ZWJ>, preventing a ligature from forming
instead of requesting the use of a ligature that would not normally be
used. As a result, there is no plain text mechanism for requesting the
use of a ligature in Arabic text."

As such, we flip internal zwj to zwnj flags for GSUB matching, which
means it will block ligation in all features, unless the font
explicitly matches U+200D glyph.  This doesn't affect joining behavior.
2013-10-16 13:42:38 +02:00
Behdad Esfahbod 1a31f9f820 [otlayout] Minor 2013-10-16 13:42:18 +02:00
Behdad Esfahbod 7e08f1258d Don't zero advance of mark-non-mark ligatures
If there's a mark ligating forward with non-mark, they were
inheriting the GC of the mark and later get advance-zeroed.
Don't do that if there's any non-mark glyph in the ligature.

Sample test: U+1780,U+17D2,U+179F with Kh-Metal-Chrieng.ttf

Also:
Bug 58922 - Issue with mark advance zeroing in generic shaper
2013-05-27 14:50:00 -04:00
Behdad Esfahbod bac1dd6a0f [OTLayout] Refactor a bit more 2013-05-04 16:04:04 -04:00
Behdad Esfahbod 45fd9424c7 [OTLayout] Add hb_ot_layout_lookup_accelerator_t 2013-05-04 16:04:03 -04:00
Behdad Esfahbod a8cf7b43fa [Indic] Futher adjust ZWJ handling in Indic-like shapers
After the Ngapi hackfest work, we were assuming that fonts
won't use presentation features to choose specific forms
(eg. conjuncts).  As such, we were using auto-joiner behavior
for such features.  It proved to be troublesome as many fonts
used presentation forms ('pres') for example to form conjuncts,
which need to be disabled when a ZWJ is inserted.

Two examples:

	U+0D2F,U+200D,U+0D4D,U+0D2F with kartika.ttf
	U+0995,U+09CD,U+200D,U+09B7 with vrinda.ttf

What we do now is to never do magic to ZWJ during GSUB's main input
match for Indic-style shapers.  Note that backtrack/lookahead are still
matched liberally, as is GPOS.  This seems to be an acceptable
compromise.

As to the bug that initially started this work, that one needs to
be fixed differently:

  Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not
  provide same results as Windows8
  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers:

BENGALI: 353689 out of 354188 tests passed. 499 failed (0.140886%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048102 out of 1048334 tests passed. 232 failed (0.0221304%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-03-19 06:22:06 -04:00
Behdad Esfahbod 57542d7f41 Minor 2013-02-21 15:55:49 -05:00
Behdad Esfahbod cfc507c543 [Indic-like] Disable automatic joiner handling for basic shaping features
Not for Arabic, but for Indic-like scripts.  ZWJ/ZWNJ have special
meanings in those scripts, so let font lookups take full control.

This undoes the regression caused by automatic-joiners handling
introduced two commits ago.

We only disable automatic joiner handling for the "basic shaping
features" of Indic, Myanmar, and SEAsian shapers.  The "presentation
forms" and other features are still applied with automatic-joiner
handling.

This change also changes the test suite failure statistics, such that
a few scripts show more "failures".  The most affected is Kannada.
However, upon inspection, we believe that in most, if not all, of the
new failures, we are producing results superior to Uniscribe.  Hard to
count those!

Here's an example of what is fixed by the recent joiner-handling
changes:

  https://bugs.freedesktop.org/show_bug.cgi?id=58714

New numbers, for future reference:

BENGALI: 353892 out of 354188 tests passed. 296 failed (0.0835714%)
DEVANAGARI: 707336 out of 707394 tests passed. 58 failed (0.00819911%)
GUJARATI: 366262 out of 366457 tests passed. 195 failed (0.0532122%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 950680 out of 951913 tests passed. 1233 failed (0.129529%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047983 out of 1048334 tests passed. 351 failed (0.0334817%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271539 out of 271847 tests passed. 308 failed (0.113299%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2013-02-14 13:10:54 -05:00
Behdad Esfahbod 0b45479198 [OTLayout] Add fine-grained control over ZWJ matching
Not used yet.  Next commit...
2013-02-14 13:02:13 -05:00
Behdad Esfahbod 607feb7cff [OTLayout] Ignore default-ignorables when matching GSUB/GPOS
When matching lookups, be smart about default-ignorable characters.
In particular:

Do nothing specific about ZWNJ, but for the other default-ignorables:

If the lookup in question uses the ignorable character in a sequence,
then match it as we used to do.  However, if the sequence match will
fail because the default-ignorable blocked it, try skipping the
ignorable character and continue.

The most immediate thing it means is that if Lam-Alef forms a ligature,
then Lam-ZWJ-Alef will do to.  Finally!

One exception: when matching for GPOS, or for backtrack/lookahead of
GSUB, we ignore ZWNJ too.  That's the right thing to do.

It certainly is possible to build fonts that this feature will result
in undesirable glyphs, but it's hard to think of a real-world case
that that would happen.

This *does* break Indic shaping right now, since Indic Unicode has
specific rules for what ZWJ/ZWNJ mean, and skipping ZWJ is breaking
those rules.  That will be fixed in upcoming commits.
2013-02-14 12:57:50 -05:00
Behdad Esfahbod 568000274c Adjust mark advance-width zeroing logic for Myanmar
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.

We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:

  - For Indic, no explicit zeroing happens whatsoever, which
    is the same as before,

  - For Myanmar, zero advance width of glyphs marked as marks
    *in GDEF*, and do that *before* applying GPOS.  This seems
    to be what the new Win8 Myanmar shaper does,

  - For everything else, zero advance width of glyphs that are
    from General_Category=Mn Unicode characters, and do so
    before applying GPOS.  This seems to be what Uniscribe does
    for Latin at least.

With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'.  See preivous
commit.
2013-02-12 09:44:57 -05:00
Behdad Esfahbod 5a08ecf920 Implement hb_ot_layout_get_glyph_class() 2012-11-16 13:34:29 -08:00