Commit Graph

2900 Commits

Author SHA1 Message Date
Behdad Esfahbod 407fc12466 [OTLayout] Remove bogus caching of glyph property 2013-02-13 11:13:06 -05:00
Behdad Esfahbod 6b1e3502e2 Remember ZWNJ
To be used in upcoming changes.
2013-02-13 11:02:54 -05:00
Behdad Esfahbod 1f91c39677 Indent 2013-02-13 09:38:40 -05:00
Behdad Esfahbod a0cb9f33ee [Indic] Improve base finding in final_reordering
Fixes 5 Malayalam failures!

MALAYALAM: 1048016 out of 1048334 tests passed. 318 failed (0.0303338%)
2013-02-13 09:26:55 -05:00
Behdad Esfahbod f22b7e7778 [Indic] Track base position when reordering things
Ouch, how did things ever work without this?!  The added test that has a
dot-reph as well as a pre-base reordering Ra perfectly demonstrates the
bug (tested with Nirmala font from Win8 for example).  Testing suggests
that Win8 shaper has the *exact* same bug / behavior that we used to
have.  Odd.
2013-02-13 07:32:46 -05:00
Behdad Esfahbod bc11de144c [SEA] Don't zero any mark advances
Keep the logic simple, easier to explain to font developers.
2013-02-13 05:59:06 -05:00
Behdad Esfahbod 0291a65286 Further adjust mark advance zeroing
This is a followup to 568000274c.
Looks like in the Latin shaper, Uniscribe zeroes all Unicode NSM
advances *after* GPOS, not before.  Match that.

Can be tested using DejaVu Sans Mono, since that font has GPOS
rules to zero the mark advances on its own.
2013-02-13 05:57:24 -05:00
Behdad Esfahbod 85c51ec2e1 [Indic] Fix Eyelash Ra with old Devanagari spec 2013-02-12 18:17:39 -05:00
Behdad Esfahbod 63e48bc33b [Indic] Apply 'blwf' before 'half'
This reverts 167b625d98.  It didn't
matter before, but that's going to change with next commit.
2013-02-12 18:02:07 -05:00
Behdad Esfahbod 70d6565711 [Indic] Apply 'vatu' before 'cjct'
This essentially reverts 1d6846db9e,
but that commit is from way back when.  We should be better
following the spec order now again.
2013-02-12 18:02:07 -05:00
Behdad Esfahbod f9b660534c [Myanmar] Use master Indic table for syllable data 2013-02-12 16:13:56 -05:00
Behdad Esfahbod a6c1e040e5 Improve check for Windows platforms
Instead of checking for compiler, check for platform.
2013-02-12 15:31:58 -05:00
Behdad Esfahbod 9e1f80ab3e [SEA] Treat Consonant_Final like Consonant_Medial 2013-02-12 15:28:21 -05:00
Behdad Esfahbod bab02d339f Rename HB_OT_INDIC_OPTIONS env var to HB_OPTIONS
The Myanmar shaper now respects the uniscribe-bug-compatibility
option too.
2013-02-12 15:26:45 -05:00
Behdad Esfahbod 3a83d33ec0 Add South-East Asian shaper
Handles Tai Tham, Cham, and New Tai Lue for now.
2013-02-12 12:14:10 -05:00
Behdad Esfahbod 5676d5d527 [Indic] Make sure New Tai Lue works! 2013-02-12 10:31:14 -05:00
Behdad Esfahbod 568000274c Adjust mark advance-width zeroing logic for Myanmar
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.

We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:

  - For Indic, no explicit zeroing happens whatsoever, which
    is the same as before,

  - For Myanmar, zero advance width of glyphs marked as marks
    *in GDEF*, and do that *before* applying GPOS.  This seems
    to be what the new Win8 Myanmar shaper does,

  - For everything else, zero advance width of glyphs that are
    from General_Category=Mn Unicode characters, and do so
    before applying GPOS.  This seems to be what Uniscribe does
    for Latin at least.

With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'.  See preivous
commit.
2013-02-12 09:44:57 -05:00
Behdad Esfahbod 99749ca8e0 [Myanmar] Add note re Uniscribe NOT applying 'mark' 2013-02-12 09:44:35 -05:00
Behdad Esfahbod b842780138 Minor 2013-02-11 17:02:17 -05:00
Behdad Esfahbod 419c933ed1 [Myanmar] Fix handling of Punctuation and Symbol types
Testing with "clusters" now on par with testing without them.  15
failures both.
2013-02-11 16:16:16 -05:00
Behdad Esfahbod 0572c1410a [Myanmar] Fixup handling of joiners and GB characters 2013-02-11 16:16:07 -05:00
Behdad Esfahbod 1c8654ead4 [Myanmar] Prevent reordering between Asat and Dot below
Implemented as a hack for now.  Myanmar failures down from 23 to 15.

MYANMAR: 1123868 out of 1123883 tests passed. 15 failed (0.00133466%)

The remaining 15 cases are all where the syllable is wrong according to
the OpenType spec.  We insert dottedcircle.  Uniscribe fails to do that,
but it also fails to reorder the prebase-reordering medial-Ra.  So it
gets it wrong.
2013-02-11 14:28:59 -05:00
Behdad Esfahbod 98628cac9f Add Win8-style Myanmar shaper
Myanmar failures down from 51% to 0.00204648%!

MYANMAR: 1123860 out of 1123883 tests passed. 23 failed (0.00204648%)
2013-02-11 14:20:08 -05:00
Behdad Esfahbod 1df5644958 Minor 2013-02-11 14:18:09 -05:00
Behdad Esfahbod 54f7b4d9ec [OTLayout] Respect lookup-flags skipping over non-mark glyphs
Before, when matching ligatures, we never skipping over base / liga
glyphs even if that was what the LookupFlags asked for.

Fixed now.  We carefully reviewed all instances of this, and tested with
Amiri as well as some Indic scripts, and are confident that this should
NOT break anyone's fonts.  It's also how Uniscribe does it, from what
we can tell.
2013-02-11 13:27:17 -05:00
Behdad Esfahbod 9082efc4aa [OTLayout] s/mark_skipping/skipping/
In aticipation of upcoming changes.
2013-02-11 13:14:56 -05:00
Behdad Esfahbod 9621e0ba29 [Indic] Fix bug introduced in 8b217f5ac5
Was breaking reph formation logic when the Ra is the only consonant.
Devanagari regression fixed.  Down to 57 failures again.  Ouch.
2013-02-11 12:59:36 -05:00
Behdad Esfahbod 6e74c64211 Improve normalization heuristic
Before, for most scripts, we were not trying to recompose two characters
if the second one had ccc=0.  That fails for Myanmar where U+1026
decomposes to U+1025,U+102E, both of which have ccc=0.  However, we do
want to try to recompose those.  We now check whether the second is a
mark, using general category instead.

At the same time, remove optimization that was conflicting with this.

[Let the Ngapi hackfest begin!]
2013-02-11 12:59:00 -05:00
Behdad Esfahbod adff377815 Revert "[Indic] Import ragel-generated Indic machine in git"
This reverts commit fab7a71f11.

Conflicts:
	src/hb-ot-shape-complex-indic-machine.hh

Keeping that generated file in-tree causes problems with processes like
tinderbox  that automatically fetch and build harfbuzz.  It's harder to
bootstrap harfbuzz now (as was previously), but I'm willing to give this
another chance and see how it goes.
2013-02-06 23:43:27 -05:00
Behdad Esfahbod 9de5f98f36 Bug 60035 - intermittent make install failure on hb-version.h 2013-02-04 23:28:22 -05:00
Behdad Esfahbod 6c1e8b614c Bug 59637 - check-exported-symbols.sh & check-internal-symbols.sh fail on mips/mipsel 2013-02-04 23:24:16 -05:00
Behdad Esfahbod bafdf3d983 Merge check-internal-symbols.sh and check-exported-symbols.sh 2013-02-04 23:06:50 -05:00
Behdad Esfahbod e9171af55c Bug 60053 - hb-common.cc:181:6: warning: ‘void free_langs()’ defined but not used 2013-01-29 22:45:00 -05:00
Behdad Esfahbod eb45c0a2fb Minor 2013-01-16 22:07:50 -06:00
Behdad Esfahbod 52c8d1226f Minor 2013-01-14 13:51:46 -06:00
Behdad Esfahbod f88d3bd7e4 Fix build with Sun compiler 2013-01-14 00:33:58 -06:00
Behdad Esfahbod 08b29c0809 Revert "Minor"
This reverts commit 0a49235701.

Enables code on more compilers.
2013-01-14 00:32:12 -06:00
Behdad Esfahbod e78463211e Fix linking with non-gcc compilers 2013-01-14 00:27:21 -06:00
Behdad Esfahbod de649f07f1 Fix residuals from fontconfig changes 2013-01-14 00:26:43 -06:00
Behdad Esfahbod 2dcb333f52 Add atomic ops for Solaris
Based on fontconfig patch from Raimund Steger.
2013-01-10 01:18:10 -06:00
Behdad Esfahbod 69fd6e157c Fix crasher during multi-thread shaper data creation
Sample crash:

0  _hb_graphite2_shaper_face_data_destroy (data=0xffffffffffffffff)
    at ../../src/hb-graphite2.cc:129
1  0x00007ffff4271d7d in hb_graphite2_shaper_face_data_ensure (
    face=<optimized out>) at ../../src/hb-shaper-list.hh:35
2  hb_shape_plan_plan (shaper_list=<optimized out>, num_user_features=0,
    user_features=0x0, shape_plan=0xf7b490) at ../../src/hb-shaper-list.hh:35
3  hb_shape_plan_create (face=<optimized out>, props=<optimized out>,
    user_features=0x0, num_user_features=0, shaper_list=<optimized out>)
    at ../../src/hb-shape-plan.cc:108
4  0x00007ffff4272c93 in hb_shape_plan_create_cached (face=0x10cf2b0,
    props=0x11980d8, user_features=0x0, num_user_features=<optimized out>,
    shaper_list=0x0) at ../../src/hb-shape-plan.cc:283
2013-01-10 00:03:36 -06:00
Behdad Esfahbod ecd454b3cd [Indic] In old-spec shaping, don't move viramas around if seq ends with one
For example: u0c9a u0ccd u0c9a u0ccd with Lohit.  See:

https://bugs.freedesktop.org/show_bug.cgi?id=59118
2013-01-08 18:09:46 -06:00
Behdad Esfahbod e95e031b56 [GPOS] If an Anchor offset is NULL, return false
If in a MarkPos table, a base has no anchor for a particular mark class,
return NULL such that the subsequent subtables get a chance at it.

Test case:
hb-shape ./EBGaramond12-Regular.otf ἂ --features="ss20","smcp"
2013-01-08 16:17:06 -06:00
Behdad Esfahbod 1172dc7362 Rename hb_buffer_clear() to hb_buffer_clear_contents()
The previous name was clashing with harfbuzz.old.  There are systems
that need to link both...

Clash-free now again.
2013-01-07 16:46:37 -06:00
Behdad Esfahbod 7b912c1936 Remove a few unnecessary const's
Apparently helps with MSVC compilation.
2013-01-04 01:25:27 -06:00
Behdad Esfahbod f0c82410db [OTLayout] Always collect default language system in collect_lookups
Not sure if this is the most desired behavior.  It's the most easily
defined though.
2013-01-03 00:07:16 -06:00
Behdad Esfahbod 15e9e4e1dd [OTLayout] Fix feature iteration in collect_lookups
Previous logic was just wrong.
2013-01-03 00:04:40 -06:00
Behdad Esfahbod 733e8c0d7b [OTLayout] Whitespace 2013-01-03 00:00:23 -06:00
Behdad Esfahbod d37ae38047 [OTLayout] Handle required_feature_index in collect_lookups 2013-01-02 23:57:36 -06:00
Behdad Esfahbod 11fba79ee9 [OTLayout] Fix various introspection issues with ClassDef's
As reported by Jonathan Kew.
2013-01-02 23:36:37 -06:00
Behdad Esfahbod 7b1b720a8d Protect sets in-error from further modication
Fixes test-set.c
2013-01-02 23:02:59 -06:00
Behdad Esfahbod 8165f2765b [tests] Start adding tests for hb-set.h
Fails now.  Fixing.
2013-01-02 22:50:36 -06:00
Behdad Esfahbod 11d2956553 Minor 2013-01-02 17:41:27 -06:00
Behdad Esfahbod 596740db04 [Indic] Insert dottedcircle after a lone Malayalam dot-reph 2012-12-21 19:41:04 -05:00
Behdad Esfahbod 6f69fa283e Minor 2012-12-21 16:51:15 -05:00
Behdad Esfahbod f4abcbfc62 Minor 2012-12-21 16:48:51 -05:00
Behdad Esfahbod 8b217f5ac5 [Indic] Reorder Malayalam dot-reph to after base
Test sequence is simple: U+0D4E,U+0D15.  The doth-reph should be
reordered to after the Ka.

https://bugzilla.redhat.com/show_bug.cgi?id=799565
2012-12-21 15:49:26 -05:00
Behdad Esfahbod 742c4ee97e Minor 2012-12-21 15:35:03 -05:00
Behdad Esfahbod 044d385276 Bug 58498 - Tests fail with gold linker on ARM 2012-12-19 13:00:16 -05:00
Behdad Esfahbod b68b86daf1 Use C++ linker if ICU is disabled
Bug 54948 - Undefined symbols: "operator delete(void*)" "operator
new(unsigned long)" "___cxa_pure_virtual"
2012-12-18 20:39:40 -05:00
Behdad Esfahbod 1ffd23cb47 [OTLayout] Limit alternate-location FeatureParams to 'size' feature 2012-12-17 23:29:15 -05:00
Behdad Esfahbod efe252e600 [OTLayout] Fix 'size' featureParams implementation
Looks at alternate location now.
2012-12-17 23:25:57 -05:00
Behdad Esfahbod e77b442574 [OTLayout] Fix tracing 2012-12-17 18:42:59 -05:00
Behdad Esfahbod 9b54562d63 [OTLayout] Towards correct FeatureParams handling 2012-12-17 13:55:36 -05:00
Behdad Esfahbod 87e43b7f2b [OTLayout] Wire tag and list start all the way to Feature
To fix FeatureParam issues.  No actual fix yet, just plumbing.
2012-12-14 17:48:23 -05:00
Behdad Esfahbod 85bc44b90a [OTLayout] More 'size' feature sanity checking
We still don't look for the old incorrect place of the featureParams.
I'll wait till someone actually complains about it...
2012-12-12 11:38:49 -05:00
Behdad Esfahbod 0bae50a36f [OTLayout] Add FeatureParamsCharacterVariants struct
No API yet.
2012-12-11 16:29:24 -05:00
Behdad Esfahbod bd61bc13ea [OTLayout] Add UINT24 type 2012-12-11 16:01:07 -05:00
Behdad Esfahbod 9cf7f9d4f6 Make test-size-params write size in points 2012-12-11 14:31:13 -05:00
Behdad Esfahbod 372fe2b67b [OTLayout] Make hb_ot_layout_get_size_params() do some checks 2012-12-11 14:30:57 -05:00
Behdad Esfahbod 875a5cbc9c [OTLayout] Change hb_ot_layout_get_params() API
And add implementation for StylisticSet UINameID.  No API yet.
2012-12-11 14:17:01 -05:00
Behdad Esfahbod 0e9f0f3e5f Fix atomic ops on iOS
Patch from John Ralls.
2012-12-10 15:25:21 -05:00
Behdad Esfahbod 5f9569c139 Make older MSVC happy 2012-12-10 13:39:06 -05:00
Behdad Esfahbod 071d5b831e Work around missing OSAtomicCompareAndSwapPtrBarrier() on OS X 10.4
Not sure how to handle iOS.
2012-12-10 00:57:00 -05:00
Behdad Esfahbod e923e6487b [coretext] Fixed typo
Oops.  Thanks Khaled for catching this.
2012-12-09 19:39:40 -05:00
Behdad Esfahbod 9a8395824b [coretext] Add hb_coretext_face_get_cg_font()
Not sure if it's useful, but it was missing.
2012-12-09 18:47:36 -05:00
Behdad Esfahbod 8611235688 [coretext] Remove hack around GlyphID
We not namespace our types, so the hack is not needed anymore.
2012-12-09 18:47:09 -05:00
Behdad Esfahbod 8e58459aeb [graphite2] "Update to new API"
Part of patch from Martin Hosken.  I believe he knows what he's doing
:).
2012-12-09 18:45:47 -05:00
Behdad Esfahbod a5a4ab3846 [graphite2] Add hb_graphite2_face_get_gr_face and hb_graphite2_font_get_gr_font
Based on patch from Martin Hosken.  I believe it returns NULL if the
font doesn't have graphite tables, but have not tested.
2012-12-09 18:44:41 -05:00
Behdad Esfahbod 737ba15644 [graphite2] Preload all tables
Part of patch from Martin Hosken.
2012-12-09 18:43:03 -05:00
Behdad Esfahbod 0ae6dbf1b4 Minor 2012-12-09 18:37:38 -05:00
Behdad Esfahbod 3fe5c159d3 Remove excess return
Oops!
2012-12-09 18:20:19 -05:00
Behdad Esfahbod ba2d543004 Update OT language tags
Patch from Roozbeh Pournader.
2012-12-08 19:28:41 -05:00
Behdad Esfahbod aba38173c6 Minor 2012-12-05 19:54:48 -05:00
Behdad Esfahbod 61865745e3 Fix test with gold linker
Bug 57633 - Symbol tests should ignore __bss_start, _edata, _end
2012-12-05 19:42:10 -05:00
Behdad Esfahbod b71b0bd9ee [Indic] Add link to Sinhala split matra section of the Sinhala spec 2012-12-05 19:20:31 -05:00
Behdad Esfahbod 0beb66e3a6 Fix warnings 2012-12-05 19:14:28 -05:00
Behdad Esfahbod 130bb3f614 Rename VOID and void_t to have HarfBuzz prefix
Fixes build on Windows.  Ouch!
2012-12-05 16:49:47 -05:00
Behdad Esfahbod 4a350d0eb2 [OTLayout] Reuse context in collect_glyphs() recursion 2012-12-04 17:13:09 -05:00
Behdad Esfahbod 8303593ba1 Minor
Use pointers instead of references, in preparation for upcoming change.
2012-12-04 17:08:41 -05:00
Behdad Esfahbod 1bcfa06d11 [OTLayout] Don't recurse in collect_glyphs() for GPOS 2012-12-04 16:58:09 -05:00
Behdad Esfahbod b5e04c7dc6 [ucdn] Match upstream changes 2012-12-04 15:57:02 -05:00
Behdad Esfahbod 7babfe5a79 Move object mutext into the user-data array
We are not using it for anything lse it seems.
2012-12-04 00:35:54 +02:00
Behdad Esfahbod a190011477 Remove unused functions 2012-12-04 00:29:35 +02:00
Behdad Esfahbod 88b7564183 "Update" to Unicode 6.2.0 tables
Nothing changed...
2012-12-02 19:14:29 +02:00
Behdad Esfahbod 4ab99fb8c3 Minor 2012-11-30 15:02:04 +02:00
Behdad Esfahbod 6748b96d27 Minor 2012-11-30 12:02:21 +02:00
Behdad Esfahbod 0f3f529904 Add test-size-params
Eventually this will become part of a yet-to-be-written hb-ot cmdline
tool.
2012-11-30 09:06:59 +02:00
Behdad Esfahbod 8465a05a89 Fix hb_buffer_guess_segment_properties() for empty buffer
Was causing assertion failure in shape_plan().
2012-11-30 08:46:43 +02:00
Behdad Esfahbod e75943de80 [OTLayout] Fix collect_glyphs() recursion in ContextFormat3 2012-11-30 08:38:24 +02:00
Behdad Esfahbod 3038ae6adb [OTLayout] Minor 2012-11-30 08:24:13 +02:00
Behdad Esfahbod 0dff11f6bf [OTLayout] Look for any 'size' feature, not only in DFLT script
The old code doesn't work with all fonts, as Khaled has reported.
2012-11-30 08:14:20 +02:00
Behdad Esfahbod e9ad71dee8 [OTLayout] Rename hb_ot_layout_position_get_size() to hb_ot_layout_get_size_params() 2012-11-30 08:10:26 +02:00
Behdad Esfahbod f18ff5a84d [OTLayout] Return correct value from recursion
Commit 4c4e8f0e75 broke contextual lookups
by making the recurse() function always return false.

Reported by Khaled.  Test case: لا in Amiri.
2012-11-30 08:07:06 +02:00
Behdad Esfahbod f54cce3c6a [OTLayout] Implement 'size' feature 2012-11-26 14:02:31 +02:00
Behdad Esfahbod 2dc1141d7d [OTLayout] Remove operator() from ClassDef 2012-11-24 19:16:34 -05:00
Behdad Esfahbod b67881b171 [OTLayout] Remove operator() from Coverage 2012-11-24 19:13:55 -05:00
Behdad Esfahbod a88e716021 [OTLayout] Implement hb_ot_layout_collect_lookups()
Untested.
2012-11-24 02:31:02 -05:00
Behdad Esfahbod 1ea375da44 [OTLayout] Only collect output glyphs during recursion in collect_glyphs() 2012-11-24 02:05:52 -05:00
Behdad Esfahbod f1b12781d2 [OTLayout] Implement ChainContext collect_glyphs()
All of collect_glyphs() complete and untested now.
2012-11-24 02:02:01 -05:00
Behdad Esfahbod cdd756b9f4 [OTLayout] Implement GPOS collect_glyphs() 2012-11-24 01:38:41 -05:00
Behdad Esfahbod 4c4e8f0e75 [OTLayout] Reuse apply context for recursion 2012-11-24 01:13:20 -05:00
Behdad Esfahbod 53a69f49e5 [OTLayout] Remove unused members 2012-11-24 01:03:05 -05:00
Behdad Esfahbod d0a5233785 [OTLayout] Implement Context::collect_glyphs() 2012-11-23 18:54:59 -05:00
Behdad Esfahbod 26514d51b6 [OTLayout] More collect_glyphs() 2012-11-23 18:13:48 -05:00
Behdad Esfahbod c6fb843f2a [OTLayout] Templatize process_recurse_func 2012-11-23 18:04:08 -05:00
Behdad Esfahbod 9b34677f36 [OTLayout] Clean up closure() a bit 2012-11-23 17:55:40 -05:00
Behdad Esfahbod adf7758a27 Improve debug log format in presence of templates 2012-11-23 17:34:02 -05:00
Behdad Esfahbod 2c53bd3c3e [OTLayout] Start porting sanitize() to process() 2012-11-23 17:29:05 -05:00
Behdad Esfahbod f48ec0e834 [OTLayout] Add process() tracing 2012-11-23 17:23:41 -05:00
Behdad Esfahbod ed2e135944 [OTLayout] More Extension templatizing 2012-11-23 17:10:40 -05:00
Behdad Esfahbod 7dddd4e72b [OTLayout] More templatizing Extension 2012-11-23 17:04:55 -05:00
Behdad Esfahbod 653eeb2645 Make Extension a template 2012-11-23 16:57:36 -05:00
Behdad Esfahbod 08f1eede1b Minor 2012-11-23 16:51:43 -05:00
Behdad Esfahbod 2c9d6485a1 More tracing fixup 2012-11-23 16:49:19 -05:00
Behdad Esfahbod a1733db1c6 [OTLayout] Start adding process() tracing 2012-11-23 16:40:04 -05:00
Behdad Esfahbod 73c18ae1b9 Cleanup 2012-11-23 15:34:11 -05:00
Behdad Esfahbod be218c688c Pass this object to trace macros 2012-11-23 15:32:14 -05:00
Behdad Esfahbod 902cc8aca0 [OTLayout] Start unbreaking tracing 2012-11-23 15:23:30 -05:00
Behdad Esfahbod dabe698fcb Minor 2012-11-23 14:21:35 -05:00
Behdad Esfahbod c779d82b2f Fix warnings 2012-11-23 14:09:21 -05:00
Behdad Esfahbod 81822528ef Minor 2012-11-23 13:27:16 -05:00
Behdad Esfahbod 1d67ef980f Move code around 2012-11-22 16:47:53 -05:00
Behdad Esfahbod ec35a72a44 [OTLayout] Port apply() operator to process() template 2012-11-22 16:33:46 -05:00
Behdad Esfahbod 2005fa5340 [OTLayout] Port would_apply() and get_coverage() to process() templates 2012-11-22 16:33:46 -05:00
Behdad Esfahbod 44fc237b53 [OTLayout] Port closure() to process() template 2012-11-22 16:33:46 -05:00
Behdad Esfahbod 5be86b1bb4 [ucdn] Make data tables const! 2012-11-22 16:33:46 -05:00
Behdad Esfahbod 7c5b7fe686 Fix hb_shape_plan_get_shaper() 2012-11-22 16:33:46 -05:00
Behdad Esfahbod ac064a2db2 Rename hb_set_population() to hb_set_get_population() 2012-11-21 01:14:19 -05:00
Behdad Esfahbod 16c914c2a6 [Indic] One more try at unbreaking Khmer fonts
See comments and discussion on the list.
2012-11-21 01:04:15 -05:00
Behdad Esfahbod e8cfdd7fa8 Start implementing collect_glyphs() operation
Not functional yet.
2012-11-16 19:07:06 -08:00
Behdad Esfahbod 7d52e6601f Whitespace 2012-11-16 18:49:54 -08:00
Behdad Esfahbod 51bb498b7b Minor 2012-11-16 14:08:05 -08:00
Behdad Esfahbod 89ca8eeb83 Implement hb_ot_layout_get_glyphs_in_class() 2012-11-16 13:53:40 -08:00
Behdad Esfahbod 5a08ecf920 Implement hb_ot_layout_get_glyph_class() 2012-11-16 13:34:29 -08:00
Behdad Esfahbod f9edd5d56b Implement hb_shape_plan_get_shaper()
Untested.
2012-11-16 13:23:37 -08:00
Behdad Esfahbod 43b6531500 [Indic] Another try to unbreak Sinhala split matras
Just read the comments...
2012-11-16 13:14:26 -08:00
Behdad Esfahbod 977f1740ac Unbreak tests 2012-11-16 13:10:07 -08:00
Behdad Esfahbod eba312c8d1 Plumbing to get shape plan and font into complex decompose function
So we can handle Sinhala split matras smartly...  Coming soon.
2012-11-16 12:58:38 -08:00
Behdad Esfahbod 3f82f8ff07 Rename hb_buffer_guess_properties() to hb_buffer_guess_segment_properties() 2012-11-15 18:48:10 -08:00
Behdad Esfahbod f30641038b Bunch of independent changes (ouch)
API additions:

	hb_segment_properties_t
	HB_SEGMENT_PROPERTIES_DEFAULT
	hb_segment_properties_equal()
	hb_segment_properties_hash()

	hb_buffer_set_segment_properties()
	hb_buffer_get_segment_properties()

	hb_ot_layout_glyph_class_t

	hb_shape_plan_t
	hb_shape_plan_create()
	hb_shape_plan_create_cached()
	hb_shape_plan_get_empty()
	hb_shape_plan_reference()
	hb_shape_plan_destroy()
	hb_shape_plan_set_user_data()
	hb_shape_plan_get_user_data()
	hb_shape_plan_execute()

	hb_ot_shape_plan_collect_lookups()

API changes:

	Rename hb_ot_layout_feature_get_lookup_indexes() to
	hb_ot_layout_feature_get_lookups().

New header file:

	hb-shape-plan.h

And a bunch of prototyped but not implemented stuff.  Coming soon.
(Tests fail because of the prototypes right now.)
2012-11-15 18:48:10 -08:00
Behdad Esfahbod e05a999495 Add hb_face_[sg]et_glyph_count() 2012-11-15 16:23:21 -08:00
Behdad Esfahbod aec89de564 Add / modify set API a bit 2012-11-15 16:15:42 -08:00
Behdad Esfahbod c54599ad26 Minor 2012-11-15 16:14:39 -08:00
Behdad Esfahbod d1aa143ca4 [Thai] Remove U+0E2C from "AC" consonants
WinXP doesn't include it.
2012-11-15 15:38:08 -08:00
Behdad Esfahbod 362a990b22 Rename hb_ot_layout_would_substitute_lookup() and hb_ot_layout_substitute_closure_lookup()
To match upcoming API.
2012-11-15 14:57:31 -08:00
Behdad Esfahbod 3cec819d39 Make the OT shaper default, even if CoreText or Uniscribe is enabled 2012-11-15 13:15:39 -08:00
Behdad Esfahbod 072ae7a982 Add hb_buffer_serialize_list_formats() 2012-11-15 13:14:12 -08:00
Behdad Esfahbod f9edf16725 Add buffer serialization / deserialization API
Two output formats for now: TEXT, and JSON.  For example:

  hb-shape --output-format=json

Deserialization API is added, but not implemented yet.
2012-11-15 13:10:07 -08:00
Behdad Esfahbod fd0de881f4 Avoid C++ undefined behavior
https://bugzilla.mozilla.org/show_bug.cgi?id=810823
2012-11-15 10:48:50 -08:00
Behdad Esfahbod f41dc2d35b Fix undefined behavior in Indic dottedcircle
Chromium Issue 158998:	Conditional jump in harfbuzz-ng
http://code.google.com/p/chromium/issues/detail?id=158998
2012-11-15 10:36:43 -08:00
Behdad Esfahbod 1eb3e94fe9 [Thai] Implement PUA-based fallback shaping
As explained here:

  http://linux.thai.net/~thep/th-otf/shaping.html

Our output now matches Uniscribe for old fonts (eg. XP Tahoma) with no
Thai GSUB table.
2012-11-14 17:53:09 -08:00
Behdad Esfahbod 851784f837 Improve shaper selection 2012-11-14 17:53:09 -08:00
Behdad Esfahbod 43f04a7456 Move Thai shaper into a separate file 2012-11-14 15:51:54 -08:00
Behdad Esfahbod ba82325b7a Add note re 'Phags-pa letter U+A872, which is Joining_Type=L 2012-11-14 15:36:53 -08:00
Behdad Esfahbod d469fadce8 [Indic] Exchange abort() for assert() 2012-11-14 15:07:36 -08:00
Behdad Esfahbod 0f80a89de9 Don't route Kharoshthi through the Indic shaper
It's a simple, right-to-left, script.
2012-11-14 15:05:19 -08:00
Behdad Esfahbod e67072bb17 [Indic] Handle overstruck matra position 2012-11-14 15:00:53 -08:00
Behdad Esfahbod 7e99e4f074 Reposition Lao marks
Lao marks are center-aligned, unlike Thai ones.
2012-11-14 14:09:46 -08:00
Behdad Esfahbod 865745b5b8 Don't do fallback positioning for Indic and Thai shapers 2012-11-14 13:48:26 -08:00
Behdad Esfahbod 981748cb2e [Indic] If Khmer fonts have a 'liga' feature, use generic shaper
Seems to produce more coherent results than trying the Indic shaper on
them.  I'm looking at you, Kh-* fonts...
2012-11-14 13:38:16 -08:00
Behdad Esfahbod dde5506fd9 [Indic] Don't move virama with left matra
This is important for the Sinhala U+0DDA split matra since it decomposes
to U+0DD9,U+0DCA where U+0DD9 is a left matra and U+0DCA is the virama.
We don't want to move the virama with the left matra.
TEST: U+0D9A,U+0DDA

Note that we were already doing this in the Uniscribe bug compatibility
mode.  We now do it all the time.
2012-11-14 11:37:04 -08:00
Behdad Esfahbod 92f9bfed42 Minor 2012-11-13 16:50:45 -08:00
Behdad Esfahbod 66ac2ff32e API change: Remove "mask" from hb_buffer_add()
I don't expect anybody using hb_buffer_add(), so this shouldn't break
anyone's code.
2012-11-13 16:26:32 -08:00
Behdad Esfahbod e13f8d280b Fix UTF-8 backward iteration
Ouch!
2012-11-13 15:12:06 -08:00
Behdad Esfahbod 5669a6cf41 [Arabic] Fix post-context handling
Ouch!
2012-11-13 15:11:51 -08:00
Behdad Esfahbod 0c7df22228 Add buffer flags
New API:

	hb_buffer_flags_t

	HB_BUFFER_FLAGS_DEFAULT
	HB_BUFFER_FLAG_BOT
	HB_BUFFER_FLAG_EOT
	HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES

	hb_buffer_set_flags()
	hb_buffer_get_flags()

We use the BOT flag to decide whether to insert dottedcircle if the
first char in the buffer is a combining mark.

The PRESERVE_DEFAULT_IGNORABLES flag prevents removal of characters like
ZWNJ/ZWJ/...
2012-11-13 14:42:35 -08:00
Behdad Esfahbod 1c7e55511a Minor fix
Ouch
2012-11-13 14:42:22 -08:00
Behdad Esfahbod 82ecaff736 Add hb_buffer_clear()
Which is like _reset(), but does NOT clear unicode-funcs.
2012-11-13 14:10:00 -08:00
Behdad Esfahbod 0736915b8e [Indic] Decompose Sinhala split matras the way old HarfBuzz / Pango did
Had to do some refactoring to make this happen...

Under uniscribe bug compatibility mode, we still plit them
Uniscrie-style, but Jonathan and I convinced ourselves that there is no
harm doing this the Unicode way.  This change makes that happen, and
unbreaks free Sinhala fonts.
2012-11-13 12:35:35 -08:00
Behdad Esfahbod 6fd5335622 [Indic] Update auto-generated Indic machine to reflect previous commit 2012-11-12 18:42:18 -08:00
Behdad Esfahbod 9cac1338c4 [Indic] Allow Consonant_Medial's after Consonant's
Mostly affects Myanmar, but also Tai Tham, Javanese, and Cham.  The
latter three are untested (no fonts!).
2012-11-12 18:41:22 -08:00
Behdad Esfahbod d187099cba [Indic] Categorize Myanmar "tone marks" as nuktas 2012-11-12 18:38:06 -08:00
Behdad Esfahbod 8173f23f3f [Indic] Add config for Myanmar 2012-11-12 18:37:20 -08:00
Behdad Esfahbod 9e92978c8a [Indic] Route "new" Myanmar tag through the Indic shaper
Windows 8 adds a Myanmar shaper using the 'mym2' tag.  Route that
through the Indic shaper.  It's still very broken, but at least this
does NOT break old-style Myanmar shaping using the generic shaper.
2012-11-12 18:36:10 -08:00
Behdad Esfahbod 5ab3855f81 Choose shaper based on chosen OT script tag
For Arabic and Indic shapers, if the font doesn't have a script system
for the script, use default shaper.

Make an exception for Arabic script since we have fallback logic for
that one.
2012-11-12 18:27:42 -08:00
Behdad Esfahbod 9b37b4c580 Make planner available to complex shaper choosing logic 2012-11-12 18:23:38 -08:00
Behdad Esfahbod 6fddf2d739 Refactoring ot-map building to make chosen script available earlier 2012-11-12 18:03:07 -08:00
Behdad Esfahbod de796a6fb9 Add "new" Myanmar OT Script tag
Windows 8 added support for Myanmar shaping using the "mym2" script tag,
even though Windows never supported the old "mymr" tag.
2012-11-12 17:27:51 -08:00
Behdad Esfahbod e9334ce97b Break build when ragel is needed and missing 2012-11-12 14:57:02 -08:00
Behdad Esfahbod dba186711e [Indic] Make more room in the table
To be used in upcoming commits.
2012-11-12 14:48:33 -08:00
Behdad Esfahbod c4be991743 Typo 2012-11-12 14:27:33 -08:00
Behdad Esfahbod 56be677781 [Indic] Port 'pref' logic to look into font tables
...instead of using a hardcoded list of Ra characters.
2012-11-12 14:09:40 -08:00
Behdad Esfahbod f2c0f59043 [Indic] Port reph handling logic to look into font features
...instead of using a hardcoded list of Ra characters.
2012-11-12 14:02:02 -08:00
Behdad Esfahbod 43149afbc0 Route MEETEI_MAYEK through the Indic shaper
Since it has a couple of left-"matras".
2012-11-12 13:34:17 -08:00
Behdad Esfahbod d0905c3400 Minor 2012-11-12 13:03:52 -08:00
Behdad Esfahbod 365f27ab5b Work around older compilers
As reported on the list:

I am seeing a similar problem building harfbuzz 0.9.5 with Apple gcc
4.0.1 on OS X 10.5 Leopard:

hb-ot-layout-common-private.hh:406: error: 'struct
OT::CoverageFormat1::Iter' is private
hb-ot-layout-common-private.hh:646: error: within this context
hb-ot-layout-common-private.hh:500: error: 'struct
OT::CoverageFormat2::Iter' is private
hb-ot-layout-common-private.hh:647: error: within this context
make[4]: *** [libharfbuzz_la-hb-ot-layout.lo] Error 1

Also reported as happening with MSVC 2005.
2012-11-12 11:16:57 -08:00
Behdad Esfahbod 6b389ddc36 [Indic] Don't apply 'liga'
Uniscribe doesn't.  And some fonts abuse this feature to get Indic
shaping working in non-complex applications like Adobe's apps.

No change in numbers:

BENGALI: 353897 out of 354188 tests passed. 291 failed (0.0821598%)
DEVANAGARI: 707337 out of 707394 tests passed. 57 failed (0.00805774%)
GUJARATI: 366440 out of 366457 tests passed. 17 failed (0.00463902%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951046 out of 951913 tests passed. 867 failed (0.0910798%)
KHMER: 299074 out of 299124 tests passed. 50 failed (0.0167155%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048011 out of 1048334 tests passed. 323 failed (0.0308108%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970557 out of 970573 tests passed. 16 failed (0.00164851%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-11-12 11:02:56 -08:00
Behdad Esfahbod d05ac7dc3f Fix hb-ft glyph name for broken fonts that return empty glyph names 2012-11-12 10:26:50 -08:00
Behdad Esfahbod 4899801155 U+A872 PHAGS-PA SUPERFIXED LETTER RA is "Right"-Joining 2012-11-08 15:08:26 -08:00
Behdad Esfahbod 22a685836a Adjust Mongolian shaping
For U+1880..U+1886 Uniscribe thinks they are non-joining.
For U+1887 Uniscribe thinks it's joining, but looks wrong to me.
For now, match Uniscribe.
2012-11-05 15:20:10 -08:00
Behdad Esfahbod c26a52fbe6 Minor 2012-11-04 16:48:45 -08:00
Behdad Esfahbod f60d3ed35d Minor 2012-11-04 16:44:47 -08:00
Behdad Esfahbod 10a33296e6 Minor 2012-11-02 13:38:55 -07:00
Behdad Esfahbod 3ba7bc14ea Implement 'Phags-pa shaping
Through the Arabic shaper.  It's similar to Mongolian.
2012-11-01 20:05:04 -07:00
Behdad Esfahbod da70111ab2 Don't clear buffer pre-context if no new context is being provided
Patch from Jonathan Kew.

Part of fixing:

Mozilla Bug 801410 - avoid inserting dotted-circle for run-initial
Unicode combining characters in "simple" scripts such as Latin

https://bugzilla.mozilla.org/show_bug.cgi?id=801410
2012-10-31 13:45:30 -07:00
Behdad Esfahbod 0bc7a38463 [OT] Fix ReverseChainingSubst
We should make it clear that we don't want output buffer in this case,
otherwise buffer->backtrack_len() would be wrong.
2012-10-29 22:02:45 -07:00
Behdad Esfahbod 2616689d15 More tracing fixups 2012-10-29 21:51:56 -07:00
Behdad Esfahbod 937f8d3871 [Arabic] Enable dlig and mset for Arabic
That's what the spec says, and what Uniscribe does.
2012-10-29 21:49:33 -07:00
Behdad Esfahbod bc513add79 Add missing TRACE_RETURN 2012-10-29 19:03:55 -07:00
Behdad Esfahbod 88d3c98e30 [Indic] Position pre-base reordering Ra after Chillus in Malayalam
The logic for pre-base reordering follows the left matra logic.
We had an exception for Malayalam/Tamil in the left matra repositioning
which was not reflected in pre-base reordering.

Malayalam failures down from 337 to 323.

BENGALI: 353996 out of 354285 tests passed. 289 failed (0.0815727%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048011 out of 1048334 tests passed. 323 failed (0.0308108%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271726 out of 271847 tests passed. 121 failed (0.0445103%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-10-29 16:46:44 -07:00
Behdad Esfahbod 21bf796954 Add missed file 2012-10-29 14:21:09 -07:00
Behdad Esfahbod 02ed52169a Improve license information 2012-10-28 21:26:19 -07:00
Behdad Esfahbod 4c1d924461 Minor 2012-10-28 20:27:25 -07:00
Behdad Esfahbod 38b015e57f Fix hb_buffer_set_length(buffer, 0)
Was causing invalid realloc()s.
2012-10-28 20:11:47 -07:00
Behdad Esfahbod b7115b63be Add XXX 2012-10-28 20:11:42 -07:00
Behdad Esfahbod 71ee1f2450 Port to ICU LayoutEngine C API
Incidentally, this makes it not crash with icu-le-hb anymore...
I'm not smart / stupid enough to spend two more days debugging C++
linking issues, and this is ABI-stable at least.
2012-10-28 19:18:11 -07:00
Behdad Esfahbod 0144f05e57 Remove unused members 2012-10-26 13:48:06 -07:00
Behdad Esfahbod cf3afd8979 Rename and revamp is_zero_width() to be is_default_ignorable()
That's really the logic desired.  Except that MONGOLIAN VOWEL SEPARATOR
is not default_ignorable but it really should be.  Reported to Unicode.

Based on suggestion from Konstantin Ritt.
2012-10-25 16:32:54 -07:00
Behdad Esfahbod fecdfa95da Fixup hb_ot_shape_closure()
Broke it when merged cmap mapping and normalizer.  Ouch!
2012-10-07 17:19:58 -04:00
Behdad Esfahbod 2d1dcb3ce3 Mark debug message functions static 2012-10-07 17:13:46 -04:00
Behdad Esfahbod 9947bd6daf Update UCDN to upstream commit 3f159c87824230b59af56e40e2db32caf6afa51a
- Unicode 6.2.0 goodness,
- Unassigned codepoints now have correct properties.  Passes test suite.
2012-10-02 20:44:43 -04:00
Behdad Esfahbod 32dbfcf763 Fix visibility of UCDN symbols 2012-10-02 17:42:13 -04:00
Behdad Esfahbod 3f33f0d1f2 Import UCDN into source tree
https://github.com/grigorig/ucdn
2012-10-02 16:23:29 -04:00
Behdad Esfahbod 0e292eb2a2 Remove Glib thread-safety support
Now that we have pthread detection in configure, we don't need Glib
anymore.  Glib will only be a Unicode data provider.
2012-10-02 15:09:38 -04:00
Behdad Esfahbod 66efe89648 Check for pthreads 2012-10-02 14:55:32 -04:00
Behdad Esfahbod f2eb3fa9dc [OT] Only insert dottedcircle if at the beginning of paragraph
If the first char in the run is a combining mark, but there is text
before the run, don't insert dottedcircle.

Part of addressing:
https://bugzilla.redhat.com/show_bug.cgi?id=858736
2012-09-25 21:35:35 -04:00
Behdad Esfahbod bdc2fc8294 [Arabic] Respect Arabic joining from neighboring context
Now we respect Arabic joining across runs.
2012-09-25 21:32:35 -04:00
Behdad Esfahbod 05207a79e0 [buffer] Save pre/post textual context
To be used for a variety of purposes.  We save up to five characters
in each direction.  No public API changes, everything is taken care
of already.  All clients need to do is to call hb_buffer_add_utf* with
the full text + segment info (or at least some context) instead of
just passing in the segment.

Various operations (hb_buffer_reset, hb_buffer_set_length,
hb_buffer_add*) automatically reset the relevant contexts.
2012-09-25 21:32:21 -04:00
Behdad Esfahbod 89ac39dbbe Add hb_utf_prev() 2012-09-25 13:59:24 -04:00
Behdad Esfahbod 70ea4ac688 Slightly optimize UTF-8 parsing 2012-09-25 12:30:16 -04:00
Behdad Esfahbod 4445e5e2ec [buffer] Cleanup / optimize UTF-16 parsing a bit 2012-09-25 12:26:12 -04:00
Behdad Esfahbod 1f66c3c1a0 Add hb_utf_strlen()
Speeds up UTF-8 parsing by calling strlen().
2012-09-25 11:42:16 -04:00
Behdad Esfahbod 7f19ae7b9f [buffer] Templatize UTF handling
Also move UTF routines into a separate file, to be reused from shapers
that need it.
2012-09-25 11:23:55 -04:00
Behdad Esfahbod 0e0a4da9b7 [buffer] Towards template'izing different UTF adders 2012-09-25 11:09:04 -04:00
Behdad Esfahbod 7d37280600 Minor 2012-09-25 11:04:41 -04:00
Behdad Esfahbod 54d5da4ee9 Remove unused indic.cc 2012-09-25 10:51:42 -04:00
Behdad Esfahbod fab7a71f11 [Indic] Import ragel-generated Indic machine in git
I don't expect ragel to be creating too much noise in its generated
output, and including this in-tree helps users right now.  We can
revisit this later if it proved to be too much trouble.
2012-09-24 21:51:13 -04:00
Behdad Esfahbod 20a840c7cd Use a C++ linker on Windows
On Windows we don't care whether or not we link to libstdc++.
Seems to fix build with mingw32 on msys, as reported by Werner.
2012-09-24 20:23:00 -04:00
Behdad Esfahbod eb7669a380 Better autofoo 2012-09-18 19:42:06 -04:00
Behdad Esfahbod d00f7d8375 Fix dependencies 2012-09-17 20:59:09 -04:00
Behdad Esfahbod 811eefe225 Return NULL, not false
Oh well...
2012-09-10 09:56:27 -04:00
Behdad Esfahbod 166b5cf7ec [Indic] Find syllables before any features are applied
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions.  That was then causing syllable match failures.  We now
find syllables before any features have been applied.

Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
2012-09-07 14:56:01 -04:00
Behdad Esfahbod 96fdc04e5c Add hb_buffer_[sg]et_content_type
And hb_buffer_content_type_t and enum values.
2012-09-06 22:30:53 -04:00
Behdad Esfahbod e30ebd2794 Add hb_feature_to/from_string() 2012-09-06 22:09:06 -04:00
Behdad Esfahbod f67917161b [OT] Do per-ligature-component fallback mark positioning
With this in place, you can remove GDEF/GSUB/GPOS tables from Arabic
fonts and still get per-component marks positioned on
oh-yeah-fallback-formed LAM-ALEF ligatures with marks in between the LAM
and ALEF.

Now *that*'s pretty cool, if a bit anachronistic...
2012-09-06 17:22:31 -04:00
Behdad Esfahbod 525c685578 [OT] Make fallback mark positioning more robust
...with clusters spanning multiple base characters.
2012-09-06 16:02:07 -04:00
Behdad Esfahbod 5d502443f5 [old] Clear offset array 2012-09-06 15:29:29 -04:00
Behdad Esfahbod 9433c218b4 [OT] Simplify fallback positioning condition 2012-09-06 14:27:15 -04:00
Behdad Esfahbod 028a1706f8 Refactor common macro 2012-09-06 14:25:48 -04:00
Behdad Esfahbod 07cfbe21b5 [OT] Streamline Arabic fallback shaping table 2012-09-06 01:16:39 -04:00
Behdad Esfahbod 82f6b6f388 Minor 2012-09-06 01:12:50 -04:00
Behdad Esfahbod fabd3113a9 [OT] Port Arabic fallback shaping to synthetic GSUB
All of init/medi/fina/isol and rlig implemented.

Let there be dragons... ⻯
2012-09-06 00:51:44 -04:00
Behdad Esfahbod f0b8ed1b6d [Indic] Allow "H,ZWJ,M"
Uniscribe accepts a Halant,ZWJ before matras.  Allow that.

BENGALI down from 295 to 291
DEVANAGARI down from 69 to 57
GUJARATI down from 19 to 17
KANNADA down from 871 to 867
MALAYALAM down from 340 to 337
TELUGU down from 20 to 16

Currently at:

BENGALI: 353897 out of 354188 tests passed. 291 failed (0.0821598%)
DEVANAGARI: 707337 out of 707394 tests passed. 57 failed (0.00805774%)
GUJARATI: 366440 out of 366457 tests passed. 17 failed (0.00463902%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951046 out of 951913 tests passed. 867 failed (0.0910798%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047997 out of 1048334 tests passed. 337 failed (0.0321462%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970557 out of 970573 tests passed. 16 failed (0.00164851%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-09-05 17:41:08 -04:00
Behdad Esfahbod 4ed717ef61 [Indic] Relax grammar
Now that we insert dotted-circle, tests break more easily when our indic
machine breaks.

In particular, a few Devanagari tests were having sequences like
"C,H,ZWJ,N", and because of the ZWJ the Nukta does NOT get reordered to
before the Halant as the grammar used to expect...  Fixup.

Another case is as simple as "C,ZWJ,SM".

Fixes 10 out of 79 failures:

DEVANAGARI: 707325 out of 707394 tests passed. 69 failed (0.00975411%)
2012-09-05 17:21:17 -04:00
Behdad Esfahbod aa7141efe4 [Indic] Fix Khmer syllable-final coeng-consonant
Brings down Khmer failures from 162 to 47.

KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)

Also rebaselined some of the test files that had only-inherited lines.
Removing those, the stats are:

BENGALI: 353893 out of 354188 tests passed. 295 failed (0.0832891%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366438 out of 366457 tests passed. 19 failed (0.00518478%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047994 out of 1048334 tests passed. 340 failed (0.0324324%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

Still some regressions, but some of the more egregious cases are
addressed.
2012-09-05 17:14:52 -04:00
Behdad Esfahbod 27bd55bd2c [Indic] Tamil does not have half-forms either
The Win7 Tamil font does not realy on this behavior, but the WinXP
version does.  Handle Tamil like Malayalam: Matras always move to
before base.

WinXP Tamil failures went down from 168964 (15.4752%) to 167
(0.0152953%) (two orders of magnitude reduction!).

Included in this is a minor fixup that actually fixed a few tests
with non-Tamil too.  Numbers at:

BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-09-05 15:22:02 -04:00
Behdad Esfahbod 87b75d0a4a [OT] Allow adding features with fallback implementation 2012-09-04 23:06:38 -04:00
Behdad Esfahbod 1d3947a6bd Minor 2012-09-04 22:42:17 -04:00
Behdad Esfahbod b3b89b6658 [OT] Add SubstLookup serialize API 2012-09-04 21:28:33 -04:00
Behdad Esfahbod 715e03bc21 Minor 2012-09-04 20:10:17 -04:00
Behdad Esfahbod 652d1e0d64 [OT] Start adding Lookup-level serialize API 2012-09-04 20:00:44 -04:00
Behdad Esfahbod a930c68e9c [OT] More serialize. Implements all basic GSUB subtables 2012-09-04 19:16:09 -04:00
Behdad Esfahbod 1b38b4e817 Minor 2012-09-04 18:17:21 -04:00
Behdad Esfahbod 2bd9fe3598 Refactor 2012-09-04 15:15:19 -04:00
Behdad Esfahbod a5ddd9e31c [OT] Really fix possible NULL dereference this time 2012-09-04 14:55:00 -04:00
Behdad Esfahbod 2941683358 [OT] Implement serialize() for AlternateSubst 2012-09-03 23:31:14 -04:00
Behdad Esfahbod 1f07e3382a [OT] Implement serialize() for MultiSubst 2012-09-03 23:28:34 -04:00
Behdad Esfahbod 4912030dfb Minor 2012-09-03 21:00:48 -04:00
Behdad Esfahbod f8fa2b5cf6 Fix possible NULL dereference
As reported by Kenichi Ishibashi.
2012-09-03 20:19:46 -04:00
Behdad Esfahbod 4b312fb288 [OT] Remove serialize alignment
Will reintroduce in a different way when we actually need it.
2012-09-01 21:56:06 -04:00
Behdad Esfahbod c61be03d6d [OT] A bit more serialize 2012-09-01 21:49:44 -04:00
Behdad Esfahbod abcc5ac1fd [OT] Improve serialize syntax
For some definition of improvement...
2012-09-01 21:30:17 -04:00
Behdad Esfahbod bc5be24014 [OT] Restart work on serialize() 2012-09-01 21:25:20 -04:00
Behdad Esfahbod 6912e476dd [OT] Insert dotted-circle for run-initial marks
Unfortunately if the font has GPOS and 'mark' feature does
not position mark on dotted-circle, our inserted dotted-circle
will not get the mark repositioned to itself.  Uniscribe cheats
here.

If there is no GPOS however, the fallback positioning kicks in
and sorts this out.

I'm not willing to address the first case.
2012-09-01 20:38:45 -04:00
Behdad Esfahbod 1d581ec384 [OT] Fallback-position ccc=0 Thai / Lao marks
Not perfect, but so is fallback positioning in 2012...
2012-09-01 20:06:26 -04:00
Behdad Esfahbod 3992b5ec4c Move code around 2012-09-01 19:20:41 -04:00
Behdad Esfahbod b85800f9de [Indic] Implement dotted-circle insertion for broken clusters
No panic, we reeally insert dotted circle when it's absolutely broken.

Fixes most of the dotted-circle cases against Uniscribe. (for Devanagari
fixes 80% of them, for Khmer 70%; the rest look like Uniscribe being
really bogus...)

I had to make a decision.  Apparently Uniscribe adds one dotted circle
to each broken character.  I tried that, but that goes wrong easily with
split matras.  So I made it add only one dotted circle to an entire
broken syllable tail.  As in: "if there was a dotted circle here, this
would have formed a correct cluster."  That works better for split
stuff, and I like it more.
2012-08-31 19:18:20 -04:00
Behdad Esfahbod 327d14ef18 [Indic] Start adding dotted-circle instrastructure 2012-08-31 16:49:34 -04:00
Behdad Esfahbod 1be368e96f Minor 2012-08-31 16:29:17 -04:00
Behdad Esfahbod 784f29d061 Minor 2012-08-31 14:06:26 -04:00
Behdad Esfahbod 5a7f18767a [OT] Better fallback-position Thai / Lao ccc!=0 marks 2012-08-30 22:53:29 -04:00
Behdad Esfahbod 9f2348de58 [OT] Add serialize() for Coverage 2012-08-29 21:08:59 -04:00
Behdad Esfahbod e901b954c6 [OT] Start adding serialize() API 2012-08-29 20:26:08 -04:00
Behdad Esfahbod 965c280de0 Add HB_BUFFER_ASSERT_VAR
To be used in places we access buffer vars...
2012-08-29 14:02:37 -04:00
Behdad Esfahbod 0ccf9b6473 Move code around 2012-08-29 14:02:37 -04:00
Behdad Esfahbod 2fcbbdb41a Port Arabic fallback ligating to share code with GSUB
This will eventually allow us to skip marks, as well as (fallback)
attach marks to ligature components of fallback-shaped Arabic.
That would be pretty cool.  I kludged GDEF props in, so mark-skipping
works, but the produced ligature id/components will be cleared later
by substitute_start() et al.

Perhaps using a synthetic table for Arabic fallback shaping was a better
idea.  The current approach has way too many layering violations...
2012-08-29 14:01:22 -04:00
Behdad Esfahbod 5e399a8a45 Minor 2012-08-29 10:40:49 -04:00
Behdad Esfahbod a177d027d1 [GSUB] Move ligation logic over 2012-08-28 23:18:22 -04:00
Behdad Esfahbod 191fa885d9 [GSUB] Merge Ligature and context input matching
Looks better now...
2012-08-28 22:58:55 -04:00
Behdad Esfahbod 93814ca7dc Start converging Ligature and match_input 2012-08-28 22:39:10 -04:00
Behdad Esfahbod 2eef71737e [hb-icu-le] Add visbility 2012-08-28 19:16:38 -04:00
Behdad Esfahbod d59e28e492 Minor 2012-08-28 19:08:36 -04:00
Behdad Esfahbod af169d2813 Minor 2012-08-28 19:08:22 -04:00
Behdad Esfahbod 52ff2681d8 Use VisualStudio-style atomic intrinsics on mingw32 2012-08-28 18:03:35 -04:00
Behdad Esfahbod 7c8e844d92 Use namespace for OpenType tables
Avoids USHORT, SHORT, ULONG, LONG clashes with Windows API.
2012-08-28 17:57:49 -04:00
Behdad Esfahbod dc5df5af6b Revert "Minor"
This reverts commit 3e0a03978b.

I know remember why that line is there :).
2012-08-28 16:31:23 -04:00
Behdad Esfahbod 3e0a03978b Minor 2012-08-27 17:10:02 -04:00
Behdad Esfahbod 667218a5b1 Minor 2012-08-27 17:00:44 -04:00
Behdad Esfahbod 30dd62251f Only fallback-position glyphs if we have the ccc
Previously, ccc=0 Thai / Lao marks were being
mispositioned.  Don't touch them.
2012-08-27 16:54:34 -04:00
Behdad Esfahbod e1ba62811a Center unknown marks horizontally 2012-08-27 16:28:05 -04:00
Behdad Esfahbod 23b0e9d7dc [Indic] Fix switch
D'oh.  Was working by pure chance :)).
2012-08-26 14:30:38 -04:00
Behdad Esfahbod 56e878ab87 [graphite2] Cleanup scratch buffer allocation 2012-08-24 00:41:51 -04:00
Behdad Esfahbod 2f7586c622 [icu-le] Implement icu layout engine shaper 2012-08-24 00:00:33 -04:00
Behdad Esfahbod ba7f6c3797 [icu-le] Hook up to hb_face_t 2012-08-24 00:00:33 -04:00
Behdad Esfahbod e96bb36995 [icu-le] Actually use the FontTableCache 2012-08-24 00:00:33 -04:00
Behdad Esfahbod 7d242364ea [icu-le] Start adding a icu-layout-engine backend
Import PortableFontInstance and add shaper stub.
2012-08-24 00:00:29 -04:00
Behdad Esfahbod b5584ee4be [Indic] For old-spec, match non-zero context
Fixes consonant-position with old-spec Malayalam.  Uniscribe seem to be
doing this.  Fixes below-base La (eg. Pa,H,La) with AnjaliNewLipi.ttf.
Doesn't regress new-spec or other scripts.
2012-08-23 16:26:07 -04:00
Behdad Esfahbod d9b204d3d2 [GSUB] Allow non-zero-context matching in would_apply()
To be used in the next patch.
2012-08-23 16:22:28 -04:00
Behdad Esfahbod 1f2bb172fe Revert "[Indic/GSUB] Ignore context when matching would_apply()"
This reverts commit 24dd4e5674.

Oops.  My bad.  The change _regressed_ Malayalam test suite, not
improved it.  I'll redo it, differentiating between old-spec and
new-spec cases.
2012-08-23 16:10:37 -04:00
Behdad Esfahbod 24dd4e5674 [Indic/GSUB] Ignore context when matching would_apply()
The MS Indic specs say "...all classifications are determined ... using
context-free substitutions."  However, testing shows that MS's Malayalam
shapers (both old and new), "match" even if there is no zero-context rule.
We follow.

Fixes below-base La (eg. Pa,H,La) with AnjaliNewLipi.ttf (old spec).
Moreover, test suite Malayalam failures are down to 312 from 875!  No
change in other scripts.

Current numbers:

BENGALI: 353996 out of 354285 tests passed. 289 failed (0.0815727%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047541 out of 1048416 tests passed. 875 failed (0.0834592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271726 out of 271847 tests passed. 121 failed (0.0445103%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-08-23 15:47:10 -04:00
Behdad Esfahbod 6732d62e78 [Indic] Implement pre-base reordering Ra for old-spec Malayalam
Fixes Pa,H,Ra sequence with AnjaliNewLipi.ttf.
2012-08-23 15:32:12 -04:00
Behdad Esfahbod 80cd92326f [Indic] Only apply basic features per-syllable
Free up syllables and let features work across syllables for the
presentation forms features and GPOS.

Fixed:
- 1 GURMUKHI test (remains 40)
- 12 KHMER tests (remains 18)
- 11 SINHALA tests (remains 121)

Regresses:
- 5 MALAYALAM tests (up to 312)

Current numbers:

BENGALI: 353996 out of 354285 tests passed. 289 failed (0.0815727%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271726 out of 271847 tests passed. 121 failed (0.0445103%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-08-23 12:06:14 -04:00
Behdad Esfahbod df5d5c68f3 Whitespace 2012-08-23 09:33:30 -04:00
Behdad Esfahbod 2f1747ed7d Add comment 2012-08-16 11:46:46 -04:00
Behdad Esfahbod bd08d5d126 [OT] Fix Arabic shaper OOB access
https://bugzilla.mozilla.org/show_bug.cgi?id=782908
2012-08-16 11:35:50 -04:00
Behdad Esfahbod daf0731865 [ICU] Fix includes
As reported by Steven Loomis, including uversion.h works everywhere.
2012-08-16 07:32:59 -04:00
Behdad Esfahbod a67ba9c0fe Whitespace 2012-08-15 18:52:17 -04:00
Behdad Esfahbod 45c1383cc7 Minor 2012-08-14 09:33:18 -04:00
Behdad Esfahbod 4ac4c6f2e1 Fix ICU build with older ICUs 2012-08-13 10:52:52 -04:00
Behdad Esfahbod d5045a5f40 [ICU] Use new normalizer2 compose/decompose API
It's considerably faster than the fallback implementation we had
previously!
2012-08-11 21:27:15 -04:00
Behdad Esfahbod 9f9f04c222 [OT] Unbreak Thai shaping and fallback Arabic shaping
The merger of normalizer and glyph-mapping broke shapers that
modified text stream.  Unbreak them by adding a new preprocess_text
shaping stage that happens before normalizing/cmap and disallow
setup_mask modification of actual text.
2012-08-11 18:34:13 -04:00
Behdad Esfahbod e9f28a38f5 [OT] Add shape_plan to Arabic shaper 2012-08-11 18:20:54 -04:00
Behdad Esfahbod daf13afb08 [OT] Implement fallback mark positioning for "double" combining marks 2012-08-10 16:38:44 -04:00
Behdad Esfahbod d345313104 [OT] Fix fallback mark positioning with left-to-right text
Ouch!
2012-08-10 16:34:04 -04:00
Behdad Esfahbod f4cb476298 [OT] Slightly adjust normalizer
The change is very subtle.  If we have a single-char cluster that
decomposes to three or more characters, then try recomposition, in
case the farther mark may compose with the base.
2012-08-10 03:51:44 -04:00
Behdad Esfahbod 07d6828063 Minor 2012-08-10 03:28:50 -04:00
Behdad Esfahbod b00321ea78 [OT] Avoid calling get_glyph() twice
Essentially move the glyph mapping to normalization process.
The effect on Devanagari is small (but observable).  Should be more
observable in simple text, like ASCII.
2012-08-09 22:33:32 -04:00
Behdad Esfahbod 12c0875eaf [OT] Remove redundant check 2012-08-09 22:02:54 -04:00
Behdad Esfahbod 5c60b70c89 [OT] More code shuffling around
Preparing for merging map_glyphs() and normalize().
2012-08-09 21:58:07 -04:00
Behdad Esfahbod cd0c6e148f Shuffle buffer variable allocations around
To room for more allocations, coming.
2012-08-09 21:48:55 -04:00
Behdad Esfahbod 8d1eef3f32 Minor 2012-08-09 21:35:47 -04:00
Behdad Esfahbod 56c9e7c004 Fill out combining class resetting for fallback shaping Thai/Lao/Tibetan 2012-08-09 21:14:23 -04:00
Behdad Esfahbod a321e1d51e Revert "Reject lookups with no subTable"
This reverts commit 30ec9002d8.

See previous commit.
2012-08-09 18:30:34 -04:00
Behdad Esfahbod 2eaf482b37 Revert "[GSUB/GPOS] Reject Context/ChainContext lookups with zero input"
This reverts commit 0981068b75.

I was confused.  Even if we access coverage[0] unconditionally, we don't
need bound checks since the array machinary already handles that.
2012-08-09 18:30:05 -04:00
Behdad Esfahbod a02d86484b Add check-exported-symbols.sh
And misc linking fixes.
2012-08-08 18:04:29 -04:00
Behdad Esfahbod 4c8ac4f47e Misc minor fixes 2012-08-08 17:44:19 -04:00
Behdad Esfahbod 560d68af81 Use a export-file for Windows builds
Apparently even that doesn't make check-internal-symbols.sh happy with
mingw32.  Going to disable that for DLLs again, but hopefully the
export-file is doing *something*.
2012-08-08 17:16:01 -04:00
Behdad Esfahbod f8751cf8e0 [hb-old] speed-up build 2012-08-08 17:15:44 -04:00
Behdad Esfahbod 5f4c52867c Minor 2012-08-08 16:53:37 -04:00
Behdad Esfahbod 7e7d245b33 Make default_language threadsafe 2012-08-08 15:23:48 -04:00
Behdad Esfahbod 06b192c458 Minor 2012-08-08 15:23:45 -04:00
Behdad Esfahbod 37191ede75 Minor 2012-08-08 14:59:09 -04:00
Behdad Esfahbod 6d9a329a8a Adjust a couple source checks 2012-08-08 14:48:41 -04:00
Behdad Esfahbod 9c929abdcf Minor renaming 2012-08-08 14:33:37 -04:00
Behdad Esfahbod 801298b590 Fix cast
https://bugs.freedesktop.org/show_bug.cgi?id=53233
2012-08-08 14:26:36 -04:00
Behdad Esfahbod 21756934a1 [OT] Implement fallback positioning
Implemented for Arabic, Hebrew, and generic marks.
Activated if no GPOS table present.
2012-08-08 01:20:45 -04:00
Behdad Esfahbod fb56e76283 [hb-old] Fix warnings 2012-08-07 23:44:47 -04:00
Behdad Esfahbod affaf8a0e5 [OT] Start adding fallback positioning
Used when there is no GPOS.
2012-08-07 22:43:07 -04:00
Behdad Esfahbod 7e4920fd15 Minor 2012-08-07 22:32:23 -04:00
Behdad Esfahbod 472f229a63 [GSUB] Generalize would_apply()
Fixes logic also, where before we were always matching if glyphs_len==1
and a ligature started with the glyph.
2012-08-07 22:25:24 -04:00
Behdad Esfahbod 6f3a300138 Add hb_font_glyph_from/to_string 2012-08-07 22:13:25 -04:00
Behdad Esfahbod eb56f6ae96 Minor 2012-08-07 21:44:25 -04:00
Behdad Esfahbod f4e48adcdd [OT] Apply 'rclt' feature in horizontal mode
'rclt' is "Required Contextual Forms" being proposed by Microsoft.
It's like 'calt', but supposedly always on.  We apply 'calt' anyway,
and now apply this too.
2012-08-07 21:12:49 -04:00
Behdad Esfahbod b1914b8bd0 Fix warnings 2012-08-07 16:57:48 -04:00
Behdad Esfahbod 0f8881d6bb More refactoring 2012-08-07 16:57:02 -04:00
Behdad Esfahbod 428dfcab66 Minor refactoring 2012-08-07 16:51:48 -04:00
Behdad Esfahbod 61f41849af Add Hebrew presentation forms shaping
Lifted from https://bugzilla.mozilla.org/show_bug.cgi?id=728866
2012-08-07 16:45:27 -04:00
Behdad Esfahbod 32d71dc133 [Graphite] Minor 2012-08-07 14:21:12 -04:00
Behdad Esfahbod 030ac5022e Remove enum trailing comma
...again.
2012-08-07 13:01:12 -04:00
Behdad Esfahbod 368b4e7649 Minor 2012-08-06 23:06:04 -04:00
Behdad Esfahbod ade7459ea7 [util] Fix leaks 2012-08-06 19:49:42 -07:00
Behdad Esfahbod 2fef993460 [Graphite] Fix graphite2 backend with RTL text
Patch from Martin Hosken.
2012-08-06 19:35:04 -07:00
Behdad Esfahbod e4992e13e1 [Graphite] Port graphite2 backend to new shaper infrastructure 2012-08-06 19:29:53 -07:00
Behdad Esfahbod 66591ececf Remove unnecessary lifecycle bits
We already set recount to INVALID when destroying.
This block was not necessary.
2012-08-06 17:07:19 -07:00
Behdad Esfahbod 167b625d98 [Indic] Minor, move 'blwf' after 'half'
We don't apply them together anyway.  Should not make any difference
right now.
2012-08-05 21:16:26 -07:00
Behdad Esfahbod 048e3b596f Speed up hb_set_digest_lowest_bits_t calcs 2012-08-04 20:46:45 -07:00
Behdad Esfahbod 3d1b66a35e Speed up hb_set_digest_common_bits_t calcs 2012-08-04 17:42:28 -07:00
Behdad Esfahbod 25326c2359 Rewrite ARRAY_LENGTH as a template function
Such it wouldn't apply to pointers accidentally.
2012-08-04 16:43:18 -07:00
Behdad Esfahbod 8ba8042821 [Indic] Fix consonant position font lookup logic
Oops.  I broken this badly and the test suite did not notice.  That
worries me.  Have to investigate.
2012-08-03 18:54:54 -07:00
Behdad Esfahbod abd0c05f1f Minor 2012-08-03 18:45:05 -07:00
Behdad Esfahbod 46ee108ef8 Fix leak 2012-08-03 18:21:13 -07:00
Behdad Esfahbod 71baea0062 [OT] Use general-category, not GDEF class, to decide to zero mark advances
At this point, the GDEF glyph synthesis looks pointless.  Not that I
have many fonts without GDEF lying around.

As for mark advance zeroing when GPOS not available, that also is being
replaced by proper fallback mark positioning soon.
2012-08-03 17:40:07 -07:00
Behdad Esfahbod 3a7e137a68 Dn't use gint 2012-08-03 17:23:40 -07:00
Behdad Esfahbod 11b0e20ba4 [Indic] Add per-script configuration tables
This concludes the Indic shape_plan work.  May do for Arabic also...
2012-08-02 14:21:40 -04:00
Behdad Esfahbod 85fc6c483f [Indic] Move more stuff to the shape_plan
Almost done.  Need to add per-script static tables.
2012-08-02 12:21:44 -04:00
Behdad Esfahbod 914ffaa40f [Indic] Move more repeated work into shape_plan 2012-08-02 11:05:32 -04:00
Behdad Esfahbod a8c6da90f4 [OT] Add per-complex-shaper shape_plan data
Hookup some Indic data to it.  More to come.
2012-08-02 10:46:34 -04:00
Behdad Esfahbod 8bb5deba96 [OT] Pipe shape_plan down to pause_callbacks 2012-08-02 10:07:58 -04:00
Behdad Esfahbod 3e38c0f288 More massaging 2012-08-02 09:44:18 -04:00
Behdad Esfahbod 16c6a27b4b [OT] Port complex_shaper to planner/plan 2012-08-02 09:38:28 -04:00
Behdad Esfahbod 5393e3a62b [OT] Minor refactoring 2012-08-02 09:24:35 -04:00
Behdad Esfahbod 24eacf17c8 [Indic] Move consonant-position-setting into initial_reordering() 2012-08-02 08:42:51 -04:00
Behdad Esfahbod afbcc24be0 [GSUB] Wire the font, not just the face, down to substitute()
We need the font for glyph lookup during GSUB pauses in Indic shaper.
Could perhaps be avoided, but at this point, we don't mean to support
separate substitute()/position() entry points (anymore), so there is
no point in not providing the font to GSUB.
2012-08-02 08:36:40 -04:00
Behdad Esfahbod b0e6a26a10 [OT] Hide some API
It was impossible to meaningfully use them from the outside these days.
2012-08-02 08:11:14 -04:00
Behdad Esfahbod 305246744e Minor 2012-08-02 08:08:04 -04:00
Behdad Esfahbod 8ef3d53255 [Indic] More refactoring of consonant position peeking in the font
To be moved to initial_reordering next...
2012-08-02 07:59:19 -04:00
Behdad Esfahbod 3eb6f81fd3 [Indic] Refactor
Move all the logic that needs to eventually move into the indic table
into hb-ot-shape-complex-indic-private.hh.
2012-08-02 07:38:39 -04:00
Behdad Esfahbod 3614ba242f [Indic] Rename 2012-08-02 07:23:42 -04:00
Behdad Esfahbod 610e5e8f71 [Indic] Streamline feature would_apply()
Comes with some 10% speedup for Devanagari even!
2012-08-02 05:41:18 -04:00
Behdad Esfahbod 1d002048d5 [Indic] Minor 2012-08-02 05:02:53 -04:00
Behdad Esfahbod 6f76113755 [GSUB/GPOS] Check array size before accessing digests 2012-08-02 04:00:31 -04:00
Behdad Esfahbod 22148b8c4a Use Coverage digests in would_apply 2012-08-02 03:51:51 -04:00
Behdad Esfahbod 6c459c8fef Minor 2012-08-02 03:45:53 -04:00
Behdad Esfahbod e2b8d75fa6 Use wider set digests on 64-bit archs 2012-08-01 22:17:48 -04:00
Behdad Esfahbod 0120ce9679 [GSUB/GPOS] Remove unused get_coverage() methods 2012-08-01 21:56:35 -04:00
Behdad Esfahbod 1336ecdf8e [GSUB/GPOS] Use Coverage digests as gatekeeper
Gives me a good 10% speedup for the Devanagari test case.  Less so
for less lookup-intensive tests.

For the Devanagari test case, the false positive rate of the GSUB digest
is 4%.
2012-08-01 21:46:36 -04:00
Behdad Esfahbod a878c58a8f [GSUB/GPOS] Add add_coverage() 2012-08-01 21:46:19 -04:00
Behdad Esfahbod 60a3035ac5 Add hb_set_digest_t
Implement two set digests, and one that combines the two.
2012-08-01 21:46:19 -04:00
Behdad Esfahbod c8accf1dd2 [OT] Templatize Coverage::add_coverage() 2012-08-01 21:05:57 -04:00
Behdad Esfahbod 8fbfda920e Inline font getters 2012-08-01 19:03:46 -04:00
Behdad Esfahbod 6adf417bc1 Use a lookup table for modified_combining_class 2012-08-01 18:07:42 -04:00
Behdad Esfahbod 208f70f055 Inline Unicode callbacks internally 2012-08-01 17:13:10 -04:00
Behdad Esfahbod 7470315a3e Move unicode accessors around 2012-08-01 17:01:59 -04:00
Behdad Esfahbod 21fdcee001 Add hb_unicode_combining_class_t 2012-08-01 16:28:50 -04:00
Behdad Esfahbod 84186a6400 Add commentary on the compatibility decomposition in the normalizer 2012-08-01 13:32:39 -04:00
Behdad Esfahbod 0834d95201 [hb-old] Adjust mark positioning parameters
Fallback mark positioning works now...  With hb-ft and hb-view /
hb-shape at least.
2012-08-01 00:21:09 -04:00
Behdad Esfahbod 4ca743dfb8 [old] Implement fontMetrics 2012-08-01 00:03:41 -04:00
Behdad Esfahbod 1e7d860613 [GPOS] Adjust mark advance-width zeroing logic
If there is no GPOS, zero mark advances.

If there *is* GPOS and the shaper requests so, zero mark advances for
attached marks.

Fixes regression with Tibetan, where the font has GPOS, and marks a
glyph as mark where it shouldn't get zero advance.
2012-07-31 23:41:06 -04:00
Behdad Esfahbod a8842e4a44 Remove some TODO items 2012-07-31 23:17:23 -04:00
Behdad Esfahbod 2bc3b9a616 [OT] Zero mark advances if the shaper desires so
Enabled for all shapers except for Indic.
2012-07-31 23:17:22 -04:00
Behdad Esfahbod 5fecd8b035 [OT] Synthesize glyph classes 2012-07-31 23:17:22 -04:00
Behdad Esfahbod 03b09214c0 [GSUB] Minor 2012-07-31 22:43:58 -04:00
Behdad Esfahbod f0fc1df8fc [hb-old] Implement getGlyphMetrics()
Still working on it.
2012-07-31 22:43:32 -04:00
Behdad Esfahbod 378d279bbf Implement Unicode compatibility decompositions
Based on patch from Philip Withnall.
https://bugs.freedesktop.org/show_bug.cgi?id=41095
2012-07-31 21:36:16 -04:00
Behdad Esfahbod 321ec29cc2 Remove unused function 2012-07-31 21:10:16 -04:00
Behdad Esfahbod 69cc492dc1 [buffer] Minor 2012-07-31 14:51:36 -04:00
Behdad Esfahbod 693918ef85 [OT] Streamline complex shaper enumeration
Add a shaper class struct.
2012-07-30 21:08:51 -04:00
Behdad Esfahbod c2e42c3db6 Minor 2012-07-30 19:54:50 -04:00
Behdad Esfahbod 03f67bc012 More refactoring glyph class access 2012-07-30 19:47:53 -04:00
Behdad Esfahbod 300c7307eb [OT] Don't crash if no GDEF available 2012-07-30 19:37:44 -04:00
Behdad Esfahbod 3dcbdc2125 Minor 2012-07-30 19:32:42 -04:00
Behdad Esfahbod 05bd1b6342 [GSUB/GPOS] Move glyph props matching around 2012-07-30 19:30:01 -04:00
Behdad Esfahbod 2fca1426ca [GSUB] Don't erase glyph classes if GDEF does not have glyph classes 2012-07-30 18:46:41 -04:00
Behdad Esfahbod fd42257f8c Minor 2012-07-30 18:44:10 -04:00
Behdad Esfahbod 7fbbf86efe [GSUB] Minor 2012-07-30 18:36:42 -04:00
Behdad Esfahbod 713914d320 [Uniscribe] Clean up a bit 2012-07-30 17:54:38 -04:00
Behdad Esfahbod 301168dae7 [CoreText] Port to shape_plan infrastructure 2012-07-30 17:48:04 -04:00
Behdad Esfahbod 6cdfd14bb1 Fix build on Mac 2012-07-30 17:22:17 -04:00
Behdad Esfahbod 7e34601ded Unbreak Hangul jamo composition
When we removed the separate Hangul shaper, the specific normalization
preference of Hangul was lost.  Fix that.  Also, the Thai shaper was
copied from Hangul, so had the fully-composed normalization behavior,
which was unnecessary.  So, fix that too.
2012-07-30 14:53:41 -04:00
Behdad Esfahbod 7afb14407e [Indic] Recategorize Telugu length marks
Fixes 8 more Telugu tests.  Failures at 15 (0.00154548%).
2012-07-30 13:54:46 -04:00
Behdad Esfahbod f2377155e3 [hb-old] Fix misc leaks
Backport (forward-port?!) from upstream:

commit 3ab7b37bdebf0f8773493a1fee910b151c4de30f
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Mon Jul 30 10:50:22 2012 -0400

    Fix misc leaks

    https://bugs.freedesktop.org/show_bug.cgi?id=31992
    https://bugs.freedesktop.org/show_bug.cgi?id=31993
    https://bugs.freedesktop.org/show_bug.cgi?id=31994
    https://bugs.freedesktop.org/show_bug.cgi?id=31995
2012-07-30 10:50:57 -04:00
Behdad Esfahbod 3f4764bb56 Don't lock user_data set during destruction if empty 2012-07-30 10:06:42 -04:00
Behdad Esfahbod 4ba647eecf Fix leak 2012-07-30 09:53:06 -04:00
Behdad Esfahbod f860366456 [OT] Gain back some lost speed 2012-07-30 03:16:38 -04:00
Behdad Esfahbod 11f4c87d01 [OT] Remove hb_ot_layout_ensure()
I didn't like it from the beginning.
2012-07-30 02:36:46 -04:00
Behdad Esfahbod 578e42182b Minor 2012-07-30 02:35:07 -04:00
Behdad Esfahbod a973b5ce86 [GSUB] Further adjustments to mark-attachment vs ligation interaction
The d1d69ec52e change broke Kannada badly,
since it was ligating consonants, pushing matra out, and then ligating
with the matra.  Adjust for that.  See comments.
2012-07-30 01:47:46 -04:00
Behdad Esfahbod 0aef425e25 [GSUB] Minor 2012-07-30 00:55:15 -04:00
Behdad Esfahbod d1d69ec52e [GSUB] Don't ligate glyphs attached to different components of ligatures
This concludes the mark-attachment vs ligating interaction fixes (for now).
2012-07-30 00:51:47 -04:00
Behdad Esfahbod 4751dec8be Minor 2012-07-30 00:42:07 -04:00
Behdad Esfahbod f24bcfbed1 Minor 2012-07-30 00:39:00 -04:00
Behdad Esfahbod fe20c0f84f [GSUB] Fix mark component stuff when ligatures form ligatures!
See comments.

Fixes https://bugzilla.gnome.org/show_bug.cgi?id=437633
2012-07-30 00:00:59 -04:00
Behdad Esfahbod 2ec3ba46a3 [GSUB/GPOS] Minor
Start squeezing more out of lig_id/lig_comp.
2012-07-29 22:16:15 -04:00
Behdad Esfahbod ef6e9cec33 Fixup bb0e4ba3e9 2012-07-29 21:35:22 -04:00
Behdad Esfahbod cb3d340631 [GSUB] Don't set new lig_id on mark ligatures
If two marks form a ligature, retain their previous lig_id, such that
the mark ligature can attach to ligature components...

Fixes https://bugzilla.gnome.org/show_bug.cgi?id=676343

In fact, I noticed that we should not let ligatures form between glyphs
coming from different components of a previous ligature.  For example,
if the sequence is: LAM,SHADDA,LAM,FATHA,HEH, the LAM,LAM,HEH form a
ligature, putting SHADDA and FATHA next to eachother.  However, it would
be wrong to ligate them.  Uniscribe has this bug also.
2012-07-29 20:37:38 -04:00
Behdad Esfahbod a15b70a81a [hb-old] Fix cluster formation in RTL
Unlike Uniscribe, hb-old returns glyphs in logical order, so the logic
does not need to duplicated for RTL.
2012-07-29 20:09:22 -04:00
Behdad Esfahbod 8a7e70ef65 [Minor] 2012-07-29 19:56:54 -04:00
Behdad Esfahbod bb0e4ba3e9 Minor 2012-07-29 17:34:14 -04:00
Behdad Esfahbod a00ad60bc0 [Uniscribe] Remove hb_uniscribe_font_ensure()
Wasn't a huge fan of putting the burden on the user.  Just remove it and
do what we've got to do transparently.
2012-07-28 21:16:08 -04:00
Behdad Esfahbod 5d874d566f [GPOS] Fix mark-to-mark positioning when one of the marks is a ligature
This commit: a3313e5400 broke MarkMarkPos
when one of the marks itself is a ligature.  That regressed 26 Tibetan
tests (up from zero!).  Fix that.  Tibetan back to zero.
2012-07-28 21:05:25 -04:00
Behdad Esfahbod 338fe662b5 [GSUB] Minor 2012-07-28 18:53:01 -04:00
Behdad Esfahbod e6f7479fe3 [GSUB] Simplify would-apply 2012-07-28 18:34:58 -04:00
Behdad Esfahbod dadede012e Minor 2012-07-28 18:13:09 -04:00
Behdad Esfahbod 0b99429ead [GSUB/GPOS] Add get_coverage() and use it to speed up main loop
And use it to speed up the hotspot by checking coverage directly in
the main loop, not 10 functions deep in.

Gives me a solid 20% boost with Indic test suite.  Less so for less
lookup-intensive scenarios.

Remove the "fast_path" hack from before.
2012-07-28 17:46:35 -04:00
Behdad Esfahbod 30ec9002d8 Reject lookups with no subTable 2012-07-28 17:25:20 -04:00
Behdad Esfahbod 0981068b75 [GSUB/GPOS] Reject Context/ChainContext lookups with zero input 2012-07-28 17:01:59 -04:00
Behdad Esfahbod 2f87cebe10 Implement shape_plan caching
Should give us some performance boost.
2012-07-27 04:20:39 -04:00
Behdad Esfahbod e9eb9503e9 Add default_shaper_list to shape_plan 2012-07-27 03:16:22 -04:00
Behdad Esfahbod 3b7c4e2706 Don't fail choosing shaper on planning failure
Shapers have a chance to reject a font in face shaper_data creation.
No need to allow failing during planning.
2012-07-27 03:12:23 -04:00
Behdad Esfahbod cfe9882610 Add hb_ot_layout_ensure() and hb_uniscribe_font_ensure() 2012-07-27 03:06:30 -04:00
Behdad Esfahbod c5b668fb92 Choose one shaper per plan 2012-07-27 02:49:39 -04:00
Behdad Esfahbod e82061e8db Move ot shaper completely to shape_plan 2012-07-27 02:29:32 -04:00
Behdad Esfahbod ea278d3895 Partially switch ot shaper to shape_plan 2012-07-27 02:12:28 -04:00
Behdad Esfahbod b6b7ba1313 Switch old and uniscribe backends to shape_plan 2012-07-27 01:37:18 -04:00
Behdad Esfahbod c32c096a42 Switch to shape_plan
Not optimized yet.  Eats babies.  And no shaper uses the shape_plan.
2012-07-27 01:13:53 -04:00
Behdad Esfahbod 5b95c148cc Start implementing shape_plan 2012-07-27 01:02:24 -04:00
Behdad Esfahbod bd26b4d21f Minor 2012-07-26 22:18:24 -04:00
Behdad Esfahbod 027857d041 Start adding a unified shaper access infrastructure
Add global shape_plan.  Unused so far.
2012-07-26 21:14:02 -04:00
Behdad Esfahbod fa2dfcd560 Fix visibility warnings with MinGW32 2012-07-26 16:06:16 -04:00
Jonathan Kew ac2085d4b3 [CoreText] Ensure cluster indices in output buffer are non-decreasing.
Does not provide Uniscribe-compatible results, but should at least avoid
breaking hb-view due to out-of-order cluster values.

For RTL runs, ensure cluster values are non-increasing (instead of
non-decreasing).
2012-07-26 15:58:45 -04:00
Behdad Esfahbod 441d3bb7de Minor 2012-07-26 12:01:12 -04:00
Behdad Esfahbod 2e7f223054 [hb-old] Fix Arabic cursive positioning
Backporting from upstream:

commit b847f24ce855d24f6822bcd9c0006905e81b94d8
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Wed Jul 25 19:29:16 2012 -0400

    [arabic] Fix Arabic cursive positioning

    This was clearly broken in testing.  Who knows...  Fixes for me.
    Test with a Nastaleeq font, or with Arabic Typesetting.

    Backporting from Chromium.
2012-07-25 19:30:15 -04:00
Behdad Esfahbod 9550a8c4e8 [hb-old] Fixup not-enough-space handling 2012-07-25 19:22:57 -04:00
Behdad Esfahbod 91e721ea86 [hb-old] Fix clusters
Unlike its "documentation", hb-old's log_clusters are, well, indeed
logical, not visual.  Fixup.  Adapted / copied from hb-uniscribe.
2012-07-25 19:20:34 -04:00
Behdad Esfahbod a3313e5400 [GPOS] Fix MarkMarkPos applied to results of MultipleSubst
This was broken as a result of 7b84c536c1.
As Khaled reported, MarkMark positioning was broken with glyphs
resulting from a MultipleSubst.  Fixed.  Test with the ALLAH character
in Amiri.
2012-07-25 18:37:51 -04:00
Behdad Esfahbod 35bdab3cf1 Minor 2012-07-25 11:59:52 -04:00
Behdad Esfahbod 8fe4c7405b [hb-old] Add HarfBuzz.old shaper
Choose using shaper name "old".
2012-07-25 11:11:22 -04:00
Behdad Esfahbod 5e1987005e [hb-old] Define Unicode funcs in terms of new HarfBuzz 2012-07-25 11:11:22 -04:00
Behdad Esfahbod 4a31166b28 [hb-old] Shovel out the line-breaking / word-segmentation stuff 2012-07-25 11:11:22 -04:00
Behdad Esfahbod 0bcbe88cf3 [hb-old] Add visibility attributes 2012-07-25 11:11:22 -04:00
Behdad Esfahbod 6a9d43c317 [hb-old] Remove unused header file 2012-07-25 11:11:22 -04:00
Behdad Esfahbod fb47209c5b [hb-old] Rename hb_buffer_* to HB_Buffer_* 2012-07-25 11:11:22 -04:00
Behdad Esfahbod 1512a73575 [hb-old] Start adding HarfBuzz-old as a new backend 2012-07-25 11:11:16 -04:00
Behdad Esfahbod 478fd0529b Minor 2012-07-24 17:09:01 -04:00
Behdad Esfahbod 8979a7f6f2 [Mongolian] Remove Mongolian Vowel Separator at the end of shaping
Results match Uniscribe now.
2012-07-24 17:03:55 -04:00
Jonathan Kew aa6d849838 [CoreText] Add basic Core Text backend for comparison with our native shaping
Does not attempt to handle clusters in a Uniscribe- or HarfBuzz-compatible way;
just returns the original string indexes that CT maintains. These may even be
out-of-order in the case of reordrant glyphs.
2012-07-24 15:52:32 -04:00
Behdad Esfahbod ec8d249469 Make data members of various OpenType structs protected instead of private
Should fix warnings generated when building with -Wunused-private-field.
Based on patch from Jonathan Kew.
2012-07-24 15:40:37 -04:00
Behdad Esfahbod 97aa0b738a Minor const correctness shuffling 2012-07-24 15:02:34 -04:00
Behdad Esfahbod 6411e74caf [Indic] Reposition Gurmukhi top matras to after post
The font is forming a post-base consonant in some samples, and Uniscribe
positions top matra on the post-base.  Do the same.

Gurmukhi failures down from 59 to 41 (0.0674242%).
2012-07-24 13:48:49 -04:00
Behdad Esfahbod 65c43accdc [Indic] Better position left-matra in Malayalam
Just put it before base, which is what's expected.

Malayalam failures down from 1559 to 1197 (0.114172%).

BENGALI: 353988 out of 354285 tests passed. 297 failed (0.0838308%)
DEVANAGARI: 693571 out of 693628 tests passed. 57 failed (0.00821766%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950956 out of 951913 tests passed. 957 failed (0.100534%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1047219 out of 1048416 tests passed. 1197 failed (0.114172%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271699 out of 271847 tests passed. 148 failed (0.0544424%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)
2012-07-24 03:36:47 -04:00
Behdad Esfahbod 88f413b56f [Indic] Implement Reph+Ya-Phalaa interaction
The sequence Ra,H,Ya in Bengali is ambigious and Unicode encoded that to
get Ya-Phalaa, one would place ZWJ before Halant.  Ie. a ZWJ,H sequence
requests subjoining, while a H,ZWJ requests Half form.  Implement that.

Bengali failures go down from 377 to 297 (0.0838308%).
Gujarati is down by 4 to 17 (0.0046384%).
Kannada is down by 226 to 957 (0.100534%).

Current status:

BENGALI: 353988 out of 354285 tests passed. 297 failed (0.0838308%)
DEVANAGARI: 693571 out of 693628 tests passed. 57 failed (0.00821766%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950956 out of 951913 tests passed. 957 failed (0.100534%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1046857 out of 1048416 tests passed. 1559 failed (0.148701%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271699 out of 271847 tests passed. 148 failed (0.0544424%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)
2012-07-24 03:04:36 -04:00
Behdad Esfahbod dff0ece11d [Indic] Limit matras to 4 per syllable
Also limit joiners.

This limits our syllable length to a constant, and is
closer to what Uniscribe does anyway.

Two Devanagari tests regressed, but who cares about tests with 20
joiners in a row?!  Devanagari at 57 (0.00821766%) now.
2012-07-24 02:37:42 -04:00
Behdad Esfahbod 330b329c89 [Indic] Unmark U+17D1 KHMER SIGN VIRIAM to NOT be a Virama
Fixes another 1 Khmer failure.  Down to 30 (0.0100293%) now.
2012-07-24 02:25:26 -04:00
Behdad Esfahbod 6824a7194e [Indic] Recategorize Khmer various signs as top matras
Khmer failures down from 39 to 31 (0.0103636%).
2012-07-24 02:22:18 -04:00
Behdad Esfahbod d90b8e841e [Indic] Reposition Khmer prebase-reordering Ra around split matras
In Khmer coeng model, a V,Ra can go *after* matras.  If it goes after a
split matra, it should be reordered to *before* the left part of such matra.

Khmer failures down from 136 to 39 (0.0130381%).
2012-07-24 02:11:18 -04:00
Behdad Esfahbod 0afb84c125 [Indic] Fix minor bug in pre-base Ra positioning 2012-07-24 01:44:47 -04:00
Behdad Esfahbod 7573799126 [Indic] Position Khmer U+17CE
Fixes another 6 Khmer failures.  Now at 136 (0.0454661%).
2012-07-24 01:32:07 -04:00
Behdad Esfahbod 8d00e8d0e7 [Indic] Don't reposition Khmer Bindu
Khmer Bindu doesn't like to move to syllable end.  Leave it where it
was.

Brings down Khmer failures from 510 to 142 (0.047572%).
2012-07-24 01:15:34 -04:00
Behdad Esfahbod 2278eefcdb [Indic] In Sinhala, form forced Reph even if no other consonant found
Fixes another 10 Sinhala failures.  Down to 148 (0.0544424%).
2012-07-24 00:31:10 -04:00
Behdad Esfahbod 71fd5e80ad [Indic] Further adjust base algorithm for Sinhala
Apparently if there is C,V,ZWJ,C, the first C will be base, but if
it's C,ZWJ,V,C, the second one will be.

Note that Uniscribe implements this differently, by breaking syllable in
the case of C,ZWJ,V,C and putting the first consonant in one syllable
and the rest in the next syllable.

Sinhala failures down from 208 to 158 (0.0581209%).  No changes to
Khmer.
2012-07-24 00:21:16 -04:00
Behdad Esfahbod 73d71cc527 [Indic] End Vowel-based syllable at ZWJ
One Devanagari test regressed, plus 10 Malayalam (at 1545 now).

Fixed 120 Sinhala failures.  Now at 208 (0.0765136%).
2012-07-24 00:09:12 -04:00
Behdad Esfahbod 34c215036f [Indic] Improve Sinhala base algorithm and reph positioning
Sinhala does not have half forms.  And most (all?) consonants can be
base, except when preceded by ZWJ, which would request a subjoined form.
Hence switch the base algorithm to categorize with Khmer, start search
at start, and stop at a ZWJ.

Also, mark all pos=base consonants after base to be subjoined.  Mark
base itself to have pos=base.

Finally, adjust Sinhala's reph position to after-main.

Brings down Sinhala failures from 455 to 328 (0.120656%).
2012-07-23 23:51:29 -04:00
Behdad Esfahbod 2ec934c6c2 [Indic] Change "unknown" position to end of syllable 2012-07-23 23:49:04 -04:00
Behdad Esfahbod b70021f7c8 When removing zero-width marks, don't remove ligatures
If a mark ligated, it probably should NOT be removed.
2012-07-23 20:18:17 -04:00
Behdad Esfahbod 49c5ec5144 Minor refactoring 2012-07-23 20:14:13 -04:00
Behdad Esfahbod c3e6fdc379 [Indic] Improve check on ligatures
Only skip actual ligatures, not marks in-between ligature components.
2012-07-23 20:11:42 -04:00
Behdad Esfahbod 771a8f5028 [Indic] exclude ligatures when matching on Indic category
If, say, a H,ZWJ,C ligature was formed, we don't want the code to detec
that as a Halant.  So, ignore ligatures when matching category in
final_reordering.

Sinhala failures down from 514 to 455 (0.167374%).
2012-07-23 20:09:30 -04:00
Behdad Esfahbod d1af9e82e5 [GSUB/GPOS] Const correctness 2012-07-23 19:55:35 -04:00
Behdad Esfahbod baacd090df [Indic] Minor refactoring 2012-07-23 19:51:48 -04:00
Behdad Esfahbod c7c4de2fb9 [Indic] Remove syllable length check before sorting
We now limit syllable lengths in the machine.  No need to match here.
2012-07-23 18:25:02 -04:00
Behdad Esfahbod 9fa052733e [Indic] Limit syllables to at most five consonants
Seems to be about what Uniscribe does.  Not exactly.  But close enough.
More consonants will start a new cluster.

A few scripts went way down in failures.  In particular:

  - Devanagari failures went down from 490 to 56.
  - Telugu went down from 113 to 49.

Other scripts went down slightly or didn't change.  New numbers:

BENGALI: 353908 out of 354285 tests passed. 377 failed (0.106412%)
DEVANAGARI: 693572 out of 693628 tests passed. 56 failed (0.00807349%)
GUJARATI: 366485 out of 366506 tests passed. 21 failed (0.00572978%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950730 out of 951913 tests passed. 1183 failed (0.124276%)
KHMER: 298613 out of 299124 tests passed. 511 failed (0.170832%)
MALAYALAM: 1046881 out of 1048416 tests passed. 1535 failed (0.146411%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271333 out of 271847 tests passed. 514 failed (0.189077%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)

Some of the remaining Telugu and Devanagari issues seem to be Uniscribe
eating Anusvara when placed before a non-joiner.  Ouch!
2012-07-23 18:19:17 -04:00
Behdad Esfahbod 093cd58326 [Thai] Fix SARA AM handling
Oops, thinko.
2012-07-23 14:04:42 -04:00
Behdad Esfahbod 42848453bf [Thai] Reorder U+0E3A THAI VOWEL SIGN PHINTHU
Uniscribe reorders U+0E3A to be after U+0E38 and U+0E39.  We do that by
modifying the ccc for U+0E3A.

Fixes the two remaining Thai failures (see previous commit).
2012-07-23 13:52:07 -04:00
Behdad Esfahbod 4a7f4f3e56 [Thai] Adjust SARA AM reordering to match Uniscribe
Adjust the list of marks before SARA AM that get the reordering
treatment.  Also adjust cluster formation to match Uniscribe.

With Wikipedia test data, now I see:

  - For Thai, with the Angsana New font from Win7, I see 54 failures out
    of over 4M tests  (0.00129107%).  Of the 54, two are legitimate
    reordering issues (fix coming soon), and the other 52 are simply
    Uniscribe using a zero-width space char instead of an unknown
    character for missing glyphs.  No idea why.  The missing-glyph
    sequences include one that is a Thai character followed by an Arabic
    Sokun.  Someone confused it with Nikhahit I assume!

  - For Lao, with the Dokchampa font from Win7, 33 tests fail out of
    54k (0.0615167%).  All seem to be insignificant mark positioning
    with two marks on a base.  Have to investigate.
2012-07-23 13:15:33 -04:00
Behdad Esfahbod 2cc933aff9 [Indic] Fix cluster formation with left-matras and conjunct forms
Test case was: <U+0D15,U+0D4D,U+0D15,U+0D4A>.
2012-07-23 08:23:44 -04:00
Behdad Esfahbod e6b01a878c [Indic] Further streamline cluster formation
This should address all possible cluster misformations that I had in
mind.
2012-07-23 00:11:26 -04:00
Behdad Esfahbod 7b2a7dadd6 [Indic] Merge clusters before sorting
This should fix any instabilities in cluster formation that we were
speculating may happen with surrounding syllables.  Or most of it
perhaps.
2012-07-22 23:58:55 -04:00
Behdad Esfahbod abb3239ef9 [Indic] Update clusters for left-matra even if matra didn't move
Fixes crashes reported with left matra under
non-uniscribe-bug-compatibilty mode.
2012-07-22 23:55:19 -04:00
Behdad Esfahbod 92a1ad7bef [Indic] Stop searching for base if a post form is found before below form
Improves Bengali and Gurmukhi.  Malayalam regressed a bit.  We will deal
with that later.
2012-07-20 18:55:15 -04:00
Behdad Esfahbod 4c450c703f [Indic] Recompose Bengali Ya,Nukta
This is a bunch of hacks for now.

Improves Bengali a bit.
2012-07-20 18:13:04 -04:00
Behdad Esfahbod e9c0f152a3 [Uniscribe] Fix script fallback
Gurmukhi failures half now.  Others changed slightly.
2012-07-20 17:37:48 -04:00
Behdad Esfahbod 5791f32915 [Indic] Allow a ZWNJ after SM's
Malayalam failures go way down.  Other scripts benefitted slightly too.
Sinhala had one or two test regressions, but...
2012-07-20 16:26:55 -04:00
Behdad Esfahbod 34ae336f3f [Indic] Improve Reph AfterMain positioning
Fixes 20 out of 48 failing Oriya tests.  Failure rate down to 0.066% now.
2012-07-20 16:17:28 -04:00
Behdad Esfahbod bdd080431a [Indic] Reposition Oriya Candrabindu
Oriya failures down from 0.65% to 0.20%.
2012-07-20 16:03:09 -04:00
Behdad Esfahbod 5f0eaaad12 [Indic] Fix base search in final_reordering
Fixes most Malayalam failures.  Down from 1.6% to 0.38% now.  Fixes a
few more in other scripts too.
2012-07-20 15:47:24 -04:00
Behdad Esfahbod 81202bd860 [Indic] Don't attach SM/VD to other characters 2012-07-20 15:14:51 -04:00
Behdad Esfahbod efb4ad7356 Fix compiler warnings
If x is not constant, we cannot ASSERT_STATIC on it.
2012-07-20 14:27:38 -04:00
Behdad Esfahbod f31d97e44e [Indic] Form Telugu Reph out of Ra,Virama,ZWJ
Apparently this was approved in Feb 2012.  No font yet.
2012-07-20 14:13:35 -04:00
Behdad Esfahbod 2e193b240e [Indic] Don't split U+0AC9
Althought IndicMatraCategory.txt classifies it as Top_And_Right matra,
it does not have Unicode decomposition, and Uniscribe does not do
anything special about it either.

Gujarati failures down from 0.672% to 0.0130966%.
2012-07-20 14:02:35 -04:00
Behdad Esfahbod 30c3d5e9fc [Indic] Simplify Uniscribe cluster emulation
Now that we break syllables on Halant,ZWNJ, this code can be simplified.
2012-07-20 13:56:32 -04:00
Behdad Esfahbod decf6ffca4 [Indic] Minor! 2012-07-20 13:51:31 -04:00
Behdad Esfahbod 9e4f94a72c [Indic] Break syllables at Halant,ZWNJ
That's really what Uniscribe does, and explains a lot of pecularities of
Halant,ZWNJ before the base.

Sent Telugu from 1% failures to 0.03%.  Improved Kannada and Malayalam
slightly.  Fixed half of Bengali, and did NOT break anything!
2012-07-20 13:48:03 -04:00
Behdad Esfahbod 2c372b80f6 [Indic] Better check for applying 'init'
Specifically, don't apply 'init' if previous char is a joiner.

Fixes some more of Bengali.
2012-07-20 13:37:48 -04:00
Behdad Esfahbod 34a7440b7c [GPOS] Don't zero mark advances
Fixes more of Telugu, Kannada, and Oriya.

May break things (outside Indic...), but we cannot think of any font relying
on this immediately.
2012-07-20 12:40:39 -04:00
Behdad Esfahbod 8ed248de77 [Indic] Minor 2012-07-20 11:42:24 -04:00
Behdad Esfahbod d0e68dbd0b [Indic] Implement reph positioning step 5
Not tuned, just copied from step 2.  Fixes another 0.5% of Kannada
failures.  1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod a9e45c32e4 [Indic] Don't let ZWNJ at the end of syllable affect base search
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod 20b68e699f [Indic] Apply 'cjct' globally
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod 51e764de44 [Indic] Unbreak old scriptures
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod 900cf3d449 Minor 2012-07-20 10:18:23 -04:00
Behdad Esfahbod 87cd63266e [Indic] Recategorize some Kannada right matras
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod 3604d64ced [Indic] Recategorize GURMUKHI ADDAK
It's not in IndicSyllabicCategory.txt.  Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod 8932858123 Minor 2012-07-19 21:02:38 -04:00
Behdad Esfahbod 47ef931f13 [buffer] Make sure out_info = info during GPOS 2012-07-19 20:52:44 -04:00
Behdad Esfahbod ae63cf2062 Print line number during return when tracing 2012-07-19 20:45:41 -04:00
Behdad Esfahbod 5249f3aee1 [Indic] Unbreak Khmer
For Khmer, all consonants are subjoining.  No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod e0475345d5 [Indic] Apply 'akhn' globally
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod fa247ebe52 [Indic] Better position U+0CD5
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod f055442716 [Indic] Lookup consonant position in the font
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00