Commit Graph

2567 Commits

Author SHA1 Message Date
Behdad Esfahbod 4c1d924461 Minor 2012-10-28 20:27:25 -07:00
Behdad Esfahbod 38b015e57f Fix hb_buffer_set_length(buffer, 0)
Was causing invalid realloc()s.
2012-10-28 20:11:47 -07:00
Behdad Esfahbod b7115b63be Add XXX 2012-10-28 20:11:42 -07:00
Behdad Esfahbod 71ee1f2450 Port to ICU LayoutEngine C API
Incidentally, this makes it not crash with icu-le-hb anymore...
I'm not smart / stupid enough to spend two more days debugging C++
linking issues, and this is ABI-stable at least.
2012-10-28 19:18:11 -07:00
Behdad Esfahbod 0144f05e57 Remove unused members 2012-10-26 13:48:06 -07:00
Behdad Esfahbod cf3afd8979 Rename and revamp is_zero_width() to be is_default_ignorable()
That's really the logic desired.  Except that MONGOLIAN VOWEL SEPARATOR
is not default_ignorable but it really should be.  Reported to Unicode.

Based on suggestion from Konstantin Ritt.
2012-10-25 16:32:54 -07:00
Behdad Esfahbod a724139e64 Update TODO 2012-10-24 14:02:15 -07:00
Behdad Esfahbod 13c0584729 0.9.5 2012-10-14 18:37:09 -05:00
Behdad Esfahbod fecdfa95da Fixup hb_ot_shape_closure()
Broke it when merged cmap mapping and normalizer.  Ouch!
2012-10-07 17:19:58 -04:00
Behdad Esfahbod 2d1dcb3ce3 Mark debug message functions static 2012-10-07 17:13:46 -04:00
Behdad Esfahbod 9947bd6daf Update UCDN to upstream commit 3f159c87824230b59af56e40e2db32caf6afa51a
- Unicode 6.2.0 goodness,
- Unassigned codepoints now have correct properties.  Passes test suite.
2012-10-02 20:44:43 -04:00
Behdad Esfahbod 32dbfcf763 Fix visibility of UCDN symbols 2012-10-02 17:42:13 -04:00
Behdad Esfahbod 3f33f0d1f2 Import UCDN into source tree
https://github.com/grigorig/ucdn
2012-10-02 16:23:29 -04:00
Behdad Esfahbod 0e292eb2a2 Remove Glib thread-safety support
Now that we have pthread detection in configure, we don't need Glib
anymore.  Glib will only be a Unicode data provider.
2012-10-02 15:09:38 -04:00
Behdad Esfahbod 66efe89648 Check for pthreads 2012-10-02 14:55:32 -04:00
Behdad Esfahbod 10a8162ddd Add ax_pthread.m4 2012-10-02 14:46:34 -04:00
Behdad Esfahbod 8ac34bc6ff Add pkg.m4 to git repo 2012-10-02 14:46:04 -04:00
Behdad Esfahbod c7afac0aa6 Add AC_CONFIG_MACRODIR 2012-10-02 14:44:47 -04:00
Behdad Esfahbod f2eb3fa9dc [OT] Only insert dottedcircle if at the beginning of paragraph
If the first char in the run is a combining mark, but there is text
before the run, don't insert dottedcircle.

Part of addressing:
https://bugzilla.redhat.com/show_bug.cgi?id=858736
2012-09-25 21:35:35 -04:00
Behdad Esfahbod bdc2fc8294 [Arabic] Respect Arabic joining from neighboring context
Now we respect Arabic joining across runs.
2012-09-25 21:32:35 -04:00
Behdad Esfahbod 05207a79e0 [buffer] Save pre/post textual context
To be used for a variety of purposes.  We save up to five characters
in each direction.  No public API changes, everything is taken care
of already.  All clients need to do is to call hb_buffer_add_utf* with
the full text + segment info (or at least some context) instead of
just passing in the segment.

Various operations (hb_buffer_reset, hb_buffer_set_length,
hb_buffer_add*) automatically reset the relevant contexts.
2012-09-25 21:32:21 -04:00
Behdad Esfahbod 89ac39dbbe Add hb_utf_prev() 2012-09-25 13:59:24 -04:00
Behdad Esfahbod 70ea4ac688 Slightly optimize UTF-8 parsing 2012-09-25 12:30:16 -04:00
Behdad Esfahbod 4445e5e2ec [buffer] Cleanup / optimize UTF-16 parsing a bit 2012-09-25 12:26:12 -04:00
Behdad Esfahbod 1f66c3c1a0 Add hb_utf_strlen()
Speeds up UTF-8 parsing by calling strlen().
2012-09-25 11:42:16 -04:00
Behdad Esfahbod 7f19ae7b9f [buffer] Templatize UTF handling
Also move UTF routines into a separate file, to be reused from shapers
that need it.
2012-09-25 11:23:55 -04:00
Behdad Esfahbod 0e0a4da9b7 [buffer] Towards template'izing different UTF adders 2012-09-25 11:09:04 -04:00
Behdad Esfahbod 7d37280600 Minor 2012-09-25 11:04:41 -04:00
Behdad Esfahbod 54d5da4ee9 Remove unused indic.cc 2012-09-25 10:51:42 -04:00
Behdad Esfahbod fab7a71f11 [Indic] Import ragel-generated Indic machine in git
I don't expect ragel to be creating too much noise in its generated
output, and including this in-tree helps users right now.  We can
revisit this later if it proved to be too much trouble.
2012-09-24 21:51:13 -04:00
Behdad Esfahbod 20a840c7cd Use a C++ linker on Windows
On Windows we don't care whether or not we link to libstdc++.
Seems to fix build with mingw32 on msys, as reported by Werner.
2012-09-24 20:23:00 -04:00
Behdad Esfahbod eb7669a380 Better autofoo 2012-09-18 19:42:06 -04:00
Behdad Esfahbod d00f7d8375 Fix dependencies 2012-09-17 20:59:09 -04:00
Behdad Esfahbod 811eefe225 Return NULL, not false
Oh well...
2012-09-10 09:56:27 -04:00
Behdad Esfahbod 166b5cf7ec [Indic] Find syllables before any features are applied
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions.  That was then causing syllable match failures.  We now
find syllables before any features have been applied.

Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
2012-09-07 14:56:01 -04:00
Behdad Esfahbod 96fdc04e5c Add hb_buffer_[sg]et_content_type
And hb_buffer_content_type_t and enum values.
2012-09-06 22:30:53 -04:00
Behdad Esfahbod e30ebd2794 Add hb_feature_to/from_string() 2012-09-06 22:09:06 -04:00
Behdad Esfahbod f67917161b [OT] Do per-ligature-component fallback mark positioning
With this in place, you can remove GDEF/GSUB/GPOS tables from Arabic
fonts and still get per-component marks positioned on
oh-yeah-fallback-formed LAM-ALEF ligatures with marks in between the LAM
and ALEF.

Now *that*'s pretty cool, if a bit anachronistic...
2012-09-06 17:22:31 -04:00
Behdad Esfahbod 525c685578 [OT] Make fallback mark positioning more robust
...with clusters spanning multiple base characters.
2012-09-06 16:02:07 -04:00
Behdad Esfahbod 5d502443f5 [old] Clear offset array 2012-09-06 15:29:29 -04:00
Behdad Esfahbod 9433c218b4 [OT] Simplify fallback positioning condition 2012-09-06 14:27:15 -04:00
Behdad Esfahbod 028a1706f8 Refactor common macro 2012-09-06 14:25:48 -04:00
Behdad Esfahbod 07cfbe21b5 [OT] Streamline Arabic fallback shaping table 2012-09-06 01:16:39 -04:00
Behdad Esfahbod 82f6b6f388 Minor 2012-09-06 01:12:50 -04:00
Behdad Esfahbod fabd3113a9 [OT] Port Arabic fallback shaping to synthetic GSUB
All of init/medi/fina/isol and rlig implemented.

Let there be dragons... ⻯
2012-09-06 00:51:44 -04:00
Behdad Esfahbod f0b8ed1b6d [Indic] Allow "H,ZWJ,M"
Uniscribe accepts a Halant,ZWJ before matras.  Allow that.

BENGALI down from 295 to 291
DEVANAGARI down from 69 to 57
GUJARATI down from 19 to 17
KANNADA down from 871 to 867
MALAYALAM down from 340 to 337
TELUGU down from 20 to 16

Currently at:

BENGALI: 353897 out of 354188 tests passed. 291 failed (0.0821598%)
DEVANAGARI: 707337 out of 707394 tests passed. 57 failed (0.00805774%)
GUJARATI: 366440 out of 366457 tests passed. 17 failed (0.00463902%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951046 out of 951913 tests passed. 867 failed (0.0910798%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047997 out of 1048334 tests passed. 337 failed (0.0321462%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970557 out of 970573 tests passed. 16 failed (0.00164851%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-09-05 17:41:08 -04:00
Behdad Esfahbod 4ed717ef61 [Indic] Relax grammar
Now that we insert dotted-circle, tests break more easily when our indic
machine breaks.

In particular, a few Devanagari tests were having sequences like
"C,H,ZWJ,N", and because of the ZWJ the Nukta does NOT get reordered to
before the Halant as the grammar used to expect...  Fixup.

Another case is as simple as "C,ZWJ,SM".

Fixes 10 out of 79 failures:

DEVANAGARI: 707325 out of 707394 tests passed. 69 failed (0.00975411%)
2012-09-05 17:21:17 -04:00
Behdad Esfahbod aa7141efe4 [Indic] Fix Khmer syllable-final coeng-consonant
Brings down Khmer failures from 162 to 47.

KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)

Also rebaselined some of the test files that had only-inherited lines.
Removing those, the stats are:

BENGALI: 353893 out of 354188 tests passed. 295 failed (0.0832891%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366438 out of 366457 tests passed. 19 failed (0.00518478%)
GURMUKHI: 60704 out of 60747 tests passed. 43 failed (0.0707854%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 299077 out of 299124 tests passed. 47 failed (0.0157125%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1047994 out of 1048334 tests passed. 340 failed (0.0324324%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

Still some regressions, but some of the more egregious cases are
addressed.
2012-09-05 17:14:52 -04:00
Behdad Esfahbod efb8d3eb71 Fixup test failure reporting
After we implemented dotted-circle, we were still ignoring any tests
that had dottedcircle in it for any of the shapers.  That meant that if
we wrongly outputted dottedcircle, the test was being ignored.  Ouch!

Fixing that shows regressions across the board.  Most are Uniscribe
bugs: NOT inserting dotted-circle when it should.  Some are arou
machine bugs.  This is in fact a nice way to catch Indic-machine
deficiencies and when I fix the regressions, our clusters should be
much closer to Uniscribe.  For now, we regressed from:

BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

To:

BENGALI: 353990 out of 354285 tests passed. 295 failed (0.0832663%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366447 out of 366506 tests passed. 59 failed (0.016098%)
GURMUKHI: 60707 out of 60809 tests passed. 102 failed (0.167738%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 298962 out of 299124 tests passed. 162 failed (0.0541581%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048074 out of 1048416 tests passed. 342 failed (0.0326206%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091835 out of 1091837 tests passed. 2 failed (0.000183178%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

Investigating.
2012-09-05 15:57:38 -04:00
Behdad Esfahbod 27bd55bd2c [Indic] Tamil does not have half-forms either
The Win7 Tamil font does not realy on this behavior, but the WinXP
version does.  Handle Tamil like Malayalam: Matras always move to
before base.

WinXP Tamil failures went down from 168964 (15.4752%) to 167
(0.0152953%) (two orders of magnitude reduction!).

Included in this is a minor fixup that actually fixed a few tests
with non-Tamil too.  Numbers at:

BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
2012-09-05 15:22:02 -04:00