Commit Graph

304 Commits

Author SHA1 Message Date
Behdad Esfahbod 64bb2ae857 Didn't mean to push this out
Ouch!
2013-02-12 16:29:25 -05:00
Behdad Esfahbod f9b660534c [Myanmar] Use master Indic table for syllable data 2013-02-12 16:13:56 -05:00
Behdad Esfahbod f60793e854 [tests] Add Cham sample 2013-02-12 15:45:59 -05:00
Behdad Esfahbod 3a83d33ec0 Add South-East Asian shaper
Handles Tai Tham, Cham, and New Tai Lue for now.
2013-02-12 12:14:10 -05:00
Behdad Esfahbod fb96021206 Minor test reshufflings 2013-02-12 10:33:58 -05:00
Behdad Esfahbod 5676d5d527 [Indic] Make sure New Tai Lue works! 2013-02-12 10:31:14 -05:00
Behdad Esfahbod bed687f886 Shuffle test data around 2013-02-11 14:24:03 -05:00
Behdad Esfahbod 5898fa94d1 Don't use $(ENV)
As reported by Peter Breitenlohner:

I think this is a very bad idea because ENV is used to specify a startup
file to be read by some/all shells.
2013-02-06 15:29:07 -05:00
Behdad Esfahbod ecd454b3cd [Indic] In old-spec shaping, don't move viramas around if seq ends with one
For example: u0c9a u0ccd u0c9a u0ccd with Lohit.  See:

https://bugs.freedesktop.org/show_bug.cgi?id=59118
2013-01-08 18:09:46 -06:00
Behdad Esfahbod 1172dc7362 Rename hb_buffer_clear() to hb_buffer_clear_contents()
The previous name was clashing with harfbuzz.old.  There are systems
that need to link both...

Clash-free now again.
2013-01-07 16:46:37 -06:00
Behdad Esfahbod e81aff9ef7 [tests] Finish test-set.c
All passing now.
2013-01-02 23:22:54 -06:00
Behdad Esfahbod 8165f2765b [tests] Start adding tests for hb-set.h
Fails now.  Fixing.
2013-01-02 22:50:36 -06:00
Behdad Esfahbod b9d28f696c [tests] Add set object to test-object.c 2013-01-02 22:49:58 -06:00
Behdad Esfahbod 8b217f5ac5 [Indic] Reorder Malayalam dot-reph to after base
Test sequence is simple: U+0D4E,U+0D15.  The doth-reph should be
reordered to after the Ka.

https://bugzilla.redhat.com/show_bug.cgi?id=799565
2012-12-21 15:49:26 -05:00
Behdad Esfahbod 624933f676 Add Persian test cases from Mehran Mehr 2012-11-30 11:46:35 +02:00
Behdad Esfahbod 43b6531500 [Indic] Another try to unbreak Sinhala split matras
Just read the comments...
2012-11-16 13:14:26 -08:00
Behdad Esfahbod f3584d3a3a Add test cases for Thai PUA shaping 2012-11-14 17:53:06 -08:00
Behdad Esfahbod 6b19fa4862 Adjust diff rule for the new hb-shape output format 2012-11-14 11:38:50 -08:00
Behdad Esfahbod 82c4d9880a Add Sinhala test case for split matra U+0DDA 2012-11-14 10:56:02 -08:00
Behdad Esfahbod d04b128531 Fix test 2012-11-14 10:53:10 -08:00
Behdad Esfahbod 0c7df22228 Add buffer flags
New API:

	hb_buffer_flags_t

	HB_BUFFER_FLAGS_DEFAULT
	HB_BUFFER_FLAG_BOT
	HB_BUFFER_FLAG_EOT
	HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES

	hb_buffer_set_flags()
	hb_buffer_get_flags()

We use the BOT flag to decide whether to insert dottedcircle if the
first char in the buffer is a combining mark.

The PRESERVE_DEFAULT_IGNORABLES flag prevents removal of characters like
ZWNJ/ZWJ/...
2012-11-13 14:42:35 -08:00
Behdad Esfahbod c8d4f8b0fe Minor 2012-11-13 14:10:19 -08:00
Behdad Esfahbod 82ecaff736 Add hb_buffer_clear()
Which is like _reset(), but does NOT clear unicode-funcs.
2012-11-13 14:10:00 -08:00
Behdad Esfahbod de796a6fb9 Add "new" Myanmar OT Script tag
Windows 8 added support for Myanmar shaping using the "mym2" script tag,
even though Windows never supported the old "mymr" tag.
2012-11-12 17:27:51 -08:00
Behdad Esfahbod 27f52dc3f6 Add Myanmar tests from UTN#11 2012-11-12 16:54:03 -08:00
Behdad Esfahbod e6b86c8519 Add test for non-joining Mongolian letters
For U+1880..U+1886 Uniscribe thinks they are non-joining.
For U+1887 Uniscribe thinks it's joining, but looks wrong to me.
2012-11-05 15:18:49 -08:00
Behdad Esfahbod f5e55754f9 Add Tifinagh test data 2012-11-02 13:53:18 -07:00
Behdad Esfahbod c21498afd8 Add Mongolian and 'Phags-pa joining test cases 2012-11-02 10:21:26 -07:00
Behdad Esfahbod 431bef2e16 Minor build fix 2012-11-01 16:26:01 -07:00
Behdad Esfahbod 911ed09698 Ignore gid0 in test results 2012-10-29 19:42:19 -07:00
Behdad Esfahbod 10b88d89ef Add Ethiopic test case
This sequence: U+120B,U+135F,U+120B with the Nyala font from Win7
exposes a GPOS bug in Uniscribe, in that the positioned mark is wrongly
moved as a result a following kern.

This is the one "failure" in the Ethiopic test suite :-).

ETHIOPIC: 118900 out of 118901 tests passed. 1 failed (0.000841036%)
2012-10-29 18:26:00 -07:00
Behdad Esfahbod 166b5cf7ec [Indic] Find syllables before any features are applied
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions.  That was then causing syllable match failures.  We now
find syllables before any features have been applied.

Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
2012-09-07 14:56:01 -04:00
Behdad Esfahbod efb8d3eb71 Fixup test failure reporting
After we implemented dotted-circle, we were still ignoring any tests
that had dottedcircle in it for any of the shapers.  That meant that if
we wrongly outputted dottedcircle, the test was being ignored.  Ouch!

Fixing that shows regressions across the board.  Most are Uniscribe
bugs: NOT inserting dotted-circle when it should.  Some are arou
machine bugs.  This is in fact a nice way to catch Indic-machine
deficiencies and when I fix the regressions, our clusters should be
much closer to Uniscribe.  For now, we regressed from:

BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

To:

BENGALI: 353990 out of 354285 tests passed. 295 failed (0.0832663%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366447 out of 366506 tests passed. 59 failed (0.016098%)
GURMUKHI: 60707 out of 60809 tests passed. 102 failed (0.167738%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 298962 out of 299124 tests passed. 162 failed (0.0541581%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048074 out of 1048416 tests passed. 342 failed (0.0326206%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091835 out of 1091837 tests passed. 2 failed (0.000183178%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)

Investigating.
2012-09-05 15:57:38 -04:00
Behdad Esfahbod a4e75e4128 Minor 2012-08-27 15:54:15 -04:00
Behdad Esfahbod 206ab60573 [test] Move around 2012-08-10 09:06:30 -04:00
Behdad Esfahbod 7a484c601e [test] Add Urdu ligature sequences from CRULP 2012-08-10 09:05:29 -04:00
Behdad Esfahbod 378d279bbf Implement Unicode compatibility decompositions
Based on patch from Philip Withnall.
https://bugs.freedesktop.org/show_bug.cgi?id=41095
2012-07-31 21:36:16 -04:00
Behdad Esfahbod 70b3dc3272 Add Hebrew test 2012-07-30 12:40:18 -04:00
Behdad Esfahbod a973b5ce86 [GSUB] Further adjustments to mark-attachment vs ligation interaction
The d1d69ec52e change broke Kannada badly,
since it was ligating consonants, pushing matra out, and then ligating
with the matra.  Adjust for that.  See comments.
2012-07-30 01:47:46 -04:00
Behdad Esfahbod 97a201becf Add Arabic tests for mark ligature component attachments 2012-07-29 20:37:29 -04:00
Behdad Esfahbod 5d874d566f [GPOS] Fix mark-to-mark positioning when one of the marks is a ligature
This commit: a3313e5400 broke MarkMarkPos
when one of the marks itself is a ligature.  That regressed 26 Tibetan
tests (up from zero!).  Fix that.  Tibetan back to zero.
2012-07-28 21:05:25 -04:00
Behdad Esfahbod 6411e74caf [Indic] Reposition Gurmukhi top matras to after post
The font is forming a post-base consonant in some samples, and Uniscribe
positions top matra on the post-base.  Do the same.

Gurmukhi failures down from 59 to 41 (0.0674242%).
2012-07-24 13:48:49 -04:00
Behdad Esfahbod c3f769ba09 [Indic] Ignore Uniscribe output containing two zero-width space glyphs
Uniscribe is buggy and sometimes /eats/ a mark next to a non-joiner.
Most of Malayalam failures where actually hitting this bug.

Ignore test output with two zero-width space glyphs.  This is a hack
until we build up the test suite infrastructure better.

Bengali went down by 9, Devanagari by 2, Kannada by 130, Malayalm down
from 1197 to 307, Sinhala down by 16, Telugu down by 26.  New stats:

BENGALI: 353996 out of 354285 tests passed. 289 failed (0.0815727%)
DEVANAGARI: 693573 out of 693628 tests passed. 55 failed (0.00792932%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1048109 out of 1048416 tests passed. 307 failed (0.0292823%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271715 out of 271847 tests passed. 132 failed (0.0485567%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970550 out of 970573 tests passed. 23 failed (0.00236973%)
2012-07-24 13:26:32 -04:00
Behdad Esfahbod 65c43accdc [Indic] Better position left-matra in Malayalam
Just put it before base, which is what's expected.

Malayalam failures down from 1559 to 1197 (0.114172%).

BENGALI: 353988 out of 354285 tests passed. 297 failed (0.0838308%)
DEVANAGARI: 693571 out of 693628 tests passed. 57 failed (0.00821766%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950956 out of 951913 tests passed. 957 failed (0.100534%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1047219 out of 1048416 tests passed. 1197 failed (0.114172%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271699 out of 271847 tests passed. 148 failed (0.0544424%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)
2012-07-24 03:36:47 -04:00
Behdad Esfahbod 88f413b56f [Indic] Implement Reph+Ya-Phalaa interaction
The sequence Ra,H,Ya in Bengali is ambigious and Unicode encoded that to
get Ya-Phalaa, one would place ZWJ before Halant.  Ie. a ZWJ,H sequence
requests subjoining, while a H,ZWJ requests Half form.  Implement that.

Bengali failures go down from 377 to 297 (0.0838308%).
Gujarati is down by 4 to 17 (0.0046384%).
Kannada is down by 226 to 957 (0.100534%).

Current status:

BENGALI: 353988 out of 354285 tests passed. 297 failed (0.0838308%)
DEVANAGARI: 693571 out of 693628 tests passed. 57 failed (0.00821766%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%)
KANNADA: 950956 out of 951913 tests passed. 957 failed (0.100534%)
KHMER: 299094 out of 299124 tests passed. 30 failed (0.0100293%)
MALAYALAM: 1046857 out of 1048416 tests passed. 1559 failed (0.148701%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271699 out of 271847 tests passed. 148 failed (0.0544424%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%)
2012-07-24 03:04:36 -04:00
Behdad Esfahbod 330b329c89 [Indic] Unmark U+17D1 KHMER SIGN VIRIAM to NOT be a Virama
Fixes another 1 Khmer failure.  Down to 30 (0.0100293%) now.
2012-07-24 02:25:26 -04:00
Behdad Esfahbod d90b8e841e [Indic] Reposition Khmer prebase-reordering Ra around split matras
In Khmer coeng model, a V,Ra can go *after* matras.  If it goes after a
split matra, it should be reordered to *before* the left part of such matra.

Khmer failures down from 136 to 39 (0.0130381%).
2012-07-24 02:11:18 -04:00
Behdad Esfahbod 7573799126 [Indic] Position Khmer U+17CE
Fixes another 6 Khmer failures.  Now at 136 (0.0454661%).
2012-07-24 01:32:07 -04:00
Behdad Esfahbod 2278eefcdb [Indic] In Sinhala, form forced Reph even if no other consonant found
Fixes another 10 Sinhala failures.  Down to 148 (0.0544424%).
2012-07-24 00:31:10 -04:00
Behdad Esfahbod 71fd5e80ad [Indic] Further adjust base algorithm for Sinhala
Apparently if there is C,V,ZWJ,C, the first C will be base, but if
it's C,ZWJ,V,C, the second one will be.

Note that Uniscribe implements this differently, by breaking syllable in
the case of C,ZWJ,V,C and putting the first consonant in one syllable
and the rest in the next syllable.

Sinhala failures down from 208 to 158 (0.0581209%).  No changes to
Khmer.
2012-07-24 00:21:16 -04:00