Commit Graph

2262 Commits

Author SHA1 Message Date
Behdad Esfahbod d0e68dbd0b [Indic] Implement reph positioning step 5
Not tuned, just copied from step 2.  Fixes another 0.5% of Kannada
failures.  1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod a9e45c32e4 [Indic] Don't let ZWNJ at the end of syllable affect base search
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod 20b68e699f [Indic] Apply 'cjct' globally
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod 51e764de44 [Indic] Unbreak old scriptures
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod 900cf3d449 Minor 2012-07-20 10:18:23 -04:00
Behdad Esfahbod 87cd63266e [Indic] Recategorize some Kannada right matras
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod 3604d64ced [Indic] Recategorize GURMUKHI ADDAK
It's not in IndicSyllabicCategory.txt.  Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod 8932858123 Minor 2012-07-19 21:02:38 -04:00
Behdad Esfahbod 47ef931f13 [buffer] Make sure out_info = info during GPOS 2012-07-19 20:52:44 -04:00
Behdad Esfahbod ae63cf2062 Print line number during return when tracing 2012-07-19 20:45:41 -04:00
Behdad Esfahbod 5249f3aee1 [Indic] Unbreak Khmer
For Khmer, all consonants are subjoining.  No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod e0475345d5 [Indic] Apply 'akhn' globally
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod c87bcddb10 [Indic] Add failing test for Kannada 2012-07-19 20:03:25 -04:00
Behdad Esfahbod fa247ebe52 [Indic] Better position U+0CD5
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod f055442716 [Indic] Lookup consonant position in the font
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00
Behdad Esfahbod 74d1d88781 [GSUB] Fix would_apply() for LigatureSubst 2012-07-19 16:14:23 -04:00
Behdad Esfahbod 787f7d1e9b [TODO] Minor 2012-07-19 15:29:13 -04:00
Behdad Esfahbod be73a5f936 Add src/test-would-substitute tool 2012-07-19 15:12:18 -04:00
Behdad Esfahbod e72b360ac6 Refactor / finish would_apply() operation
Untested.
2012-07-19 14:44:46 -04:00
Behdad Esfahbod 8c973ebf0f [Indic] Implement per-script matra positioning
Following what the spec says.

Brings down Telugu failures from 40% to 3.75%, and Kannada failures from
44% to 10%.  Does NOT affect other scripts' test results.
2012-07-19 13:25:08 -04:00
Behdad Esfahbod 8bb32458f9 [Indic] More refactoring 2012-07-19 13:04:44 -04:00
Behdad Esfahbod 9ccc6382ba [Indic] Minor refactoring 2012-07-19 12:45:31 -04:00
Behdad Esfahbod f83aaa3133 [Indic] Minor 2012-07-19 12:23:23 -04:00
Behdad Esfahbod be8b9f5f71 [Indic] Start refactoring different matra positions per script 2012-07-19 12:11:12 -04:00
Behdad Esfahbod deeb540a74 [test] Ignore tests with DOTTED CIRCLE in the output 2012-07-19 11:30:48 -04:00
Behdad Esfahbod b01d9b3d90 [Indic] Disallow decomposition of a couple characters
This is a hack for now.  Will be fixed when we do complex-shaper-driven
normalization properly.

The results with or without decomposition are the same, but Uniscribe
does not normalize, so this matches better.
2012-07-19 11:25:49 -04:00
Behdad Esfahbod 422ecd2d3c [Indic] Accept a forced Rakar sequence at the end of syllable
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra.  If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe.  Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence...  So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.

Fixes some 100 or more of Sinhala failures.  Now at 622 only (0.23%).
2012-07-18 23:25:58 -04:00
Behdad Esfahbod 6fc1732003 [Indic] Allow joiners on both sides of Halant at the same time
The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
for Rakar.  Fixes two-thousand Sinhala tests.  Not many left.
2012-07-18 17:49:19 -04:00
Behdad Esfahbod 10cdc94eee [Indic] In final reordering, find base, even if it disappeared
POS_BASE can disappear if base ligated backward.  Define base as last
with position not after base.

Fixes a few hundred of Sinhala failures with Iskoola Pota.
2012-07-18 17:43:23 -04:00
Behdad Esfahbod 9c4d24a3a6 [Indic] Minor 2012-07-18 17:29:10 -04:00
Behdad Esfahbod 3285e107c9 [Indic] Implement Sinhala "Al Lakuna" Reph behavior
In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
2012-07-18 17:22:14 -04:00
Behdad Esfahbod 91cade7555 [Indic/Unicode] Decompose Sinhala split matras the way Uniscribe likes
Makes no visual difference.

Fixes most of the failures.  Down from 15% to 1.3%!
2012-07-18 16:50:41 -04:00
Behdad Esfahbod d8942dcbb4 Apply Tibetan (global) features.
Fixes all Tibetan failures.  All 180k of them!

Merges back Hangul into the default shaper.
2012-07-18 16:34:10 -04:00
Behdad Esfahbod 552d19b7a1 [Indic] Treat Register Shifters like Nukta
Really this time.

Fixes another 18 Khmer tests.
2012-07-18 16:02:33 -04:00
Behdad Esfahbod e8cd81f76d [Indic] Minor 2012-07-18 16:00:20 -04:00
Behdad Esfahbod 69f26bf39c [Indic] Fix Matra reordering when base is at end of syllable
For example: U+915,U+200c,U+93f

Fixes last Tamil failure!
2012-07-18 15:47:51 -04:00
Behdad Esfahbod d16ccc4ae7 Leave one extra item at the end of buffer allocation
Just in case, for the times we do out-of-bounds access.

jk
2012-07-18 15:43:55 -04:00
Behdad Esfahbod 075d671f10 [Indic] Fix out-of-bounds array access 2012-07-18 15:41:53 -04:00
Behdad Esfahbod dcb527242b [Indic] Allow joiners before matras
Fixes 1 more Devanagari test!
2012-07-18 15:32:26 -04:00
Behdad Esfahbod 391cc03317 [Indic] Allow halant group in Vowel and placeholder syllables
Fixes 2 out of 560 Devanagari failures.  AND:
Fixes 1 out of 2 Tamil failures.
2012-07-18 15:12:49 -04:00
Behdad Esfahbod ca4e3d3eab [Indic] Streamline halant/joiner in grammar 2012-07-18 15:05:40 -04:00
Behdad Esfahbod 418d00dffd [Indic] Minor 2012-07-18 14:57:28 -04:00
Behdad Esfahbod 4c3691d2a3 [Indic] Hopefully minor!
Refactoring Indic machin.  No semantic change.
2012-07-18 14:23:55 -04:00
Behdad Esfahbod e092c556fb [Indic] Minor 2012-07-18 14:09:25 -04:00
Behdad Esfahbod 14dbdd9e39 [Indic] Unbreak Tamil
Tamil has only about 150 failures now!
2012-07-18 13:13:03 -04:00
Behdad Esfahbod db8981f1e0 [Indic] Position Khmer Robat
It's a visual Repha.

Still not positioning logical Repha as occurs in Malayalam.

Another 200 Khmer failures fixed.  547 to go.  That's better than
Devanagari!
2012-07-17 23:42:04 -04:00
Behdad Esfahbod 25bc489498 [Indic] Better categorize Register Shifters and Khmer Various signs
Down another 500 or so Khmer failures!
2012-07-17 17:53:03 -04:00
Behdad Esfahbod 39b17837b4 Add hb_buffer_normalize_glyphs() and hb-shape --normalize-glyphs
This reorders glyphs within the cluster to a nominal order.  This should
have no visible effect on the output, but helps with testing, for
getting the same hb-shape output for visually-equal glyphs for each
cluster.
2012-07-17 17:09:29 -04:00
Behdad Esfahbod 25e302da9a [Indic] Minor 2012-07-17 14:25:14 -04:00
Behdad Esfahbod 5d32690a34 [Indic] For scripts without Half forms, always choose first consonant as base
In such scripts (ie. Khmer), a ZWJ/ZWNJ shouldn't stop the search for
base.  So, instead just choose the first consonant as base directly.

Test sequence:
U+1798,200c,U+17C9,U+17D2,U+179B,U+17C1,U+17C7
2012-07-17 14:23:28 -04:00