Behdad Esfahbod
d0e68dbd0b
[Indic] Implement reph positioning step 5
...
Not tuned, just copied from step 2. Fixes another 0.5% of Kannada
failures. 1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod
a9e45c32e4
[Indic] Don't let ZWNJ at the end of syllable affect base search
...
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod
20b68e699f
[Indic] Apply 'cjct' globally
...
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod
51e764de44
[Indic] Unbreak old scriptures
...
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod
900cf3d449
Minor
2012-07-20 10:18:23 -04:00
Behdad Esfahbod
87cd63266e
[Indic] Recategorize some Kannada right matras
...
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod
3604d64ced
[Indic] Recategorize GURMUKHI ADDAK
...
It's not in IndicSyllabicCategory.txt. Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod
8932858123
Minor
2012-07-19 21:02:38 -04:00
Behdad Esfahbod
47ef931f13
[buffer] Make sure out_info = info during GPOS
2012-07-19 20:52:44 -04:00
Behdad Esfahbod
ae63cf2062
Print line number during return when tracing
2012-07-19 20:45:41 -04:00
Behdad Esfahbod
5249f3aee1
[Indic] Unbreak Khmer
...
For Khmer, all consonants are subjoining. No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod
e0475345d5
[Indic] Apply 'akhn' globally
...
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod
c87bcddb10
[Indic] Add failing test for Kannada
2012-07-19 20:03:25 -04:00
Behdad Esfahbod
fa247ebe52
[Indic] Better position U+0CD5
...
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod
f055442716
[Indic] Lookup consonant position in the font
...
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00
Behdad Esfahbod
74d1d88781
[GSUB] Fix would_apply() for LigatureSubst
2012-07-19 16:14:23 -04:00
Behdad Esfahbod
787f7d1e9b
[TODO] Minor
2012-07-19 15:29:13 -04:00
Behdad Esfahbod
be73a5f936
Add src/test-would-substitute tool
2012-07-19 15:12:18 -04:00
Behdad Esfahbod
e72b360ac6
Refactor / finish would_apply() operation
...
Untested.
2012-07-19 14:44:46 -04:00
Behdad Esfahbod
8c973ebf0f
[Indic] Implement per-script matra positioning
...
Following what the spec says.
Brings down Telugu failures from 40% to 3.75%, and Kannada failures from
44% to 10%. Does NOT affect other scripts' test results.
2012-07-19 13:25:08 -04:00
Behdad Esfahbod
8bb32458f9
[Indic] More refactoring
2012-07-19 13:04:44 -04:00
Behdad Esfahbod
9ccc6382ba
[Indic] Minor refactoring
2012-07-19 12:45:31 -04:00
Behdad Esfahbod
f83aaa3133
[Indic] Minor
2012-07-19 12:23:23 -04:00
Behdad Esfahbod
be8b9f5f71
[Indic] Start refactoring different matra positions per script
2012-07-19 12:11:12 -04:00
Behdad Esfahbod
deeb540a74
[test] Ignore tests with DOTTED CIRCLE in the output
2012-07-19 11:30:48 -04:00
Behdad Esfahbod
b01d9b3d90
[Indic] Disallow decomposition of a couple characters
...
This is a hack for now. Will be fixed when we do complex-shaper-driven
normalization properly.
The results with or without decomposition are the same, but Uniscribe
does not normalize, so this matches better.
2012-07-19 11:25:49 -04:00
Behdad Esfahbod
422ecd2d3c
[Indic] Accept a forced Rakar sequence at the end of syllable
...
In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra. If you put that at the
end of a Consonant,Matra syllable, you get a dotted-circle from
Uniscribe. Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
And people have been encoding that sequence... So, allow a forced
"ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.
Fixes some 100 or more of Sinhala failures. Now at 622 only (0.23%).
2012-07-18 23:25:58 -04:00
Behdad Esfahbod
6fc1732003
[Indic] Allow joiners on both sides of Halant at the same time
...
The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
for Rakar. Fixes two-thousand Sinhala tests. Not many left.
2012-07-18 17:49:19 -04:00
Behdad Esfahbod
10cdc94eee
[Indic] In final reordering, find base, even if it disappeared
...
POS_BASE can disappear if base ligated backward. Define base as last
with position not after base.
Fixes a few hundred of Sinhala failures with Iskoola Pota.
2012-07-18 17:43:23 -04:00
Behdad Esfahbod
9c4d24a3a6
[Indic] Minor
2012-07-18 17:29:10 -04:00
Behdad Esfahbod
3285e107c9
[Indic] Implement Sinhala "Al Lakuna" Reph behavior
...
In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
2012-07-18 17:22:14 -04:00
Behdad Esfahbod
91cade7555
[Indic/Unicode] Decompose Sinhala split matras the way Uniscribe likes
...
Makes no visual difference.
Fixes most of the failures. Down from 15% to 1.3%!
2012-07-18 16:50:41 -04:00
Behdad Esfahbod
d8942dcbb4
Apply Tibetan (global) features.
...
Fixes all Tibetan failures. All 180k of them!
Merges back Hangul into the default shaper.
2012-07-18 16:34:10 -04:00
Behdad Esfahbod
552d19b7a1
[Indic] Treat Register Shifters like Nukta
...
Really this time.
Fixes another 18 Khmer tests.
2012-07-18 16:02:33 -04:00
Behdad Esfahbod
e8cd81f76d
[Indic] Minor
2012-07-18 16:00:20 -04:00
Behdad Esfahbod
69f26bf39c
[Indic] Fix Matra reordering when base is at end of syllable
...
For example: U+915,U+200c,U+93f
Fixes last Tamil failure!
2012-07-18 15:47:51 -04:00
Behdad Esfahbod
d16ccc4ae7
Leave one extra item at the end of buffer allocation
...
Just in case, for the times we do out-of-bounds access.
jk
2012-07-18 15:43:55 -04:00
Behdad Esfahbod
075d671f10
[Indic] Fix out-of-bounds array access
2012-07-18 15:41:53 -04:00
Behdad Esfahbod
dcb527242b
[Indic] Allow joiners before matras
...
Fixes 1 more Devanagari test!
2012-07-18 15:32:26 -04:00
Behdad Esfahbod
391cc03317
[Indic] Allow halant group in Vowel and placeholder syllables
...
Fixes 2 out of 560 Devanagari failures. AND:
Fixes 1 out of 2 Tamil failures.
2012-07-18 15:12:49 -04:00
Behdad Esfahbod
ca4e3d3eab
[Indic] Streamline halant/joiner in grammar
2012-07-18 15:05:40 -04:00
Behdad Esfahbod
418d00dffd
[Indic] Minor
2012-07-18 14:57:28 -04:00
Behdad Esfahbod
4c3691d2a3
[Indic] Hopefully minor!
...
Refactoring Indic machin. No semantic change.
2012-07-18 14:23:55 -04:00
Behdad Esfahbod
e092c556fb
[Indic] Minor
2012-07-18 14:09:25 -04:00
Behdad Esfahbod
14dbdd9e39
[Indic] Unbreak Tamil
...
Tamil has only about 150 failures now!
2012-07-18 13:13:03 -04:00
Behdad Esfahbod
db8981f1e0
[Indic] Position Khmer Robat
...
It's a visual Repha.
Still not positioning logical Repha as occurs in Malayalam.
Another 200 Khmer failures fixed. 547 to go. That's better than
Devanagari!
2012-07-17 23:42:04 -04:00
Behdad Esfahbod
25bc489498
[Indic] Better categorize Register Shifters and Khmer Various signs
...
Down another 500 or so Khmer failures!
2012-07-17 17:53:03 -04:00
Behdad Esfahbod
39b17837b4
Add hb_buffer_normalize_glyphs() and hb-shape --normalize-glyphs
...
This reorders glyphs within the cluster to a nominal order. This should
have no visible effect on the output, but helps with testing, for
getting the same hb-shape output for visually-equal glyphs for each
cluster.
2012-07-17 17:09:29 -04:00
Behdad Esfahbod
25e302da9a
[Indic] Minor
2012-07-17 14:25:14 -04:00
Behdad Esfahbod
5d32690a34
[Indic] For scripts without Half forms, always choose first consonant as base
...
In such scripts (ie. Khmer), a ZWJ/ZWNJ shouldn't stop the search for
base. So, instead just choose the first consonant as base directly.
Test sequence:
U+1798,200c,U+17C9,U+17D2,U+179B,U+17C1,U+17C7
2012-07-17 14:23:28 -04:00