Behdad Esfahbod
2278eefcdb
[Indic] In Sinhala, form forced Reph even if no other consonant found
...
Fixes another 10 Sinhala failures. Down to 148 (0.0544424%).
2012-07-24 00:31:10 -04:00
Behdad Esfahbod
71fd5e80ad
[Indic] Further adjust base algorithm for Sinhala
...
Apparently if there is C,V,ZWJ,C, the first C will be base, but if
it's C,ZWJ,V,C, the second one will be.
Note that Uniscribe implements this differently, by breaking syllable in
the case of C,ZWJ,V,C and putting the first consonant in one syllable
and the rest in the next syllable.
Sinhala failures down from 208 to 158 (0.0581209%). No changes to
Khmer.
2012-07-24 00:21:16 -04:00
Behdad Esfahbod
34c215036f
[Indic] Improve Sinhala base algorithm and reph positioning
...
Sinhala does not have half forms. And most (all?) consonants can be
base, except when preceded by ZWJ, which would request a subjoined form.
Hence switch the base algorithm to categorize with Khmer, start search
at start, and stop at a ZWJ.
Also, mark all pos=base consonants after base to be subjoined. Mark
base itself to have pos=base.
Finally, adjust Sinhala's reph position to after-main.
Brings down Sinhala failures from 455 to 328 (0.120656%).
2012-07-23 23:51:29 -04:00
Behdad Esfahbod
49c5ec5144
Minor refactoring
2012-07-23 20:14:13 -04:00
Behdad Esfahbod
c3e6fdc379
[Indic] Improve check on ligatures
...
Only skip actual ligatures, not marks in-between ligature components.
2012-07-23 20:11:42 -04:00
Behdad Esfahbod
771a8f5028
[Indic] exclude ligatures when matching on Indic category
...
If, say, a H,ZWJ,C ligature was formed, we don't want the code to detec
that as a Halant. So, ignore ligatures when matching category in
final_reordering.
Sinhala failures down from 514 to 455 (0.167374%).
2012-07-23 20:09:30 -04:00
Behdad Esfahbod
baacd090df
[Indic] Minor refactoring
2012-07-23 19:51:48 -04:00
Behdad Esfahbod
c7c4de2fb9
[Indic] Remove syllable length check before sorting
...
We now limit syllable lengths in the machine. No need to match here.
2012-07-23 18:25:02 -04:00
Behdad Esfahbod
2cc933aff9
[Indic] Fix cluster formation with left-matras and conjunct forms
...
Test case was: <U+0D15,U+0D4D,U+0D15,U+0D4A>.
2012-07-23 08:23:44 -04:00
Behdad Esfahbod
e6b01a878c
[Indic] Further streamline cluster formation
...
This should address all possible cluster misformations that I had in
mind.
2012-07-23 00:11:26 -04:00
Behdad Esfahbod
7b2a7dadd6
[Indic] Merge clusters before sorting
...
This should fix any instabilities in cluster formation that we were
speculating may happen with surrounding syllables. Or most of it
perhaps.
2012-07-22 23:58:55 -04:00
Behdad Esfahbod
abb3239ef9
[Indic] Update clusters for left-matra even if matra didn't move
...
Fixes crashes reported with left matra under
non-uniscribe-bug-compatibilty mode.
2012-07-22 23:55:19 -04:00
Behdad Esfahbod
92a1ad7bef
[Indic] Stop searching for base if a post form is found before below form
...
Improves Bengali and Gurmukhi. Malayalam regressed a bit. We will deal
with that later.
2012-07-20 18:55:15 -04:00
Behdad Esfahbod
4c450c703f
[Indic] Recompose Bengali Ya,Nukta
...
This is a bunch of hacks for now.
Improves Bengali a bit.
2012-07-20 18:13:04 -04:00
Behdad Esfahbod
34ae336f3f
[Indic] Improve Reph AfterMain positioning
...
Fixes 20 out of 48 failing Oriya tests. Failure rate down to 0.066% now.
2012-07-20 16:17:28 -04:00
Behdad Esfahbod
bdd080431a
[Indic] Reposition Oriya Candrabindu
...
Oriya failures down from 0.65% to 0.20%.
2012-07-20 16:03:09 -04:00
Behdad Esfahbod
5f0eaaad12
[Indic] Fix base search in final_reordering
...
Fixes most Malayalam failures. Down from 1.6% to 0.38% now. Fixes a
few more in other scripts too.
2012-07-20 15:47:24 -04:00
Behdad Esfahbod
81202bd860
[Indic] Don't attach SM/VD to other characters
2012-07-20 15:14:51 -04:00
Behdad Esfahbod
f31d97e44e
[Indic] Form Telugu Reph out of Ra,Virama,ZWJ
...
Apparently this was approved in Feb 2012. No font yet.
2012-07-20 14:13:35 -04:00
Behdad Esfahbod
30c3d5e9fc
[Indic] Simplify Uniscribe cluster emulation
...
Now that we break syllables on Halant,ZWNJ, this code can be simplified.
2012-07-20 13:56:32 -04:00
Behdad Esfahbod
decf6ffca4
[Indic] Minor!
2012-07-20 13:51:31 -04:00
Behdad Esfahbod
9e4f94a72c
[Indic] Break syllables at Halant,ZWNJ
...
That's really what Uniscribe does, and explains a lot of pecularities of
Halant,ZWNJ before the base.
Sent Telugu from 1% failures to 0.03%. Improved Kannada and Malayalam
slightly. Fixed half of Bengali, and did NOT break anything!
2012-07-20 13:48:03 -04:00
Behdad Esfahbod
2c372b80f6
[Indic] Better check for applying 'init'
...
Specifically, don't apply 'init' if previous char is a joiner.
Fixes some more of Bengali.
2012-07-20 13:37:48 -04:00
Behdad Esfahbod
8ed248de77
[Indic] Minor
2012-07-20 11:42:24 -04:00
Behdad Esfahbod
d0e68dbd0b
[Indic] Implement reph positioning step 5
...
Not tuned, just copied from step 2. Fixes another 0.5% of Kannada
failures. 1% to go.
2012-07-20 11:25:41 -04:00
Behdad Esfahbod
a9e45c32e4
[Indic] Don't let ZWNJ at the end of syllable affect base search
...
Fixes a few Devanagari, half of remaining Kannada failures, quarter for
Telugu, and others slightly improved or unchanged.
2012-07-20 11:04:15 -04:00
Behdad Esfahbod
20b68e699f
[Indic] Apply 'cjct' globally
...
Fixes 5 Devanagari failures, and no regressions.
2012-07-20 10:47:46 -04:00
Behdad Esfahbod
51e764de44
[Indic] Unbreak old scriptures
...
Brings down failures with Lohit-Telugu from 57% to 1.40%.
2012-07-20 10:30:24 -04:00
Behdad Esfahbod
900cf3d449
Minor
2012-07-20 10:18:23 -04:00
Behdad Esfahbod
87cd63266e
[Indic] Recategorize some Kannada right matras
...
Kannada failures down from 3.5% to 2.93%.
2012-07-19 21:25:46 -04:00
Behdad Esfahbod
3604d64ced
[Indic] Recategorize GURMUKHI ADDAK
...
It's not in IndicSyllabicCategory.txt. Fixes most of Gurmukhi failures.
Failures down from 7.7% to 0.222%!
2012-07-19 21:13:04 -04:00
Behdad Esfahbod
5249f3aee1
[Indic] Unbreak Khmer
...
For Khmer, all consonants are subjoining. No need to look in the font.
We were looking in the wrong order anyway.
2012-07-19 20:30:22 -04:00
Behdad Esfahbod
e0475345d5
[Indic] Apply 'akhn' globally
...
Fixes 1.5% more failures for Telugu, 2% for Kannada.
Breaks one test in Devanagari.
2012-07-19 20:24:14 -04:00
Behdad Esfahbod
fa247ebe52
[Indic] Better position U+0CD5
...
Fixes another 5% of Kannada failures.
2012-07-19 19:52:19 -04:00
Behdad Esfahbod
f055442716
[Indic] Lookup consonant position in the font
...
Fixes most failures of Oriya, and improves others a bit.
2012-07-19 16:20:21 -04:00
Behdad Esfahbod
8c973ebf0f
[Indic] Implement per-script matra positioning
...
Following what the spec says.
Brings down Telugu failures from 40% to 3.75%, and Kannada failures from
44% to 10%. Does NOT affect other scripts' test results.
2012-07-19 13:25:08 -04:00
Behdad Esfahbod
8bb32458f9
[Indic] More refactoring
2012-07-19 13:04:44 -04:00
Behdad Esfahbod
9ccc6382ba
[Indic] Minor refactoring
2012-07-19 12:45:31 -04:00
Behdad Esfahbod
be8b9f5f71
[Indic] Start refactoring different matra positions per script
2012-07-19 12:11:12 -04:00
Behdad Esfahbod
10cdc94eee
[Indic] In final reordering, find base, even if it disappeared
...
POS_BASE can disappear if base ligated backward. Define base as last
with position not after base.
Fixes a few hundred of Sinhala failures with Iskoola Pota.
2012-07-18 17:43:23 -04:00
Behdad Esfahbod
9c4d24a3a6
[Indic] Minor
2012-07-18 17:29:10 -04:00
Behdad Esfahbod
3285e107c9
[Indic] Implement Sinhala "Al Lakuna" Reph behavior
...
In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
2012-07-18 17:22:14 -04:00
Behdad Esfahbod
552d19b7a1
[Indic] Treat Register Shifters like Nukta
...
Really this time.
Fixes another 18 Khmer tests.
2012-07-18 16:02:33 -04:00
Behdad Esfahbod
e8cd81f76d
[Indic] Minor
2012-07-18 16:00:20 -04:00
Behdad Esfahbod
69f26bf39c
[Indic] Fix Matra reordering when base is at end of syllable
...
For example: U+915,U+200c,U+93f
Fixes last Tamil failure!
2012-07-18 15:47:51 -04:00
Behdad Esfahbod
075d671f10
[Indic] Fix out-of-bounds array access
2012-07-18 15:41:53 -04:00
Behdad Esfahbod
14dbdd9e39
[Indic] Unbreak Tamil
...
Tamil has only about 150 failures now!
2012-07-18 13:13:03 -04:00
Behdad Esfahbod
db8981f1e0
[Indic] Position Khmer Robat
...
It's a visual Repha.
Still not positioning logical Repha as occurs in Malayalam.
Another 200 Khmer failures fixed. 547 to go. That's better than
Devanagari!
2012-07-17 23:42:04 -04:00
Behdad Esfahbod
25bc489498
[Indic] Better categorize Register Shifters and Khmer Various signs
...
Down another 500 or so Khmer failures!
2012-07-17 17:53:03 -04:00
Behdad Esfahbod
25e302da9a
[Indic] Minor
2012-07-17 14:25:14 -04:00
Behdad Esfahbod
5d32690a34
[Indic] For scripts without Half forms, always choose first consonant as base
...
In such scripts (ie. Khmer), a ZWJ/ZWNJ shouldn't stop the search for
base. So, instead just choose the first consonant as base directly.
Test sequence:
U+1798,200c,U+17C9,U+17D2,U+179B,U+17C1,U+17C7
2012-07-17 14:23:28 -04:00
Behdad Esfahbod
0201e0a464
[Indic] Apply 'cfar' for Khmer
...
Mark stuff after a pre-base reordering Ro 'cfar'. Used in Khmer.
This allows distinguishing the following cases with MS Khmer fonts:
U+1784,U+17D2,U+179A,U+17D2,U+1782
U+1784,U+17D2,U+1782,U+17D2,U+179A
2012-07-17 13:56:24 -04:00
Behdad Esfahbod
55f70ebfb9
[Indic] Position final subjoined consonants (and vowels) after matras
...
In Khmer, a final subjoined consonant or independent vowel can occur
after matras. This final subjoined thing should NOT be reordered to
before the matra even though it's subjoined.
Fixes another 1k of the Khmer failures. Not much left really.
2012-07-17 12:50:13 -04:00
Behdad Esfahbod
c50ed71e9a
[Indic] Recategorize Khmer coeng sign as a separate category OT_Coeng
...
Amend the syllable structure to allow a final subscripted consonant
(Coeng+C) and a final subscripted independent vowel (Coeng+V).
Fixes another 2k of Khmer failures.
2012-07-17 11:54:28 -04:00
Behdad Esfahbod
deb521dee4
[Indic] Add a separate Coeng class
...
No characters recategorized yet. No semantic change.
2012-07-17 11:37:32 -04:00
Behdad Esfahbod
74ccc6a132
[Indic] Move Halant with after-base consonants
...
Normally, we attach the Halant to the previous character and move it
with it. For after-base consonants however, the Halant "belongs" to the
consonant after, so attach it so.
This fixes Bengali sequences involving post-base consonant Ya, which
should ligate with the Halant to form Ya Phala, but previously a
reordered matras was blocking the ligation.
2012-07-17 11:16:19 -04:00
Behdad Esfahbod
d5c4edcdd6
[Indic] Apply presentation-forms features all at once
...
Seems like this is what Uniscribe is doing, and does not break any fonts
we tested (with Devanagari, Malayalam, Khmer, and Bengali), while fixing
some Ra Phala sequences for Bengali with Vrinda. Fixes another 2% of
Bengali failures (a couple more to go).
2012-07-17 10:40:59 -04:00
Behdad Esfahbod
af92b4cc90
[Indic] Disable 'kern' in Uniscribe bug compatibility mode
...
Uniscribe does not apply 'kern' in the Indic module. Some of the Khmer
fonts they ship have small adjustments in the 'kern' table. Disable
'kern' in the Indic module under Uniscribe bug compatibility mode.
Fixes some 10% of the Khmer failures. Remains under 3% (excluding
dotted-circle ones).
2012-07-16 20:31:24 -04:00
Behdad Esfahbod
d96838ef95
Allow complex shapers overriding common features
...
In a new callback... Currently unused by all complex shapers.
2012-07-16 20:26:57 -04:00
Behdad Esfahbod
df50b84740
[Indic] Categorize other Khmer marks
...
Mark them the same as the Register Shifters for now. Need to rename
that category to something more sensible after all is settled.
Fixes another percent of Khmer failures. Down to under 3%!
2012-07-16 20:14:50 -04:00
Behdad Esfahbod
8e7b5882fb
[Indic] Recognize pre-base reordering Ra anywhere in the syllable
...
We were doing that only immediately after base.
Fixes another percent in the Khmer failures. About three more to go...
2012-07-16 17:04:46 -04:00
Behdad Esfahbod
7d09c98a1f
[Indic] Recognizer Register Shifter marks
...
Fixes another 6% of the Khmer failures.
2012-07-16 16:45:22 -04:00
Behdad Esfahbod
78818124b1
[Indic] Reoder pre-base reordering Ra
...
Brings down Malayalam failures from 14% down to 3%.
2012-07-16 15:49:08 -04:00
Behdad Esfahbod
1a1dbe9a27
[Indic] Rename
2012-07-16 15:41:33 -04:00
Behdad Esfahbod
46e645ec4b
[Indic] Start implementing pre-base reordering
2012-07-16 15:30:05 -04:00
Behdad Esfahbod
921ce5b17d
[Indic] Rename
...
No semantic change.
2012-07-16 15:26:56 -04:00
Behdad Esfahbod
b504e060f0
[Indic] Implement After-Main Reph positioning
...
Almost...
2012-07-16 15:21:12 -04:00
Behdad Esfahbod
17d7de91d7
[Indic] Apply 'pref' to pre-base reodering Ra
...
No reordering yet.
2012-07-16 15:20:15 -04:00
Behdad Esfahbod
362d3db8d3
[Indic] Minor
...
Should not be any semantic change. In preparation for implementing
pre-base reordering Ra.
2012-07-16 15:15:28 -04:00
Behdad Esfahbod
70fe77bb9a
Minor
2012-07-16 14:52:18 -04:00
Behdad Esfahbod
2f903215c5
Minor
2012-07-16 13:54:43 -04:00
Behdad Esfahbod
a3e04bee2c
[Indic] Reorder virama only for old Indic spec
2012-07-16 13:47:19 -04:00
Behdad Esfahbod
0de771b72d
[Indic] Categorize Khmer consonants
2012-07-16 13:39:36 -04:00
Behdad Esfahbod
29f106d7fb
[Indic] Apply Above Forms
2012-07-16 12:05:35 -04:00
Behdad Esfahbod
a2b471df82
Remove static initializers from indic
2012-06-05 15:17:44 -04:00
Behdad Esfahbod
9fc7a11469
Remove comma at the end of enum
...
As reported by Jonathan Kew on the list.
2012-06-04 08:28:19 -04:00
Behdad Esfahbod
27aba594c9
Minor
2012-05-24 15:00:01 -04:00
Behdad Esfahbod
1d6846db9e
[Indic] Apply vatu feature after cjct
...
Testing with old Deva spec this reduces failures.
Test sequence: U+0915,U+094D,U+0930.
2012-05-13 18:09:29 +02:00
Behdad Esfahbod
617f4ac46f
Refactor
2012-05-13 16:48:03 +02:00
Behdad Esfahbod
5e4e21fce4
Revert "[Indic] Refactoring"
...
This reverts commit 0831061efb
.
2012-05-13 16:46:08 +02:00
Behdad Esfahbod
3f18236a03
Fix more warnings
2012-05-13 16:20:10 +02:00
Behdad Esfahbod
9f377ed321
Fix more unused-var warnings
2012-05-13 16:13:44 +02:00
Behdad Esfahbod
eace47b173
Minor
2012-05-13 15:54:43 +02:00
Behdad Esfahbod
737dded2e0
Fix compiler warnings
2012-05-12 15:40:11 +02:00
Behdad Esfahbod
7f852b644b
Fix compiler warnings
2012-05-11 23:10:31 +02:00
Behdad Esfahbod
6a091df9b4
[Indic] Disambiguate sub vs post vs above matras
...
Bengali is at *just* above 5% now.
2012-05-11 21:42:27 +02:00
Behdad Esfahbod
9d0d319a4a
[Indic] Position Bengali Reph before matras
2012-05-11 21:36:32 +02:00
Behdad Esfahbod
f893672511
[Indic] Start categorizing Reph per script
2012-05-11 21:10:03 +02:00
Behdad Esfahbod
a913b024d8
[Indic] Apply 'init' feature for Bengali
...
Error down from 20% to 7%.
2012-05-11 20:59:26 +02:00
Behdad Esfahbod
eed903b164
[Indic] Refactor for the arrival of 'init' feature
...
Yep, on Bengali now!
2012-05-11 20:50:53 +02:00
Behdad Esfahbod
18c06e189b
[Indic] Add Uniscribe bug feature for dotted circle
...
For dotted-circle independent clusters, Uniscribe does no Reph shaping
for the exact sequence Ra+Halant+25CC. Which also is the only possible
sequence with 25CC at the end.
2012-05-11 20:02:14 +02:00
Behdad Esfahbod
0831061efb
[Indic] Refactoring
2012-05-11 19:07:58 +02:00
Behdad Esfahbod
7ea58db311
Minor
2012-05-11 18:58:57 +02:00
Behdad Esfahbod
3399a06e70
[Indic] Fix U+0952 and similar classification to match Uniscribe
...
See comments.
2012-05-11 17:54:26 +02:00
Behdad Esfahbod
11aa3ef18d
[Indic] Treat U+0951..U+0954 all similar to U+0952
2012-05-11 17:30:48 +02:00
Behdad Esfahbod
892eb78782
[Indic] Implement Uniscribe Reph+Matra+Halant bug feature
2012-05-11 16:54:40 +02:00
Behdad Esfahbod
67ea29af49
[Indic] Add example of different Uniscribe behavior
2012-05-11 16:51:23 +02:00
Behdad Esfahbod
ebe29733d4
[Indic] Add runtime Uniscribe bug compatibility mode!
...
Enable by setting envvar:
HB_OT_INDIC_OPTIONS=uniscribe-bug-compatible
Plus, LeftMatra+Halant "feature".
2012-05-11 16:43:12 +02:00
Behdad Esfahbod
616e692e29
[Indic] Add #define UNISCRIBE_BUG_COMPATIBLE 1
2012-05-11 16:25:02 +02:00
Behdad Esfahbod
6782bdae3b
[Indic] Fix Left Matra + Halant reordering
...
As can be seen in: U+092B,U+093F,U+094D
2012-05-11 16:23:43 +02:00
Behdad Esfahbod
3c2ea9481b
Minor
2012-05-11 16:23:38 +02:00
Behdad Esfahbod
668c6046c1
[Indic] Apply Reph mask to all POS_REPH glyphs
...
Needed for upcoming changes to GSUB/GPOS mask matching.
2012-05-11 15:34:13 +02:00
Behdad Esfahbod
cee7187447
[Indic] Move syllable tracking from Indic to generic layer
...
This is to incorporate it into GSUB/GPOS processing.
2012-05-11 11:41:39 +02:00
Behdad Esfahbod
3bf27a9f0e
[Indic] Disable conjuncts when a ZWJ happens
...
Not that the code makes any difference since the presence of ZWJ itself
causes the ligature to fail to match anyway.
2012-05-11 11:17:23 +02:00
Behdad Esfahbod
c6d904d67d
[Indic] Fix bitops typo!
...
Another 1000 down!
2012-05-11 11:07:40 +02:00
Behdad Esfahbod
02b2922fbf
[Indic] Towards better Reph positioning
...
Fixed for Deva cases with two full-form consonants. Failures **way** down.
Not much left to go :-).
2012-05-10 21:44:50 +02:00
Behdad Esfahbod
2b70df5cc0
[Indic] Add note re Uniscribe clusters
2012-05-10 18:38:22 +02:00
Behdad Esfahbod
21d2803133
[Indic] Do clustering like Uniscribe does
...
Hindi Wikipedia failures down to 6639 (0.938381%)!
2012-05-10 18:34:34 +02:00
Behdad Esfahbod
8df5636968
[Indic] Reorder Reph to before the Halant after Matras
...
Uniscribe doesn't do it, but we want to do as it gives the Reph the
opportunity to interact with the Matras. Test with mangal for example.
Sequence: <0930,094d,0915,094b,094d>
In test suite already.
2012-05-10 15:41:04 +02:00
Behdad Esfahbod
daf3234bdc
[Indic] Don't clear the mask for Reph
...
This was removing the mandatory global 1 bit in the mask and hence
disabling GPOS for Reph!
2012-05-10 15:28:27 +02:00
Behdad Esfahbod
7708ee23cb
[Indic] Improve Left Matra repositioning
...
Move its dependents too.
2012-05-10 14:48:25 +02:00
Behdad Esfahbod
dbb105883c
[Indic] Do Reph repositioning in final reordering like the spec says
...
This introduced a failure, which we tracked down to a test case like this:
U+092E,U+094B,U+094D,U+0930
The final character is a Ra that should be put in a syllable of it's
own. And we do. But it will interact with the Halant before it. So
now we finally are convinced that we have to limit features to syllable
boundaries. That's coming after lunch!
2012-05-10 13:45:52 +02:00
Behdad Esfahbod
4705a70269
Minor
2012-05-10 13:09:08 +02:00
Behdad Esfahbod
4ac9e98d9d
[Indic] Reorder left matras to be closer to base
2012-05-10 12:53:53 +02:00
Behdad Esfahbod
1a1fa8c655
[Indic] Treat the standalone cluster case reusing the consonant logic
2012-05-10 12:21:30 +02:00
Behdad Esfahbod
190eb31a16
[Indic] Minor
2012-05-10 12:21:30 +02:00
Behdad Esfahbod
c5306b6861
[Indic] Handle Vowel syllables
...
Reusing the consonant logic!
2012-05-10 12:21:30 +02:00
Behdad Esfahbod
6d8e0cb74c
[Indic] Simplify Reph logic
2012-05-10 11:41:51 +02:00
Behdad Esfahbod
3d25079f8d
[Indic] Don't form Reph is Ra is the only consonant in the syllable
2012-05-10 11:37:42 +02:00
Behdad Esfahbod
b99d63ae11
[Indic] Increase max syllable length
...
20 was way too low, one could hit a syllable with 7ish consonants with it.
2012-05-10 11:32:52 +02:00
Behdad Esfahbod
a391ff50b9
[Indic] Adjust base after sorting
2012-05-10 11:31:20 +02:00
Behdad Esfahbod
d3637edb24
[Indic] Don't return for long syllables. Just not sort.
2012-05-10 10:51:38 +02:00
Behdad Esfahbod
ef24cc8c8e
[Indic] Towards multi-cluster syllables and final reordering
2012-05-09 18:10:20 +02:00
Behdad Esfahbod
92332e5116
Minor
2012-05-09 17:40:00 +02:00
Behdad Esfahbod
dbccf87eef
[Indic] Make room for more reordering positions
2012-05-09 17:24:39 +02:00
Behdad Esfahbod
d4480ace7f
[Indic] Improve matra vs consonant ordering
...
Another 1.5% down.
2012-05-09 15:59:47 +02:00
Behdad Esfahbod
33c92e7695
[Indic] Categorize Anudatta
2012-05-09 15:41:51 +02:00
Behdad Esfahbod
19d984edaa
[Indic] Make sure Reph jumps over all matras to the right
...
Another 12 thousand failures gone! (78 to go)
2012-05-09 15:21:13 +02:00
Behdad Esfahbod
9034641333
[Indic] Keep Vedic signs at the right too
2012-05-09 15:04:58 +02:00
Behdad Esfahbod
d1deaa2f5b
Replace zerowidth invisible chars with a zero-advance space glyph
...
Like Uniscribe does.
2012-05-09 15:04:13 +02:00
Behdad Esfahbod
49e5da1591
[indic] Keep the syllable modifier marks to the right
...
Shaping failures on Hindi Wikipedia go down from 25% to 14%!
2012-05-09 13:23:27 +02:00
Behdad Esfahbod
76b3409de6
[indic] Better Reph matching
2012-05-09 11:52:32 +02:00
Behdad Esfahbod
df6d45c693
Minor
2012-05-09 11:38:31 +02:00
Behdad Esfahbod
412b91889d
[indic] Apply Indic features in order
2012-05-09 11:07:18 +02:00
Behdad Esfahbod
1ac075b227
[indic] Apply rakaar forms
...
Fixes 10% of the failures against all of Hindi Wikipedia!
2012-05-09 11:06:47 +02:00
Behdad Esfahbod
3ed4634ec3
Add Indic inspection tool
2012-04-19 22:35:01 -04:00
Behdad Esfahbod
a06411ecf9
Minor matra renumbering
...
Should have no visible effect.
2012-04-19 22:28:25 -04:00
Behdad Esfahbod
c65662b71e
Fix left-matra positioning in Indic
...
Fixes 200 failures out of previous 4290 cases in the OO.o Indic
dictionary (of ~16000 entries).
2012-04-12 09:31:55 -04:00
Behdad Esfahbod
acd88e659f
In Arabic fallback shaping, check that the font has glyph for new char
2012-04-10 18:02:20 -04:00
Behdad Esfahbod
11138ccff7
Add normalize mode
...
In preparation for Hangul shaper.
2012-04-05 17:25:19 -04:00
Behdad Esfahbod
e8eedf2687
Avoid enum trailing commas
...
Based on patch from Jonathan Kew.
2012-01-16 16:39:40 -05:00
Behdad Esfahbod
0a965eee88
Minor
2011-09-19 16:53:47 -04:00
Behdad Esfahbod
c605bbbb6d
Remove C++ guards from source files
...
Where causing issues for people with MSVC.
2011-08-04 20:00:53 -04:00
Behdad Esfahbod
a91c58bf98
[Indic] Disable CJCT-disabling logic
...
Read comment.
2011-08-01 16:30:11 -04:00
Behdad Esfahbod
5e72071062
[Indic] Stop looking for base upon seeing joiners
...
Not sure where this is documented, but I remember this being the desired
behavior.
test-shape-complex failures are down from 48 to 46. Meh.
2011-07-31 17:52:44 -04:00
Behdad Esfahbod
281683995a
Cosmetic
2011-07-31 16:00:35 -04:00
Behdad Esfahbod
6b37bc8084
[Indic] Fix ZWJ/ZWNJ application
...
Not quite working just yet. False alarm re 10 failures. It was
crashing. Ouch! Back to 48 failures.
2011-07-31 15:57:00 -04:00
Behdad Esfahbod
e7be057024
[Indic] Add Final Reordering rules into comments
...
Not applied yet.
2011-07-31 15:22:46 -04:00
Behdad Esfahbod
cfd4382ec1
[Indic] Handle Reph when determining base consonant
2011-07-31 15:08:40 -04:00
Behdad Esfahbod
97158392a5
[Indic] Ra is a consonant too
2011-07-31 15:01:28 -04:00
Behdad Esfahbod
0d8f8a177c
[Indic] Fix reph inhibition logic
2011-07-31 14:57:59 -04:00
Behdad Esfahbod
9da0487cd4
[Indic] Support ZWJ/ZWNJ
...
Brings test-shape-complex failures down from 52 to 10!
I hereby declare harfbuzz-ng supporting Indic!
2011-07-31 13:46:44 -04:00
Behdad Esfahbod
9ee27a928a
[Indic] Suppress reph formation upon joiners
2011-07-31 11:10:14 -04:00
Behdad Esfahbod
8354e004e5
Un-Ra U+09F1. According to the test suite this is correct.
...
But I'm not sure... Down from 54 failures to 52.
2011-07-31 02:24:51 -04:00
Behdad Esfahbod
ba7e85c104
Cosmetic
2011-07-30 21:11:53 -04:00
Behdad Esfahbod
f5bc2725cb
[Indic] For old-style Indic tables, move Halant around
...
In old-style Indic OT standards, the post-base Halants are moved after
their base. Emulate that by moving first post-base Halant to
post-last-consonant.
Brings test-shape-complex failures down from 88 to 54. Getting there!
2011-07-30 21:08:10 -04:00
Behdad Esfahbod
fd06bf5611
[Indic] Handle initial Ra+Halant in scripts that support Reph
...
Brings test-shape-complex failures down from 104 to 92. Way to go!
2011-07-30 20:14:44 -04:00
Behdad Esfahbod
ee58f3bc75
Minor
2011-07-30 19:15:53 -04:00
Behdad Esfahbod
352372ae5e
[Indic] Categorize Ra in scripts that have Reph
...
Is the categorization correct? I don't know.
2011-07-30 19:04:02 -04:00
Behdad Esfahbod
45d6f29f15
[Indic] Reorder matras
...
Number of failing shape-complex tests goes from 125 down to 94.
Next: Add Ra handling and it's fair to say we kinda support Indic :).
2011-07-30 14:44:30 -04:00
Behdad Esfahbod
743807a3ce
[Indic] Apply Indic features
...
Find the base consonant and apply basic Indic features accordingly.
Nothing complete, but does something for now. Specifically:
no Ra handling right now, and no ZWJ/ZWNJ.
Number of failing shape-complex tests goes from 174 down to 125.
Next: reorder matras.
2011-07-29 16:46:09 -04:00
Behdad Esfahbod
9f9bcceca6
Register buffer vars in Indic shaper
2011-07-28 17:07:50 -04:00
Behdad Esfahbod
b65c06025d
Formalize buffer var allocations
2011-07-28 16:49:29 -04:00
Behdad Esfahbod
02cdf743c2
Add prefer_decomposed() complex-shaper callback
...
This allows the Indic shaper to request decomposed characters. This will
handle split matra for free. Other shapers prefer precomposed
characters.
2011-07-21 12:23:12 -04:00
Behdad Esfahbod
a54a5505a3
Minor
2011-07-20 16:42:10 -04:00
Behdad Esfahbod
f6fd3780e1
Let shapers decide when to apply ccmp and locl
...
Instead of always applying those two features before the complex shaper,
let the complex shaper decide whether they should be applied first.
Also add stub for Indic's final_reordering().
2011-07-08 00:22:40 -04:00
Behdad Esfahbod
76f76812ac
Shuffle code around, remove shape_plan from complex shapers
2011-07-07 22:25:25 -04:00
Behdad Esfahbod
d69d5ceaa0
[Indic] Well, at least finding syllables works now :)
...
Still not much there.
2011-07-04 12:56:38 -04:00
Behdad Esfahbod
4ec30aec30
[Indic] Optimize Indic table storage
2011-06-28 14:13:38 -04:00
Behdad Esfahbod
8fdba506f0
[Indic] Define indic_position_t
2011-06-24 20:45:55 -04:00
Behdad Esfahbod
65988a145b
[Indic] Add a table of consonant positions
...
Copied form HarfBuzz.old Indic data. These are below and post
consonants. This is temporary. Read the comment in the patch.
2011-06-24 19:05:52 -04:00
Behdad Esfahbod
c7fe56a1d5
[Indic] Some of the basic features are global; Mark them so
2011-06-24 19:05:34 -04:00
Behdad Esfahbod
867361c3ad
[indic] Add syllable recognition state machine
...
Using an incredible tool called Ragel.
2011-06-17 18:35:46 -04:00
Behdad Esfahbod
422e08dbb8
Better categorize Indic character classes
...
Matches OT types now.
2011-06-15 17:22:48 -04:00
Behdad Esfahbod
b9452bfc16
Fix compiler warnings with -pedantic
2011-06-14 14:47:07 -04:00
Behdad Esfahbod
20503ccd57
More Indic data shuffling
2011-06-07 17:02:48 -04:00
Behdad Esfahbod
b9ddbd5593
[Indic] Start an Indic shaper
...
Nothing functional in there yet.
So far, we're parsing IndicSyllabicCategory.txt and IndicMatraCategory.txt
fils from Unicode Character Database and store them in an array to be used
by the shaper. Also hooked up the shaper, but it does not do anything
right now.
2011-06-02 17:43:12 -04:00