Sinhala and Telugu use "explicit" reph. That is, the reph is formed by
a Ra,H,ZWJ sequence. Previously, upon detecting this sequence, we were
checking checking whether the 'rphf' feature applies to the first two
glyphs of the sequence. This is how the Microsoft fonts are designed.
However, testing with Noto shows that apparently Uniscribe also forms
the reph if the lookup ligates all three glyphs. So, try both
sequences.
Doesn't affect test results for Sinhala or Telugu.
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=232
The grammar in the OT spec, and the existing Windows implementation
seem to be confused around where to allow Asat around the medial
consonants.
The previous grammar for medial group was allowing an Asat after
the medial group only if there was a medial Wa or Ha, but not if
there was only a medial Ya. This doesn't make sense to me and
sounds reversed, as both medial Wa and Ha are below marks while
Asat is an above mark. An Asat can come before the medial group
already (in fact, multiple ones can. Why?!). The medial Ya
however is a spacing mark and according to Roozbeh it's valid
to want an Asat on the medial Ya instead of the base, so it looks
to me like we want to allow an Asat after the medial group if
there *was* a Ya but not if there wasn't any. Not wanting to
produce dotted-circle where Windows is not, this commit changes
the grammar to allow one Asat after the medial group no matter
what comes in the group.
Test: U+1002,103A,103B vs U+1002,103B,103A
Before we were just relying on the compiler inlining them and not
leaving a trace in our public API. Try to fix. Hopefully not
breaking anyone's build.
commit b5a0f69e47
Author: Behdad Esfahbod <behdad@behdad.org>
Date: Thu Oct 17 18:04:23 2013 +0200
[indic] Pass zero-context=false to would_substitute for newer scripts
For scripts without an old/new spec distinction, use zero-context=false.
This changes behavior in Sinhala / Khmer, but doesn't seem to regress.
This will be useful and used in Javanese.
The *intention* was to change zero-context from true to false for scripts that
don't have old-vs-new specs. However, checking the code, looks like we
essentially change zero-context to always be true; ie. we only changed things
for old-spec, and we broke them. That's what causes this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=76705
The root of the bug is here:
/* Use zero-context would_substitute() matching for new-spec of the main
* Indic scripts, but not for old-spec or scripts with one spec only. */
bool zero_context = indic_plan->config->has_old_spec || !indic_plan->is_old_spec;
Note that is_old_spec itself is:
indic_plan->is_old_spec = indic_plan->config->has_old_spec && ((plan->map.chosen_script[0] & 0x000000FF) != '2');
It's easy to show that zero_context is now always true. What we really meant was:
bool zero_context = indic_plan->config->has_old_spec && !indic_plan->is_old_spec;
Ie, "&&" instead of "||". We made this change supposedly to make Javanese
work. But apparently we got it working regardless! So I'm going to fix this
to only change the logic for old-spec and not touch other cases.
This is a higher-priority shaper than default shaper ("ot"), but
only picks up fonts that have AAT "morx"/"mort" table.
Note that for this to work the font face's get_table() implementation
should know how to return the full font blob.
Based on patch from Konstantin Ritt.
Not exhaustively tested, but I think I got the intended logic
right.
The logic can perhaps be simplified. Maybe we should disabled
normalization with this shaper. Then again, for now focusing on
correctness.
When seeing U+2044 FRACTION SLASH in the text, find decimal
digits (Unicode General Category Decimal_Number) around it,
and mark the pre-slash digits with 'numr' feature, the post-slash
digits with 'dnom' feature, and the whole sequence with 'frac'
feature.
This beautifully renders fractions with major Windows fonts,
and any other font that implements those features (numr/dnom is
enough for most fonts.)
Not the fastest way to do this, but good enough for a start.
CoreText does automatic font fallback (AKA "cascading") for characters
not supported by the requested font, and provides no way to turn it off,
so detect if the returned run uses a font other than the requested one
and fill in the buffer with .notdef glyphs instead of random indices
glyph from a different font.
The spec and Uniscribe don't allow these, but UTN#11
specifically says the sequence U+104B,U+1038 is valid.
As such, allow all "P V" sequences. There's about
eight sequences that match that structure, but Roozbeh
thinks it's fine to allow all of them.
Test case: U+104B, U+1038
https://bugs.freedesktop.org/show_bug.cgi?id=71947
The spec and Uniscribe treat it as consonant in the grammar, but
it's not in IndicSyllableCategory.txt, so fix up.
Test sequence: U+1004,U+103A,U+1039,U+104E
https://bugs.freedesktop.org/show_bug.cgi?id=71948
This is broken sequence according to OpenType spec, Uniscribe,
and current HarfBuzz implementation. But Roozbeh says this
is a valid sequence, so allow it. There are multiple
"(DB As?)?" constructs in the grammar, but Roozbeh thinks only
this one needs changing.
Test case: 1014,1063,103A
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=71949
Based on research into latest SIL and Windows fonts, pulling in
the latest OpenType language tag proposal from Microsoft, and updating
to latest language tags and names from ISO 639.
This reverts commit d5bd0590ae.
The reasoning behind that logic was flawed and made under
a misunderstanding of the original problem, and caused
regressions as reported by Jonathan Kew in thread titled
"tibetan marks" in Oct 2013. Apparently I have had fixed
the original problem with this commit:
7e08f1258d
So, revert the faulty commit and everything seems to be in good
shape.
For Javanese (pref_len == 1) only reorder if it didn't ligate. That's
sensible, and what the spec says. For other Indic (pref_len > 1)
only reorder if ligated.
Doesn't change any test numbers.
Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not provide
same results as Windows8
https://bugs.freedesktop.org/show_bug.cgi?id=58714
Test with U+0CB0,U+200D,U+0CCD,U+0C95,U+0CBF and tunga.ttf.
Improves some scripts. Improves Bengali too, but numbers
are up because we produce better results than Uniscribe for some
sequences now.
New numbers:
BENGALI: 353724 out of 354188 tests passed. 464 failed (0.131004%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)