https://bugs.freedesktop.org/show_bug.cgi?id=91228
Commit cdcdfe61b9 changed two `#pragma
message` to `#pragma error` in hb-unicode.cc, however MSVC uses #error,
just like the #else branch. `#pragma error` is an unknown pragma so
MSVC does not fail the build because of it, which I believe was the
intention of that commit.
If it's meant to be an #error, then the #ifdef for _MSC_VER can be
removed entirely.
Separate the loops for the two cases of replacing with space
and deleting. For deleting, use the out-buffer machinery.
Needed for upcoming cluster merge fix.
Applying unary minus operator to unsigned int causes the following
warning on MSVS:
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Based on patch from Koji Ishi.
Fixes https://github.com/behdad/harfbuzz/pull/110
Previously, when creating an object from inert inputs (eg:
"hb_font_create(hb_face_get_empty())") we returned the inert
empty object. This is not helpful as there are legitimate
usecases to do that.
We now never return the inert object unless allocation failed.
Tests are revised to reflect.
s/atomic_int/atomic_int_impl/ and s/atomic_ptr/atomic_ptr_impl/
to bring it in par with hb_mutex_impl_t, then re-introduce
hb_atomic_int_t as a wrapper around hb_atomic_int_impl_t.
In hb_reference_count_t, make it clear the non-atomic get and set
are intentional due to nature of the cases they are used in
(comparison to -1 and the debug output/tracing).
In hb-coretext, when we were using scratch buffer for book-keeping,
a reverse_range() caused by the notdef-insertion loop could mess up
our log_clusters. Ouch!
Before, the IntType::cmp functions providing this and was truncating
the hb_codepoint_t to 16bits before comparison. I have no idea how
this was never discovered, and I'm too lazy to try to reproduce this
with Pango (which uses non-16bit codepoint numbers for missing glyphs).
This makes a lot of code safer. We only try modifying the object in one
place, after making sure it's safe to do so. So, do a const_cast<> in
that one place...
Currently:
- Initializing skippy is very expensive,
- Our lookup accelerator (using set-digests) can be very ineffecite,
As such, we end up many times initializing skippy but then failing
coverage check. Reordering fixes that.
When, later, we fix our accelerator to have truly small false-positive
rate (for example by using the frozen-sets), then we might want to
reorder these checks such that we wouldn't calculate coverage number
if skippy is going to fail.
This shows a 5% speedup with Roboto already.
I experimented with replacing use of hb_set_digest_t with this new
hb_frozen_set_t, hoping to get a huge speedup for busy lookups
(like kern lookup in Roboto), but I only got 6% speendup in Roboto
and 4% in NotoNastaliqUrduDraft :(.
This code is C++ only. There isn't a single C++ compiler that fails to
understand the "inline" keyword, since it's required by C++98. Any
compiler older than C++98 is likely to choke on the template usage
further down, so this isn't necessary.
Moreover, the C++ standard says you cannot define macros.
[lib.macro.names] says "Nor shall such a translation unit define macros
for names lexically identical to keywords." -- technically, it's a
promise that the Standard Library headers won't do it, the wording means
that the entire translation unit won't do it, which implies no source
can do it.
MSVC complains about it:
fatal error C1189: #error : The C++ Standard Library forbids macroizing
keywords. Enable warning C4005 to find the forbidden macro.
Author: Thiago Macieira <thiago.macieira@intel.com>
This is by no ways to promote non-Unicode encodings. This is an entry
point that takes Unicode codepoints that happen to all be the first
256 characters and hence fit in 8bit strings. This is useful eg in Chrome
where strings that can fit in 8bit are implemented that way, and this
avoids copying into UTF-8 or UTF-16.
Perhaps we should rename this to hb_buffer_add_codepoints8(). I'm also
curious if anyone would be really interested in hb_buffer_add_codepoints16().
Please discuss!
Roboto has glyphs (like 'F') that have 200 kerning pairs.
Add a handcoded bsearch instead of previous linear search.
This doesn't show much speedup though, apparently we spend the
bulk of the time somewhere before here.
For discussion see:
http://lists.freedesktop.org/archives/harfbuzz/2012-April/001905.html
Over time we have had added NO_HINTING all over the place in hb-ft. Finish it off.
Not setting ppem on hb-font disables get_contour_point() calls which is good anyway.
See comments in the commit.
When I originally wrote hb-ft, FreeType objects did not support reference
counting. As such, hb_ft_face_create() and hb_ft_font_create() had a
"destroy" callback and client was responsible for making sure FT_Face is
kept around as long as the hb-font/face are alive.
However, since this was not clearly documented, some clienets didn't
correctly did that. In particular, some clients assumed that it's safe
to destroy FT_Face and then hb_face_t. This, indeed, used to work, until
45fd9424c7, which make face destroy access
font tables.
Now, I fixed that issue in 395b35903e since
the access was not needed, but the problem remains that not all clients
handle this correctly. See:
https://bugs.freedesktop.org/show_bug.cgi?id=86300
Fortunately, FT_Reference_Face() was added to FreeType in 2010, and so we
can use it now. Originally I wanted to change hb_ft_face_create() and
hb_ft_font_create() to reference the face if destroy==NULL was passed in.
That would improve pretty much all clients, with little undesired effects.
Except that FreeType itself, when compiled with HarfBuzz support, calls
hb_ft_font_create() with destroy==NULL and saves the resulting hb-font on
the ft-face (why does it not free it immediately?). Making hb-face
reference ft-face causes a cycling reference there. At least, that's my
current understanding.
At any rate, a cleaner approach, even if it means all clients will need a
change, is to introduce brand new API. Which this commit does.
Some comments added to hb-ft.h, hoping to make future clients make better
choices.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=75299
"Fixes" https://bugs.freedesktop.org/show_bug.cgi?id=86300
Based on discussion someone else who had a similar issue, most probably
the user is releasing FT_Face before destructing hb_face_t / hb_font_t.
While that's a client bug, and while we can (and should) use FreeType
refcounting to help avoid that, it happens that we were accessing
the table when we didn't really have to. Avoid that.
Fail if blob start plus length overflows; or if blob length
is greater than 2GB. It takes a while for fonts to get to that
size. In the mean time, it protects against bugs like this:
http://www.icu-project.org/trac/ticket/11450
Also avoids some weird issues with 32bit vs 64bit systems
as we accept length as unsigned int. As such, a length of
-1 will cause overflow on 32bit machines, but happily
accepted on a 64bit machine. Avoid that.
In Oriya, a ZWJ/ZWNJ might be added before candrabindu to encourage
or stop ligation of the candrabindu. This is clearly specified in
the Unicode section on Oriya. Allow it there. Note that Uniscribe
doesn't allow this.
Micro tests added using Noto Sans Oriya draft.
No changes in numbers. Currently at:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048147 out of 1048334 tests passed. 187 failed (0.0178378%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Otherwise, we might process a lookup thousands of times, with no
benefit. This pathological case was hit by Noto Nastaliq Urdu Draft
in Firefox's code to determine whether space glyph is involved in
any GSUB/GPOS rules. A test page is at http://behdad.org/urdu
See:
https://bugzilla.mozilla.org/show_bug.cgi?id=1090869
Currently doesn't work though, we detect font fallback. Apparently
matching on ct_font is not safe for this. Looks like commit
25f4fb9b56 wasn't enough after all.
After 763e5466c0, one doesn't
need to set flags for different pieces of text. The flags now
are something the client sets up once, depending on how it
actually uses the buffer. As such, don't clear it in
clear_contents().
Tests updated.
We can't really resize buffer and continue in this shaper as we are
using the scratch buffer for string_ref and log_cluster. Restructure
shaper to retry from (almost) scratch.
Apparently those functions documented as sometimes returning NULL
actually exercise that right in OS X 10.10 Yosemite. The scratch
was too small for that. I *think* I fixed it, but haven't tested
as I don't have Yosemite.
Apparently they are not (advertised as?) safe on BSD systems.
We ignore the case of static libraries.
Whitelisted on glibc, Android, and MSVC / mingw.
https://bugs.freedesktop.org/show_bug.cgi?id=82246
The reason we turned it on is because Kazuraki uses it. But that's
not reason enough. Until the OpenType spec gets its act together re
adding design-direction to lookups, this is better user experience.
Previously, we expected users to provide BOT/EOT flags when the
text *segment* was at paragraph boundaries. This meant that for
clients that provide full paragraph to HarfBuzz (eg. Pango), they
had code like this:
hb_buffer_set_flags (hb_buffer,
(item_offset == 0 ? HB_BUFFER_FLAG_BOT : 0) |
(item_offset + item_length == paragraph_length ?
HB_BUFFER_FLAG_EOT : 0));
hb_buffer_add_utf8 (hb_buffer,
paragraph_text, paragraph_length,
item_offset, item_length);
After this change such clients can simply say:
hb_buffer_set_flags (hb_buffer,
HB_BUFFER_FLAG_BOT | HB_BUFFER_FLAG_EOT);
hb_buffer_add_utf8 (hb_buffer,
paragraph_text, paragraph_length,
item_offset, item_length);
Ie, HarfBuzz itself checks whether the segment is at the beginning/end
of the paragraph. Clients that only pass item-at-a-time to HarfBuzz
continue not setting any flags whatsoever.
Another way to put it is: if there's pre-context text in the buffer,
HarfBuzz ignores the BOT flag. If there's post-context, it ignores
EOT flag.
The table can now compile independently too. If we cannot make it work
on MSVC, we can always generate the data and distribute it.
The code now compiles cleanly with:
gcc -c -xc -std=c99 -Werror -pedantic hb-ot-shape-complex-arabic-win1256.hh
g++ -c -xc -std=c++1x -Werror -pedantic hb-ot-shape-complex-arabic-win1256.hh
See:
a97f537cec (commitcomment-7218736)
Bug 1045139 - The Arabic text with "MS Sans Serif" font is rendered bad
https://bugzilla.mozilla.org/show_bug.cgi?id=1045139
This is only enabled on Windows platforms, and requires support from
Uniscribe to work. But for clients that do hook up to Uniscribe, this
fixes shaping of Windows-1256-encoded bitmap fonts like "MS Sans Serif".
The code and table together have just less than a 1kb footprint when
enabled.
UNTESTED. I might even have broken regular Arabic fallback shaping.
Seems to be what Uniscribe does.
At this point I think it's work checking our default...
Fixes Bug 76767 - Zeroing of advance of 2nd component of multiple
substitution with SBL Hebrew
https://bugs.freedesktop.org/show_bug.cgi?id=76767
Micro-test added.
Looks like Unsicribe responds to the 'mymr' tag by zeroing marks
GDEF_LATE instead of generic-shaper UNICODE_LATE. Implement that.
Fixes
Bug 81775 - Incorrect Rendering with harfbuzz-ng myanmar unicode
https://bugs.freedesktop.org/show_bug.cgi?id=81775
Micro-test added based on Padauk.
Follows the order of the Arabic/Syriac specs. Also don't stop
between rlig and calt in non-Arabic scripts.
Micro-tests for Arabic and Mongolian added for the latter.
We now handle U+FFFD replacement in hb_buffer_add_utf*(). Any other
manipulation can happen in user callbacks. No need for this.
efe74214bb (commitcomment-7039404)
This reverts commit efe74214bb.
Conflicts:
src/hb-ot-shape-normalize.cc
With this change, we now by default replace broken UTF-8/16/32 bits
with U+FFFD. This can be changed by calling new API on the buffer.
Previously the replacement value used to be (hb_codepoint_t)-1.
Note that hb_buffer_clear_contents() does NOT reset the replacement
character.
See discussion here:
6f13b6d62d
New API:
hb_buffer_set_replacement_codepoint()
hb_buffer_get_replacement_codepoint()
Originally we fixed those in 79d1007a50.
However, fonts like MongolianWhite don't have GDEF, but have IgnoreMarks
in their LigatureSubstitute init/etc features. We were synthesizing a
GDEF class of mark for Mongolian Variation Selectors and as such the
ligature lookups where not matching. Uniscribe doesn't do that.
I tried with more sophisticated fixes, like, if there is no GDEF and
a lookup-flag mismatch happens, instead of rejecting a match, try
skipping that glyph. That surely produces some interesting behavior,
but since we don't want to support fonts missing GDEF more than we have
to, I went for this simpler fix which is to always mark
default-ignorables as base when synthesizing GDEF.
Micro-test added.
Fixes rest of https://bugs.freedesktop.org/show_bug.cgi?id=65258
Only if the font doesn't support it. Ie, this gives the user to
use non-Unicode codepoints as private values and return a meaningful
glyph for them. But if it's invalid and font callback doesn't
like it, and if font has U+FFFD, show that instead.
Font functions that do not want this automatic replacement to
happen should return true from get_glyph() if unicode > 0x10FFFF.
Replaces https://github.com/behdad/harfbuzz/pull/27
There may be more. There are members that are by definition
redundant or reserved and not needed, NOT what we *currently*
don't use.
I'm sure there's more...
Add hb_ot_layout_language_get_required_feature_index() again, which
is used in Pango. This was removed in
da13293798 in favor of
hb_ot_layout_language_get_required_feature().
API changes:
- Added hb_ot_layout_language_get_required_feature_index back.
HB_VERSION_CHECK's comparison was originally written wrongly
by mistake. When API tests were written, they were also written
wrongly to pass given the wrong implementation... Sigh.
Given the purpose of this API, there's no point in fixing it
without renaming it. As such, rename.
API changes:
HB_VERSION_CHECK -> HB_VERSION_ATLEAST
hb_version_check -> hb_version_atleast
If pre-base reordering Ra is NOT formed (or formed and then
broken up), we should consider that Ra as base. This is
observable when there's a left matra or dotreph that positions
before base.
Now, it might be that we shouldn't do this if the Ra happend
to form a below form. We can't quite deduce that right now...
Micro test added. Also at:
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
Sometimes font designers form half/pref/etc consonant forms
unconditionally and then undo that conditionally. Try to
recover the OT_H classification in those cases.
No test number changes expected.
Normally if you want to, say, conditionally prevent a 'pref', you
would use blocking contextual matching. Some designers instead
form the 'pref' form, then undo it in context. To detect that
we now also remember glyphs that went through MultipleSubst.
In the only place that this is used, Uniscribe seems to only care
about the "last" transformation between Ligature and Multiple
substitions. Ie. if you ligate, expand, and ligate again, it
moves the pref, but if you ligate and expand it doesn't. That's
why we clear the MULTIPLIED bit when setting LIGATED.
Micro-test added. Test: U+0D2F,0D4D,0D30 with font from:
[1]
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
Roboto was hitting this. FreeType also has pretty much the
same code for this, in ttcmap.c:tt_cmap4_validate():
/* in certain fonts, the `length' field is invalid and goes */
/* out of bound. We try to correct this here... */
if ( table + length > valid->limit )
{
if ( valid->level >= FT_VALIDATE_TIGHT )
FT_INVALID_TOO_SHORT;
length = (FT_UInt)( valid->limit - table );
}
Sinhala and Telugu use "explicit" reph. That is, the reph is formed by
a Ra,H,ZWJ sequence. Previously, upon detecting this sequence, we were
checking checking whether the 'rphf' feature applies to the first two
glyphs of the sequence. This is how the Microsoft fonts are designed.
However, testing with Noto shows that apparently Uniscribe also forms
the reph if the lookup ligates all three glyphs. So, try both
sequences.
Doesn't affect test results for Sinhala or Telugu.
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=232
The grammar in the OT spec, and the existing Windows implementation
seem to be confused around where to allow Asat around the medial
consonants.
The previous grammar for medial group was allowing an Asat after
the medial group only if there was a medial Wa or Ha, but not if
there was only a medial Ya. This doesn't make sense to me and
sounds reversed, as both medial Wa and Ha are below marks while
Asat is an above mark. An Asat can come before the medial group
already (in fact, multiple ones can. Why?!). The medial Ya
however is a spacing mark and according to Roozbeh it's valid
to want an Asat on the medial Ya instead of the base, so it looks
to me like we want to allow an Asat after the medial group if
there *was* a Ya but not if there wasn't any. Not wanting to
produce dotted-circle where Windows is not, this commit changes
the grammar to allow one Asat after the medial group no matter
what comes in the group.
Test: U+1002,103A,103B vs U+1002,103B,103A
Before we were just relying on the compiler inlining them and not
leaving a trace in our public API. Try to fix. Hopefully not
breaking anyone's build.
commit b5a0f69e47
Author: Behdad Esfahbod <behdad@behdad.org>
Date: Thu Oct 17 18:04:23 2013 +0200
[indic] Pass zero-context=false to would_substitute for newer scripts
For scripts without an old/new spec distinction, use zero-context=false.
This changes behavior in Sinhala / Khmer, but doesn't seem to regress.
This will be useful and used in Javanese.
The *intention* was to change zero-context from true to false for scripts that
don't have old-vs-new specs. However, checking the code, looks like we
essentially change zero-context to always be true; ie. we only changed things
for old-spec, and we broke them. That's what causes this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=76705
The root of the bug is here:
/* Use zero-context would_substitute() matching for new-spec of the main
* Indic scripts, but not for old-spec or scripts with one spec only. */
bool zero_context = indic_plan->config->has_old_spec || !indic_plan->is_old_spec;
Note that is_old_spec itself is:
indic_plan->is_old_spec = indic_plan->config->has_old_spec && ((plan->map.chosen_script[0] & 0x000000FF) != '2');
It's easy to show that zero_context is now always true. What we really meant was:
bool zero_context = indic_plan->config->has_old_spec && !indic_plan->is_old_spec;
Ie, "&&" instead of "||". We made this change supposedly to make Javanese
work. But apparently we got it working regardless! So I'm going to fix this
to only change the logic for old-spec and not touch other cases.
This is a higher-priority shaper than default shaper ("ot"), but
only picks up fonts that have AAT "morx"/"mort" table.
Note that for this to work the font face's get_table() implementation
should know how to return the full font blob.
Based on patch from Konstantin Ritt.
Not exhaustively tested, but I think I got the intended logic
right.
The logic can perhaps be simplified. Maybe we should disabled
normalization with this shaper. Then again, for now focusing on
correctness.
When seeing U+2044 FRACTION SLASH in the text, find decimal
digits (Unicode General Category Decimal_Number) around it,
and mark the pre-slash digits with 'numr' feature, the post-slash
digits with 'dnom' feature, and the whole sequence with 'frac'
feature.
This beautifully renders fractions with major Windows fonts,
and any other font that implements those features (numr/dnom is
enough for most fonts.)
Not the fastest way to do this, but good enough for a start.
CoreText does automatic font fallback (AKA "cascading") for characters
not supported by the requested font, and provides no way to turn it off,
so detect if the returned run uses a font other than the requested one
and fill in the buffer with .notdef glyphs instead of random indices
glyph from a different font.
The spec and Uniscribe don't allow these, but UTN#11
specifically says the sequence U+104B,U+1038 is valid.
As such, allow all "P V" sequences. There's about
eight sequences that match that structure, but Roozbeh
thinks it's fine to allow all of them.
Test case: U+104B, U+1038
https://bugs.freedesktop.org/show_bug.cgi?id=71947
The spec and Uniscribe treat it as consonant in the grammar, but
it's not in IndicSyllableCategory.txt, so fix up.
Test sequence: U+1004,U+103A,U+1039,U+104E
https://bugs.freedesktop.org/show_bug.cgi?id=71948
This is broken sequence according to OpenType spec, Uniscribe,
and current HarfBuzz implementation. But Roozbeh says this
is a valid sequence, so allow it. There are multiple
"(DB As?)?" constructs in the grammar, but Roozbeh thinks only
this one needs changing.
Test case: 1014,1063,103A
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=71949
Based on research into latest SIL and Windows fonts, pulling in
the latest OpenType language tag proposal from Microsoft, and updating
to latest language tags and names from ISO 639.
This reverts commit d5bd0590ae.
The reasoning behind that logic was flawed and made under
a misunderstanding of the original problem, and caused
regressions as reported by Jonathan Kew in thread titled
"tibetan marks" in Oct 2013. Apparently I have had fixed
the original problem with this commit:
7e08f1258d
So, revert the faulty commit and everything seems to be in good
shape.
For Javanese (pref_len == 1) only reorder if it didn't ligate. That's
sensible, and what the spec says. For other Indic (pref_len > 1)
only reorder if ligated.
Doesn't change any test numbers.
Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not provide
same results as Windows8
https://bugs.freedesktop.org/show_bug.cgi?id=58714
Test with U+0CB0,U+200D,U+0CCD,U+0C95,U+0CBF and tunga.ttf.
Improves some scripts. Improves Bengali too, but numbers
are up because we produce better results than Uniscribe for some
sequences now.
New numbers:
BENGALI: 353724 out of 354188 tests passed. 464 failed (0.131004%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Lohit-Punjabi has a upem of 769! We were losing one unit in our
code, and FreeType is losing another one... Test with U+0A06.
Has an advance of 854 in the font. We were producing 852.
Now we do 853, which is what FreeType is telling us.
See comments from caveat! Seems to work fine.
This is useful for Javanese which has an atomically encoded pre-base
reordering Ra which should only be reordered if it was substituted
by the pref feature.
For scripts without an old/new spec distinction, use zero-context=false.
This changes behavior in Sinhala / Khmer, but doesn't seem to regress.
This will be useful and used in Javanese.