Previously, we expected users to provide BOT/EOT flags when the
text *segment* was at paragraph boundaries. This meant that for
clients that provide full paragraph to HarfBuzz (eg. Pango), they
had code like this:
hb_buffer_set_flags (hb_buffer,
(item_offset == 0 ? HB_BUFFER_FLAG_BOT : 0) |
(item_offset + item_length == paragraph_length ?
HB_BUFFER_FLAG_EOT : 0));
hb_buffer_add_utf8 (hb_buffer,
paragraph_text, paragraph_length,
item_offset, item_length);
After this change such clients can simply say:
hb_buffer_set_flags (hb_buffer,
HB_BUFFER_FLAG_BOT | HB_BUFFER_FLAG_EOT);
hb_buffer_add_utf8 (hb_buffer,
paragraph_text, paragraph_length,
item_offset, item_length);
Ie, HarfBuzz itself checks whether the segment is at the beginning/end
of the paragraph. Clients that only pass item-at-a-time to HarfBuzz
continue not setting any flags whatsoever.
Another way to put it is: if there's pre-context text in the buffer,
HarfBuzz ignores the BOT flag. If there's post-context, it ignores
EOT flag.
The table can now compile independently too. If we cannot make it work
on MSVC, we can always generate the data and distribute it.
The code now compiles cleanly with:
gcc -c -xc -std=c99 -Werror -pedantic hb-ot-shape-complex-arabic-win1256.hh
g++ -c -xc -std=c++1x -Werror -pedantic hb-ot-shape-complex-arabic-win1256.hh
See:
a97f537cec (commitcomment-7218736)
Bug 1045139 - The Arabic text with "MS Sans Serif" font is rendered bad
https://bugzilla.mozilla.org/show_bug.cgi?id=1045139
This is only enabled on Windows platforms, and requires support from
Uniscribe to work. But for clients that do hook up to Uniscribe, this
fixes shaping of Windows-1256-encoded bitmap fonts like "MS Sans Serif".
The code and table together have just less than a 1kb footprint when
enabled.
UNTESTED. I might even have broken regular Arabic fallback shaping.
Seems to be what Uniscribe does.
At this point I think it's work checking our default...
Fixes Bug 76767 - Zeroing of advance of 2nd component of multiple
substitution with SBL Hebrew
https://bugs.freedesktop.org/show_bug.cgi?id=76767
Micro-test added.
Looks like Unsicribe responds to the 'mymr' tag by zeroing marks
GDEF_LATE instead of generic-shaper UNICODE_LATE. Implement that.
Fixes
Bug 81775 - Incorrect Rendering with harfbuzz-ng myanmar unicode
https://bugs.freedesktop.org/show_bug.cgi?id=81775
Micro-test added based on Padauk.
Follows the order of the Arabic/Syriac specs. Also don't stop
between rlig and calt in non-Arabic scripts.
Micro-tests for Arabic and Mongolian added for the latter.
We now handle U+FFFD replacement in hb_buffer_add_utf*(). Any other
manipulation can happen in user callbacks. No need for this.
efe74214bb (commitcomment-7039404)
This reverts commit efe74214bb.
Conflicts:
src/hb-ot-shape-normalize.cc
With this change, we now by default replace broken UTF-8/16/32 bits
with U+FFFD. This can be changed by calling new API on the buffer.
Previously the replacement value used to be (hb_codepoint_t)-1.
Note that hb_buffer_clear_contents() does NOT reset the replacement
character.
See discussion here:
6f13b6d62d
New API:
hb_buffer_set_replacement_codepoint()
hb_buffer_get_replacement_codepoint()
Originally we fixed those in 79d1007a50.
However, fonts like MongolianWhite don't have GDEF, but have IgnoreMarks
in their LigatureSubstitute init/etc features. We were synthesizing a
GDEF class of mark for Mongolian Variation Selectors and as such the
ligature lookups where not matching. Uniscribe doesn't do that.
I tried with more sophisticated fixes, like, if there is no GDEF and
a lookup-flag mismatch happens, instead of rejecting a match, try
skipping that glyph. That surely produces some interesting behavior,
but since we don't want to support fonts missing GDEF more than we have
to, I went for this simpler fix which is to always mark
default-ignorables as base when synthesizing GDEF.
Micro-test added.
Fixes rest of https://bugs.freedesktop.org/show_bug.cgi?id=65258
Only if the font doesn't support it. Ie, this gives the user to
use non-Unicode codepoints as private values and return a meaningful
glyph for them. But if it's invalid and font callback doesn't
like it, and if font has U+FFFD, show that instead.
Font functions that do not want this automatic replacement to
happen should return true from get_glyph() if unicode > 0x10FFFF.
Replaces https://github.com/behdad/harfbuzz/pull/27
There may be more. There are members that are by definition
redundant or reserved and not needed, NOT what we *currently*
don't use.
I'm sure there's more...
Add hb_ot_layout_language_get_required_feature_index() again, which
is used in Pango. This was removed in
da13293798 in favor of
hb_ot_layout_language_get_required_feature().
API changes:
- Added hb_ot_layout_language_get_required_feature_index back.
HB_VERSION_CHECK's comparison was originally written wrongly
by mistake. When API tests were written, they were also written
wrongly to pass given the wrong implementation... Sigh.
Given the purpose of this API, there's no point in fixing it
without renaming it. As such, rename.
API changes:
HB_VERSION_CHECK -> HB_VERSION_ATLEAST
hb_version_check -> hb_version_atleast
If pre-base reordering Ra is NOT formed (or formed and then
broken up), we should consider that Ra as base. This is
observable when there's a left matra or dotreph that positions
before base.
Now, it might be that we shouldn't do this if the Ra happend
to form a below form. We can't quite deduce that right now...
Micro test added. Also at:
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
Sometimes font designers form half/pref/etc consonant forms
unconditionally and then undo that conditionally. Try to
recover the OT_H classification in those cases.
No test number changes expected.
Normally if you want to, say, conditionally prevent a 'pref', you
would use blocking contextual matching. Some designers instead
form the 'pref' form, then undo it in context. To detect that
we now also remember glyphs that went through MultipleSubst.
In the only place that this is used, Uniscribe seems to only care
about the "last" transformation between Ligature and Multiple
substitions. Ie. if you ligate, expand, and ligate again, it
moves the pref, but if you ligate and expand it doesn't. That's
why we clear the MULTIPLIED bit when setting LIGATED.
Micro-test added. Test: U+0D2F,0D4D,0D30 with font from:
[1]
https://code.google.com/a/google.com/p/noto-alpha/issues/detail?id=186#c29
Roboto was hitting this. FreeType also has pretty much the
same code for this, in ttcmap.c:tt_cmap4_validate():
/* in certain fonts, the `length' field is invalid and goes */
/* out of bound. We try to correct this here... */
if ( table + length > valid->limit )
{
if ( valid->level >= FT_VALIDATE_TIGHT )
FT_INVALID_TOO_SHORT;
length = (FT_UInt)( valid->limit - table );
}