The numbers for right-to-left scripts are processed also from right to
left, so the order of applying “numr” and “dnom” features should be
reversed in such case.
Fixes https://github.com/behdad/harfbuzz/issues/395
New approach to fix this:
69f9fbc420
Previous approach was reverted as it was too broad. See context:
https://github.com/behdad/harfbuzz/issues/347#issuecomment-267838368
With U+05E9,U+05B8,U+05C1,U+05DC and Arial Unicode, we now (correctly) disable
GDEF and GPOS, so we get results very close to Uniscribe, but slightly different
since our fallback position logic is not exactly the same:
Before: [gid1166=3+991|gid1142=0+737|gid5798=0+1434]
After: [gid1166=3+991|gid1142=0@402,-26+0|gid5798=0+1434]
Uniscribe: [gid1166=3+991|gid1142=0@348,0+0|gid5798=0+1434]
Previously we only synthesized GDEF glyph classes if the glyphClassDef
array in GDEF was null. This worked well enough, and is indeed what
OpenType requires: "If the font does not include a GlyphClassDef table,
the client must define and maintain this information when using the
GSUB and GPOS tables." That sentence does not quite make sense since
one needs Unicode properties as well, but is close enough.
However, looks like Arial Unicode as shipped on WinXP, does have GDEF
glyph class array, but defines no classes for Hebrew. This results
in Hebrew marks not getting their widths zeroed. So, with this change,
we synthesize glyph class for any glyph that is not specified in the
GDEF glyph class table. Since, from our point of view, a glyph not
being listed in that table is a font bug, any unwanted consequence of
this change is a font bug :).
Note that we still don't get the same rendering as Uniscribe, since
Uniscribe seems to do fallback positioning as well, even though the
font does have a GPOS table (which does NOT cover Hebrew!). We are
not going to try to match that though.
Test string for Arial Unicode:
U+05E9,U+05B8,U+05C1,U+05DC
Before: [gid1166=3+991|gid1142=0+737|gid5798=0+1434]
After: [gid1166=3+991|gid1142=0+0|gid5798=0+1434]
Uniscribe: [gid1166=3+991|gid1142=0@348,0+0|gid5798=0+1434]
Note that our new output matches what we were generating until July
2014, because the Hebrew shaper used to zero mark advances based on
Unicode, NOT GDEF. That's 9e834e29e0.
Reported by Greg Douglas.
New API:
- hb_font_get_nominal_glyph_func_t
- hb_font_get_variation_glyph_func_t
- hb_font_funcs_set_nominal_glyph_func()
- hb_font_funcs_set_variation_glyph_func()
- hb_font_get_nominal_glyph()
- hb_font_get_variation_glyph()
Deprecated API:
- hb_font_get_glyph_func_t
- hb_font_funcs_set_glyph_func()
Clients that implement their own font-funcs are encouraged to replace
their get_glyph() implementation with a get_nominal_glyph() and
get_variation_glyph() pair. The variation version can assume that
variation_selector argument is not zero.
This better emulates Unicode grapheme clusters.
Note that Uniscribe does NOT do this, but should be harmless with most clients,
and improve fallback with clients that use HarfBuzz cluster as unit of fallback.
Fixes https://github.com/behdad/harfbuzz/issues/217
This was brorken earlier, though, it's really hard to notice it.
Unlike the glyph_h_origin(), an unset glyph_v_origin() does NOT
mean that the vertical origin is at 0,0.
Related to https://github.com/behdad/harfbuzz/issues/187
Separate the loops for the two cases of replacing with space
and deleting. For deleting, use the out-buffer machinery.
Needed for upcoming cluster merge fix.
The reason we turned it on is because Kazuraki uses it. But that's
not reason enough. Until the OpenType spec gets its act together re
adding design-direction to lookups, this is better user experience.
Previously, we expected users to provide BOT/EOT flags when the
text *segment* was at paragraph boundaries. This meant that for
clients that provide full paragraph to HarfBuzz (eg. Pango), they
had code like this:
hb_buffer_set_flags (hb_buffer,
(item_offset == 0 ? HB_BUFFER_FLAG_BOT : 0) |
(item_offset + item_length == paragraph_length ?
HB_BUFFER_FLAG_EOT : 0));
hb_buffer_add_utf8 (hb_buffer,
paragraph_text, paragraph_length,
item_offset, item_length);
After this change such clients can simply say:
hb_buffer_set_flags (hb_buffer,
HB_BUFFER_FLAG_BOT | HB_BUFFER_FLAG_EOT);
hb_buffer_add_utf8 (hb_buffer,
paragraph_text, paragraph_length,
item_offset, item_length);
Ie, HarfBuzz itself checks whether the segment is at the beginning/end
of the paragraph. Clients that only pass item-at-a-time to HarfBuzz
continue not setting any flags whatsoever.
Another way to put it is: if there's pre-context text in the buffer,
HarfBuzz ignores the BOT flag. If there's post-context, it ignores
EOT flag.
Originally we fixed those in 79d1007a50.
However, fonts like MongolianWhite don't have GDEF, but have IgnoreMarks
in their LigatureSubstitute init/etc features. We were synthesizing a
GDEF class of mark for Mongolian Variation Selectors and as such the
ligature lookups where not matching. Uniscribe doesn't do that.
I tried with more sophisticated fixes, like, if there is no GDEF and
a lookup-flag mismatch happens, instead of rejecting a match, try
skipping that glyph. That surely produces some interesting behavior,
but since we don't want to support fonts missing GDEF more than we have
to, I went for this simpler fix which is to always mark
default-ignorables as base when synthesizing GDEF.
Micro-test added.
Fixes rest of https://bugs.freedesktop.org/show_bug.cgi?id=65258
When seeing U+2044 FRACTION SLASH in the text, find decimal
digits (Unicode General Category Decimal_Number) around it,
and mark the pre-slash digits with 'numr' feature, the post-slash
digits with 'dnom' feature, and the whole sequence with 'frac'
feature.
This beautifully renders fractions with major Windows fonts,
and any other font that implements those features (numr/dnom is
enough for most fonts.)
Not the fastest way to do this, but good enough for a start.
This reverts commit d5bd0590ae.
The reasoning behind that logic was flawed and made under
a misunderstanding of the original problem, and caused
regressions as reported by Jonathan Kew in thread titled
"tibetan marks" in Oct 2013. Apparently I have had fixed
the original problem with this commit:
7e08f1258d
So, revert the faulty commit and everything seems to be in good
shape.
Before, if one called hb_shape() without setting script, language, and
direction on the buffer, hb_shape() was calling
hb_buffer_guess_segment_properties() on the user's behalf to guess
these.
This is very dangerous, since any serious user of HarfBuzz must set
these properly (specially important is direction). So now, we don't
guess properties by default. People not setting direction will get
an abort() now. If the old behavior is desired (fragile, good for
simple testing only), users can call
hb_buffer_guess_segment_properties() on the buffer just before calling
hb_shape().
This is a followup to 568000274c.
Looks like in the Latin shaper, Uniscribe zeroes all Unicode NSM
advances *after* GPOS, not before. Match that.
Can be tested using DejaVu Sans Mono, since that font has GPOS
rules to zero the mark advances on its own.
Before, we were zeroing advance width of attached marks for
non-Indic scripts, and not doing it for Indic.
We have now three different behaviors, which seem to better
reflect what Uniscribe is doing:
- For Indic, no explicit zeroing happens whatsoever, which
is the same as before,
- For Myanmar, zero advance width of glyphs marked as marks
*in GDEF*, and do that *before* applying GPOS. This seems
to be what the new Win8 Myanmar shaper does,
- For everything else, zero advance width of glyphs that are
from General_Category=Mn Unicode characters, and do so
before applying GPOS. This seems to be what Uniscribe does
for Latin at least.
With these changes, positioning of all tests matches for Myanmar,
except for the glitch in Uniscribe not applying 'mark'. See preivous
commit.
API additions:
hb_segment_properties_t
HB_SEGMENT_PROPERTIES_DEFAULT
hb_segment_properties_equal()
hb_segment_properties_hash()
hb_buffer_set_segment_properties()
hb_buffer_get_segment_properties()
hb_ot_layout_glyph_class_t
hb_shape_plan_t
hb_shape_plan_create()
hb_shape_plan_create_cached()
hb_shape_plan_get_empty()
hb_shape_plan_reference()
hb_shape_plan_destroy()
hb_shape_plan_set_user_data()
hb_shape_plan_get_user_data()
hb_shape_plan_execute()
hb_ot_shape_plan_collect_lookups()
API changes:
Rename hb_ot_layout_feature_get_lookup_indexes() to
hb_ot_layout_feature_get_lookups().
New header file:
hb-shape-plan.h
And a bunch of prototyped but not implemented stuff. Coming soon.
(Tests fail because of the prototypes right now.)
New API:
hb_buffer_flags_t
HB_BUFFER_FLAGS_DEFAULT
HB_BUFFER_FLAG_BOT
HB_BUFFER_FLAG_EOT
HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES
hb_buffer_set_flags()
hb_buffer_get_flags()
We use the BOT flag to decide whether to insert dottedcircle if the
first char in the buffer is a combining mark.
The PRESERVE_DEFAULT_IGNORABLES flag prevents removal of characters like
ZWNJ/ZWJ/...
Had to do some refactoring to make this happen...
Under uniscribe bug compatibility mode, we still plit them
Uniscrie-style, but Jonathan and I convinced ourselves that there is no
harm doing this the Unicode way. This change makes that happen, and
unbreaks free Sinhala fonts.
That's really the logic desired. Except that MONGOLIAN VOWEL SEPARATOR
is not default_ignorable but it really should be. Reported to Unicode.
Based on suggestion from Konstantin Ritt.
Unfortunately if the font has GPOS and 'mark' feature does
not position mark on dotted-circle, our inserted dotted-circle
will not get the mark repositioned to itself. Uniscribe cheats
here.
If there is no GPOS however, the fallback positioning kicks in
and sorts this out.
I'm not willing to address the first case.
This will eventually allow us to skip marks, as well as (fallback)
attach marks to ligature components of fallback-shaped Arabic.
That would be pretty cool. I kludged GDEF props in, so mark-skipping
works, but the produced ligature id/components will be cleared later
by substitute_start() et al.
Perhaps using a synthetic table for Arabic fallback shaping was a better
idea. The current approach has way too many layering violations...
The merger of normalizer and glyph-mapping broke shapers that
modified text stream. Unbreak them by adding a new preprocess_text
shaping stage that happens before normalizing/cmap and disallow
setup_mask modification of actual text.
Essentially move the glyph mapping to normalization process.
The effect on Devanagari is small (but observable). Should be more
observable in simple text, like ASCII.
'rclt' is "Required Contextual Forms" being proposed by Microsoft.
It's like 'calt', but supposedly always on. We apply 'calt' anyway,
and now apply this too.
At this point, the GDEF glyph synthesis looks pointless. Not that I
have many fonts without GDEF lying around.
As for mark advance zeroing when GPOS not available, that also is being
replaced by proper fallback mark positioning soon.
We need the font for glyph lookup during GSUB pauses in Indic shaper.
Could perhaps be avoided, but at this point, we don't mean to support
separate substitute()/position() entry points (anymore), so there is
no point in not providing the font to GSUB.
If there is no GPOS, zero mark advances.
If there *is* GPOS and the shaper requests so, zero mark advances for
attached marks.
Fixes regression with Tibetan, where the font has GPOS, and marks a
glyph as mark where it shouldn't get zero advance.
When we removed the separate Hangul shaper, the specific normalization
preference of Hangul was lost. Fix that. Also, the Thai shaper was
copied from Hangul, so had the fully-composed normalization behavior,
which was unnecessary. So, fix that too.
Also remove shaper_options argument to hb_shape_full(). That was
unused and for "future". Let it go.
More shaper API coming in preparation for plan/planned API.
hb_shape() now accepts a shaper_options and a shaper_list argument.
Both can be set to NULL to emulate previous API. And in most situations
they are expected to be set to NULL.
hb_shape() also returns a boolean for now. If shaper_list is NULL, the
return value can be ignored.
shaper_options is ignored for now, but otherwise it should be a
NULL-terminated list of strings.
shaper_list is a NULL-terminated list of strings. Currently recognized
strings are "ot" for native OpenType Layout implementation, "uniscribe"
for the Uniscribe backend, and "fallback" for the non-complex backend
(that will be implemented shortly). The fallback backend never fails.
The env var HB_SHAPER_LIST is also parsed and honored. It's a
colon-separated list of shaper names. The fallback shaper is invoked if
none of the env-listed shapers succeed.
New API hb_buffer_guess_properties() added.
I've messed up a lot of stuff recently, different parts of the
shaping process are stumbling on eachother's toes because
manually tracking what's in which buffer var is hard. I'm
going to add some internal API to track those such that mistakes
are discovered as soon as they are introduced.
Instead of always applying those two features before the complex shaper,
let the complex shaper decide whether they should be applied first.
Also add stub for Indic's final_reordering().
Add compose() and decompose() unicode funcs. These implement
pair-wise canonical composition/decomposition.
The glib/icu implementations are lacking for now. We are adding
API for this to glib, but I cannot find any useful API in ICU.
May end of implementing these in-house.
Changed all unicode_funcs callback names to remove the "_get" part.
Eg, hb_unicode_get_script_func_t is now hb_unicode_script_func_t,
and hb_unicode_get_script() is hb_unicode_script() now.
Wow, it took me a few days to find the right fix!
We now set the advance for attached marks to zero, but we
do this in the _finish() state of gpos, so it shouldn't
regress with fonts like DejaVuSansMono that explicitly
decrease the mark advance width to set it to zero.
We need to know whether the glyph exists, so we can fallback to
composing / decomposing. Assuming that glyph==0 means "doesn't exist"
wouldn't work for applications like Pango that want to use different
"doesn't exist" glyph codes for different characters. An explicit
return value fixes that.
Unicode data providers can now be subclassed, including support for
chain-up. The interface should now be nicely bindable, as well.
Also fix glib unicode funcs that where broken after hb_script_t
changes. Nicely caught by the test-unicode.c added in this commit.
That better matches OpenType spec. Note that we enable it for all
Arabic-shaper scripts. Ie. we enable it by default for Syriac too,
but the SyriacOT spec does not require it. I think this is a more
useful compromise than special-casing for Arabic script alone.
- Rename HB_SCRIPT_INVALID_CODE to HB_SCRIPT_INVALID
- Add HB_DIRECTION_INVALID
- Make hb_script_get_horizontal_direction() public
- Make hb_shape() guess script from buffer text (first non-common
non-inherit script) if buffer script is set to HB_SCRIPT_INVALID (this
is NOT the default.)
- Make hb_shape() guess direction from buffer script if buffer direction
is set to HB_DIRECTION_INVALID (this is NOT the default.)
- Make hb-view.c set INVALID script and direction on the buffer.
The above changes are meant to make hb-view fairly useful for uni-script
uni-direction text. The guessing behavior however is NOT the default of
hb_shape() and must be asked for explicitly. This is intended, because
the guess is not a suitable substitute to full-fledged bidi and script
segmentation. It's just a testing tool.
We should ensure-direction before doing any complex work. The only
exception is mirroring that needs to see the original / final direction,
not the native. Handle that.
Previously boolean features turned on the entire feature mask. This is
wrong if feature is Alternate and user has provided values bigger than one.
Though, I don't think other engines support such corner cases.