Fixes https://github.com/behdad/harfbuzz/issues/243
With javatext.ttf, the reodering medial Ra gets its advance width
zero'ed in Uniscribe implementation, and the font adds the advance
back. Our Indic shaper does not do that, but USE does. So, route
Javanese through USE. That's what Microsoft does anyway. Test:
U+A9A5,U+A9BA
This also seems to fix the following sequence, and variations thereof:
U+A99F,U+A9C0,U+A9A2,U+A9BF
Apparently some clients have reference-table callbacks that copy the table.
As such, avoid loading 'glyf' table which is only needed if fallback positioning
happens.
Fixes https://github.com/behdad/harfbuzz/issues/237
Note that Uniscribe enforces a certain syllable order. We don't.
But with this change, I get all of the tibetan contractions pass
with Microsoft Himalaya font.
Previously we only synthesized GDEF glyph classes if the glyphClassDef
array in GDEF was null. This worked well enough, and is indeed what
OpenType requires: "If the font does not include a GlyphClassDef table,
the client must define and maintain this information when using the
GSUB and GPOS tables." That sentence does not quite make sense since
one needs Unicode properties as well, but is close enough.
However, looks like Arial Unicode as shipped on WinXP, does have GDEF
glyph class array, but defines no classes for Hebrew. This results
in Hebrew marks not getting their widths zeroed. So, with this change,
we synthesize glyph class for any glyph that is not specified in the
GDEF glyph class table. Since, from our point of view, a glyph not
being listed in that table is a font bug, any unwanted consequence of
this change is a font bug :).
Note that we still don't get the same rendering as Uniscribe, since
Uniscribe seems to do fallback positioning as well, even though the
font does have a GPOS table (which does NOT cover Hebrew!). We are
not going to try to match that though.
Test string for Arial Unicode:
U+05E9,U+05B8,U+05C1,U+05DC
Before: [gid1166=3+991|gid1142=0+737|gid5798=0+1434]
After: [gid1166=3+991|gid1142=0+0|gid5798=0+1434]
Uniscribe: [gid1166=3+991|gid1142=0@348,0+0|gid5798=0+1434]
Note that our new output matches what we were generating until July
2014, because the Hebrew shaper used to zero mark advances based on
Unicode, NOT GDEF. That's 9e834e29e0.
Reported by Greg Douglas.
API changes:
- If NDEBUG is defined, define HB_NDEBUG
- Disable costlier sanity checks if HB_NDEBUG is defined.
In 1.2.3 introduced some code to disable costly sanity checks if
NDEBUG is defined. NDEBUG, however, disables all assert()s as
well. With HB_NDEBUG, one can disable costlier checks but keep
assert()s.
I'll probably add a way to define HB_NDEBUG automatically in
release tarballs. But for now, production systems that do NOT
define NDEBUG, are encouraged to define HB_NDEBUG for our build.
New API:
- hb_font_get_nominal_glyph_func_t
- hb_font_get_variation_glyph_func_t
- hb_font_funcs_set_nominal_glyph_func()
- hb_font_funcs_set_variation_glyph_func()
- hb_font_get_nominal_glyph()
- hb_font_get_variation_glyph()
Deprecated API:
- hb_font_get_glyph_func_t
- hb_font_funcs_set_glyph_func()
Clients that implement their own font-funcs are encouraged to replace
their get_glyph() implementation with a get_nominal_glyph() and
get_variation_glyph() pair. The variation version can assume that
variation_selector argument is not zero.
That commit moved the advance adjustment for mark positioning to
be applied immediately, instead of doing late before. This breaks
if mark advances are zeroed late, like in Arabic. Also, easier to
hit it in RTL scripts since a single mark with non-zero advance is
enough to hit the bug, whereas in LTR, at least two marks are needed.
This reopens https://github.com/behdad/harfbuzz/issues/211
The cursive+mark interaction is broken again. To be fixed in a
different way.
This better emulates Unicode grapheme clusters.
Note that Uniscribe does NOT do this, but should be harmless with most clients,
and improve fallback with clients that use HarfBuzz cluster as unit of fallback.
Fixes https://github.com/behdad/harfbuzz/issues/217
Fixes https://github.com/behdad/harfbuzz/issues/223
Right now we cannot test this because it has to be tested using hb-fuzzer.
We should move all fuzzing tests from test/shaping/tests/fuzzed.tests to
test/fuzzing/ and have its own test runner. At that point, should add
test from this issue as well.
This is what Microsoft's implementation does. Marks that need advance
need to add it back using 'dist' or other feature in GPOS. Update tests to
match.
Fixes https://github.com/behdad/harfbuzz/issues/211
What happens in that bug is that a mark is attached to base first,
then a second mark is cursive-chained to the first mark. This only
"works" because it's in the Indic shaper where mark advances are
not zeroed.
Before, we didn't allow cursive to run on marks at all. Fix that.
We also where updating mark major offsets at the end of GPOS, such
that changes in advance of base will not change the mark attachment
position. That was superior to the alternative (which is what Uniscribe
does BTW), but made it hard to apply cursive to the mark after it
was positioned. We could track major-direction offset changes and
apply that to cursive in the post process, but that's a much trickier
thing to do than the fix here, which is to immediately apply the
major-direction advance-width offsets... Ie.:
https://github.com/behdad/harfbuzz/issues/211#issuecomment-183194739
If this breaks any fonts, the font should be fixed to do mark attachment
after all the advances are set up first (kerning, etc).
Finally, this, still doesn't make us match Uniscribe, for I explained
in that bug. Looks like Uniscribe applies minor-direction cursive
adjustment immediate as well. We don't, and we like it our way, at
least for now. Eg. the sequence in the test case does this:
- The first subscript attaches with mark-to-base, moving in x only,
- The second subscript attaches with cursive attachment to first subscript
moving in x only,
- A final context rule moves the first subscript up by 104 units.
The way we do, the final shift-up, also shifts up the second subscript
mark because it's cursively-attached. Uniscribe doesn't. We get:
[ttaorya=0+1307|casubscriptorya=0@-242,104+-231|casubscriptnarroworya=0@20,104+507]
while Uniscribe gets:
[ttaorya=0+1307|casubscriptorya=0@-242,104+-211|casubscriptnarroworya=0+487]
note the different y-offset of the last glyph. In our view, after cursive,
things move together, period.
This moves all the source listings in src/Makefile.am,
src/hb-ucdn/Makefile.am and util/Makefile.am into separate Makefile
snippets, so that they may be shared between different Makefile-based
build systems, such as NMake for Visual Studio.
This speeds up shaping the Amiri font by over 15%.
This was primarily needed for my work on OpenType GX, since
we will be collecting only sublookups that are "active" for
current font instance; but it's a nice boost in general as
well.
We might, in the future, collect subtables in the lookup_accel.
That would also allow us to do a per-subtbale set-digest, which
should speed things up some more, specially for ContextChainFormat3
lookups... Amiri, for example, contains one lookup with 53
subtables!
The font Garamond Premier Pro Caption (and possibly many other
Adobe fonts), have many FeatureParamsSize tables with the old
wrong offset. We handle fixing those up, but they were still
contributing to edit_count, and when I reduced HB_SANITIZE_MAX_EDIT
from 100 to 8 in 14c2de3218, these
fonts were now getting GPOS dropped and hence kerning disabled.
Fix, by not counting edits made towareds offset fix-up. I'll
also increase edit count again, in the next commit.
The coretext_aat shaper delegates to the regular coretext_..._ensure() functions, so coretext_aat_..._ensure() functions defined by these macros are unused. The compiler warns about them, which in turn can confuse people to think that the coretext_aat_..._ensure() functions weren't called by accident.
Currently just announces lookup applications. Message-API *will* change.
hb-shape / hb-view are updated to print-out messages to stder if --debug
is specified.
We regeressed Malayalam in 508cc3d3cf
This brings down the failures to 198 (from 750).
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
MYANMAR: 1123865 out of 1123883 tests passed. 18 failed (0.00160159%)
Test stats remain unchanged, except for Malayalam, which we investigate:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1047584 out of 1048334 tests passed. 750 failed (0.0715421%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Myanmar, compared to Windows 10 mmrtext.ttf:
MYANMAR: 1123865 out of 1123883 tests passed. 18 failed (0.00160159%)
On Windows 10 we are seeing that other error message...
Test sequence: U+0995,U+-9CD,U+09B0
With Nirmala shipped on Windows 10, this failed to form the below form.
Works now.
Reported by Sairus.
Say, if we are ligating "A B_C m D", then previously 'm' was being
attached to 'B' in the combined A_B_C_D ligature. Now we attach it
to 'C'. No test for this though :(.
We use three bits for lig_id these days, so we finally got a report of
two separate ligatures with the same lig_id happening adjacent to each
other, and then the component-handling code was breaking things.
Protect against that by ignoring same-lig-id but lig-comp=0 glyphs after
a new ligature.
Fixes https://github.com/behdad/harfbuzz/issues/198
Before, we were just checking the use_category(). This detects as
halant a ligature that had the halant as first glyph (as seen in
NotoSansBalinese.) Change that to use the is_ligated() glyph prop
bit. The font is forming this ligature in ccmp, which is before
the rphf / pref tests. So we need to make sure the "ligated" bit
survives those tests. Since those only check the "substituted" bit,
we now only clear that bit for them and "ligated" survives.
Fixes https://github.com/behdad/harfbuzz/issues/180
- use '-w' instead of '\<...\>' for check-header-guards
grep manpage says these are the same
- put '-q' first in the grep options
- move VAR into hb-private.hh
- hb-font-private.hh - use [VAR] instead of [] for variable array
Back in the old days, we used to apply 'calt' and 'cswh' in Arabic shaper,
with a pause in between. Then we disabled the 'cswh' because Microsoft
disabled it, but forgot to remove the unnecessary pause. Do that now.
This has the benefit that it fixes shaping with monbaiti from Windows 10.
In that version of that font, the lookups from 'calt' are duplicated in
'rclt', and Mongolian was changed to go through Universal Shaping Engine.
We still use the Arabic shaper for Mongolian. With a pause after 'calt',
we were applying the duplicate lookups from 'calt' and 'rclt' twice. It
happened to be the case that these lookups were NOT idempotent. So we
were getting wrong shaping. See thread "Windows 10 monbaiti.ttf upgrade
(5.01 -> 5.51) caused loss of diacritical marks when shaped with harfbuz"
on the mailing list. This fixes that.
This aims to make Syriac Abbr Mark sizing more accurate when repeating segments are used, by adding an extra repeat and tightening up the spacing slightly rather than leaving a shortfall corresponding to a partial repeat-width.
This was brorken earlier, though, it's really hard to notice it.
Unlike the glyph_h_origin(), an unset glyph_v_origin() does NOT
mean that the vertical origin is at 0,0.
Related to https://github.com/behdad/harfbuzz/issues/187
Fixes https://github.com/behdad/harfbuzz/issues/187
Funcs implementations that have a non-zero horizontal origin must
implement the glyph_h_origin() callback, nothing new here.
Other implementations (all I know of!) can simply not set
glyph_h_origin() now. I did that for hb-ot and hb-ft in
44f8275080, though that broke the
fallback shaper because the default was returning false...
This prepares the headers for exporting symbols using visibility
attributes or __declspec(dllexport), so that we do not need to maintain
symbols listing files, as this is what was and is done in GLib and GTK+.
This is just to make it harder to be extremely slow. There definitely
are ways still, just harder. Oh well... how do we tame this problem
without solving halting problem?!
Fixes https://github.com/behdad/harfbuzz/issues/174
Use the DEFINE_ENUM_FLAG_OPERATORS macro in winnt.h on Visual Studio,
which defines the bitwise operators for the enumerations that we want to
mark as hb_mark_as_flags_t, which will take care of the situation on newer
Visual Studio (>= 2012), where the build breaks with C2057 errors as the
underlying types of the enumerations is not clear to the compiler when we
do a bitwise op within the declaration of the enumerations themselves.
Also disable the C4200 (nonstandard extension used : zero-sized array in
struct/union) and C4800 ('type' : forcing value to bool 'true' or 'false'
(performance warning)) warnings as the C4200 is the intended scenario and
C4800 is harmless but is so far an unavoidable side effect of using
DEFINE_ENUM_FLAG_OPERATORS.
This reverts commit f92bd86cc8.
We don't want to be like cairo, where as soon as there's an error,
nothing works anymore. So, lets process lookups as long as there's
no new memory needed. That's also a model that hides fewer bugs.
The feature is enabled for any character in the Arabic shaper.
We should experiment with using it for Arabic subtending marks.
Though, that has a directionality problem as well, since those
are used with digits...
Fixes https://github.com/behdad/harfbuzz/issues/141
This resurrects the space fallback feature, after I disabled
the compatibility decomposition. Now I can release HarfBuzz
again without breaking Pango!
It also remembers which space character it was, such that later
on we can approximate the width of this particular space
character. That part is not implemented yet.
We normalize all GC=Zs chars except for U+1680 OGHA SPACE MARK,
which is better left alone.
Also route fuzzing-related tests through hb-ot-font, to reduce dependency
on FreeType behavior for badly-broken fonts. Fixes failing test with
FreeType master.
The default FreeType load flags where changed from FT_LOAD_NO_HINTING
to FT_LOAD_DEFAULT in 2a9627c564.
This is crashing HarfBuzz-enabled FreeType as I suppose it causes
infinite recursion between HB and FT autohinter...
Revert the behavior change.
Fixes https://github.com/behdad/harfbuzz/issues/143
This changes the default load_flags of fonts created using
hb_ft_font_create() from NO_HINTING to DEFAULT. Hope that doesn't
break too much client code.
Code calling hb_ft_font_set_funcs() is unaffected.
The BCP-47 registry defines a variant subtag "fonipa" that can be used
in combination with arbitrary other language tags. For example,
"rm-CH-fonipa-sursilv" indicates the Sursilvan dialect of Romansh
as used in Switzerland, transcribed used the International Phonetic
Alphabet.
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
A while back we marked every API as version:1.0. We should fix them all
to reflect real version they were introduced. This is a start.
Patch from Nikolay Sivov.
Previously we were setting refcount of freed objects to the inert value, which
was harmful because it caused further destroy()s of the freed object to NOT
call free() and hence hide the bug. Indeed, after eb0bf3ae66 test-object
was double-free'ing objects and this was never caught on Linux. It only was
caught as crashing on Mac.
Now we poison refcount upon freeing and check that it's valid whenever reading
it. Makes test-object fail now.
See thread "Issue with cursive attachment" started by Khaled.
Turned out fixing this wasn't as bad as I had assumed. I like the
new code better; we now have a theoretical model of cursive
connections that is easier to reason about.
These were never tested with Indic shaper, and indeed wouldn't work there
because they didn't have their viramas and other config defined. They are
all also supported by MS through USE, so route them there.
https://bugs.freedesktop.org/show_bug.cgi?id=91228
Commit cdcdfe61b9 changed two `#pragma
message` to `#pragma error` in hb-unicode.cc, however MSVC uses #error,
just like the #else branch. `#pragma error` is an unknown pragma so
MSVC does not fail the build because of it, which I believe was the
intention of that commit.
If it's meant to be an #error, then the #ifdef for _MSC_VER can be
removed entirely.
Separate the loops for the two cases of replacing with space
and deleting. For deleting, use the out-buffer machinery.
Needed for upcoming cluster merge fix.
Applying unary minus operator to unsigned int causes the following
warning on MSVS:
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Based on patch from Koji Ishi.
Fixes https://github.com/behdad/harfbuzz/pull/110
Previously, when creating an object from inert inputs (eg:
"hb_font_create(hb_face_get_empty())") we returned the inert
empty object. This is not helpful as there are legitimate
usecases to do that.
We now never return the inert object unless allocation failed.
Tests are revised to reflect.
s/atomic_int/atomic_int_impl/ and s/atomic_ptr/atomic_ptr_impl/
to bring it in par with hb_mutex_impl_t, then re-introduce
hb_atomic_int_t as a wrapper around hb_atomic_int_impl_t.
In hb_reference_count_t, make it clear the non-atomic get and set
are intentional due to nature of the cases they are used in
(comparison to -1 and the debug output/tracing).
In hb-coretext, when we were using scratch buffer for book-keeping,
a reverse_range() caused by the notdef-insertion loop could mess up
our log_clusters. Ouch!
Before, the IntType::cmp functions providing this and was truncating
the hb_codepoint_t to 16bits before comparison. I have no idea how
this was never discovered, and I'm too lazy to try to reproduce this
with Pango (which uses non-16bit codepoint numbers for missing glyphs).
This makes a lot of code safer. We only try modifying the object in one
place, after making sure it's safe to do so. So, do a const_cast<> in
that one place...
Currently:
- Initializing skippy is very expensive,
- Our lookup accelerator (using set-digests) can be very ineffecite,
As such, we end up many times initializing skippy but then failing
coverage check. Reordering fixes that.
When, later, we fix our accelerator to have truly small false-positive
rate (for example by using the frozen-sets), then we might want to
reorder these checks such that we wouldn't calculate coverage number
if skippy is going to fail.
This shows a 5% speedup with Roboto already.
I experimented with replacing use of hb_set_digest_t with this new
hb_frozen_set_t, hoping to get a huge speedup for busy lookups
(like kern lookup in Roboto), but I only got 6% speendup in Roboto
and 4% in NotoNastaliqUrduDraft :(.
This code is C++ only. There isn't a single C++ compiler that fails to
understand the "inline" keyword, since it's required by C++98. Any
compiler older than C++98 is likely to choke on the template usage
further down, so this isn't necessary.
Moreover, the C++ standard says you cannot define macros.
[lib.macro.names] says "Nor shall such a translation unit define macros
for names lexically identical to keywords." -- technically, it's a
promise that the Standard Library headers won't do it, the wording means
that the entire translation unit won't do it, which implies no source
can do it.
MSVC complains about it:
fatal error C1189: #error : The C++ Standard Library forbids macroizing
keywords. Enable warning C4005 to find the forbidden macro.
Author: Thiago Macieira <thiago.macieira@intel.com>
This is by no ways to promote non-Unicode encodings. This is an entry
point that takes Unicode codepoints that happen to all be the first
256 characters and hence fit in 8bit strings. This is useful eg in Chrome
where strings that can fit in 8bit are implemented that way, and this
avoids copying into UTF-8 or UTF-16.
Perhaps we should rename this to hb_buffer_add_codepoints8(). I'm also
curious if anyone would be really interested in hb_buffer_add_codepoints16().
Please discuss!
Roboto has glyphs (like 'F') that have 200 kerning pairs.
Add a handcoded bsearch instead of previous linear search.
This doesn't show much speedup though, apparently we spend the
bulk of the time somewhere before here.
For discussion see:
http://lists.freedesktop.org/archives/harfbuzz/2012-April/001905.html
Over time we have had added NO_HINTING all over the place in hb-ft. Finish it off.
Not setting ppem on hb-font disables get_contour_point() calls which is good anyway.
See comments in the commit.
When I originally wrote hb-ft, FreeType objects did not support reference
counting. As such, hb_ft_face_create() and hb_ft_font_create() had a
"destroy" callback and client was responsible for making sure FT_Face is
kept around as long as the hb-font/face are alive.
However, since this was not clearly documented, some clienets didn't
correctly did that. In particular, some clients assumed that it's safe
to destroy FT_Face and then hb_face_t. This, indeed, used to work, until
45fd9424c7, which make face destroy access
font tables.
Now, I fixed that issue in 395b35903e since
the access was not needed, but the problem remains that not all clients
handle this correctly. See:
https://bugs.freedesktop.org/show_bug.cgi?id=86300
Fortunately, FT_Reference_Face() was added to FreeType in 2010, and so we
can use it now. Originally I wanted to change hb_ft_face_create() and
hb_ft_font_create() to reference the face if destroy==NULL was passed in.
That would improve pretty much all clients, with little undesired effects.
Except that FreeType itself, when compiled with HarfBuzz support, calls
hb_ft_font_create() with destroy==NULL and saves the resulting hb-font on
the ft-face (why does it not free it immediately?). Making hb-face
reference ft-face causes a cycling reference there. At least, that's my
current understanding.
At any rate, a cleaner approach, even if it means all clients will need a
change, is to introduce brand new API. Which this commit does.
Some comments added to hb-ft.h, hoping to make future clients make better
choices.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=75299
"Fixes" https://bugs.freedesktop.org/show_bug.cgi?id=86300
Based on discussion someone else who had a similar issue, most probably
the user is releasing FT_Face before destructing hb_face_t / hb_font_t.
While that's a client bug, and while we can (and should) use FreeType
refcounting to help avoid that, it happens that we were accessing
the table when we didn't really have to. Avoid that.
Fail if blob start plus length overflows; or if blob length
is greater than 2GB. It takes a while for fonts to get to that
size. In the mean time, it protects against bugs like this:
http://www.icu-project.org/trac/ticket/11450
Also avoids some weird issues with 32bit vs 64bit systems
as we accept length as unsigned int. As such, a length of
-1 will cause overflow on 32bit machines, but happily
accepted on a 64bit machine. Avoid that.
In Oriya, a ZWJ/ZWNJ might be added before candrabindu to encourage
or stop ligation of the candrabindu. This is clearly specified in
the Unicode section on Oriya. Allow it there. Note that Uniscribe
doesn't allow this.
Micro tests added using Noto Sans Oriya draft.
No changes in numbers. Currently at:
BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951190 out of 951913 tests passed. 723 failed (0.0759523%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048147 out of 1048334 tests passed. 187 failed (0.0178378%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
Otherwise, we might process a lookup thousands of times, with no
benefit. This pathological case was hit by Noto Nastaliq Urdu Draft
in Firefox's code to determine whether space glyph is involved in
any GSUB/GPOS rules. A test page is at http://behdad.org/urdu
See:
https://bugzilla.mozilla.org/show_bug.cgi?id=1090869
Currently doesn't work though, we detect font fallback. Apparently
matching on ct_font is not safe for this. Looks like commit
25f4fb9b56 wasn't enough after all.
After 763e5466c0, one doesn't
need to set flags for different pieces of text. The flags now
are something the client sets up once, depending on how it
actually uses the buffer. As such, don't clear it in
clear_contents().
Tests updated.
We can't really resize buffer and continue in this shaper as we are
using the scratch buffer for string_ref and log_cluster. Restructure
shaper to retry from (almost) scratch.