Add the ability to create a subset plan which an be used to gather info on things like glyph mappings in the final subset. The plan can then be passed on to perform the subsetting operation.
cur_intersected_glyphs gets modified during recursion leading to incorrect filtering of sub tables in some cases. So don't use cur_intersected_glyphs. Instead just add an additional entry onto the parent_active_glyphs () stack.
Additionaly expands NotoNastaliqUrdu tests to include coverage of the issue from #3397.
Fixes https://oss-fuzz.com/testcase-detail/5549945449480192
In prune_langsys: move LangSys visited check up before any work is done for a LangSys. In this particular case the compare() method is responsible for the majority of the time spent and wasn't being guarded with a visisted check.
- When pos_glyphs is empty, use current full glyphs set as input for
subsequent recursive closure process
- Also increase max_lookup_visit_count to 35000 cause a real font file hit
previous limit 20000 and some lookups are dropped unexpectedly
Ligature subtables use virtual links to enforce an ordering constraint between the subtables and the coverage table. Unfortunately this has the sideeffect of prevent the subtables from being shared by another Ligature with a different coverage table since object equality compares all links real and virtual. This change makes virtual links stored separately from real links and updates the equality check to only check real links. If an object is de-duped any virtual links it has are merged into the object that replaces it.
The current CMAP4 implementation uses whatever the current codepoint ranges are and then encodes them as indivudal glyph ids or as a delta if possible. However, it's often possible to save bytes by splitting up existing ranges and encoding parts of them using deltas where the cost of splitting the range is less than encoding each glyph individual.
We implicitly require it for building ragel subproject. This new version
requirement should satisfied in both Fedora 33 and Debian bullseye, and
not be too cutting edge for us.
Though the spec said FeatureRecords are sorted alphabetically by feature
tag, there're font files with unsorted FeatureList. And harfbuzz is not
able to subset these files correctly because we use binary search in
finding featureRecords when collecting lookups. Also
find_duplicate_features needs to be updated to handle this.
In Windows 7 on Chrome if the coverage table comes before any of the LigatureSet or Ligature subtables the font won't load. This changes the packing order to always place the Coverage table last. Virtual links are used to ensure the repacker maintains the desired ordering.
Coincidentally fontTools also does the same thing (a3f988fbf6/Lib/fontTools/ttLib/tables/otTables.py (L1137)) to reduce overflows during packing.
ArrayOf.serialize_append allocates space for the new item, but ArrayOf.pop() does not recover the allocated space. So in the case where the revert path was entered the extra space added by serialize_append gets left in the serialization buffer. This moves the snapshot to before ArrayOf.serialize_append is called so that revert cleans up the buffer extend.
Test that both NFC and NFD input produces identical results for fonts
that used composed fonts internally (Amiri here) and fonts that
decompose internally (Noto Nastaliq Urdu here) and that for the former
composed forms are used.
See https://github.com/harfbuzz/harfbuzz/issues/3179
No idea why test names are underscorified but it it just makes calling
meson test testname harder than it should being not able to copy file
name directly.
Instead use inverted sets to handle requesting all features. Modifies feature collection in subset plan to intersect the set of requested features against the features in the font. This prevents iterating a fully filled feature tag set.
Fix for fuzzer timeout: https://oss-fuzz.com/testcase-detail/5001604901240832.
- Operation limit is per glyph, so 100,000 should still be far more than needed.
- Switches from for(...) to while(...) loop for iteration. for(...) calls it.end() which in this case triggers a complete iteration.
- Cache CompositeGlyph size in the iterator to avoid needing to recalculate it.
Fixes#2361. Stores tables in the builder in a hashmap so you end up with at most one copy of each table. Table serialization order is now based on tag sort order instead of order of insertion into the builder.
If the post subset cmap12 table is equivalent to another cmap subtable don't include the 12 table in the final subset. Matches change https://github.com/fonttools/fonttools/pull/2146 from fontTools.
Most of time the files are identical, so instead of comparing the TTX
dump we can check sha256 hashes of the files first and if they match, we
don’t have to check the TTX dumps at all, making the subset tests orders
of magnitude faster.
time meson test --suite=subset down from:
real 0m19.418s
user 0m38.171s
sys 0m3.587s
to:
real 0m3.102s
user 0m8.622s
sys 0m1.701s
The expected files have been replaced by hb-subset output so they are
bit-identical where FontTools output might not.
The generate-expected-outputs.py now compares the hb-subset output with
fontttols subset and errors of they don’t match.
time meson test --suite=subset down from:
real 0m22.822s
user 0m44.561s
sys 0m9.255s
to:
real 0m19.418s
user 0m38.171s
sys 0m3.587s
Does not seem to help much, but it is something.
Part of https://github.com/harfbuzz/harfbuzz/issues/3089
Speed-up subset tests by saving TTX dump of expected output instead of
generating it each time the tests are run.
Cuts down meson test --suite=subset on my system from:
real 0m38.977s
user 1m12.024s
sys 0m10.547s
to:
real 0m22.291s
user 0m44.548s
sys 0m9.221s
Part of https://github.com/harfbuzz/harfbuzz/issues/3089
Fixes https://github.com/harfbuzz/harfbuzz/issues/3017
Uses AdobeBlank2.ttf from:
https://github.com/adobe-fonts/adobe-blank-2
instead of a dummy empty font so that everything maps to GID 1 and
control code points are kept instead of being dropped because there is
not space glyph (otherwise we’d need to identify control code points
somehow when generating the expectations).
Currently COLRv1 spec is being changed so the subsetting implementation is out of sync. Disable subsetting by failing sanitization for COLRv1 tables and disable all colrv1 tests.
We are not interested in testing FreeType cmap support.
Fixes most format 4 tests. The remaining test seems to be peculiar, and
I can’t find any cmap implementation that produces the expected output.
In batch mode (which is used for testing) we are probably not interested
in splitting text into lines as we could have split the string into
different tests. This fixes a bunch of AOTS tests that use newlines as
input.
When running in batch mode, the quotes are not stripped by the shell and
end up in the feature string. This breaks one of the AOTS tests.
Alternatively, we can remove the quotes from the test files, not sure
which is less hacky, though!