Commit Graph

232 Commits

Author SHA1 Message Date
Garret Rieger 8f9f0c494b [subset] Enforce cmap12 group ordering constraints in collect_mapping.
Fixes fuzzer issue: https://oss-fuzz.com/testcase-detail/6365271012540416
2022-05-10 12:15:09 -06:00
Behdad Esfahbod f10ddb8dd8 [cmap] Use -1 as Unicode sentinel, not U+FFFF in Format12 serialize 2022-05-05 11:21:24 -06:00
Behdad Esfahbod 8a19968c8b [cmap] Use iterator bool operator 2022-05-05 11:17:23 -06:00
Behdad Esfahbod 052812b6ba
Merge pull request #3561 from googlefonts/cmap_opt
[subset] Further cmap subsetting speed optimizations
2022-05-04 15:38:30 -06:00
Garret Rieger f0c04114bc [subset] Embed unicode to gid list vector in subset plan. 2022-05-03 22:02:59 +00:00
Behdad Esfahbod 3fff2e9182 [perf/benchmark-font] Cosmetic 2022-05-02 16:42:10 -06:00
Behdad Esfahbod 307d2d8bb6 [cmap] Sprinkle some 'unlikely's 2022-05-02 16:30:22 -06:00
Garret Rieger 088133d939 [subset] cache cp to new gid list in subset plan.
This avoids having to recompute the ordered list multiple times during cmap generation.
2022-05-02 21:29:16 +00:00
Garret Rieger 6922a2561f [subset] Change serialize_rangeoffset_glyid back to using iterator. 2022-04-29 23:30:32 +00:00
Garret Rieger c66fd50c26 [subset] in cmap4 serialization save cp to gid iter to memory.
Iterator accesses are slow and it's iterated multiple times.
2022-04-29 23:18:53 +00:00
Garret Rieger 17b98563dc [subset] In cmap4 serialization reduce unnessecary calls into the iterator.
Gives ~20% speedup for large subsets.
2022-04-29 22:49:02 +00:00
Garret Rieger 5e241094bf [subset] In unicodes cache cleanup if set insert fails. 2022-04-29 22:45:16 +00:00
Garret Rieger a424a92ce5 [subset] s/void */intptr_t. 2022-04-29 22:14:03 +00:00
Garret Rieger aad67f5629 [subset] cache results of collect_unicodes. 2022-04-29 22:05:34 +00:00
Garret Rieger b4236b7de6 [subset] Optimize Cmap4 collect_unicodes.
Use set add_range() instead of individual add() calls.
2022-04-29 19:22:00 +00:00
Behdad Esfahbod f41945e313 [cmap] In collect_unicodes() of format 12/13, limit to max Unicode
Fixes fuzzer timeout:
https://oss-fuzz.com/testcase-detail/5062368881672192
2022-03-21 18:24:30 -06:00
Behdad Esfahbod ac1bb3e39e [machinery] Move accelerators to constructor/destructor 2022-01-20 12:10:05 -07:00
Behdad Esfahbod e062376ef1 [machinery] Make accelerator lazy-loader call Xinit/Xfini
Instead of init/fini. To isolate those functions. To be turned into
constructor/destructors, ideally one per commit (after some SFINAE
foo.)
2022-01-19 17:09:34 -07:00
Behdad Esfahbod 8a69e00639 [meta] Use std::addressof() instead of hb_addressof() 2022-01-13 16:17:34 -07:00
Garret Rieger 1d9ef3a75a [subset] Actually fix end_cp unitialized warning. 2021-12-01 11:53:10 -07:00
Garret Rieger d8635dfe5a [subset] Fix warning about uninitialized use of end_cp. 2021-12-01 11:17:46 -07:00
Garret Rieger 95329081c2 [subset] further optimize cmap4 packing. 2021-11-28 07:47:49 -07:00
Garret Rieger d9660fd58a [subset] Make cmap4 packing more optimal.
The current CMAP4 implementation uses whatever the current codepoint ranges are and then encodes them as indivudal glyph ids or as a delta if possible. However, it's often possible to save bytes by splitting up existing ranges and encoding parts of them using deltas where the cost of splitting the range is less than encoding each glyph individual.
2021-11-26 13:21:50 -07:00
Behdad Esfahbod c852b86841 Rename HBGlyphID to HBGlyphID16 2021-09-19 16:30:12 -04:00
Garret Rieger 2bd911b8b4 [subset] handle cmap4 overflows.
If a cmap4 subtable overflows during serialization drop it and the corresponding EncodingRecord. Don't drop the corresponding cmap12 table if it would have otherwise been removed.
2021-09-02 14:43:17 -06:00
Garret Rieger b9a176e268
[subset] speedup cmap4 subsetting for large codepoint counts. (#3178)
glyphIdArray generation implementation was O(n^2). Refactored to use a hashmap to reduce complexity. After the change subset time for a 22k codepoint subset went from 7s to 0.7s.
2021-08-29 10:33:12 -06:00
Garret Rieger 2c024dc3cb [subset] prune redundant cmap12 subtables.
If the post subset cmap12 table is equivalent to another cmap subtable don't include the 12 table in the final subset. Matches change https://github.com/fonttools/fonttools/pull/2146 from fontTools.
2021-08-04 17:36:24 -06:00
Behdad Esfahbod f0a1892ff9 [serialize] Remove unnecessary pointer indirection 2021-07-28 17:36:22 -06:00
Garret Rieger 9aa0ecef3f [subset] de-duplicate the logic that finds unicodes corresponding to requested glyphs.
Move the logic into subset planning and then re-use the results in cmap and OS2 subsetting. Removes depedency on cmap from os2.
2021-07-14 17:31:47 -07:00
Behdad Esfahbod 092094f705 Use as_array() and range loops in a few places 2021-04-01 16:02:54 -06:00
Behdad Esfahbod 4dba749d83 Add SortedArray{16,32}Of<> 2021-03-31 16:09:39 -06:00
Behdad Esfahbod ad28f973f3 Rename offset types to be explicit about their size
Add Offset16To<>, Offset24To<>, and Offset32To<> for most use-cases.
2021-03-31 13:00:07 -06:00
Garret Rieger b14475d2ae [subset] further changes to serializer error handling.
- Rename enum type and enum members.
- in_errors() now returns true for any error having been set. hb-subset now looks for offset overflow only errors to divert to repacker.
- Added INT_OVERFLOW and ARRAY_OVERFLOW enum values.
2021-03-18 10:51:26 -07:00
Garret Rieger 73ed59f7a6 [subset] store errors in the serializer as a flag set.
Make check_assign/check_equal specify the type of error to set.
2021-03-17 15:58:34 -07:00
Behdad Esfahbod 6d94194497 Use auto in range-for-loop more 2021-02-19 17:10:06 -07:00
Garret Rieger 18ab8029d5 [ENOMEM] check vector status in cmap subsetting. 2020-08-02 00:30:17 +04:30
Ebrahim Byagowi 5a7cc7fd8b minor spacing tweak 2020-07-29 08:33:38 +04:30
Ebrahim Byagowi d0e2addd43 minor 2020-07-18 22:16:02 +04:30
Qunxin Liu 8e5bc535d1 [subset] call collect_mapping only when --gids option is used.
collect_mapping is time consuming as it iterates all codepoints in all
cmap subtables, only trigger it when necessary
2020-07-16 11:25:53 -07:00
Qunxin Liu 10d6605bbe [subset] don't use << operator in collect_mapping 2020-05-15 11:04:59 -07:00
Qunxin Liu b2a965df5e [subset] Add support for "--gids" option
cmap subsetting now retains entries associated with any glyph ids explicitly requested
2020-05-11 15:28:58 -07:00
Qunxin Liu e53c44e326 [subset] temporarily revert previous cmap commit
Required in https://github.com/harfbuzz/harfbuzz/issues/2356
2020-04-25 12:21:22 +04:30
Ebrahim Byagowi 08428a15c3 minor, spacing 2020-04-24 23:45:17 +04:30
Ebrahim Byagowi 2dda6dd744 minor, tweak spacing
turn 8 spaces to tab, add space before Null/Crap
2020-04-20 16:18:29 +04:30
Ebrahim Byagowi a224f4179f
Turn more of simple dagger chains to foreach
Less noise, as was agreed before and applied 385741d also
2020-03-13 08:33:34 +03:30
Ebrahim Byagowi 07acd1a042 [subset] Rename src_base args to base to match sanitize methods
So it will become easier to follow that serialize methods signatures should
match with their sanitize methods counterparts.
2020-03-08 23:39:26 +03:30
ariza 188a0a47c2 removed default base; replaced w/ bias if required 2020-03-08 22:59:43 +03:30
Michiharu Ariza 5ab50eebd7
collect_unicodes() with clamp, calling add_range()
Use add_range instead an inner loop, clamp its input number by
number of glyphs a face has.

Even the face cmap12 and 13 have 32-bit hb_codepoint_t, which is here
used to make timeout, face's maxp has 16-bit gid limitation at least for now,
using that makes sure we both fix and the timeout and don't need to change
much things here also in order to support 32-bit gids also someday.

Fixes #2204
2020-02-29 13:02:29 +03:30
Ebrahim Byagowi e90213868b Revert "collect_unicodes() to check gid < num_glyphs with cmap 12"
Didn't fix the case actually, making bots to fail.

This reverts commit 15b43a4104.
2020-02-28 21:24:51 +03:30
Michiharu Ariza 15b43a4104
collect_unicodes() to check gid < num_glyphs with cmap 12
fixes #2204
2020-02-28 20:15:39 +03:30