Commit Graph

222 Commits

Author SHA1 Message Date
Garret Rieger 17b98563dc [subset] In cmap4 serialization reduce unnessecary calls into the iterator.
Gives ~20% speedup for large subsets.
2022-04-29 22:49:02 +00:00
Garret Rieger 5e241094bf [subset] In unicodes cache cleanup if set insert fails. 2022-04-29 22:45:16 +00:00
Garret Rieger a424a92ce5 [subset] s/void */intptr_t. 2022-04-29 22:14:03 +00:00
Garret Rieger aad67f5629 [subset] cache results of collect_unicodes. 2022-04-29 22:05:34 +00:00
Garret Rieger b4236b7de6 [subset] Optimize Cmap4 collect_unicodes.
Use set add_range() instead of individual add() calls.
2022-04-29 19:22:00 +00:00
Behdad Esfahbod f41945e313 [cmap] In collect_unicodes() of format 12/13, limit to max Unicode
Fixes fuzzer timeout:
https://oss-fuzz.com/testcase-detail/5062368881672192
2022-03-21 18:24:30 -06:00
Behdad Esfahbod ac1bb3e39e [machinery] Move accelerators to constructor/destructor 2022-01-20 12:10:05 -07:00
Behdad Esfahbod e062376ef1 [machinery] Make accelerator lazy-loader call Xinit/Xfini
Instead of init/fini. To isolate those functions. To be turned into
constructor/destructors, ideally one per commit (after some SFINAE
foo.)
2022-01-19 17:09:34 -07:00
Behdad Esfahbod 8a69e00639 [meta] Use std::addressof() instead of hb_addressof() 2022-01-13 16:17:34 -07:00
Garret Rieger 1d9ef3a75a [subset] Actually fix end_cp unitialized warning. 2021-12-01 11:53:10 -07:00
Garret Rieger d8635dfe5a [subset] Fix warning about uninitialized use of end_cp. 2021-12-01 11:17:46 -07:00
Garret Rieger 95329081c2 [subset] further optimize cmap4 packing. 2021-11-28 07:47:49 -07:00
Garret Rieger d9660fd58a [subset] Make cmap4 packing more optimal.
The current CMAP4 implementation uses whatever the current codepoint ranges are and then encodes them as indivudal glyph ids or as a delta if possible. However, it's often possible to save bytes by splitting up existing ranges and encoding parts of them using deltas where the cost of splitting the range is less than encoding each glyph individual.
2021-11-26 13:21:50 -07:00
Behdad Esfahbod c852b86841 Rename HBGlyphID to HBGlyphID16 2021-09-19 16:30:12 -04:00
Garret Rieger 2bd911b8b4 [subset] handle cmap4 overflows.
If a cmap4 subtable overflows during serialization drop it and the corresponding EncodingRecord. Don't drop the corresponding cmap12 table if it would have otherwise been removed.
2021-09-02 14:43:17 -06:00
Garret Rieger b9a176e268
[subset] speedup cmap4 subsetting for large codepoint counts. (#3178)
glyphIdArray generation implementation was O(n^2). Refactored to use a hashmap to reduce complexity. After the change subset time for a 22k codepoint subset went from 7s to 0.7s.
2021-08-29 10:33:12 -06:00
Garret Rieger 2c024dc3cb [subset] prune redundant cmap12 subtables.
If the post subset cmap12 table is equivalent to another cmap subtable don't include the 12 table in the final subset. Matches change https://github.com/fonttools/fonttools/pull/2146 from fontTools.
2021-08-04 17:36:24 -06:00
Behdad Esfahbod f0a1892ff9 [serialize] Remove unnecessary pointer indirection 2021-07-28 17:36:22 -06:00
Garret Rieger 9aa0ecef3f [subset] de-duplicate the logic that finds unicodes corresponding to requested glyphs.
Move the logic into subset planning and then re-use the results in cmap and OS2 subsetting. Removes depedency on cmap from os2.
2021-07-14 17:31:47 -07:00
Behdad Esfahbod 092094f705 Use as_array() and range loops in a few places 2021-04-01 16:02:54 -06:00
Behdad Esfahbod 4dba749d83 Add SortedArray{16,32}Of<> 2021-03-31 16:09:39 -06:00
Behdad Esfahbod ad28f973f3 Rename offset types to be explicit about their size
Add Offset16To<>, Offset24To<>, and Offset32To<> for most use-cases.
2021-03-31 13:00:07 -06:00
Garret Rieger b14475d2ae [subset] further changes to serializer error handling.
- Rename enum type and enum members.
- in_errors() now returns true for any error having been set. hb-subset now looks for offset overflow only errors to divert to repacker.
- Added INT_OVERFLOW and ARRAY_OVERFLOW enum values.
2021-03-18 10:51:26 -07:00
Garret Rieger 73ed59f7a6 [subset] store errors in the serializer as a flag set.
Make check_assign/check_equal specify the type of error to set.
2021-03-17 15:58:34 -07:00
Behdad Esfahbod 6d94194497 Use auto in range-for-loop more 2021-02-19 17:10:06 -07:00
Garret Rieger 18ab8029d5 [ENOMEM] check vector status in cmap subsetting. 2020-08-02 00:30:17 +04:30
Ebrahim Byagowi 5a7cc7fd8b minor spacing tweak 2020-07-29 08:33:38 +04:30
Ebrahim Byagowi d0e2addd43 minor 2020-07-18 22:16:02 +04:30
Qunxin Liu 8e5bc535d1 [subset] call collect_mapping only when --gids option is used.
collect_mapping is time consuming as it iterates all codepoints in all
cmap subtables, only trigger it when necessary
2020-07-16 11:25:53 -07:00
Qunxin Liu 10d6605bbe [subset] don't use << operator in collect_mapping 2020-05-15 11:04:59 -07:00
Qunxin Liu b2a965df5e [subset] Add support for "--gids" option
cmap subsetting now retains entries associated with any glyph ids explicitly requested
2020-05-11 15:28:58 -07:00
Qunxin Liu e53c44e326 [subset] temporarily revert previous cmap commit
Required in https://github.com/harfbuzz/harfbuzz/issues/2356
2020-04-25 12:21:22 +04:30
Ebrahim Byagowi 08428a15c3 minor, spacing 2020-04-24 23:45:17 +04:30
Ebrahim Byagowi 2dda6dd744 minor, tweak spacing
turn 8 spaces to tab, add space before Null/Crap
2020-04-20 16:18:29 +04:30
Ebrahim Byagowi a224f4179f
Turn more of simple dagger chains to foreach
Less noise, as was agreed before and applied 385741d also
2020-03-13 08:33:34 +03:30
Ebrahim Byagowi 07acd1a042 [subset] Rename src_base args to base to match sanitize methods
So it will become easier to follow that serialize methods signatures should
match with their sanitize methods counterparts.
2020-03-08 23:39:26 +03:30
ariza 188a0a47c2 removed default base; replaced w/ bias if required 2020-03-08 22:59:43 +03:30
Michiharu Ariza 5ab50eebd7
collect_unicodes() with clamp, calling add_range()
Use add_range instead an inner loop, clamp its input number by
number of glyphs a face has.

Even the face cmap12 and 13 have 32-bit hb_codepoint_t, which is here
used to make timeout, face's maxp has 16-bit gid limitation at least for now,
using that makes sure we both fix and the timeout and don't need to change
much things here also in order to support 32-bit gids also someday.

Fixes #2204
2020-02-29 13:02:29 +03:30
Ebrahim Byagowi e90213868b Revert "collect_unicodes() to check gid < num_glyphs with cmap 12"
Didn't fix the case actually, making bots to fail.

This reverts commit 15b43a4104.
2020-02-28 21:24:51 +03:30
Michiharu Ariza 15b43a4104
collect_unicodes() to check gid < num_glyphs with cmap 12
fixes #2204
2020-02-28 20:15:39 +03:30
Garret Rieger 50129b03a1 Add a reverse () call to hb_array_t. 2020-02-26 11:09:54 -08:00
Garret Rieger 38c6598c1c Switch to C style comments. 2020-02-26 11:09:54 -08:00
Garret Rieger 52b6e0baa0 When serializing cmap14 order the offsets from smallest to largest.
Current versions of OTS fail fonts with cmap 14's who's last offset does not point to the a block at the end of the table.
2020-02-26 11:09:54 -08:00
ckitagawa 03f778cf3c [cmap] remove dead code 2020-02-05 18:00:39 +03:30
Ebrahim Byagowi a7f694d4b0 Merge branch 'subset_cblc' into master 2020-02-05 16:31:21 +03:30
ckitagawa e128f80278 parent 777ba47b50
author ckitagawa <ckitagawa@chromium.org> 1579631743 -0500
committer ckitagawa <ckitagawa@chromium.org> 1580506176 -0500

[subset] Add CBLC support
2020-01-31 16:37:30 -05:00
Qunxin Liu b6a8f5e63c [subset] CMAP table subsetting fix
Not all codepoints smaller than 0xFFFF go to cmap4 table.
Only subset codepoints existing in each table.
This will also make harfbuzz consistent with fontTools' behavior
2020-01-31 10:49:44 -08:00
Qunxin Liu c370da45ff [subset] Cmap table: remove encodingRecord entry for empty cmap4 subtable 2020-01-23 17:23:55 -08:00
Qunxin Liu 1db2c1d0da fix for cmap4 and OS_2 subsetting: maximum character code allowed is 0xFFFF 2020-01-09 10:00:32 -08:00
Behdad Esfahbod 6a60ca117c [algs] Fold last other bsearch() in
Now truly have only one bsearch implementation.
2019-12-10 12:32:59 -06:00