Commit Graph

210 Commits

Author SHA1 Message Date
Garret Rieger d9660fd58a [subset] Make cmap4 packing more optimal.
The current CMAP4 implementation uses whatever the current codepoint ranges are and then encodes them as indivudal glyph ids or as a delta if possible. However, it's often possible to save bytes by splitting up existing ranges and encoding parts of them using deltas where the cost of splitting the range is less than encoding each glyph individual.
2021-11-26 13:21:50 -07:00
Behdad Esfahbod c852b86841 Rename HBGlyphID to HBGlyphID16 2021-09-19 16:30:12 -04:00
Garret Rieger 2bd911b8b4 [subset] handle cmap4 overflows.
If a cmap4 subtable overflows during serialization drop it and the corresponding EncodingRecord. Don't drop the corresponding cmap12 table if it would have otherwise been removed.
2021-09-02 14:43:17 -06:00
Garret Rieger b9a176e268
[subset] speedup cmap4 subsetting for large codepoint counts. (#3178)
glyphIdArray generation implementation was O(n^2). Refactored to use a hashmap to reduce complexity. After the change subset time for a 22k codepoint subset went from 7s to 0.7s.
2021-08-29 10:33:12 -06:00
Garret Rieger 2c024dc3cb [subset] prune redundant cmap12 subtables.
If the post subset cmap12 table is equivalent to another cmap subtable don't include the 12 table in the final subset. Matches change https://github.com/fonttools/fonttools/pull/2146 from fontTools.
2021-08-04 17:36:24 -06:00
Behdad Esfahbod f0a1892ff9 [serialize] Remove unnecessary pointer indirection 2021-07-28 17:36:22 -06:00
Garret Rieger 9aa0ecef3f [subset] de-duplicate the logic that finds unicodes corresponding to requested glyphs.
Move the logic into subset planning and then re-use the results in cmap and OS2 subsetting. Removes depedency on cmap from os2.
2021-07-14 17:31:47 -07:00
Behdad Esfahbod 092094f705 Use as_array() and range loops in a few places 2021-04-01 16:02:54 -06:00
Behdad Esfahbod 4dba749d83 Add SortedArray{16,32}Of<> 2021-03-31 16:09:39 -06:00
Behdad Esfahbod ad28f973f3 Rename offset types to be explicit about their size
Add Offset16To<>, Offset24To<>, and Offset32To<> for most use-cases.
2021-03-31 13:00:07 -06:00
Garret Rieger b14475d2ae [subset] further changes to serializer error handling.
- Rename enum type and enum members.
- in_errors() now returns true for any error having been set. hb-subset now looks for offset overflow only errors to divert to repacker.
- Added INT_OVERFLOW and ARRAY_OVERFLOW enum values.
2021-03-18 10:51:26 -07:00
Garret Rieger 73ed59f7a6 [subset] store errors in the serializer as a flag set.
Make check_assign/check_equal specify the type of error to set.
2021-03-17 15:58:34 -07:00
Behdad Esfahbod 6d94194497 Use auto in range-for-loop more 2021-02-19 17:10:06 -07:00
Garret Rieger 18ab8029d5 [ENOMEM] check vector status in cmap subsetting. 2020-08-02 00:30:17 +04:30
Ebrahim Byagowi 5a7cc7fd8b minor spacing tweak 2020-07-29 08:33:38 +04:30
Ebrahim Byagowi d0e2addd43 minor 2020-07-18 22:16:02 +04:30
Qunxin Liu 8e5bc535d1 [subset] call collect_mapping only when --gids option is used.
collect_mapping is time consuming as it iterates all codepoints in all
cmap subtables, only trigger it when necessary
2020-07-16 11:25:53 -07:00
Qunxin Liu 10d6605bbe [subset] don't use << operator in collect_mapping 2020-05-15 11:04:59 -07:00
Qunxin Liu b2a965df5e [subset] Add support for "--gids" option
cmap subsetting now retains entries associated with any glyph ids explicitly requested
2020-05-11 15:28:58 -07:00
Qunxin Liu e53c44e326 [subset] temporarily revert previous cmap commit
Required in https://github.com/harfbuzz/harfbuzz/issues/2356
2020-04-25 12:21:22 +04:30
Ebrahim Byagowi 08428a15c3 minor, spacing 2020-04-24 23:45:17 +04:30
Ebrahim Byagowi 2dda6dd744 minor, tweak spacing
turn 8 spaces to tab, add space before Null/Crap
2020-04-20 16:18:29 +04:30
Ebrahim Byagowi a224f4179f
Turn more of simple dagger chains to foreach
Less noise, as was agreed before and applied 385741d also
2020-03-13 08:33:34 +03:30
Ebrahim Byagowi 07acd1a042 [subset] Rename src_base args to base to match sanitize methods
So it will become easier to follow that serialize methods signatures should
match with their sanitize methods counterparts.
2020-03-08 23:39:26 +03:30
ariza 188a0a47c2 removed default base; replaced w/ bias if required 2020-03-08 22:59:43 +03:30
Michiharu Ariza 5ab50eebd7
collect_unicodes() with clamp, calling add_range()
Use add_range instead an inner loop, clamp its input number by
number of glyphs a face has.

Even the face cmap12 and 13 have 32-bit hb_codepoint_t, which is here
used to make timeout, face's maxp has 16-bit gid limitation at least for now,
using that makes sure we both fix and the timeout and don't need to change
much things here also in order to support 32-bit gids also someday.

Fixes #2204
2020-02-29 13:02:29 +03:30
Ebrahim Byagowi e90213868b Revert "collect_unicodes() to check gid < num_glyphs with cmap 12"
Didn't fix the case actually, making bots to fail.

This reverts commit 15b43a4104.
2020-02-28 21:24:51 +03:30
Michiharu Ariza 15b43a4104
collect_unicodes() to check gid < num_glyphs with cmap 12
fixes #2204
2020-02-28 20:15:39 +03:30
Garret Rieger 50129b03a1 Add a reverse () call to hb_array_t. 2020-02-26 11:09:54 -08:00
Garret Rieger 38c6598c1c Switch to C style comments. 2020-02-26 11:09:54 -08:00
Garret Rieger 52b6e0baa0 When serializing cmap14 order the offsets from smallest to largest.
Current versions of OTS fail fonts with cmap 14's who's last offset does not point to the a block at the end of the table.
2020-02-26 11:09:54 -08:00
ckitagawa 03f778cf3c [cmap] remove dead code 2020-02-05 18:00:39 +03:30
Ebrahim Byagowi a7f694d4b0 Merge branch 'subset_cblc' into master 2020-02-05 16:31:21 +03:30
ckitagawa e128f80278 parent 777ba47b50
author ckitagawa <ckitagawa@chromium.org> 1579631743 -0500
committer ckitagawa <ckitagawa@chromium.org> 1580506176 -0500

[subset] Add CBLC support
2020-01-31 16:37:30 -05:00
Qunxin Liu b6a8f5e63c [subset] CMAP table subsetting fix
Not all codepoints smaller than 0xFFFF go to cmap4 table.
Only subset codepoints existing in each table.
This will also make harfbuzz consistent with fontTools' behavior
2020-01-31 10:49:44 -08:00
Qunxin Liu c370da45ff [subset] Cmap table: remove encodingRecord entry for empty cmap4 subtable 2020-01-23 17:23:55 -08:00
Qunxin Liu 1db2c1d0da fix for cmap4 and OS_2 subsetting: maximum character code allowed is 0xFFFF 2020-01-09 10:00:32 -08:00
Behdad Esfahbod 6a60ca117c [algs] Fold last other bsearch() in
Now truly have only one bsearch implementation.
2019-12-10 12:32:59 -06:00
Ebrahim Byagowi 486754a888 [serialize] Extract iterable copy, copy_all 2019-10-31 13:31:11 -07:00
Khaled Hosny dd288840d6 [cmap] Check GID before adding ranges in format 4 & 12
Fixes https://github.com/harfbuzz/harfbuzz/issues/2031
2019-10-29 02:09:13 +02:00
Behdad Esfahbod 03028a5fe5 Revert "Don't include codepoint 0 in the results of collect_unicodes."
This reverts commit 14ad96ffbf.

This was wrong.  My bad!

https://github.com/harfbuzz/harfbuzz/issues/2031
2019-10-28 13:46:56 -07:00
Garret Rieger 14ad96ffbf Don't include codepoint 0 in the results of collect_unicodes.
It is always assumed to be the notdef glyph.
2019-10-28 12:56:04 -07:00
Ebrahim Byagowi 0558413f27 Minor, tweak spaces 2019-10-01 13:50:11 +03:30
Ebrahim Byagowi 035ec3d1b4
[cmap] remove has_format14, minor format
fixes #1986
2019-09-23 20:51:43 +03:30
Ebrahim Byagowi 385741d565 [cmap] Turn hb_apply into foreach where possible 2019-09-21 15:33:02 +04:30
Ebrahim Byagowi 1023c2cc6d [cmap] minor 2019-09-21 15:33:02 +04:30
Ebrahim Byagowi ead46eefe3 minor, use internal API instead public hb_set_has 2019-09-21 15:33:02 +04:30
Ebrahim Byagowi d8af4e7701 [cmap] minor, turn 8 spaces to tab 2019-09-21 15:33:02 +04:30
Qunxin Liu 4315666283 [subset] updates according to review comments 2019-09-20 07:55:11 +09:00
Qunxin Liu 2583afa0eb [subset] subsetting cmap14 2019-09-20 07:55:11 +09:00