[repacker] update the repacker doc to reflect the current state.

This commit is contained in:
Garret Rieger 2022-09-23 20:06:57 +00:00 committed by Behdad Esfahbod
parent 8cd7d1c3fe
commit d5829b3ce2
1 changed files with 63 additions and 34 deletions

View File

@ -54,6 +54,12 @@ There's four key pieces to the harfbuzz approach:
to calculate the final position of each subtable and then check if any offsets to it will
overflow.
* Content Aware Preprocessing: if the overflow resolver is aware of the format of the underlying
tables (eg. GSUB, GPOS) then in some cases preprocessing can be done to increase the chance of
successfully packing the graph. For example for GSUB and GPOS we can preprocess the graph and
promote lookups to extension lookups (upgrades a 16 bit offset to 32 bits) or split large lookup
subtables into two or more pieces.
* Offset resolution strategies: given a particular occurrence of an overflow these strategies
modify the graph to attempt to resolve the overflow.
@ -64,6 +70,7 @@ def repack(graph):
graph.topological_sort()
if (graph.will_overflow())
preprocess(graph)
assign_spaces(graph)
graph.topological_sort()
@ -185,6 +192,37 @@ The assign_spaces() step in the high level algorithm is responsible for identify
subgraphs and assigning unique spaces to each one. More information on the space assignment can be
found in the next section.
# Graph Preprocessing
For certain table types we can preprocess and modify the graph structure to reduce the occurences
of overflows. Currently the repacker implements preprocessing only for GPOS and GSUB tables.
## GSUB/GPOS Table Splitting
The GSUB/GPOS preprocessor scans each lookup subtable and determines if the subtable's children are
so large that no overflow resolution is possible (for example a single subtable that exceeds 65kb
cannot be pointed over). When such cases are detected table splitting is invoked:
* The subtable is first analyzed to determine the smallest number of split points that will allow
for successful offset overflow resolution.
* Then the subtable in the graph representation is modified to actually perform the split at the
previously computed split points. At a high level splits are done by inserting new subtables
which contain a subset of the data of the original subtable and then shrinking the original subtable.
Table splitting must be aware of the underlying format of each subtable type and thus needs custom
code for each subtable type. Currently subtable splitting is only supported for GPOS subtable types.
## GSUB/GPOS Extension Lookup Promotion
In GSUB/GPOS tables lookups can be regular lookups which use 16 bit offsets to the children subtables
or extension lookups which use 32 bit offsets to the children subtables. If the sub graph of all
regular lookups is too large then it can be difficult to find an overflow free configuration. This
can be remedied by promoting one or more regular lookups to extension lookups.
During preprocessing the graph is scanned to determine the size of the subgraph of regular lookups.
If the graph is found to be too big then the analysis finds a set of lookups to promote to reduce
the subgraph size. Lastly the graph is modified to convert those lookups to extension lookups.
# Offset Resolution Strategies
@ -237,29 +275,20 @@ The harfbuzz repacker has tests defined using generic graphs: https://github.com
# Future Improvements
The above resolution strategies are not sufficient to resolve all overflows. For example consider
the case where a single subtable is 65k and the graph structure requires an offset to point over it.
Currently for GPOS tables the repacker implementation is sufficient to handle both subsetting and the
general case of font compilation repacking. However for GSUB the repacker is only sufficient for
subsetting related overflows. To enable general case repacking of GSUB, support for splitting of
GSUB subtables will need to be added. Other table types such as COLRv1 shouldn't require table
splitting due to the wide use of 24 bit offsets throughout the table.
The current harfbuzz implementation is suitable for the vast majority of subsetting related overflows.
Subsetting related overflows are typically easy to solve since all subsets are derived from a font
that was originally overflow free. A more general purpose version of the algorithm suitable for font
creation purposes will likely need some additional offset resolution strategies:
Beyond subtable splitting there are a couple of "nice to have" improvements, but these are not required
to support the general case:
* Extension demotion: currently extension promotion is supported but in some cases if the non-extension
subgraph is underfilled then packed size can be reduced by demoting extension lookups back to regular
lookups.
* Currently only children nodes are moved to resolve offsets. However, in many cases moving a parent
node closer to it's children will have less impact on the size of other offsets. Thus the algorithm
should use a heuristic (based on parent and child subtable sizes) to decide if the children's
priority should be increased or the parent's priority decreased.
* Many subtables can be split into two smaller subtables without impacting the overall functionality.
This should be done when an overflow is the result of a very large table which can't be moved
to avoid offsets pointing over it.
* Lookup subtables in GSUB/GPOS can be upgraded to extension lookups which uses a 32 bit offset.
Overflows from a Lookup subtable to it's child should be resolved by converting to an extension
lookup.
Once additional resolution strategies are added to the algorithm it's likely that we'll need to
switch to using a [backtracking algorithm](https://en.wikipedia.org/wiki/Backtracking) to explore
the various combinations of resolution strategies until a non-overflowing combination is found. This
will require the ability to restore the graph to an earlier state. It's likely that using a stack
of undoable resolution commands could be used to accomplish this.