As written at:
https://lists.freedesktop.org/archives/fontconfig/2017-June/005929.html
I think FcCharSetFreezeOrig() and FcCharSetFindFrozen() should use the %
operator instead of & when computing the bucket index for
freezer->orig_hash_table, otherwise at most 8 buckets among the 67
available (FC_CHAR_SET_HASH_SIZE) are used.
Another way would be to change FC_CHAR_SET_HASH_SIZE to be of the form
2**n -1 (i.e., a power of two minus one). In such a case, the & and %
operators would be equivalent.
Before this commit, FcCharSetHash() repeatedly used the address of the
'numbers' array of an FcCharSet to compute the FcCharSet hash, instead
of the value of each array element. This is not good for even spreading
of the FcCharSet objects among the various buckets of the hash table
(and should thus reduce performance). This bug appears to have been
mistakenly introduced in commit
cd2ec1a940 (June 2005).
Previous format was unusable. New format is ranges of hex values.
To choose space character and Latin capital letters for example:
$ fc-pattern ':charset=20 41-5a'
Pattern has 1 elts (size 16)
charset:
0000: 00000000 00000001 07fffffe 00000000 00000000 00000000 00000000 00000000
(s)
FcTypeVoid is likely to happen when 'lang' and 'charset'
is deleted by 'delete' or 'delete_all' mode in edit.
Without this change, any modification on them are simply
ignored.
This is useful to make a lot of changes, particularly
when one wants to add a few and delete a lot say.
Last night in between my dreams I also noticed that we support Unicode
values up to 0x01000000 and not 0x00100000 which I thought before.
This covers the entire Unicode range.
To only work on writable charsets. Also, return a bool indicating whether
the merge changed the charset.
Also changes the implementation of FcCharSetMerge and FcCharSetIsSubset
The current behaviour of FcSortWalk() is to create a new FcCharSet on
each iteration that is the union of the previous iteration with the next
FcCharSet in the font set. This causes the existing FcCharSet to be
reproduced in its entirety and then allocates fresh leaves for the new
FcCharSet. In essence the number of allocations is quadratic wrt the
number of fonts required.
By introducing a new method for merging a new FcCharSet with an existing
one we can change the behaviour to be effectively linear with the number
of fonts - allocating no more leaves than necessary to cover all the
fonts in the set.
For example, profiling 'gedit UTF-8-demo.txt'
Allocator nAllocs nBytes
Before:
FcCharSetFindLeafCreate 62886 2012352
FcCharSetPutLeaf 9361 11441108
After:
FcCharSetFindLeafCreate 1940 62080
FcCharSetPutLeaf 281 190336
The savings are even more significant for applications like firefox-3.0b5
which need to switch between large number of fonts.
Before:
FcCharSetFindLeafCreate 4461192 142758144
FcCharSetPutLeaf 1124536 451574172
After:
FcCharSetFindLeafCreate 80359 2571488
FcCharSetPutLeaf 18940 9720522
Out of interest, the next most frequent allocations are
FcPatternObjectAddWithBinding 526029 10520580
tt_face_load_eblc 42103 2529892
Charset hashing actually use the value of the leaf pointers, which is
clearly wrong, especially now that charsets are not shared across multiple
font directories.
Using a simple shell script that processes the public headers, two header
files are constructed that map public symbols to hidden internal aliases
avoiding the assocated PLT entry for referring to a public symbol.
A few mistakes in the FcPrivate/FcPublic annotations were also discovered
through this process
Caches contain patterns and character sets which are reference counted and
visible to applications. Reference count the underlying cache object so that
it stays around until all reference objects are no longer in use.
This is less efficient than just leaving all caches around forever, but does
avoid eternal size increases in case applications ever bother to actually
look for changes in the font configuration.
Borrowing header stuff written for cairo, fontconfig now exposes in the
shared library only the symbols which are included in the public header
files. All private symbols are hidden using suitable compiler directives.
A few new public functions were required for the fontconfig utility programs
(fc-cat and fc-cache) so those were added, bumping the .so minor version number
in the process.
Charset freezer api now uses allocated object. Also required minor fixes to
charset freezer code to remove assumption that all input charsets are
persistant.
Instead of passing directory information around in separate variables,
collect it all in an FcCache structure. Numerous internal and tool
interfaces changed as a result of this.
Charsets are now pre-frozen before being serialized. This causes them to
share across multiple fonts in the same cache.
Validate cache contents and skip broken caches, looking down cache path for
valid ones.
Every time a directory is scanned, it will be written to a cache file if
possible, so fc-cache doesn't need to re-write the cache file. This makes
detecting when the cache was generated a bit tricky, so we guess that if the
cache wasn't valid before running and is valid afterwards, the cache file
was written.
Also, allow empty charsets to be serialized with null leaves/numbers.
Eliminate a leak in FcEdit by switching to FcObject sooner.
Call FcFini from fc-match to make valgrind happy.
FcCharSetSerialize was computing the offset to the unserialized leaf,
which left it pointing at random data when the cache was reloaded.
fc-cat has been updated to work with the new cache structure.
Various debug messages extended to help diagnose serialization errors.
Replace all of the bank/id pairs with simple offsets, recode several
data structures to always use offsets inside the library to avoid
conditional paths. Exposed data structures use pointers to hold offsets,
setting the low bit to distinguish between offset and pointer.
Use offset-based data structures for lang charset encodings; eliminates
separate data structure format for that file.
Much testing will be needed; offsets are likely not detected everywhere in
the library yet.
Fix missing FcCacheBankToIndex in FcCharSetInsertLeaf. Declare extern for
static arrays as arrays, not pointers. (Part of the fix for 'fonts
don't have en' issue after Euro patch.)
(I forgot to commit the ChangeLog last time.)
reviewed by: plam
added by the new ALIGN macro. Fix alignment problems on ia64 and s390
by bumping up block_ptr appropriately. (Earlier version by Andreas
Schwab).
Use sysconf to determine proper PAGESIZE value; this appears to be
POSIX-compliant. (reported by Andreas Schwab)
reviewed by: plam
and distribute bytes for each directory from a single malloc for that
directory. Store pointers as differences between the data pointed to
and the pointer's address (s_off = s - v). Don't serialize data
structures that never actually get serialized. Separate strings used
for keys from strings used for values (in FcPatternElt and FcValue,
respectively). Bump FC_CACHE_VERSION to 2.
cache. Add *Read and *Write procedures which mmap in and write out the
fontconfig data structures to disk. Currently, create cache in /tmp,
with different sections for each architecture (as returned by uname's
.machine field. Run the fc-cache binary to create a new cache file;
fontconfig then uses this cache file on subsequent runs, saving lots of
memory. Also fixes a few bugs and leaks.