Basically we trust the font cmap table now...
New numbers:
behdad:src 0$ time fc-scan ~/fonts/ > after-noloca
real 0m55.788s
user 0m15.836s
sys 0m17.008s
behdad:src 0$
behdad:src 0$ time fc-scan ~/fonts/ > after-noloca
real 0m24.794s
user 0m12.164s
sys 0m12.420s
Before this change it was:
behdad:src 130$ time fc-scan ~/fonts/ > after
real 0m24.825s
user 0m12.408s
sys 0m11.356s
Not any faster! I suppose most time is being spent in loading cmap and advances now.
I'll see about loading hmtx ourselves.
With I/O numbers. Before:
behdad:src 0$ \time fc-scan ~/fonts/ > after
11.66user 12.17system 0:24.03elapsed 99%CPU (0avgtext+0avgdata 487684maxresident)k
2320inputs+50480outputs (21major+11468549minor)pagefaults 0swaps
after:
behdad:src 130$ \time fc-scan ~/fonts/ > after-noloca
11.94user 11.99system 0:24.11elapsed 99%CPU (0avgtext+0avgdata 487704maxresident)k
16inputs+50688outputs (0major+11464386minor)pagefaults 0swaps
We are definitely doing a lot less I/O. Surprisingly less in fact. I don't get it.
Part of https://bugs.freedesktop.org/show_bug.cgi?id=64766#c47
This is the approach introduced in
https://bugs.freedesktop.org/show_bug.cgi?id=64766#c30
Testing it with 11GB worth of stuff, before/after:
behdad:src 130$ time fc-scan ~/fonts/ > before
real 2m18.428s
user 1m17.008s
sys 0m20.576s
behdad:src 0$ time fc-scan ~/fonts/ > after
real 1m12.130s
user 0m18.180s
sys 0m19.952s
Running the after case a second time is significantly faster:
behdad:src 130$ time fc-scan ~/fonts/ > after
real 0m24.825s
user 0m12.408s
sys 0m11.356s
Next I'm going to try to not even read loca...
As written at:
https://lists.freedesktop.org/archives/fontconfig/2017-June/005929.html
I think FcCharSetFreezeOrig() and FcCharSetFindFrozen() should use the %
operator instead of & when computing the bucket index for
freezer->orig_hash_table, otherwise at most 8 buckets among the 67
available (FC_CHAR_SET_HASH_SIZE) are used.
Another way would be to change FC_CHAR_SET_HASH_SIZE to be of the form
2**n -1 (i.e., a power of two minus one). In such a case, the & and %
operators would be equivalent.
In fcLangCountrySets, it may happen that two charsets for the same
language but different territories are found in different FcChar32
"buckets" (different "columns" on the same line). This is currently the
case for the following pairs:
mn-cn and mn-mn
pap-an and pap-aw
The FcLangSetCompare() code so far used to return FcLangDifferentLang
instead of FcLangDifferentTerritory when comparing:
an FcLangSet containing only mn-cn with one containing only mn-mn
or
an FcLangSet containing only pap-an with one containing only pap-aw
This commit fixes this problem.
FcLangSetIndex() indicates "not found" with a non-negative return value.
Return value 0 doesn't imply "not found", it rather means "language
found at index 0 in fcLangCharSets".
This commit fixes a bug that can be reproduced like this:
- remove all languages starting with 'a' in fc-lang/Makefile.am (in
ORTH's definition);
- rebuild fontconfig with this change (-> new fc-lang/fclang.h);
- create an FcLangSet 'ls1' that contains at least the first language
from fcLangCharSets (i.e., the first *remaining* in lexicographic
order); let's assume it is "ba" for the sake of this description;
- create an FcLangSet 'ls2' that only contains the language "aa" (any
language starting with 'a' should work as well);
- check the return value of FcLangSetContains(ls1, ls2);
The expected return value is FcFalse, however it is FcTrue if you use
the code before this commit.
What happens is that FcLangSetIndex() returns 0, because this is the
index of the first slot after the not-found language "aa" in
fcLangCharSets (since we removed all languages starting with 'a').
However, this index happens to be non-negative, therefore
FcLangSetContainsLang() mistakenly infers that the language "aa" was
found in fcLangCharSets, and thus calls FcLangSetBitGet(ls1, 0), which
returns FcTrue since we've put the first remaining language "ba" in the
'ls1' language set.
The "return -low;" statement previously in FcLangSetIndex() was
inconsistent with the final return statement. "return -(low+1);" fixes
this inconsistency as well as the incorrect behavior described above.
Before this commit, FcCharSetHash() repeatedly used the address of the
'numbers' array of an FcCharSet to compute the FcCharSet hash, instead
of the value of each array element. This is not good for even spreading
of the FcCharSet objects among the various buckets of the hash table
(and should thus reduce performance). This bug appears to have been
mistakenly introduced in commit
cd2ec1a940 (June 2005).
On Windows, opened or locked files cannot be removed.
Since fontconfig locked an old cache file while updating the file,
fontconfig failed to replace the file with updated file on Windows.
This patch makes fontconfig does not lock the old cache file
while updating it on Windows.
To support the one of changes in gperf 3.1:
* The 'len' parameter of the hash function and of the lookup function is now
of type 'size_t' instead of 'unsigned int'. This makes it safe to call these
functions with strings of length > 4 GB, on 64-bit machines.
Validation fails when the FcValueList contains more than font->num.
this logic was wrong because font->num contains a number of the elements
in FcPatternElt but FcValue in FcValueList.
This corrects 7a4a5bd7.
Patch from Tobias Stoeckmann
The cache files are insufficiently validated. Even though the magic
number at the beginning of the file as well as time stamps are checked,
it is not verified if contained offsets are in legal ranges or are
even pointers.
The lack of validation allows an attacker to trigger arbitrary free()
calls, which in turn allows double free attacks and therefore arbitrary
code execution. Due to the conversion from offsets into pointers through
macros, this even allows to circumvent ASLR protections.
This attack vector allows privilege escalation when used with setuid
binaries like fbterm. A user can create ~/.fonts or any other
system-defined user-private font directory, run fc-cache and adjust
cache files in ~/.cache/fontconfig. The execution of setuid binaries will
scan these files and therefore are prone to attacks.
If it's not about code execution, an endless loop can be created by
letting linked lists become circular linked lists.
This patch verifies that:
- The file is not larger than the maximum addressable space, which
basically only affects 32 bit systems. This allows out of boundary
access into unallocated memory.
- Offsets are always positive or zero
- Offsets do not point outside file boundaries
- No pointers are allowed in cache files, every "pointer or offset"
field must be an offset or NULL
- Iterating linked lists must not take longer than the amount of elements
specified. A violation of this rule can break a possible endless loop.
If one or more of these points are violated, the cache is recreated.
This is current behaviour.
Even though this patch fixes many issues, the use of mmap() shall be
forbidden in setuid binaries. It is impossible to guarantee with these
checks that a malicious user does not change cache files after
verification. This should be handled in a different patch.
Signed-off-by: Tobias Stoeckmann <tobias@stoeckmann.org>
Take a look at the nano second in the mtime to figure out
if the cache needs to be updated if available.
and do the mutex lock between scanning and writing a cache
to avoid the conflict.
Also we don't need to scan directories again after writing
caches. so getting rid of the related code as well.
https://bugs.freedesktop.org/show_bug.cgi?id=69845
and for reference:
https://bugzilla.redhat.com/show_bug.cgi?id=1236034
In 32ac7c75e8 the behavior of
FcConfigAppFontAddFile/Dir() were changed to return false
if not fonts were found. While this is welldefined and useful
for AddFile(), it's quite problematic for AddDir(). For example,
if the directory is empty, is that a failure or success? Worse,
the false value from AddDir() was being propagated all the way
to FcInit() returning false now. This only happened upon memory
allocation failure before, and some clients assert that FcInit()
is successful.
With this change, AddDir() is reverted back to what it was.
AddFont() change (which was actually in fcdir.c) from the original
commit is left in.
just setting FC_MATCH=3 shows a lot of information and hard to keep on track for informamtion
which is really necessary to see. to use this more effectively, added FC_DBG_MATCH_FILTER to
see for what one really want to see. it takes a comma-separated-list of object names.
If you want to see family name only, try like this:
FC_DBG_MATCH_FILTER=family FC_DEBUG=4096 fc-match
debugging output will be filtered out and see family only in the result.
Continue to increase the object id even after FcFini()
and detect the overflow. that would be rather easier than
reset the object id with the complicated mutex and atomic
functions.
This situation would be quite unlikely to happen though
Adds FC_SYMBOL.
This affects fonts having a cmap with platform 3 encoding 0.
We now map their glyphs from the PUA area to the Latin1 area.
See thread "Webdings and other MS symbol fonts don't display"
on the mailing list.
Test before/after with:
$ pango-view --markup --text='<span fallback="false">×</span>' --font=Wingdings
Paths starting with '/' don't make sense on W32 as-is,
prepend the installation root directory to them.
This allows the cache to be contained within a particular
fontconfig installation (as long as the default
--with-cache-dir= is overriden at configure time).
Prior to the change of 32ac7c75e8
FcConfigAppFontAddFile() always returned FcTrue no matter what
fonts was added. after that, it always returned FcFalse because
adding a font doesn't add any subdirs with FcFileScanConfig().
so changing that to simply ignore it.
Also fixing it to return FcFalse if non-fonts was added, i.e.
FcFreeTypeQuery() fails.
https://bugs.freedesktop.org/show_bug.cgi?id=89617
After the change of d6a5cc665a
we have a lot of code points in FcBlanks. doing the linear search
on the array isn't comfortable anymore.
So re-implementing FcBlanksIsMember() to use the binary search.
Figuring out how much improved after this change depends on
how many fonts proceed with fc-cache say though, it's about 20 times
faster here on testing. which sounds good enough for
improvement.
Assuming that d_name is the last member of struct dirent.
In POSIX, the maximum length of d_name is defined as NAME_MAX
or FILENAME_MAX though, that assumption may be wrong on some
platforms where defines d_name as the flexible array member
and allocate the minimum memory to store d_name.
Patch from Raimund Steger
A while back we removed Apple Roman encoding support. This broke
symbol fonts (Wingdings, etc) because those fonts come with two
cmaps:
1) platform=1,encoding=0, aka Apple Roman, which maps identity,
2) platform=3,encoding=0, aka MS Symbol font
Now, the reason the Apple Roman removal "broke" these fonts is
obvious, and for the better: these fonts were mapping ASCII and
other Latin chars to symbols.
The reason the fonts didn't work anymore, however, is that we were
mishandling the MS symbol-font cmaps. In their modern incarnation
they are like regular non-symbol-font cmap that map PUA codepoints
to symbols. We want to expose those as such. Hence, this change
just removes the special-handling for that.
Now, the reason this confusion happened, if I was to guess, is either
that FreeType docs are wrong saying that FT_ENCODING_MS_SYMBOL is
the "Microsoft Symbol encoding, used to encode mathematical symbols":
http://www.kostis.net/charsets/symbol.htm
or maybe it started that way, but turned into also mapping MS symbol-
font cmaps, which is a completely different thing. At any rate, I
don't know if there are any fonts that use this thing these days, but
the code here didn't seem to produce charset for any font. By now I'm
convinced that this change is the Right Thing to do. The MS Symbol
thing was called AdobeSymbol in our code by the way.
This fixes the much-reported bug that windings, etc are not usable
with recent fontconfig:
https://bugs.freedesktop.org/show_bug.cgi?id=58641
Now I see PUA mappings reported for Wingdings.
This also fixes:
Bug 48947 - Drop the non-Unicode cmap support gradually
https://bugs.freedesktop.org/show_bug.cgi?id=48947
since the AdobeSymbol was the last non-Unicode cmap we were
trying to parse (very incorrectly).
Lots of code around this change can be simplified. I'll push those
out (including removing the table itself) in subsequent changes.
All color fonts are designed to be scaled, even if they only have
bitmap strikes. Client is responsible to scale the bitmaps. This
is in constrast to non-color strikes...
Clients can still use FC_OUTLINE to distinguish bitmap vs outline
fonts. Previously FC_OUTLINE and FC_SCALABLE always had the same
value. Now FC_SCALABLE is set to (FC_OUTLINE || FC_COLOR).
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=87122
This reverts commit a5a384c5ff.
I don't remember what I had in mind for "We will use this property later.", but
the change was wrong. If a font pattern doesn't have any value for element,
it must be interpretted as "it matches any value perfectly. And "perfectly"
must have a score of 0 for that to happen.
This was actually affecting bitmap fonts (in a bad way), as the change made
an outline font to always be preferred over a (otherwise equal) bitmap font,
even for the exact size of the bitmap font. That probably was never noticed
by anyone, but with the font range support this has become clear (and worked
around by Akira). To clean that up, I'm reverting this so I can land the
rest of patches for bug 80873.
https://bugs.freedesktop.org/show_bug.cgi?id=80873#c10
Previously, if the patten didn't request, eg, style, then the style
and stylelang were fully copied from the font, even though the pattern
had a stylelang. Eg:
$ fc-match 'Apple Color Emoji:stylelang=en'
Apple Color Emoji.ttf: "Apple Color Emoji" "標準體"
This change both fixes that and makes the code much more readable. Now:
$ fc-match 'Apple Color Emoji:stylelang=en'
Apple Color Emoji.ttf: "Apple Color Emoji" "Regular"
iconv support was turned off by default in f30a5d76.
Some fonts, like Apple Color Emoji, only have their English
name in a MacRoman entry. As such, decode MacRoman ourselves.
Previous format was unusable. New format is ranges of hex values.
To choose space character and Latin capital letters for example:
$ fc-pattern ':charset=20 41-5a'
Pattern has 1 elts (size 16)
charset:
0000: 00000000 00000001 07fffffe 00000000 00000000 00000000 00000000 00000000
(s)
It was added without proper measurement and a fuzzy possible
use-case (font servers) in mind, but reality check shows that
this significantly slows down caching. As such, deprecate it
and do NOT compute hash during caching.
Makes caching two to three times faster (ignoring the 2 second
delay in fc-cache).
This is more robust but introduces a small change in behavior:
For .pcf.gz fonts, the new code calculates the hash of the uncompressed
font data whereas the original code was calculating the hash of the
compressed data.
No big deal IMO.
FcTypeVoid is likely to happen when 'lang' and 'charset'
is deleted by 'delete' or 'delete_all' mode in edit.
Without this change, any modification on them are simply
ignored.
This is useful to make a lot of changes, particularly
when one wants to add a few and delete a lot say.
This feature requires the FreeType 2.5.1 or later at the build time.
Besides <range> element allows <double> elements with this changes.
This may breaks the cache but not bumping in this change sets at this moment.
please be aware if you want to try it and run fc-cache before/after to
avoid the weird thing against it.
Let me show it with an example.
Currently:
$ fc-match symbol
symbol.ttf: "Symbol" "Regular"
$ fc-match symbol --sort | head -n 1
Symbol.pfb: "Symbol" "Regular"
$ fc-match symbol --sort --all | head -n 1
symbol.ttf: "Symbol" "Regular"
I want to make sure the above three commands all return the same font.
Ie. I want to make sure FcFontMatch() always returns the first font
from FcFontSort(). As such, never trim first font.
Reported by parfait 1.3:
Error: Null pointer dereference (CWE 476)
Read from null pointer t
at line 423 of src/fcname.c in function 'FcNameParse'.
Function _FcObjectLookupOtherTypeByName may return constant 'NULL'
at line 63, called at line 122 of src/fcobjs.c in function
'FcObjectLookupOtherTypeByName'.
Function FcObjectLookupOtherTypeByName may return constant 'NULL'
at line 122, called at line 67 of src/fcname.c in function
'FcNameGetObjectType'.
Function FcNameGetObjectType may return constant 'NULL' at line 67,
called at line 422 in function 'FcNameParse'.
Null pointer introduced at line 63 of src/fcobjs.c in function
'_FcObjectLookupOtherTypeByName'.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reported by parfait 1.3:
Memory leak of pointer sset allocated with FcStrSetCreate()
at line 933 of src/fcstr.c in function 'FcStrBuildFilename'.
sset allocated at line 927 with FcStrSetCreate().
sset leaks when sset != NULL at line 932.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
config.h is read from fcint.h now so having a line of the sort of #include "config.h"
is duplicate.
Bug 69833 - Incorrect SIZEOF_VOID_P and ALIGNOF_DOUBLE definitions causes nasty warnings on MacOSX when building fat libraries
This change reverts 9acc14c34a
because it doesn't work as expected when building
with -fshort-enums which is default for older arms ABIs
Thanks for pointing this out, Thomas Klausner, Valery Ushakov, and Martin Husemann
After this change, the following works as expected:
$ FC_DEBUG=4 fc-match ":family=foo bar, sans-serif"
...
FcConfigSubstitute Pattern has 3 elts (size 16)
family: "foo bar"(s) "sans-serif"(s)
...
Workaround to not failing even when the hash is unable to generate from fonts.
This change also contains to ignore the case if the hash isn't in either both
patterns.
Regex is expensive to compare filenames. we already have the glob matching
and it works enough in this case.
Prior to this change, renaming FcConfigGlobMatch() to FcStrGlobMatch() and moving to fcstr.c
Add back FcHashGetSHA256DigestFromFile() and fall back to it
when font isn't SFNT-based font because FT_Load_Sfnt_Table
fails with FT_Err_Invalid_Face_Handle.
As of automake-13.1 the INCLUDES directive is no longer supported.
An automake run will return with an error.
This changeset simply follows automake's advice to replace INCLUDES
by AM_CPPFLAGS.
Add an ability to set the system root to generate the caches.
In order to do this, new APIs, FcConfigGetSysRoot() and
FcConfigSetSysRoot() is available.
Maps fonts produced by the Culmus project <http://culmus.sourceforge.net>
to the XLFD foundry name culmus.
For TrueType fonts, maps the vendor code CLM from the TrueType vendor id field.
For Type1 fonts, which use heuristics to guess mappings to XLFD foundries from
words in the copyright notice, add the names of the main contributors to
the Culmus product to recognize the fonts under their copyright.
Patch from Maxim Iorsh
Add two edit mode, "delete" and "delete_all".
what values are being deleted depends on <test> as documented.
if the target object is same to what is tested, matching value there
will be deleted. otherwise all of values in the object will be deleted.
so this would means both edit mode will not take any expressions.
e.g.
Given that the testing is always true here, the following rules:
<match>
<test name="foo" compare="eq">
<string>bar</string>
</test>
<edit name="foo" mode="delete"/>
</match>
will removes "bar" string from "foo" object. and:
<match>
<test name="foo" compare="eq">
<string>foo</string>
</test>
<edit name="bar" mode="delete"/>
</match>
will removes all of values in "bar" object.
This changes allows to have multiple mathcing rules in one <match> block
in the same order.
After this changes, the following thing will works as two matching rules:
<match>
<!-- rule 1 -->
<test name="family" compare="eq">
<string>foo</string>
</test>
<edit name="foo" mode="append">
<string>foo</string>
</edit>
<!-- rule 2 -->
<test name="foo" compare="eq">
<string>foo</string>
</test>
<edit name="foo" mode="append">
<string>bar</string>
</edit>
</match>
In FcStrListCreate() we were increasing reference count of set,
however, if set had a const reference (which is the case for list
of languages), and with multiple threads, the const ref (-1) was
getting up to 1 and then a decrease was destroying the set. Ouch.
Here's the valgrind error, which took me quite a few hours of
running to catch:
==4464== Invalid read of size 4
==4464== at 0x4E58FF3: FcStrListNext (fcstr.c:1256)
==4464== by 0x4E3F11D: FcConfigSubstituteWithPat (fccfg.c:1508)
==4464== by 0x4E3F8F4: FcConfigSubstitute (fccfg.c:1729)
==4464== by 0x4009FA: test_match (simple-pthread-test.c:53)
==4464== by 0x400A6E: run_test_in_thread (simple-pthread-test.c:68)
==4464== by 0x507EE99: start_thread (pthread_create.c:308)
==4464== Address 0x6bc0b44 is 4 bytes inside a block of size 24 free'd
==4464== at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4464== by 0x4E58F84: FcStrSetDestroy (fcstr.c:1236)
==4464== by 0x4E3F0C6: FcConfigSubstituteWithPat (fccfg.c:1507)
==4464== by 0x4E3F8F4: FcConfigSubstitute (fccfg.c:1729)
==4464== by 0x4009FA: test_match (simple-pthread-test.c:53)
==4464== by 0x400A6E: run_test_in_thread (simple-pthread-test.c:68)
==4464== by 0x507EE99: start_thread (pthread_create.c:308)
Thread test is running happily now. Will add the test in a moment.