We've already calculated the lengths of these strings, so re-use those
values to avoid having to rescan the strings multiple times.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
I broke FcFontSort() language handling at the end of 2008 with this
commit: c7641f28
G-d knows how many of the lang-matching bugs in bugzilla will be
fixed by this changed...
I'm really sorry, everyone!
- Do not throw away FC_FILE in FcNameUnparse
- Update the builtin "fclist" format to remove FC_FILE properly instead
- Switch fc-list to use FcPatternFormat()
Note that I had previously broken fc-list and it was not showing the
file name anymore. No one noticed that it seems! Now fixed.
Do not remove blacklisted fonts during cache generation. We already
apply the blacklist when reading the caches. The idea always has been
that the config should not affect caches built, although that design
was tarnished with the introduction of target="scan" configurations.
The syntax to add any characters to the charset table looks like:
<match target="scan">
<test name="family">
<string>Buggy Sans</string>
</test>
<edit name="charset" mode="assign">
<plus>
<name>charset</name>
<charset>
<int>0x3220</int> <!-- PARENTHESIZED IDEOGRAPH ONE -->
</charset>
</plus>
</edit>
</match>
To remove any characters from the charset table:
<match target="scan">
<test name="family">
<string>Buggy Sans</string>
</test>
<edit name="charset" mode="assign">
<minus>
<name>charset</name>
<charset>
<int>0x06CC</int> <!-- ARABIC LETTER FARSI YEH -->
<int>0x06D2</int> <!-- ARABIC LETTER YEH BARREE -->
<int>0x06D3</int> <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE -->
</charset>
</minus>
</edit>
</match>
You could also use the range element for convenience:
...
<charset>
<int>0x06CC</int> <!-- ARABIC LETTER FARSI YEH -->
<range>
<int>0x06D2</int> <!-- ARABIC LETTER YEH BARREE -->
<int>0x06D3</int> <!-- ARABIC LETTER YEH BARREE WITH HAMZA ABOVE -->
</range>
</charset>
...
The OT spec says:
"When building a Unicode font for Windows, the platform ID should be 3 and the
encoding ID should be 1. When building a symbol font for Windows, the platform
ID should be 3 and the encoding ID should be 0."
We were ignoring the SYMBOL_CS entry before. It's UTF-16/UCS-2 like the
UNICODE_CS.
Also, always use UTF-16BE instead of UCS-2BE. The conversion was doing
UTF-16BE anyway.
Last night in between my dreams I also noticed that we support Unicode
values up to 0x01000000 and not 0x00100000 which I thought before.
This covers the entire Unicode range.
Protect cache against future expansions of FcLangSet (adding new
orth files). Previously, doing so could change the size of
that struct. Indeed, that happened between 2.6.0 and 2.7.3, causing
crashes. Unfortunately, sizeof(FcLangSet) was not checked in fcarch.c.
This changes FcLangSet code to be able to cope with struct size changes.
And change cache format, hence bumping from 2 to 3.
Before a NULL config was passed down adn essentially FcFileScan was
equivalent to FcFreeTypeQuery. Now fc-scan tool correctly applies
the configuration to the scanned patterns.
The East Asian double-byte codepages have characters with backslash as
the second byte, so we must use _mbsrchr() instead of strrchr() when
looking at pathnames in the system codepage.
Must not call FcStrFree() on a value returned by
FcStrBufDoneStatic(). In the Windows code don't bother with dynamic
allocation, just use a local buffer.
Fontconfig assigns an index number to each language it knows about.
The index is used to index a bit in FcLangSet language map. The bit
map is stored in the cache.
Previously fc-lang simply sorted the list of languages and assigned
them an index starting from zero. Net effect is that whenever new
orth files were added, all the FcLangSet info in the cache files would
become invalid. This was causing weird bugs like this one:
https://bugzilla.redhat.com/show_bug.cgi?id=490888
With this commit we fix the index assigned to each language. The index
will be based on the order the orth files are passed to fc-lang. As a
result all orth files are explicitly listed in Makefile.am now, and
new additions should be made to the end of the list. The list is made
to reflect the sorted list of orthographies from 2.6.0 released followed
by new additions since.
This fixes the stability problem. Needless to say, recreating caches
is necessary before any new orthography is recognized in existing fonts,
but at least the existing caches are still valid and don't cause bugs
like the above.
The format '%{[]family,familylang{expr}}' expands expr once for the first
value of family and familylang, then for the second, etc, until both lists
are exhausted.
The '%{=unparse}' format expands to the FcNameUnparse() result on the
pattern. Need to add '%{=verbose}' for FcPatternPrint() output but
need to change that function to output to a string first.
Also added the '%{=fclist}' and '%{=fcmatch}' which format like the
default format of fc-list and fc-match respectively.
The format '%{family|delete( )}' expands to family values with space removed.
The format '%{family|translate( ,-)}' expands to family values with space
replaced by dash. Multiple chars are supported, like tr(1).
The format '%{family|escape(\\ )}' expands to family values with space
escaped using backslash.
The format '%{family|downcase}' for example prints the lowercase of
the family element. Three converters are defined right now:
'downcase', 'basename', and 'dirname'.
The conditional '%{?elt1,elt2,!elt3{expr1}{expr2}}' will evaluate
expr1 if elt1 and elt2 exist in pattern and elt3 doesn't exist, and
expr2 otherwise. The '{expr2}' part is optional.
The filtering, '%{+elt1,elt2,elt3{subexpr}}' will evaluate subexpr
with a pattern only having the listed elements from the surrounding
pattern.
The deletion, '%{-elt1,elt2,elt3{subexpr}}' will evaluate subexpr
with a the surrounding pattern sans the listed elements.
Diego Santa Cruz pointed out that we are using that API wrongly.
The forth argument is a pointer to a pointer. Turns out we don't
need that arugment and it accepts NULL, so just pass that.
To only work on writable charsets. Also, return a bool indicating whether
the merge changed the charset.
Also changes the implementation of FcCharSetMerge and FcCharSetIsSubset
Previously an index j was added to element score to prefer matches earlier
in the value list to the later ones. This index started from 0, meaning
that the score zero could be generated for the first element. By starting
j from one, scores for when the element exists in both pattern and font
can never be zero. The score zero is reserved for when the element is
NOT available in both font and pattern. We will use this property later.
This shouldn't change matching much. The only difference I can think of
is that if a font family exists both as a bitmap font and a scalable
version, and when requesting it at the size of the bitmap version,
previously the font returned was nondeterministic. Now the scalable
version will always be preferred.
Previously the matcher multiplied comparison results by 100 and added
index value to it. With long lists of families (lots of aliases),
reaching 100 is not that hard. That could result in a non-match early
in the list to be preferred over a match late in the list. Changing
the multiplier from 100 to 1000 should fix that.
To keep things relatively in order, the lang multiplier is changed
from 1000 to 10000.
Previously fc-match "xxx,nazli" matched Nazli, but "xxx, nazli" didn't.
This was because of a bug in FcCompareFamily's short-circuit check
that forgot to ignore spaces.
I can't understand why the special case is needed. Indeed, removing it
does not make any difference in the "fc-match --verbose" output, and
that's the only time fc-match uses FcPatternPrint.
Two changes:
- after mkdir(), we immediately chmod(), such that we are not affected
by stupid umask's.
- if a directory we want to use is not writable but exists, we try a
chmod on it. This is to recover from stupid umask's having affected
us with older versions.
The current behaviour of FcSortWalk() is to create a new FcCharSet on
each iteration that is the union of the previous iteration with the next
FcCharSet in the font set. This causes the existing FcCharSet to be
reproduced in its entirety and then allocates fresh leaves for the new
FcCharSet. In essence the number of allocations is quadratic wrt the
number of fonts required.
By introducing a new method for merging a new FcCharSet with an existing
one we can change the behaviour to be effectively linear with the number
of fonts - allocating no more leaves than necessary to cover all the
fonts in the set.
For example, profiling 'gedit UTF-8-demo.txt'
Allocator nAllocs nBytes
Before:
FcCharSetFindLeafCreate 62886 2012352
FcCharSetPutLeaf 9361 11441108
After:
FcCharSetFindLeafCreate 1940 62080
FcCharSetPutLeaf 281 190336
The savings are even more significant for applications like firefox-3.0b5
which need to switch between large number of fonts.
Before:
FcCharSetFindLeafCreate 4461192 142758144
FcCharSetPutLeaf 1124536 451574172
After:
FcCharSetFindLeafCreate 80359 2571488
FcCharSetPutLeaf 18940 9720522
Out of interest, the next most frequent allocations are
FcPatternObjectAddWithBinding 526029 10520580
tt_face_load_eblc 42103 2529892
Note that this also fixes a bug with FcFontList() where previously
it was NOT checking whether the config is up-to-date. May want to
keep the old behavior and document that ScanInterval is essentially
unused internally (FcFontSetList uses it, but we can remove that
too).
A private FcObjectGetSet() is implemented that provides an
FcObjectSet of all registered elements. FcFontSetList() is
then modified to use the object set from FcObjectGetSet() if
provided object-set is NULL.
Alternatively FcObjectGetSet() can be made public. In that
case fc-list can use that as a base if --verbose is included,
and also add any elements provided by the user (though that has
no effect, as all elements from the cache are already registered).
Currently fc-list ignores user-provided elements if --verbose
is specified.
The fact that we now drop final slashes from all filenames without
checking that the file name represents a directory may surprise some,
but it doesn't bother me really.
At OLPC, we came across a bug where the Browse activity (based on xulrunner)
took 100% CPU after an upgrade/. It turns out the Mozilla uses
FcConfigUptoDate() to check if new fonts have been added to the system, and
this function was always returning FcFalse since we have the mtimes of some
font directories set in the future. The attached patch makes
FcConfigUptoDate() print a warning and return FcTrue if mtime of directories
are in the future.
It seems indices in _FcMatchers array are slightly mixed up, MATCH_DECORATIVE
should be 10, not 11.
And MATCH_RASTERIZER_INDEX should be 13, not 12, right?
Libtool-2.2 introduces new restrictions. So now it does not allow LT_*
variables as it includes marcros:
m4_pattern_forbid([^_?LT_[A-Z_]+$])
Rename the LT_ variables to LIBT_ to work around this restriction.
Building 2.5.91 on Solaris with the native make(1) yields
...
Making all in src
make: Fatal error in reader: Makefile, line 313: Unexpected end of line seen
Current working directory /tmp/fontconfig-2.5.91/src
*** Error code 1
This is due to the following line (src/Makefile.am:143):
CLEANFILES := $(ALIAS_FILES)
Changing that to a standard assignment ("=") fixes the problem.
I believe the ":=" is a typo. ALIAS_FILES is just a statically assigned
variable; it's not like evaluating it more than once would be a problem.
If the /usr/bin/head program is missing or unusable, or if an unusable head
program is listed first in the PATH, fontconfig fails to build
using "sed -n 1p" instead of "head -1" would be a suitable workaround.
Since fontconfig didn't have special handling for paths in static Windows
libraries, I've created a patch which should fix this.
Basically it does this:
fccfg.c:
If fontconfig_path was uninitialised it tries to get the directory the exe is
in and uses a fonts/ dir inside that.
fcxml.c:
In case the fonts.conf lists a <dir>CUSTOMFONTDIR</dir>, it searches for a
fonts/ directory where the exe is located.
David Turner has modified FreeType to be able to render sub-pixel decimated
glyphs using different methods of filtering. Fontconfig needs new
configurables to support selecting these new filtering options. A patch
follows that would correspond to one available for Cairo in bug 10301.
Bitmap-only TrueType fonts without a glyf table will not load a glyph when
FT_LOAD_NO_SCALE is set. Work around this by identifying TrueType fonts that have no
glyphs and select a single strike to measure the glyph map with.
For Version 2.5.0, (same for previous version 2.4.2), in source file fccfg.c,
on line 700,
Original:
ret = FcStrCmpIgnoreCase (left.u.s, right.u.s) == 0;
Should change to:
ret = FcStrStrIgnoreCase (left.u.s, right.u.s) == 0;
I think this is just a mistake when copy-n-paste similar codes in the same
function. Apparently, return for "Not_contain" should be just the inverse of
"Contain", not the same as "Equal".
Fix a couple of longstanding problems with fontconfig on Windows that
manifest themselves especially in GIMP. The root cause to the problems is in
Microsoft's incredibly stupid stat() implementation. Basically, stat()
returns wrong timestamp fields for files on NTFS filesystems on machines
that use automatic DST switching.
See for instance http://bugzilla.gnome.org/show_bug.cgi?id=154968 and
http://www.codeproject.com/datetime/dstbugs.asp
As fccache.c now looks at more fields in the stat struct I fill in them all.
I noticed that fstat() is used only on a fd just after opening it, so on
Win32 I just call my stat() replacement before opening instead...
Implementing a good replacement for fstat() would be harder because the code
in fccache.c wants to compare inode numbers. There are no (readily
accessible) inode numbers on Win32, so I fake it with the hash of the full
file name, in the case as it is on disk. And fstat() doesn't know the full
file name, so it would be rather hard to come up with a inode number to
identify the file.
The patch also adds similar handling for the cache directory as for the fonts
directory: If a cachedir element in fonts.conf contains the magic string
"WINDOWSTEMPDIR_FONTCONFIG_CACHE" it is replaced at runtime with a path under
the machine's (or user's) temp folder as returned by GetTempPath(). I don't
want to hardcode any pathnames in a fonts.conf intended to be distributed to
end-users, most of which who wouldn't know how to edit it anyway. And
requiring an installer to edit it gets complicated.
This reverts commit b607922909.
Conflicts:
src/Makefile.am
Xft still uses the macros that are in fcprivate.h. Document those macros and
include fcprivate.h in the published header files.
These two names are typos of the correct names. Instead of simply changing
them, the correct thing to do is leave them in the library, add the correct
functions and mark them as deprecated so any source packages will be updated.
This requires bumping the minor version of the library (for adding APIs)
instead of bumping the major version of the library (for removing APIs).
fcprivate.h was supposed to extend the fontconfig API for the various
fontconfig utilities. Instead, just have those utilities use the internal
fcint.h header file (which they already do), removing fcprivate.h from the
installation and hence from the defacto public API.
If the provided style value doesn't match any available font, fall back to
using the weight and slant values by ensuring that those are in the pattern.
Glyph names (now used only for dingbats) were using many relocations,
causing startup latency plus per-process memory usage. Replace pointers with
table indices, shrinking table size and elimninating relocations.
The cache was inserted into the hash table before the timestamps in the
cache were verified; if that verification failed, an extra pointer to the
now freed cache would be left in the hash table. FcFini would fail an
assertion as a result.
If ~/.fonts.conf contains:
<edit mode="assign_replace" name="spacing">
<int>mono</int>
</edit>
fontconfig crashes:
mfabian@magellan:~$ fc-match sans
Fontconfig error: "~/.fonts.conf", line 46: "mono": not a valid
integer
セグメンテーション違反です (core dumped)
mfabian@magellan:~$
Of course the above is nonsense, “mono” is no valid integer indeed.
But I think nevertheless fontconfig should not crash in that case.
The problem was caused by partially truncated expression trees caused by
parse errors -- typechecking these walked the tree without verifying the
integrity of the structure. Of course, the whole tree will be discarded
shortly after being loaded as it contained an error.
Some mingw versions have broken X_OK checking; instead of trying to work
around this in a system-depedent manner, simply don't bother checking for
X_OK along with W_OK as such cases are expected to be mistakes, and not
sensible access control.
The old policy of eliding fullname entries which matched FC_FAMILY or
FC_FAMILY + FC_STYLE meant that applications could not know what the
font foundry set as the fullname of the font. Hiding information is not
helpful.
Instead of relying on mtime ordering between a directory and its associated
cache file, write the directory mtime into the cache file itself. This makes
cache file checks more reliable across file systems.
This change is made in a way that old programs can use new cache files, but
new programs will need new cache files.
In FcDirCacheUnlink(), the line
cache_hashed = FcStrPlus (cache_dir, cache_base);
allocates memory in cache_hashed that is never free()'d before the function
exits.
Reported by Ben Combee.
Recent versions of FreeType do not correctly deal with glyph name buffers
that are too small; work around this by declaring a buffer that can hold any
PS name (127 bytes).
I noticed that Qt always uses a different font than fc-match advertises.
Debugging the issue, I found that a call that looks pretty innocent is
changing all weak bindings to strong bindings and as such changes the
semantic of the match: FcPatternDuplicate.
Missing NULL font check before attempting to edit scanned pattern.
Also, <match target="scan"> rules are now checked to ensure all
edited variables are in the predefined set; otherwise, the resulting
cache files will not be stable.
src/fccache.c uses a trick to try and use a function name that is also a
macro name. It does this using the varargs args() macro. Replace that
with separate macros for each number of formals.
grep -l -w '^foo' doesn't work on Solaris. Replace with
grep -l '^foo\>' instead which does. Also, grep -l will
report the filename more than once (!), so add | head -1
to pick just the first one.
The union inside the FcValue structure contains pad bytes. Instead of
copying the whole structure to the cache block, copy only the initialized
fields to avoid writing whichever bytes serve as padding within the
structure.
When updating from older fontconfig versions, if the config file
is not replaced, it will not contain <cachedir> elements. Lacking these,
fontconfig has no place to store cached font information and cannot operate
reasonably.
Add code to check and see if the loaded configuration has no cache
directories, and if so, warn the user and add both the default system cache
directory and the normal per-user cache directory.
Instead of accepting whatever order names appear in the font file,
use an explicit ordering for both platform and nameid.
Platforms are high precedence than nameids.
The platform order is:
microsoft, apple unicode, macintosh, (other)
The family nameid order is:
preferred family, font family
The fullname nameid order is:
mac full name, full name
The style nameid order is
preferred subfamily, font subfamily
This will change the names visible to users in various application UIs, but
should not change how existing font names are matched as all names remain
present in the resulting database. The hope is that family names will, in
general, be less ambiguous. Testing here shows that commercial fonts
have longer names now while DejaVu has a shorter family name, and moves more
of the font description to the style name.
Our build system barfs on autogen.sh, which ignores --noconfigure. Configure
needs a host of options to make the cross compile work in our case.
Fix typo in fccache.c
FcStrCanonFileName checks whether s[0] == '/', and recurses if not.
This only works on POSIX. On dos, this crashes with a stack overflow.
The patch attached splits this functionality in two functions
(FcStrCanonAbsoluteFilename) and uses GetFullPathName on windows to get an
absolute path. It also fixes a number of other issues. With this patch,
LilyPond actually produces output on Windows.
Instead of attempting to track exported symbols manually in
fontconfig.def.in, build it directly from the public fontconfig header files
to ensure it exports the public API.
With the cache restructuring of 2.4.0, the ability to add
application-specific font files and directories was accidentally lost.
Reimplement this using by sharing the logic used to load configured font
directories.
All caches used in the application must be in the cache reference list so
internal references can be tracked correctly. Failing to have newly created
caches in the list would cause the cache to be deallocated while references
were still present.
Locale environment variables (LC_ALL, LC_CTYPE, LANG) must contain language,
and may contain territory and encoding. Don't accidentally require territory
as that will cause fontconfig to fall back to 'en'.
makealias was using a gnu-extension to sed addressing, replace that with a
simple (and more robuse) grep command. Also, found a bug in the public
header file that was leaving one symbol out of the process.
The existing loop for discovering which characters map to glyphs is ugly and
inefficient. The replacement is functionally identical, but far cleaner and
faster.
Charset hashing actually use the value of the leaf pointers, which is
clearly wrong, especially now that charsets are not shared across multiple
font directories.
Using a simple shell script that processes the public headers, two header
files are constructed that map public symbols to hidden internal aliases
avoiding the assocated PLT entry for referring to a public symbol.
A few mistakes in the FcPrivate/FcPublic annotations were also discovered
through this process
Eliminate need to reference cache object once per cached font, instead
just count the number of fonts used from the cache and bump the reference
count once by that amount. I think this makes this refernece technique
efficient enough for use.
Caches contain patterns and character sets which are reference counted and
visible to applications. Reference count the underlying cache object so that
it stays around until all reference objects are no longer in use.
This is less efficient than just leaving all caches around forever, but does
avoid eternal size increases in case applications ever bother to actually
look for changes in the font configuration.
Without reference counting on cache objects, there's no way to know when
an application is finished using objects pulled from the cache. Until some
kinf of cache reference counting can be done, leave all cache objects mapped
for the life of the library (until FcFini is called). To mitigate the cost
of this, ensure that each instance of a cache file is mapped only once.
Borrowing header stuff written for cairo, fontconfig now exposes in the
shared library only the symbols which are included in the public header
files. All private symbols are hidden using suitable compiler directives.
A few new public functions were required for the fontconfig utility programs
(fc-cat and fc-cache) so those were added, bumping the .so minor version number
in the process.
The Delicious family includes one named Delicious Heavy, a bold variant
which is unfortunately marked as having normal weight. Because the family
name is 'Delicious', fontconfig accidentally selects this font instead of
the normal weight variant. The fix here rewrites the scanned data by running
the scanned pattern through a new substitution sequence tagged with
<match target=scan>; a sample for the Delicious family is included to
demonstrate how it works (and fix Delicious at the same time).
Also added was a new match predicate -- the 'decorative' predicate which is
automatically detected in fonts by searching style names for key decorative
phrases like SmallCaps, Shadow, Embosed and Antiqua. Suggestions for
additional decorative key words are welcome. This should have little effect
on font matching except when two fonts share the same characteristics except
for this value.
Use the version number inside the cache file to mark backward compatible
changes while continuing to reserve the filename number for incompatible
changes.
Instead of making filename canonicalization occur in multiple places, it
occurs only in FcStrAddFilename now, as all filenames pass through that
function at one point.
Many Japanese fonts incorrectly include names tagged as Roman encoding and
English language which are actually Japanese names in the SJIS encoding.
Guess that names with a large number of high bits set are SJIS encoded
Japanese names rather than English names.
A pattern specifying 'Chinese' (:lang=zh) without a territory should be
satisfied by any font supporting any Chinese lang. The code was requiring
that the lang tags match exactly, causing this sort to fail.
Within a fontset, the patterns are stored as pointers in an array.
When stored as offsets, the offsets are relative to the fontset object
itself, not the base of the array of pointers.
Charset freezer api now uses allocated object. Also required minor fixes to
charset freezer code to remove assumption that all input charsets are
persistant.
Instead of passing directory information around in separate variables,
collect it all in an FcCache structure. Numerous internal and tool
interfaces changed as a result of this.
Charsets are now pre-frozen before being serialized. This causes them to
share across multiple fonts in the same cache.
Automatically list all font directories when no arguments are given to
fc-cat. Also add -r option to recurse from specified cache directories.
fc-cat also now prints the cache filename in verbose mode, along with the
related directory name.
Validate cache contents and skip broken caches, looking down cache path for
valid ones.
Every time a directory is scanned, it will be written to a cache file if
possible, so fc-cache doesn't need to re-write the cache file. This makes
detecting when the cache was generated a bit tricky, so we guess that if the
cache wasn't valid before running and is valid afterwards, the cache file
was written.
Also, allow empty charsets to be serialized with null leaves/numbers.
Eliminate a leak in FcEdit by switching to FcObject sooner.
Call FcFini from fc-match to make valgrind happy.
Eliminate ancient list of object name databases and load names into single
hash table that includes type information. Typecheck all pattern values to
avoid mis-typed pattern elements.
FcCharSetSerialize was computing the offset to the unserialized leaf,
which left it pointing at random data when the cache was reloaded.
fc-cat has been updated to work with the new cache structure.
Various debug messages extended to help diagnose serialization errors.
Pagesize no longer matters in architecture decisions, the entire cache file
is mmaped into the library. However, lots of intptr_t values are in use now,
so that value is important.
fc-lang now requires fcserialize.c, which has been added to the repository.