Docs: update Usermanual-What Is HarfBuzz.
This commit is contained in:
parent
0af3d176a6
commit
d9fd927210
|
@ -1,112 +1,173 @@
|
|||
<chapter id="what-is-harfbuzz">
|
||||
<title>What is HarfBuzz?</title>
|
||||
<para>
|
||||
HarfBuzz is a <emphasis>text shaping engine</emphasis>. It solves
|
||||
the problem of selecting and positioning glyphs from a font given a
|
||||
Unicode string.
|
||||
HarfBuzz is a <emphasis>text shaping engine</emphasis>. If you
|
||||
give HarfBuzz a font and a string containing a sequence of Unicode
|
||||
codepoints, HarfBuzz selects and positions the corresponding
|
||||
glyphs from the font, applying all of the necessary layout rules
|
||||
and font features. HarfBuzz then returns the string to you in the
|
||||
form that is correctly arranged for the language and writing
|
||||
system.
|
||||
</para>
|
||||
<section id="why-do-i-need-it">
|
||||
<title>Why do I need it?</title>
|
||||
<para>
|
||||
Text shaping is an integral part of preparing text for display. It
|
||||
is a fairly low level operation; HarfBuzz is used directly by
|
||||
graphic rendering libraries such as Pango, and the layout engines
|
||||
in Firefox, LibreOffice and Chromium. Unless you are
|
||||
<emphasis>writing</emphasis> one of these layout engines yourself,
|
||||
you will probably not need to use HarfBuzz - normally higher level
|
||||
libraries will turn text into glyphs for you.
|
||||
HarfBuzz can properly shape all of the world's major writing
|
||||
systems. It runs on virtually all operating systems and software
|
||||
platforms, and it supports all of the standard font formats in use
|
||||
today.
|
||||
</para>
|
||||
<section id="why-do-i-need-a-shaping-engine">
|
||||
<title>Why do I need a shaping engine?</title>
|
||||
<para>
|
||||
Text shaping is an integral part of preparing text for
|
||||
display. Before a Unicode sequence can be rendered, the
|
||||
codepoints in the sequence must be mapped to the glyphs
|
||||
provided in the font, and the glyphs must be positioned
|
||||
correctly relative to each other. For many of the scripts
|
||||
supported in Unicode, these steps involve script-specific layout
|
||||
rules.
|
||||
</para>
|
||||
<para>
|
||||
Text shaping is a fairly low-level operation. HarfBuzz is
|
||||
used directly by graphic rendering libraries such as Pango, as
|
||||
well as by the layout engines in Firefox, LibreOffice, and
|
||||
Chromium. Unless you are <emphasis>writing</emphasis> one of
|
||||
these layout engines yourself, you will probably not need to use
|
||||
HarfBuzz: normally, lower-level libraries will turn text into
|
||||
glyphs for you.
|
||||
</para>
|
||||
<para>
|
||||
However, if you <emphasis>are</emphasis> writing a layout engine
|
||||
or graphics library yourself, you will need to perform text
|
||||
shaping, and this is where HarfBuzz can help you. Here are some
|
||||
reasons why you need it:
|
||||
shaping, and this is where HarfBuzz can help you.
|
||||
</para>
|
||||
<para>
|
||||
Here are some specific scenarios where a text-shaping engine
|
||||
like HarfBuzz helps you:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
OpenType fonts contain a set of glyphs, indexed by glyph ID.
|
||||
The glyph ID within the font does not necessarily relate to a
|
||||
Unicode codepoint. For instance, some fonts have the letter
|
||||
"a" as glyph ID 1. To pull the right glyph out of
|
||||
the font in order to display it, you need to consult a table
|
||||
within the font (the "cmap" table) which maps
|
||||
Unicode codepoints to glyph IDs. Text shaping turns codepoints
|
||||
into glyph IDs.
|
||||
OpenType fonts contain a set of glyphs (that is, shapes
|
||||
to represent the letters, numbers, punctuation marks, and
|
||||
all other symbols), which are indexed by a <literal>glyph ID</literal>.
|
||||
</para>
|
||||
<para>
|
||||
The glyph ID within the font does not necessarily correlate
|
||||
to a predictable Unicode codepoint. For instance, some fonts
|
||||
have the letter "a" as glyph ID 1, but many others do
|
||||
not. To pull the right glyph out of the font in order to
|
||||
display "a", you need to consult the table inside
|
||||
the font (the <literal>cmap</literal> table) that maps Unicode
|
||||
codepoints to glyph IDs. In other words, <emphasis>text shaping turns
|
||||
codepoints into glyph IDs</emphasis>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Many OpenType fonts contain ligatures: combinations of
|
||||
characters which are rendered together. For instance, it's
|
||||
common for the <literal>fi</literal> combination to appear in
|
||||
print as the single ligature "fi". Whether you should
|
||||
render text as <literal>fi</literal> or "fi" does not
|
||||
depend on the input text, but on the capabilities of the font
|
||||
and the level of ligature application you wish to perform.
|
||||
Text shaping involves querying the font's ligature tables and
|
||||
determining what substitutions should be made.
|
||||
characters that are rendered as a single unit. For instance,
|
||||
it is common for the <literal>fi</literal> letter
|
||||
combination to appear in print as the single ligature glyph
|
||||
"fi".
|
||||
</para>
|
||||
<para>
|
||||
Whether you should render an "f, i" sequence
|
||||
as <literal>fi</literal> or as "fi" does not
|
||||
depend on the input text. Rather, it depends on the whether
|
||||
or not the font includes an "fi" glyph and on the
|
||||
level of ligature application you wish to perform. The font
|
||||
and the amount of ligature application used are under your
|
||||
control. In other words, <emphasis>text shaping involves
|
||||
querying the font's ligature tables and determining what
|
||||
substitutions should be made</emphasis>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
While ligatures like "fi" are typographic
|
||||
refinements, some languages <emphasis>require</emphasis> such
|
||||
While ligatures like "fi" are optional typographic
|
||||
refinements, some languages <emphasis>require</emphasis> certain
|
||||
substitutions to be made in order to display text correctly.
|
||||
In Tamil, when the letter "TTA" (ட) letter is
|
||||
followed by "U" (உ), the combination should appear
|
||||
as the single glyph "டு". The sequence of Unicode
|
||||
characters "டஉ" needs to be rendered as a single
|
||||
glyph from the font - text shaping chooses the correct glyph
|
||||
from the sequence of characters provided.
|
||||
</para>
|
||||
<para>
|
||||
For example, in Tamil, when the letter "TTA" (ட)
|
||||
letter is followed by "U" (உ), the pair
|
||||
must be replaced by the single glyph "டு". The
|
||||
sequence of Unicode characters "டஉ" needs to be
|
||||
substituted with a single "டு" glyph from the
|
||||
font.
|
||||
</para>
|
||||
<para>
|
||||
But "டு" does not have a Unicode codepoint. To
|
||||
find this glyph, you need to consult the table inside
|
||||
the font (the <literal>GSUB</literal> table) that contains
|
||||
substitution information. In other words, <emphasis>text shaping
|
||||
chooses the correct glyph for a sequence of characters
|
||||
provided</emphasis>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Similarly, each Arabic character has four different variants:
|
||||
within a font, there will be glyphs for the initial, medial,
|
||||
final, and isolated forms of each letter. Unicode only encodes
|
||||
one codepoint per character, and so a Unicode string will not
|
||||
tell you which glyph to use. Text shaping chooses the correct
|
||||
form of the letter and returns the correct glyph from the font
|
||||
that you need to render.
|
||||
Similarly, each Arabic character has four different variants
|
||||
corresponding to the different positions in might appear in
|
||||
within a sequence. Inside a font, there will be separate
|
||||
glyphs for the initial, medial, final, and isolated forms of
|
||||
each letter, each at a different glyph ID.
|
||||
</para>
|
||||
<para>
|
||||
Unicode only assigns one codepoint per character, so a
|
||||
Unicode string will not tell you which glyph variant to use
|
||||
for each character. To decide, you need to analyze the whole
|
||||
string and determine the appropriate glyph for each character
|
||||
based on its position. In other words, <emphasis>text
|
||||
shaping chooses the correct form of the letter by its
|
||||
position and returns the correct glyph from the font</emphasis>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Other languages have marks and accents which need to be
|
||||
rendered in certain positions around a base character. For
|
||||
instance, the Moldovan language has the Cyrillic letter
|
||||
"zhe" (ж) with a breve accent, like so: ӂ. Some
|
||||
fonts will contain this character as an individual glyph,
|
||||
whereas other fonts will not contain a zhe-with-breve glyph
|
||||
but expect the rendering engine to form the character by
|
||||
overlaying the two glyphs ж and ˘. Where you should draw the
|
||||
combining breve depends on the height of the preceding glyph.
|
||||
Again, for Arabic, the correct positioning of vowel marks
|
||||
depends on the height of the character on which you are
|
||||
placing the mark. Text shaping tells you whether you have a
|
||||
Other languages involve marks and accents that need to be
|
||||
rendered in specific positions relative a base character. For
|
||||
instance, the Moldovan language includes the Cyrillic letter
|
||||
"zhe" (ж) with a breve accent, like so: "ӂ".
|
||||
</para>
|
||||
<para>
|
||||
Some fonts will provide this character as a single
|
||||
zhe-with-breve glyph, but other fonts will not and, instead,
|
||||
will expect the rendering engine to form the character by
|
||||
superimposing the separate "ж" and "˘"
|
||||
glyphs.
|
||||
</para>
|
||||
<para>
|
||||
But exactly where you should draw the breve depends on the
|
||||
height and width of the preceding zhe glyph. To find the
|
||||
right position, you need to consult the table inside
|
||||
the font (the <literal>GPOS</literal> table) that contains
|
||||
positioning information.
|
||||
In other words, <emphasis>text shaping tells you whether you have a
|
||||
precomposed glyph within your font or if you need to compose a
|
||||
glyph yourself out of combining marks, and if so, where to
|
||||
position those marks.
|
||||
glyph yourself out of combining marks—and, if so, where to
|
||||
position those marks.</emphasis>
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>
|
||||
If this is something that you need to do, then you need a text
|
||||
shaping engine: you could use Uniscribe if you are using Windows;
|
||||
you could use CoreText on OS X; or you could use HarfBuzz. In the
|
||||
rest of this manual, we are going to assume that you are the
|
||||
implementor of a text layout engine.
|
||||
If tasks like these are something that you need to do, then you need a text
|
||||
shaping engine. You could use Uniscribe if you are writing
|
||||
Windows software; you could use CoreText on macOS; or you could
|
||||
use HarfBuzz.
|
||||
</para>
|
||||
<para>
|
||||
In the rest of this manual, we are going to assume that you are the
|
||||
implementor of a text-layout engine.
|
||||
</para>
|
||||
</section>
|
||||
<section id="why-is-it-called-harfbuzz">
|
||||
<title>Why is it called HarfBuzz?</title>
|
||||
<para>
|
||||
HarfBuzz began its life as text shaping code within the FreeType
|
||||
project, (and you will see references to the FreeType authors
|
||||
within the source code copyright declarations) but was then
|
||||
abstracted out to its own project. This project is maintained by
|
||||
HarfBuzz began its life as text-shaping code within the FreeType
|
||||
project (and you will see references to the FreeType authors
|
||||
within the source code copyright declarations), but was then
|
||||
extracted out to its own project. This project is maintained by
|
||||
Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping
|
||||
engine for OpenType fonts - "HarfBuzz" is the Persian
|
||||
for "open type".
|
||||
|
|
Loading…
Reference in New Issue