Docs: update Usermanual-What Is HarfBuzz.

This commit is contained in:
Nathan Willis 2018-09-28 16:07:37 -05:00 committed by Khaled Hosny
parent 0af3d176a6
commit d9fd927210
1 changed files with 130 additions and 69 deletions

View File

@ -1,112 +1,173 @@
<chapter id="what-is-harfbuzz"> <chapter id="what-is-harfbuzz">
<title>What is HarfBuzz?</title> <title>What is HarfBuzz?</title>
<para> <para>
HarfBuzz is a <emphasis>text shaping engine</emphasis>. It solves HarfBuzz is a <emphasis>text shaping engine</emphasis>. If you
the problem of selecting and positioning glyphs from a font given a give HarfBuzz a font and a string containing a sequence of Unicode
Unicode string. codepoints, HarfBuzz selects and positions the corresponding
glyphs from the font, applying all of the necessary layout rules
and font features. HarfBuzz then returns the string to you in the
form that is correctly arranged for the language and writing
system.
</para> </para>
<section id="why-do-i-need-it">
<title>Why do I need it?</title>
<para> <para>
Text shaping is an integral part of preparing text for display. It HarfBuzz can properly shape all of the world's major writing
is a fairly low level operation; HarfBuzz is used directly by systems. It runs on virtually all operating systems and software
graphic rendering libraries such as Pango, and the layout engines platforms, and it supports all of the standard font formats in use
in Firefox, LibreOffice and Chromium. Unless you are today.
<emphasis>writing</emphasis> one of these layout engines yourself, </para>
you will probably not need to use HarfBuzz - normally higher level <section id="why-do-i-need-a-shaping-engine">
libraries will turn text into glyphs for you. <title>Why do I need a shaping engine?</title>
<para>
Text shaping is an integral part of preparing text for
display. Before a Unicode sequence can be rendered, the
codepoints in the sequence must be mapped to the glyphs
provided in the font, and the glyphs must be positioned
correctly relative to each other. For many of the scripts
supported in Unicode, these steps involve script-specific layout
rules.
</para>
<para>
Text shaping is a fairly low-level operation. HarfBuzz is
used directly by graphic rendering libraries such as Pango, as
well as by the layout engines in Firefox, LibreOffice, and
Chromium. Unless you are <emphasis>writing</emphasis> one of
these layout engines yourself, you will probably not need to use
HarfBuzz: normally, lower-level libraries will turn text into
glyphs for you.
</para> </para>
<para> <para>
However, if you <emphasis>are</emphasis> writing a layout engine However, if you <emphasis>are</emphasis> writing a layout engine
or graphics library yourself, you will need to perform text or graphics library yourself, you will need to perform text
shaping, and this is where HarfBuzz can help you. Here are some shaping, and this is where HarfBuzz can help you.
reasons why you need it: </para>
<para>
Here are some specific scenarios where a text-shaping engine
like HarfBuzz helps you:
</para> </para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para> <para>
OpenType fonts contain a set of glyphs, indexed by glyph ID. OpenType fonts contain a set of glyphs (that is, shapes
The glyph ID within the font does not necessarily relate to a to represent the letters, numbers, punctuation marks, and
Unicode codepoint. For instance, some fonts have the letter all other symbols), which are indexed by a <literal>glyph ID</literal>.
&quot;a&quot; as glyph ID 1. To pull the right glyph out of </para>
the font in order to display it, you need to consult a table <para>
within the font (the &quot;cmap&quot; table) which maps The glyph ID within the font does not necessarily correlate
Unicode codepoints to glyph IDs. Text shaping turns codepoints to a predictable Unicode codepoint. For instance, some fonts
into glyph IDs. have the letter &quot;a&quot; as glyph ID 1, but many others do
not. To pull the right glyph out of the font in order to
display &quot;a&quot;, you need to consult the table inside
the font (the <literal>cmap</literal> table) that maps Unicode
codepoints to glyph IDs. In other words, <emphasis>text shaping turns
codepoints into glyph IDs</emphasis>.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
Many OpenType fonts contain ligatures: combinations of Many OpenType fonts contain ligatures: combinations of
characters which are rendered together. For instance, it's characters that are rendered as a single unit. For instance,
common for the <literal>fi</literal> combination to appear in it is common for the <literal>fi</literal> letter
print as the single ligature &quot;&quot;. Whether you should combination to appear in print as the single ligature glyph
render text as <literal>fi</literal> or &quot;&quot; does not &quot;&quot;.
depend on the input text, but on the capabilities of the font </para>
and the level of ligature application you wish to perform. <para>
Text shaping involves querying the font's ligature tables and Whether you should render an &quot;f, i&quot; sequence
determining what substitutions should be made. as <literal>fi</literal> or as &quot;&quot; does not
depend on the input text. Rather, it depends on the whether
or not the font includes an &quot;&quot; glyph and on the
level of ligature application you wish to perform. The font
and the amount of ligature application used are under your
control. In other words, <emphasis>text shaping involves
querying the font's ligature tables and determining what
substitutions should be made</emphasis>.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
While ligatures like &quot;&quot; are typographic While ligatures like &quot;&quot; are optional typographic
refinements, some languages <emphasis>require</emphasis> such refinements, some languages <emphasis>require</emphasis> certain
substitutions to be made in order to display text correctly. substitutions to be made in order to display text correctly.
In Tamil, when the letter &quot;TTA&quot; (ட) letter is </para>
followed by &quot;U&quot; (உ), the combination should appear <para>
as the single glyph &quot;டு&quot;. The sequence of Unicode For example, in Tamil, when the letter &quot;TTA&quot; (ட)
characters &quot;டஉ&quot; needs to be rendered as a single letter is followed by &quot;U&quot; (உ), the pair
glyph from the font - text shaping chooses the correct glyph must be replaced by the single glyph &quot;டு&quot;. The
from the sequence of characters provided. sequence of Unicode characters &quot;டஉ&quot; needs to be
substituted with a single &quot;டு&quot; glyph from the
font.
</para>
<para>
But &quot;டு&quot; does not have a Unicode codepoint. To
find this glyph, you need to consult the table inside
the font (the <literal>GSUB</literal> table) that contains
substitution information. In other words, <emphasis>text shaping
chooses the correct glyph for a sequence of characters
provided</emphasis>.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
Similarly, each Arabic character has four different variants: Similarly, each Arabic character has four different variants
within a font, there will be glyphs for the initial, medial, corresponding to the different positions in might appear in
final, and isolated forms of each letter. Unicode only encodes within a sequence. Inside a font, there will be separate
one codepoint per character, and so a Unicode string will not glyphs for the initial, medial, final, and isolated forms of
tell you which glyph to use. Text shaping chooses the correct each letter, each at a different glyph ID.
form of the letter and returns the correct glyph from the font </para>
that you need to render. <para>
Unicode only assigns one codepoint per character, so a
Unicode string will not tell you which glyph variant to use
for each character. To decide, you need to analyze the whole
string and determine the appropriate glyph for each character
based on its position. In other words, <emphasis>text
shaping chooses the correct form of the letter by its
position and returns the correct glyph from the font</emphasis>.
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
Other languages have marks and accents which need to be Other languages involve marks and accents that need to be
rendered in certain positions around a base character. For rendered in specific positions relative a base character. For
instance, the Moldovan language has the Cyrillic letter instance, the Moldovan language includes the Cyrillic letter
&quot;zhe&quot; (ж) with a breve accent, like so: ӂ. Some &quot;zhe&quot; (ж) with a breve accent, like so: &quot;ӂ&quot;.
fonts will contain this character as an individual glyph, </para>
whereas other fonts will not contain a zhe-with-breve glyph <para>
but expect the rendering engine to form the character by Some fonts will provide this character as a single
overlaying the two glyphs ж and ˘. Where you should draw the zhe-with-breve glyph, but other fonts will not and, instead,
combining breve depends on the height of the preceding glyph. will expect the rendering engine to form the character by
Again, for Arabic, the correct positioning of vowel marks superimposing the separate &quot;ж&quot; and &quot;˘&quot;
depends on the height of the character on which you are glyphs.
placing the mark. Text shaping tells you whether you have a </para>
<para>
But exactly where you should draw the breve depends on the
height and width of the preceding zhe glyph. To find the
right position, you need to consult the table inside
the font (the <literal>GPOS</literal> table) that contains
positioning information.
In other words, <emphasis>text shaping tells you whether you have a
precomposed glyph within your font or if you need to compose a precomposed glyph within your font or if you need to compose a
glyph yourself out of combining marks, and if so, where to glyph yourself out of combining marks&mdash;and, if so, where to
position those marks. position those marks.</emphasis>
</para> </para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para> <para>
If this is something that you need to do, then you need a text If tasks like these are something that you need to do, then you need a text
shaping engine: you could use Uniscribe if you are using Windows; shaping engine. You could use Uniscribe if you are writing
you could use CoreText on OS X; or you could use HarfBuzz. In the Windows software; you could use CoreText on macOS; or you could
rest of this manual, we are going to assume that you are the use HarfBuzz.
implementor of a text layout engine. </para>
<para>
In the rest of this manual, we are going to assume that you are the
implementor of a text-layout engine.
</para> </para>
</section> </section>
<section id="why-is-it-called-harfbuzz"> <section id="why-is-it-called-harfbuzz">
<title>Why is it called HarfBuzz?</title> <title>Why is it called HarfBuzz?</title>
<para> <para>
HarfBuzz began its life as text shaping code within the FreeType HarfBuzz began its life as text-shaping code within the FreeType
project, (and you will see references to the FreeType authors project (and you will see references to the FreeType authors
within the source code copyright declarations) but was then within the source code copyright declarations), but was then
abstracted out to its own project. This project is maintained by extracted out to its own project. This project is maintained by
Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping
engine for OpenType fonts - &quot;HarfBuzz&quot; is the Persian engine for OpenType fonts - &quot;HarfBuzz&quot; is the Persian
for &quot;open type&quot;. for &quot;open type&quot;.