First two chapters. More to follow.
This commit is contained in:
parent
fdd1770e00
commit
f0807654da
|
@ -0,0 +1,115 @@
|
|||
<sect1 id="what-is-harfbuzz">
|
||||
<title>What is Harfbuzz?</title>
|
||||
<para>
|
||||
Harfbuzz is a <emphasis>text shaping engine</emphasis>. It solves
|
||||
the problem of selecting and positioning glyphs from a font given a
|
||||
Unicode string.
|
||||
</para>
|
||||
<sect2 id="why-do-i-need-it">
|
||||
<title>Why do I need it?</title>
|
||||
<para>
|
||||
Text shaping is an integral part of preparing text for display. It
|
||||
is a fairly low level operation; Harfbuzz is used directly by
|
||||
graphic rendering libraries such as Pango, and the layout engines
|
||||
in Firefox, LibreOffice and Chromium. Unless you are
|
||||
<emphasis>writing</emphasis> one of these layout engines yourself,
|
||||
you will probably not need to use Harfbuzz - normally higher level
|
||||
libraries will turn text into glyphs for you.
|
||||
</para>
|
||||
<para>
|
||||
However, if you <emphasis>are</emphasis> writing a layout engine
|
||||
or graphics library yourself, you will need to perform text
|
||||
shaping, and this is where Harfbuzz can help you. Here are some
|
||||
reasons why you need it:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
OpenType fonts contain a set of glyphs, indexed by glyph ID.
|
||||
The glyph ID within the font does not necessarily relate to a
|
||||
Unicode codepoint. For instance, some fonts have the letter
|
||||
"a" as glyph ID 1. To pull the right glyph out of
|
||||
the font in order to display it, you need to consult a table
|
||||
within the font (the "cmap" table) which maps
|
||||
Unicode codepoints to glyph IDs. Text shaping turns codepoints
|
||||
into glyph IDs.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Many OpenType fonts contain ligatures: combinations of
|
||||
characters which are rendered together. For instance, it's
|
||||
common for the <literal>fi</literal> combination to appear in
|
||||
print as the single ligature "fi". Whether you should
|
||||
render text as <literal>fi</literal> or "fi" does not
|
||||
depend on the input text, but on the capabilities of the font
|
||||
and the level of ligature application you wish to perform.
|
||||
Text shaping involves querying the font's ligature tables and
|
||||
determining what substitutions should be made.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
While ligatures like "fi" are typographic
|
||||
refinements, some languages <emphasis>require</emphasis> such
|
||||
substitutions to be made in order to display text correctly.
|
||||
In Tamil, when the letter "TTA" (ட) letter is
|
||||
followed by "U" (உ), the combination should appear
|
||||
as the single glyph "டு". The sequence of Unicode
|
||||
characters "டஉ" needs to be rendered as a single
|
||||
glyph from the font - text shaping chooses the correct glyph
|
||||
from the sequence of characters provided.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Similarly, each Arabic character has four different variants:
|
||||
within a font, there will be glyphs for the initial, medial,
|
||||
final, and isolated forms of each letter. Unicode only encodes
|
||||
one codepoint per character, and so a Unicode string will not
|
||||
tell you which glyph to use. Text shaping chooses the correct
|
||||
form of the letter and returns the correct glyph from the font
|
||||
that you need to render.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Other languages have marks and accents which need to be
|
||||
rendered in certain positions around a base character. For
|
||||
instance, the Moldovan language has the Cyrillic letter
|
||||
"zhe" (ж) with a breve accent, like so: ӂ. Some
|
||||
fonts will contain this character as an individual glyph,
|
||||
whereas other fonts will not contain a zhe-with-breve glyph
|
||||
but expect the rendering engine to form the character by
|
||||
overlaying the two glyphs ж and ˘. Where you should draw the
|
||||
combining breve depends on the height of the preceding glyph.
|
||||
Again, for Arabic, the correct positioning of vowel marks
|
||||
depends on the height of the character on which you are
|
||||
placing the mark. Text shaping tells you whether you have a
|
||||
precomposed glyph within your font or if you need to compose a
|
||||
glyph yourself out of combining marks, and if so, where to
|
||||
position those marks.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>
|
||||
If this is something that you need to do, then you need a text
|
||||
shaping engine: you could use Uniscribe if you are using Windows;
|
||||
you could use CoreText on OS X; or you could use Harfbuzz. In the
|
||||
rest of this manual, we are going to assume that you are the
|
||||
implementor of a text layout engine.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2 id="why-is-it-called-harfbuzz">
|
||||
<title>Why is it called Harfbuzz?</title>
|
||||
<para>
|
||||
Harfbuzz began its life as text shaping code within the FreeType
|
||||
project, (and you will see references to the FreeType authors
|
||||
within the source code copyright declarations) but was then
|
||||
abstracted out to its own project. This project is maintained by
|
||||
Behdad Esfahbod, and named Harfbuzz. Originally, it was a shaping
|
||||
engine for OpenType fonts - "Harfbuzz" is the Persian
|
||||
for "open type".
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
|
@ -0,0 +1,182 @@
|
|||
<sect1 id="hello-harfbuzz">
|
||||
<title>Hello, Harfbuzz</title>
|
||||
<para>
|
||||
Here's the simplest Harfbuzz that can possibly work. We will improve
|
||||
it later.
|
||||
</para>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem>
|
||||
<para>
|
||||
Create a buffer and put your text in it.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting language="C">
|
||||
#include <hb.h>
|
||||
hb_buffer_t *buf;
|
||||
buf = hb_buffer_create();
|
||||
hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
|
||||
</programlisting>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem override="2">
|
||||
<para>
|
||||
Guess the script, language and direction of the buffer.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting language="C">
|
||||
hb_buffer_guess_segment_properties(buf);
|
||||
</programlisting>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem override="3">
|
||||
<para>
|
||||
Create a face and a font, using FreeType for now.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting language="C">
|
||||
#include <hb-ft.h>
|
||||
FT_New_Face(ft_library, font_path, index, &face)
|
||||
hb_font_t *font = hb_ft_font_create(face);
|
||||
</programlisting>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem override="4">
|
||||
<para>
|
||||
Shape!
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting>
|
||||
hb_shape(font, buf, NULL, 0);
|
||||
</programlisting>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem override="5">
|
||||
<para>
|
||||
Get the glyph and position information.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting language="C">
|
||||
hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count);
|
||||
hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count);
|
||||
</programlisting>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem override="6">
|
||||
<para>
|
||||
Iterate over each glyph.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting language="C">
|
||||
for (i = 0; i < glyph_count; ++i) {
|
||||
glyphid = glyph_info[i].codepoint;
|
||||
x_offset = glyph_pos[i].x_offset / 64.0;
|
||||
y_offset = glyph_pos[i].y_offset / 64.0;
|
||||
x_advance = glyph_pos[i].x_advance / 64.0;
|
||||
y_advance = glyph_pos[i].y_advance / 64.0;
|
||||
draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset);
|
||||
cursor_x += x_advance;
|
||||
cursor_y += y_advance;
|
||||
}
|
||||
</programlisting>
|
||||
<orderedlist numeration="arabic">
|
||||
<listitem override="7">
|
||||
<para>
|
||||
Tidy up.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<programlisting language="C">
|
||||
hb_buffer_destroy(buf);
|
||||
hb_font_destroy(hb_ft_font);
|
||||
</programlisting>
|
||||
<sect2 id="what-harfbuzz-doesnt-do">
|
||||
<title>What Harfbuzz doesn't do</title>
|
||||
<para>
|
||||
The code above will take a UTF8 string, shape it, and give you the
|
||||
information required to lay it out correctly on a single
|
||||
horizontal (or vertical) line using the font provided. That is the
|
||||
extent of Harfbuzz's responsibility.
|
||||
</para>
|
||||
<para>
|
||||
If you are implementing a text layout engine you may have other
|
||||
responsibilities, that Harfbuzz will not help you with:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Harfbuzz won't help you with bidirectionality. If you want to
|
||||
lay out text with mixed Hebrew and English, you will need to
|
||||
ensure that the buffer provided to Harfbuzz has those
|
||||
characters in the correct layout order. This will be different
|
||||
from the logical order in which the Unicode text is stored. In
|
||||
other words, the user will hit the keys in the following
|
||||
sequence:
|
||||
</para>
|
||||
<programlisting>
|
||||
A B C [space] ג ב א [space] D E F
|
||||
</programlisting>
|
||||
<para>
|
||||
but will expect to see in the output:
|
||||
</para>
|
||||
<programlisting>
|
||||
ABC אבג DEF
|
||||
</programlisting>
|
||||
<para>
|
||||
This reordering is called <emphasis>bidi processing</emphasis>
|
||||
("bidi" is short for bidirectional), and there's an
|
||||
algorithm as an annex to the Unicode Standard which tells you how
|
||||
to reorder a string from logical order into presentation order.
|
||||
Before sending your string to Harfbuzz, you may need to apply the
|
||||
bidi algorithm to it. Libraries such as ICU and fribidi can do
|
||||
this for you.
|
||||
</para>
|
||||
<listitem>
|
||||
<para>
|
||||
Harfbuzz won't help you with text that contains different font
|
||||
properties. For instance, if you have the string "a
|
||||
<emphasis>huge</emphasis> breakfast", and you expect
|
||||
"huge" to be italic, you will need to send three
|
||||
strings to Harfbuzz: <literal>a</literal>, in your Roman font;
|
||||
<literal>huge</literal> using your italic font; and
|
||||
<literal>breakfast</literal> using your Roman font again.
|
||||
Similarly if you change font, font size, script, language or
|
||||
direction within your string, you will need to shape each run
|
||||
independently and then output them independently. Harfbuzz
|
||||
expects to shape a run of characters sharing the same
|
||||
properties.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Harfbuzz won't help you with line breaking, hyphenation or
|
||||
justification. As mentioned above, it lays out the string
|
||||
along a <emphasis>single line</emphasis> of, notionally,
|
||||
infinite length. If you want to find out where the potential
|
||||
word, sentence and line break points are in your text, you
|
||||
could use the ICU library's break iterator functions.
|
||||
</para>
|
||||
<para>
|
||||
Harfbuzz can tell you how wide a shaped piece of text is, which is
|
||||
useful input to a justification algorithm, but it knows nothing
|
||||
about paragraphs, lines or line lengths. Nor will it adjust the
|
||||
space between words to fit them proportionally into a line. If you
|
||||
want to layout text in paragraphs, you will probably want to send
|
||||
each word of your text to Harfbuzz to determine its shaped width
|
||||
after glyph substitutions, then work out how many words will fit
|
||||
on a line, and then finally output each word of the line separated
|
||||
by a space of the correct size to fully justify the paragraph.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>
|
||||
As a layout engine implementor, Harfbuzz will help you with the
|
||||
interface between your text and your font, and that's something
|
||||
that you'll need - what you then do with the glyphs that your font
|
||||
returns is up to you. The example we saw above enough to get us
|
||||
started using Harfbuzz. Now we are going to use the remainder of
|
||||
Harfbuzz's API to refine that example and improve our text shaping
|
||||
capabilities.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
Loading…
Reference in New Issue