Merge pull request #1691 from n8willis/usermanual-shaping

Usermanual: Add new chapters.
This commit is contained in:
n8willis 2019-05-25 16:05:07 +01:00 committed by GitHub
commit e7ed85de95
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 1322 additions and 38 deletions

View File

@ -79,9 +79,9 @@ content_files= \
usermanual-object-model.xml \ usermanual-object-model.xml \
usermanual-buffers-language-script-and-direction.xml \ usermanual-buffers-language-script-and-direction.xml \
usermanual-fonts-and-faces.xml \ usermanual-fonts-and-faces.xml \
usermanual-clusters.xml \
usermanual-opentype-features.xml \ usermanual-opentype-features.xml \
usermanual-glyph-information.xml \ usermanual-clusters.xml \
usermanual-utilities.xml \
version.xml version.xml
# SGML files where gtk-doc abbrevations (#GtkWidget) are expanded # SGML files where gtk-doc abbrevations (#GtkWidget) are expanded

View File

@ -36,9 +36,9 @@
<xi:include href="usermanual-object-model.xml"/> <xi:include href="usermanual-object-model.xml"/>
<xi:include href="usermanual-buffers-language-script-and-direction.xml"/> <xi:include href="usermanual-buffers-language-script-and-direction.xml"/>
<xi:include href="usermanual-fonts-and-faces.xml"/> <xi:include href="usermanual-fonts-and-faces.xml"/>
<xi:include href="usermanual-clusters.xml"/>
<xi:include href="usermanual-opentype-features.xml"/> <xi:include href="usermanual-opentype-features.xml"/>
<xi:include href="usermanual-glyph-information.xml"/> <xi:include href="usermanual-clusters.xml"/>
<xi:include href="usermanual-utilities.xml"/>
</part> </part>
<part> <part>

View File

@ -7,29 +7,37 @@
<chapter id="buffers-language-script-and-direction"> <chapter id="buffers-language-script-and-direction">
<title>Buffers, language, script and direction</title> <title>Buffers, language, script and direction</title>
<para> <para>
The input to HarfBuzz is a series of Unicode characters, stored in a The input to the HarfBuzz shaper is a series of Unicode characters, stored in a
buffer. In this chapter, we'll look at how to set up a buffer with buffer. In this chapter, we'll look at how to set up a buffer with
the text that we want and then customize the properties of the the text that we want and how to customize the properties of the
buffer. buffer. We'll also look at a piece of lower-level machinery that
you will need to understand before proceeding: the functions that
HarfBuzz uses to retrieve Unicode information.
</para>
<para>
After shaping is complete, HarfBuzz puts its output back
into the buffer. But getting that output requires setting up a
face and a font first, so we will look at that in the next chapter
instead of here.
</para> </para>
<section id="creating-and-destroying-buffers"> <section id="creating-and-destroying-buffers">
<title>Creating and destroying buffers</title> <title>Creating and destroying buffers</title>
<para> <para>
As we saw in our <emphasis>Getting Started</emphasis> example, a As we saw in our <emphasis>Getting Started</emphasis> example, a
buffer is created and buffer is created and
initialized with <literal>hb_buffer_create()</literal>. This initialized with <function>hb_buffer_create()</function>. This
produces a new, empty buffer object, instantiated with some produces a new, empty buffer object, instantiated with some
default values and ready to accept your Unicode strings. default values and ready to accept your Unicode strings.
</para> </para>
<para> <para>
HarfBuzz manages the memory of objects (such as buffers) that it HarfBuzz manages the memory of objects (such as buffers) that it
creates, so you don't have to. When you have finished working on creates, so you don't have to. When you have finished working on
a buffer, you can call <literal>hb_buffer_destroy()</literal>: a buffer, you can call <function>hb_buffer_destroy()</function>:
</para> </para>
<programlisting language="C"> <programlisting language="C">
hb_buffer_t *buffer = hb_buffer_create(); hb_buffer_t *buf = hb_buffer_create();
... ...
hb_buffer_destroy(buffer); hb_buffer_destroy(buf);
</programlisting> </programlisting>
<para> <para>
This will destroy the object and free its associated memory - This will destroy the object and free its associated memory -
@ -39,46 +47,364 @@
else destroying it, you should increase its reference count: else destroying it, you should increase its reference count:
</para> </para>
<programlisting language="C"> <programlisting language="C">
void somefunc(hb_buffer_t *buffer) { void somefunc(hb_buffer_t *buf) {
buffer = hb_buffer_reference(buffer); buf = hb_buffer_reference(buf);
... ...
</programlisting> </programlisting>
<para> <para>
And then decrease it once you're done with it: And then decrease it once you're done with it:
</para> </para>
<programlisting language="C"> <programlisting language="C">
hb_buffer_destroy(buffer); hb_buffer_destroy(buf);
} }
</programlisting> </programlisting>
<para>
While we are on the subject of reference-counting buffers, it is
worth noting that an individual buffer can only meaningfully be
used by one thread at a time.
</para>
<para> <para>
To throw away all the data in your buffer and start from scratch, To throw away all the data in your buffer and start from scratch,
call <literal>hb_buffer_reset(buffer)</literal>. If you want to call <function>hb_buffer_reset(buf)</function>. If you want to
throw away the string in the buffer but keep the options, you can throw away the string in the buffer but keep the options, you can
instead call <literal>hb_buffer_clear_contents(buffer)</literal>. instead call <function>hb_buffer_clear_contents(buf)</function>.
</para> </para>
</section> </section>
<section id="adding-text-to-the-buffer"> <section id="adding-text-to-the-buffer">
<title>Adding text to the buffer</title> <title>Adding text to the buffer</title>
<para> <para>
Now we have a brand new HarfBuzz buffer. Let's start filling it Now we have a brand new HarfBuzz buffer. Let's start filling it
with text! From HarfBuzz's perspective, a buffer is just a stream with text! From HarfBuzz's perspective, a buffer is just a stream
of Unicode code points, but your input string is probably in one of of Unicode code points, but your input string is probably in one of
the standard Unicode character encodings (UTF-8, UTF-16, UTF-32) the standard Unicode character encodings (UTF-8, UTF-16, or
UTF-32). HarfBuzz provides convenience functions that accept
each of these encodings:
<function>hb_buffer_add_utf8()</function>,
<function>hb_buffer_add_utf16()</function>, and
<function>hb_buffer_add_utf32()</function>. Other than the
character encoding they accept, they function identically.
</para> </para>
<para>
You can add UTF-8 text to a buffer by passing in the text array,
the array's length, an offset into the array for the first
character to add, and the length of the segment to add:
</para>
<programlisting language="C">
hb_buffer_add_utf8 (hb_buffer_t *buf,
const char *text,
int text_length,
unsigned int item_offset,
int item_length)
</programlisting>
<para>
So, in practice, you can say:
</para>
<programlisting language="C">
hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
</programlisting>
<para>
This will append your new characters to
<parameter>buf</parameter>, not replace its existing
contents. Also, note that you can use <literal>-1</literal> in
place of the first instance of <function>strlen(text)</function>
if your text array is NULL-terminated. Similarly, you can also use
<literal>-1</literal> as the final argument want to add its full
contents.
</para>
<para>
Whatever start <parameter>item_offset</parameter> and
<parameter>item_length</parameter> you provide, HarfBuzz will also
attempt to grab the five characters <emphasis>before</emphasis>
the offset point and the five characters
<emphasis>after</emphasis> the designated end. These are the
before and after "context" segments, which are used internally
for HarfBuzz to make shaping decisions. They will not be part of
the final output, but they ensure that HarfBuzz's
script-specific shaping operations are correct. If there are
fewer than five characters available for the before or after
contexts, HarfBuzz will just grab what is there.
</para>
<para>
For longer text runs, such as full paragraphs, it might be
tempting to only add smaller sub-segments to a buffer and
shape them in piecemeal fashion. Generally, this is not a good
idea, however, because a lot of shaping decisions are
dependent on this context information. For example, in Arabic
and other connected scripts, HarfBuzz needs to know the code
points before and after each character in order to correctly
determine which glyph to return.
</para>
<para>
The safest approach is to add all of the text available, then
use <parameter>item_offset</parameter> and
<parameter>item_length</parameter> to indicate which characters you
want shaped, so that HarfBuzz has access to any context.
</para>
<para>
You can also add Unicode code points directly with
<function>hb_buffer_add_codepoints()</function>. The arguments
to this function are the same as those for the UTF
encodings. But it is particularly important to note that
HarfBuzz does not do validity checking on the text that is added
to a buffer. Invalid code points will be replaced, but it is up
to you to do any deep-sanity checking necessary.
</para>
</section> </section>
<section id="setting-buffer-properties"> <section id="setting-buffer-properties">
<title>Setting buffer properties</title> <title>Setting buffer properties</title>
<para> <para>
Buffers containing input characters still need several
properties set before HarfBuzz can shape their text correctly.
</para> </para>
</section>
<section id="what-about-the-other-scripts">
<title>What about the other scripts?</title>
<para> <para>
Initially, all buffers are set to the
<literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content
type. After adding text, the buffer should be set to
<literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which
indicates that it contains un-shaped input
characters. After shaping, the buffer will have the
<literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.
</para>
<para>
<function>hb_buffer_add_utf8()</function> and the
other UTF functions set the content type of their buffer
automatically. But if you are reusing a buffer you may want to
check its state with
<function>hb_buffer_get_content_type(buffer)</function>. If
necessary you can set the content type with
</para>
<programlisting language="C">
hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
</programlisting>
<para>
to prepare for shaping.
</para>
<para>
Buffers also need to carry information about the script,
language, and text direction of their contents. You can set
these properties individually:
</para>
<programlisting language="C">
hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
hb_buffer_set_language(buf, hb_language_from_string("en", -1));
</programlisting>
<para>
However, since these properties are often the repeated for
multiple text runs, you can also save them in a
<literal>hb_segment_properties_t</literal> for reuse:
</para>
<programlisting language="C">
hb_segment_properties_t *savedprops;
hb_buffer_get_segment_properties (buf, savedprops);
...
hb_buffer_set_segment_properties (buf2, savedprops);
</programlisting>
<para>
HarfBuzz also provides getter functions to retrieve a buffer's
direction, script, and language properties individually.
</para>
<para>
HarfBuzz recognizes four text directions in
<type>hb_direction_t</type>: left-to-right
(<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),
top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and
bottom-to-top (<literal>HB_DIRECTION_BTT</literal>). For the
script property, HarfBuzz uses identifiers based on the
<ulink
url="https://unicode.org/iso15924/">ISO 15924
standard</ulink>. For languages, HarfBuzz uses tags based on the
<ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.
</para>
<para>
Helper functions are provided to convert character strings into
the necessary script and language tag types.
</para>
<para>
Two additional buffer properties to be aware of are the
"invisible glyph" and the replacement code point. The
replacement code point is inserted into buffer output in place of
any invalid code points encountered in the input. By default, it
is the Unicode <literal>REPLACEMENT CHARACTER</literal> code
point, <literal>U+FFFD</literal> "&#xFFFD;". You can change this with
</para>
<programlisting language="C">
hb_buffer_set_replacement_codepoint(buf, replacement);
</programlisting>
<para>
passing in the replacement Unicode code point as the
<parameter>replacement</parameter> parameter.
</para>
<para>
The invisible glyph is used to replace all output glyphs that
are invisible. By default, the standard space character
<literal>U+0020</literal> is used; you can replace this (for
example, when using a font that provides script-specific
spaces) with
</para>
<programlisting language="C">
hb_buffer_set_invisible_glyph(buf, replacement_glyph);
</programlisting>
<para>
Do note that in the <parameter>replacement_glyph</parameter>
parameter, you must provide the glyph ID of the replacement you
wish to use, not the Unicode code point.
</para>
<para>
HarfBuzz supports a few additional flags you might want to set
on your buffer under certain circumstances. The
<literal>HB_BUFFER_FLAG_BOT</literal> and
<literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz
that the buffer represents the beginning or end (respectively)
of a text element (such as a paragraph or other block). Knowing
this allows HarfBuzz to apply certain contextual font features
when shaping, such as initial or final variants in connected
scripts.
</para>
<para>
<literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal>
tells HarfBuzz not to hide glyphs with the
<literal>Default_Ignorable</literal> property in Unicode. This
property designates control characters and other non-printing
code points, such as joiners and variation selectors. Normally
HarfBuzz replaces them in the output buffer with zero-width
space glyphs (using the "invisible glyph" property discussed
above); setting this flag causes them to be printed, which can
be helpful for troubleshooting.
</para>
<para>
Conversely, setting the
<literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag
tells HarfBuzz to remove <literal>Default_Ignorable</literal>
glyphs from the output buffer entirely. Finally, setting the
<literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal>
flag tells HarfBuzz not to insert the dotted-circle glyph
(<literal>U+25CC</literal>, "&#x25CC;"), which is normally
inserted into buffer output when broken character sequences are
encountered (such as combining marks that are not attached to a
base character).
</para> </para>
</section> </section>
<section id="customizing-unicode-functions"> <section id="customizing-unicode-functions">
<title>Customizing Unicode functions</title> <title>Customizing Unicode functions</title>
<para> <para>
HarfBuzz requires some simple functions for accessing
information from the Unicode Character Database (such as the
<literal>General_Category</literal> (gc) and
<literal>Script</literal> (sc) properties) that is useful
for shaping, as well as some useful operations like composing and
decomposing code points.
</para>
<para>
HarfBuzz includes its own internal, lightweight set of Unicode
functions. At build time, it is also possible to compile support
for some other options, such as the Unicode functions provided
by GLib or the International Components for Unicode (ICU)
library. Generally, this option is only of interest for client
programs that have specific integration requirements or that do
a significant amount of customization.
</para>
<para>
If your program has access to other Unicode functions, however,
such as through a system library or application framework, you
might prefer to use those instead of the built-in
options. HarfBuzz supports this by implementing its Unicode
functions as a set of virtual methods that you can replace —
without otherwise affecting HarfBuzz's functionality.
</para>
<para>
The Unicode functions are specified in a structure called
<literal>unicode_funcs</literal> which is attached to each
buffer. But even though <literal>unicode_funcs</literal> is
associated with a <type>hb_buffer_t</type>, the functions
themselves are called by other HarfBuzz APIs that access
buffers, so it would be unwise for you to hook different
functions into different buffers.
</para>
<para>
In addition, you can mark your <literal>unicode_funcs</literal>
as immutable by calling
<function>hb_unicode_funcs_make_immutable (ufuncs)</function>.
This is especially useful if your code is a
library or framework that will have its own client programs. By
marking your Unicode function choices as immutable, you prevent
your own client programs from changing the
<literal>unicode_funcs</literal> configuration and introducing
inconsistencies and errors downstream.
</para>
<para>
You can retrieve the Unicode-functions configuration for
your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:
</para>
<programlisting language="C">
hb_unicode_funcs_t *ufunctions;
ufunctions = hb_buffer_get_unicode_funcs(buf);
</programlisting>
<para>
The current version of <literal>unicode_funcs</literal> uses six functions:
</para>
<itemizedlist>
<listitem>
<para>
<function>hb_unicode_combining_class_func_t</function>:
returns the Canonical Combining Class of a code point.
</para>
</listitem>
<listitem>
<para>
<function>hb_unicode_general_category_func_t</function>:
returns the General Category (gc) of a code point.
</para>
</listitem>
<listitem>
<para>
<function>hb_unicode_mirroring_func_t</function>: returns
the Mirroring Glyph code point (for bi-directional
replacement) of a code point.
</para>
</listitem>
<listitem>
<para>
<function>hb_unicode_script_func_t</function>: returns the
Script (sc) property of a code point.
</para>
</listitem>
<listitem>
<para>
<function>hb_unicode_compose_func_t</function>: returns the
canonical composition of a sequence of two code points.
</para>
</listitem>
<listitem>
<para>
<function>hb_unicode_decompose_func_t</function>: returns
the canonical decomposition of a code point.
</para>
</listitem>
</itemizedlist>
<para>
Note, however, that future HarfBuzz releases may alter this set.
</para>
<para>
Each Unicode function has a corresponding setter, with which you
can assign a callback to your replacement function. For example,
to replace
<function>hb_unicode_general_category_func_t</function>, you can call
</para>
<programlisting language="C">
hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy)
</programlisting>
<para>
Virtualizing this set of Unicode functions is primarily intended
to improve portability. There is no need for every client
program to make the effort to replace the default options, so if
you are unsure, do not feel any pressure to customize
<literal>unicode_funcs</literal>.
</para> </para>
</section> </section>
</chapter> </chapter>

View File

@ -5,20 +5,449 @@
<!ENTITY version SYSTEM "version.xml"> <!ENTITY version SYSTEM "version.xml">
]> ]>
<chapter id="fonts-and-faces"> <chapter id="fonts-and-faces">
<title>Fonts and faces</title> <title>Fonts, faces, and output</title>
<section id="using-freetype"> <para>
In the previous chapter, we saw how to set up a buffer and fill
it with text as Unicode code points. In order to shape this
buffer text with HarfBuzz, you will need also need a font
object.
</para>
<para>
HarfBuzz provides abstractions to help you cache and reuse the
heavier parts of working with binary fonts, so we will look at
how to do that. We will also look at how to work with the
FreeType font-rendering library and at how you can customize
HarfBuzz to work with other libraries.
</para>
<para>
Finally, we will look at how to work with OpenType variable
fonts, the latest update to the OpenType font format, and at
some other recent additions to OpenType.
</para>
<section id="fonts-and-faces-objects">
<title>Font and face objects</title>
<para>
The outcome of shaping a run of text depends on the contents of
a specific font file (such as the substitutions and positioning
moves in the 'GSUB' and 'GPOS' tables), so HarfBuzz makes
accessing those internals fast.
</para>
<para>
An <type>hb_face_t</type> represents a <emphasis>face</emphasis>
in HarfBuzz. This data type is a wrapper around an
<type>hb_blob_t</type> blob that holds the contents of a binary
fotn file. Since HarfBuzz supports TrueType Collections and
OpenType Collections (each of which can include multiple
typefaces), a HarfBuzz face also requires an index number
specifying which typeface in the file you want to use. Most of
the font files you will encounter in the wild include just a
single face, however, so most of the time you would pass in
<literal>0</literal> as the index when you create a face:
</para>
<programlisting language="C">
hb_blob_t* blob = hb_blob_create_from_file(file);
...
hb_face_t* face = hb_face_create(blob, 0);
</programlisting>
<para>
On its own, a face object is not quite ready to use for
shaping. The typeface must be set to a specific point size in
order for some details (such as hinting) to work. In addition,
if the font file in question is an OpenType Variable Font, then
you may need to specify one or variation-axis settings (or a
named instance) in order to get the output you need.
</para>
<para>
In HarfBuzz, you do this by creating a <emphasis>font</emphasis>
object from your face.
</para>
<para>
Font objects also have the advantage of being considerably
lighter-weight than face objects (remember that a face contains
the contents of a binary font file mapped into memory). As a
result, you can cache and reuse a font object, but you could
also create a new one for each additional size you needed.
Creating new fonts incurs some additional overhead, of course,
but whether or not it is excessive is your call in the end. In
contrast, face objects are substantially larger, and you really
should cache them and reuse them whenever possible.
</para>
<para>
You can create a font object from a face object:
</para>
<programlisting language="C">
hb_font_t* hb_font = hb_font_create(hb_face);
</programlisting>
<para>
After creating a font, there are a few properties you should
set. Many fonts enable and disable hints based on the size it
is used at, so setting this is important for font
objects. <function>hb_font_set_ppem(font, x_ppem,
y_ppem)</function> sets the pixels-per-EM value of the font. You
can also set the point size of the font with
<function>hb_font_set_ppem(font, ptem)</function>. HarfBuzz uses the
industry standard 72 points per inch.
</para>
<para>
HarfBuzz lets you specify the degree subpixel precision you want
through a scaling factor. You can set horizontal and
vertical scaling factors on the
font by calling <function>hb_font_set_scale(font, x_scale,
y_scale)</function>.
</para>
<para>
There may be times when you are handed a font object and need to
access the face object that it comes from. For that, you can call
</para>
<programlisting language="C">
hb_face = hb_font_get_face(hb_font);
</programlisting>
<para>
You can also create a font object from an existing font object
using the <function>hb_font_create_sub_font()</function>
function. This creates a child font object that is initiated
with the same attributes as its parent; it can be used to
quickly set up a new font for the purpose of overriding a specific
font-functions method.
</para>
<para>
All face objects and font objects are lifecycle-managed by
HarfBuzz. After creating a face, you increase its reference
count with <function>hb_face_reference(face)</function> and
decrease it with
<function>hb_face_destroy(face)</function>. Likewise, you
increase the reference count on a font with
<function>hb_font_reference(font)</function> and decrease it
with <function>hb_font_destroy(font)</function>.
</para>
<para>
You can also attach user data to face objects and font objects.
</para>
</section>
<section id="fonts-and-faces-custom-functions">
<title>Customizing font functions</title>
<para>
During shaping, HarfBuzz frequently needs to query font objects
to get at the contents and parameters of the glyphs in a font
file. It includes a built-in set of functions that is tailored
to working with OpenType fonts. However, as was the case with
Unicode functions in the buffers chapter, HarfBuzz also wants to
make it easy for you to assign a substitute set of font
functions if you are developing a program to work with a library
or platform that provides its own font functions.
</para>
<para>
Therefore, the HarfBuzz API defines a set of virtual
methods for accessing font-object properties, and you can
replace the defaults with your own selections without
interfering with the shaping process. Each font object in
HarfBuzz includes a structure called
<literal>font_funcs</literal> that serves as a vtable for the
font object. The virtual methods in
<literal>font_funcs</literal> are:
</para>
<itemizedlist>
<listitem>
<para>
<function>hb_font_get_font_h_extents_func_t</function>: returns
the extents of the font for horizontal text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_font_v_extents_func_t</function>: returns
the extents of the font for vertical text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_nominal_glyph_func_t</function>: returns
the font's nominal glyph for a given code point.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_variation_glyph_func_t</function>: returns
the font's glyph for a given code point when it is followed by a
given Variation Selector.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_nominal_glyphs_func_t</function>: returns
the font's nominal glyphs for a series of code points.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_advance_func_t</function>: returns
the advance for a glyph.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_h_advance_func_t</function>: returns
the advance for a glyph for horizontal text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_v_advance_func_t</function>:returns
the advance for a glyph for vertical text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_advances_func_t</function>: returns
the advances for a series of glyphs.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_h_advances_func_t</function>: returns
the advances for a series of glyphs for horizontal text .
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_v_advances_func_t</function>: returns
the advances for a series of glyphs for vertical text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_origin_func_t</function>: returns
the origin coordinates of a glyph.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_h_origin_func_t</function>: returns
the origin coordinates of a glyph for horizontal text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_v_origin_func_t</function>: returns
the origin coordinates of a glyph for vertical text.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_extents_func_t</function>: returns
the extents for a glyph.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_contour_point_func_t</function>:
returns the coordinates of a specific contour point from a glyph.
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_name_func_t</function>: returns the
name of a glyph (from its glyph index).
</para>
</listitem>
<listitem>
<para>
<function>hb_font_get_glyph_from_name_func_t</function>: returns
the glyph index that corresponds to a given glyph name.
</para>
</listitem>
</itemizedlist>
<para>
You can fetch the font-functions configuration for a font object
by calling <function>hb_font_get_font_funcs()</function>:
</para>
<programlisting language="C">
hb_font_funcs_t *ffunctions;
ffunctions = hb_font_get_font_funcs (font);
</programlisting>
<para>
The individual methods can each be replaced with their own setter
function, such as
<function>hb_font_funcs_set_nominal_glyph_func(*ffunctions,
func, *user_data, destroy)</function>.
</para>
<para>
Font-functions structures can be reused for multiple font
objects, and can be reference counted with
<function>hb_font_funcs_reference()</function> and
<function>hb_font_funcs_destroy()</function>. Just like other
objects in HarfBuzz, you can set user-data for each
font-functions structure and assign a destroy callback for
it.
</para>
<para>
You can also mark a font-functions structure as immutable,
with <function>hb_font_funcs_make_immutable()</function>. This
is especially useful if your code is a library or framework that
will have its own client programs. By marking your
font-functions structures as immutable, you prevent your client
programs from changing the configuration and introducing
inconsistencies and errors downstream.
</para>
</section>
<section id="fonts-and-faces-native-opentype">
<title>Font objects and HarfBuzz's native OpenType implementation</title>
<para>
By default, whenever HarfBuzz creates a font object, it will
configure the font to use a built-in set of font functions that
supports contemporary OpenType font internals. If you want to
work with OpenType or TrueType fonts, you should be able to use
these functions without difficulty.
</para>
<para>
Many of the methods in the font-functions structure deal with
the fundamental properties of glyphs that are required for
shaping text: extents (the maximums and minimums on each axis),
origins (the <literal>(0,0)</literal> coordinate point which
glyphs are drawn in reference to), and advances (the amount that
the cursor needs to be moved after drawing each glyph, including
any empty space for the glyph's side bearings).
</para>
<para>
As you can see in the list of functions, there are separate "horizontal"
and "vertical" variants depending on whether the text is set in
the horizontal or vertical direction. For some scripts, fonts
that are designed to support text set horizontally or vertically (for
example, in Japanese) may include metrics for both text
directions. When fonts don't include this information, HarfBuzz
does its best to transform what the font provides.
</para>
<para>
In addition to the direction-specific functions, HarfBuzz
provides some higher-level functions for fetching information
like extents and advances for a glyph. If you call
</para>
<programlisting language="C">
hb_font_get_glyph_advance_for_direction(font, direction, extents);
</programlisting>
<para>
then you can provide any <type>hb_direction_t</type> as the
<parameter>direction</parameter> parameter, and HarfBuzz will
use the correct function variant for the text direction. There
are similar higher-level versions of the functions for fetching
extents, origin coordinates, and contour-point
coordinates. There are also addition and subtraction functions
for moving points with respect to the origin.
</para>
<para>
There are also methods for fetching the glyph ID that
corresponds to a Unicode code point (possibly when followed by a
variation-selector code point), fetching the glyph name from the
font, and fetching the glyph ID that corresponds to a glyph name
you already have.
</para>
<para>
HarfBuzz also provides functions for converting between glyph
names and string
variables. <function>hb_font_glyph_to_string(font, glyph, s,
size)</function> retrieves the name for the glyph ID
<parameter>glyph</parameter> from the font object. It generates a
generic name of the form <literal>gidDDD</literal> (where DDD is
the glyph index) if there is no name for the glyph in the
font. The <function>hb_font_glyph_from_string(font, s, len,
glyph)</function> takes an input string <parameter>s</parameter>
and looks for a glyph with that name in the font, returning its
glyph ID in the <parameter>glyph</parameter>
output parameter. It automatically parses
<literal>gidDDD</literal> and <literal>uniUUUU</literal> strings.
</para>
</section>
<!-- Commenting out FreeType integration section-holder for now. May move
to the full-blown Integration Chapter. -->
<!-- <section id="fonts-and-faces-freetype">
<title>Using FreeType</title> <title>Using FreeType</title>
<para> <para>
</para> </para>
</section>
<section id="using-harfbuzzs-native-opentype-implementation">
<title>Using HarfBuzz's native OpenType implementation</title>
<para> <para>
</para> </para>
</section> </section> -->
<section id="using-your-own-font-functions">
<title>Using your own font functions</title> <section id="fonts-and-faces-variable">
<title>Working with OpenType Variable Fonts</title>
<para> <para>
If you are working with OpenType Variable Fonts, there are a few
additional functions you should use to specify the
variation-axis settings of your font object. Without doing so,
your variable font's font object can still be used, but only at
the default setting for every axis (which, of course, is
sometimes what you want, but does not cover general usage).
</para>
<para>
HarfBuzz manages variation settings in the
<type>hb_variation_t</type> data type, which holds a <property>tag</property> for the
variation-axis identifier tag and a <property>value</property> for its
setting. You can retrieve the list of variation axes in a font
binary from the face object (not from a font object, notably) by
calling <function>hb_ot_var_get_axis_count(face)</function> to
find the number of axes, then using
<function>hb_ot_var_get_axis_infos()</function> to collect the
axis structures:
</para>
<programlisting language="C">
axes = hb_ot_var_get_axis_count(face);
...
hb_ot_var_get_axis_infos(face, 0, axes, axes_array);
</programlisting>
<para>
For each axis returned in the array, you can can access the
identifier in its <property>tag</property>. HarfBuzz also has
tag definitions predefined for the five standard axes specified
in OpenType (<literal>ital</literal> for italic,
<literal>opsz</literal> for optical size,
<literal>slnt</literal> for slant, <literal>wdth</literal> for
width, and <literal>wght</literal> for weight). Each axis also
has a <property>min_value</property>, a
<property>default_value</property>, and a <property>max_value</property>.
</para>
<para>
To set your font object's variation settings, you call the
<function>hb_font_set_variations()</function> function with an
array of <type>hb_variation_t</type> variation settings. Let's
say our font has weight and width axes. We need to specify each
of the axes by tag and assign a value on the axis:
</para>
<programlisting language="C">
unsigned int variation_count = 2;
hb_variation_t variation_data[variation_count];
variation_data[0].tag = HB_OT_TAG_VAR_AXIS_WIDTH;
variation_data[1].tag = HB_OT_TAG_VAR_AXIS_WEIGHT;
variation_data[0].value = 80;
variation_data[1].value = 750;
...
hb_font_set_variations(font, variation_data, variation_count);
</programlisting>
<para>
That should give us a slightly condensed font ("normal" on the
<literal>wdth</literal> axis is 100) at a noticeably bolder
weight ("regular" is 400 on the <literal>wght</literal> axis).
</para>
<para>
In practice, though, you should always check that the value you
want to set on the axis is within the
[<property>min_value</property>,<property>max_value</property>]
range actually implemented in the font's variation axis. After
all, a font might only provide lighter-than-regular weights, and
setting a heavier value on the <literal>wght</literal> axis will
not change that.
</para>
<para>
Once your variation settings are specified on your font object,
however, shaping with a variable font is just like shaping a
static font.
</para> </para>
</section> </section>
</chapter> </chapter>

View File

@ -6,14 +6,299 @@
]> ]>
<chapter id="shaping-and-shape-plans"> <chapter id="shaping-and-shape-plans">
<title>Shaping and shape plans</title> <title>Shaping and shape plans</title>
<section id="opentype-features"> <para>
Once you have your face and font objects configured as desired and
your input buffer is filled with the characters you need to shape,
all you need to do is call <function>hb_shape()</function>.
</para>
<para>
HarfBuzz will return the shaped version of the text in the same
buffer that you provided, but it will be in output mode. At that
point, you can iterate through the glyphs in the buffer, drawing
each one at the specified position or handing them off to the
appropriate graphics library.
</para>
<para>
For the most part, HarfBuzz's shaping step is straightforward from
the outside. But that doesn't mean there will never be cases where
you want to look under the hood and see what is happening on the
inside. HarfBuzz provides facilities for doing that, too.
</para>
<section id="shaping-buffer-output">
<title>Shaping and buffer output</title>
<para>
The <function>hb_shape()</function> function call takes four arguments: the font
object to use, the buffer of characters to shape, an array of
user-specified features to apply, and the length of that feature
array. The feature array can be NULL, so for the sake of
simplicity we will start with that case.
</para>
<para>
Internally, HarfBuzz looks at the tables of the font file to
determine where glyph classes, substitutions, and positioning
are defined, using that information to decide which
<emphasis>shaper</emphasis> to use (<literal>ot</literal> for
OpenType fonts, <literal>aat</literal> for Apple Advanced
Typography fonts, and so on). It also looks at the direction,
script, and language properties of the segment to figure out
which script-specific shaping model is needed (at least, in
shapers that support multiple options).
</para>
<para>
If a font has a GDEF table, then that is used for
glyph classes; if not, HarfBuzz will fall back to Unicode
categorization by code point. If a font has an AAT "morx" table,
then it is used for substitutions; if not, but there is a GSUB
table, then the GSUB table is used. If the font has an AAT
"kerx" table, then it is used for positioning; if not, but
there is a GPOS table, then the GPOS table is used. If neither
table is found, but there is a "kern" table, then HarfBuzz will
use the "kern" table. If there is no "kerx", no GPOS, and no
"kern", HarfBuzz will fall back to positioning marks itself.
</para>
<para>
With a well-behaved OpenType font, you expect GDEF, GSUB, and
GPOS tables to all be applied. HarfBuzz implements the
script-specific shaping models in internal functions, rather
than in the public API.
</para>
<para>
The algorithms
used for complex scripts can be quite involved; HarfBuzz tries
to be compatible with the OpenType Layout specification
and, wherever there is any ambiguity, HarfBuzz attempts to replicate the
output of Microsoft's Uniscribe engine. See the <ulink
url="https://docs.microsoft.com/en-us/typography/script-development/standard">Microsoft
Typography pages</ulink> for more detail.
</para>
<para>
In general, though, all that you need to know is that
<function>hb_shape()</function> returns the results of shaping
in the same buffer that you provided. The buffer's content type
will now be set to
<literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal>, indicating
that it contains shaped output, rather than input text. You can
now extract the glyph information and positioning arrays:
</para>
<programlisting language="C">
hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &amp;glyph_count);
hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &amp;glyph_count);
</programlisting>
<para>
The glyph information array holds a <type>hb_glyph_info_t</type>
for each output glyph, which has two fields:
<parameter>codepoint</parameter> and
<parameter>cluster</parameter>. Whereas, in the input buffer,
the <parameter>codepoint</parameter> field contained the Unicode
code point, it now contains the glyph ID of the corresponding
glyph in the font. The <parameter>cluster</parameter> field is
an integer that you can use to help identify when shaping has
reordered, split, or combined code points; we will say more
about that in the next chapter.
</para>
<para>
The glyph positions array holds a corresponding
<type>hb_glyph_position_t</type> for each output glyph,
containing four fields: <parameter>x_advance</parameter>,
<parameter>y_advance</parameter>,
<parameter>x_offset</parameter>, and
<parameter>y_offset</parameter>. The advances tell you how far
you need to move the drawing point after drawing this glyph,
depending on whether you are setting horizontal text (in which
case you will have x advances) or vertical text (for which you
will have y advances). The x and y offsets tell you where to
move to start drawing the glyph; usually you will have both and
x and a y offset, regardless of the text direction.
</para>
<para>
Most of the time, you will rely on a font-rendering library or
other graphics library to do the actual drawing of glyphs, so
you will need to iterate through the glyphs in the buffer and
pass the corresponding values off.
</para>
</section>
<section id="shaping-opentype-features">
<title>OpenType features</title> <title>OpenType features</title>
<para> <para>
OpenType features enable fonts to include smart behavior,
implemented as "lookup" rules stored in the GSUB and GPOS
tables. The OpenType specification defines a long list of
standard features that fonts can use for these behaviors; each
feature has a four-character reserved name and a well-defined
semantic meaning.
</para>
<para>
Some OpenType features are defined for the purpose of supporting
complex-script shaping, and are automatically activated, but
only when a buffer's script property is set to a script that the
feature supports.
</para>
<para>
Other features are more generic and can apply to several (or
any) script, and shaping engines are expected to implement
them. By default, HarfBuzz activates several of these features
on every text run. They include <literal>ccmp</literal>,
<literal>locl</literal>, <literal>mark</literal>,
<literal>mkmk</literal>, and <literal>rlig</literal>.
</para>
<para>
In addition, if the text direction is horizontal, HarfBuzz
also applies the <literal>calt</literal>,
<literal>clig</literal>, <literal>curs</literal>,
<literal>kern</literal>, <literal>liga</literal>,
<literal>rclt</literal>, and <literal>frac</literal> features.
</para>
<para>
If the text direction is vertical, HarfBuzz applies
the <literal>vert</literal> feature by default.
</para>
<para>
Still other features are designed to be purely optional and left
up to the application or the end user to enable or disable as desired.
</para>
<para>
You can adjust the set of features that HarfBuzz applies to a
buffer by supplying an array of <type>hb_feature_t</type>
features as the third argument to
<function>hb_shape()</function>. For a simple case, let's just
enable the <literal>dlig</literal> feature, which turns on any
"discretionary" ligatures in the font:
</para>
<programlisting language="C">
hb_feature_t userfeatures[1];
userfeatures[0].tag = HB_TAG('d','l','i','g');
userfeatures[0].value = 1;
userfeatures[0].start = HB_FEATURE_GLOBAL_START;
userfeatures[0].end = HB_FEATURE_GLOBAL_END;
</programlisting>
<para>
<literal>HB_FEATURE_GLOBAL_END</literal> and
<literal>HB_FEATURE_GLOBAL_END</literal> are macros we can use
to indicate that the features will be applied to the entire
buffer. We could also have used a literal <literal>0</literal>
for the start and a <literal>-1</literal> to indicate the end of
the buffer (or have selected other start and end positions, if needed).
</para>
<para>
When we pass the <varname>userfeatures</varname> array to
<function>hb_shape()</function>, any discretionary ligature
substitutions from our font that match the text in our buffer
will get performed:
</para>
<programlisting language="C">
hb_shape(font, buf, userfeatures, num_features);
</programlisting>
<para>
Just like we enabled the <literal>dlig</literal> feature by
setting its <parameter>value</parameter> to
<literal>1</literal>, you would disable a feature by setting its
<parameter>value</parameter> to <literal>0</literal>. Some
features can take other <parameter>value</parameter> settings;
be sure you read the full specification of each feature tag to
understand what it does and how to control it.
</para> </para>
</section> </section>
<section id="plans-and-caching">
<section id="shaping-shaper-selection">
<title>Shaper selection</title>
<para>
The basic version of <function>hb_shape()</function> determines
its shaping strategy based on examining the capabilities of the
font file. OpenType font tables cause HarfBuzz to try the
<literal>ot</literal> shaper, while AAT font tables cause HarfBuzz to try the
<literal>aat</literal> shaper.
</para>
<para>
In the real world, however, a font might include some unusual
mix of tables, or one of the tables might simply be broken for
the script you need to shape. So, sometimes, you might not
want to rely on HarfBuzz's process for deciding what to do, and
just tell <function>hb_shape()</function> what you want it to try.
</para>
<para>
<function>hb_shape_full()</function> is an alternate shaping
function that lets you supply a list of shapers for HarfBuzz to
try, in order, when shaping your buffer. For example, if you
have determined that HarfBuzz's attempts to work around broken
tables gives you better results than the AAT shaper itself does,
you might move the AAT shaper to the end of your list of
preferences and call <function>hb_shape_full()</function>
</para>
<programlisting language="C">
char *shaperprefs[3] = {"ot", "default", "aat"};
...
hb_shape_full(font, buf, userfeatures, num_features, shaperprefs);
</programlisting>
<para>
to get results you are happier with.
</para>
<para>
You may also want to call
<function>hb_shape_list_shapers()</function> to get a list of
the shapers that were built at compile time in your copy of HarfBuzz.
</para>
</section>
<section id="shaping-plans-and-caching">
<title>Plans and caching</title> <title>Plans and caching</title>
<para> <para>
Internally, HarfBuzz uses a structure called a shape plan to
track its decisions about how to shape the contents of a
buffer. The <function>hb_shape()</function> function builds up the shape plan by
examining segment properties and by inspecting the contents of
the font.
</para>
<para>
This process can involve some decision-making and
trade-offs — for example, HarfBuzz inspects the GSUB and GPOS
lookups for the script and language tags set on the segment
properties, but it falls back on the lookups under the
<literal>DFLT</literal> tag (and sometimes other common tags)
if there are actually no lookups for the tag requested.
</para>
<para>
HarfBuzz also includes some work-arounds for
handling well-known older font conventions that do not follow
OpenType or Unicode specifications, for buggy system fonts, and for
peculiarities of Microsoft Uniscribe. All of that means that a
shape plan, while not something that you should edit directly in
client code, still might be an object that you want to
inspect. Furthermore, if resources are tight, you might want to
cache the shape plan that HarfBuzz builds for your buffer and
font, so that you do not have to rebuild it for every shaping call.
</para>
<para>
You can create a cacheable shape plan with
<function>hb_shape_plan_create_cached(face, props,
user_features, num_user_features, shaper_list)</function>, where
<parameter>face</parameter> is a face object (not a font object,
notably), <parameter>props</parameter> is an
<type>hb_segment_properties_t</type>,
<parameter>user_features</parameter> is an array of
<type>hb_feature_t</type>s (with length
<parameter>num_user_features</parameter>), and
<parameter>shaper_list</parameter> is a list of shapers to try.
</para>
<para>
Shape plans are objects in HarfBuzz, so there are
reference-counting functions and user-data attachment functions
you can
use. <function>hb_shape_plan_reference(shape_plan)</function>
increases the reference count on a shape plan, while
<function>hb_shape_plan_destroy(shape_plan)</function> decreases
the reference count, destroying the shape plan when the last
reference is dropped.
</para>
<para>
You can attach user data to a shaper (with a key) using the
<function>hb_shape_plan_set_user_data(shape_plan,key,data,destroy,replace)</function>
function, optionally supplying a <function>destroy</function>
callback to use. You can then fetch the user data attached to a
shape plan with
<function>hb_shape_plan_get_user_data(shape_plan, key)</function>.
</para> </para>
</section> </section>
</chapter> </chapter>

View File

@ -0,0 +1,244 @@
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
<!ENTITY version SYSTEM "version.xml">
]>
<chapter id="utilities">
<title>Utilities</title>
<para>
HarfBuzz includes several auxiliary components in addition to the
main APIs. These include a set of command-line tools, a set of
lower-level APIs for common data types that may be of interest to
client programs, and an embedded library for working with
Unicode Character Database (UCD) data.
</para>
<section id="utilities-command-line-tools">
<title>Command-line tools</title>
<para>
HarfBuzz include three command-line tools:
<program>hb-shape</program>, <program>hb-view</program>, and
<program>hb-subset</program>. They can be used to examine
HarfBuzz's functionality, debug font binaries, or explore the
various shaping models and features from a terminal.
</para>
<section id="utilities-command-line-hbshape">
<title>hb-shape</title>
<para>
<emphasis><program>hb-shape</program></emphasis> allows you to run HarfBuzz's
<function>hb_shape()</function> function on an input string and
to examine the outcome, in human-readable form, as terminal
output. <program>hb-shape</program> does
<emphasis>not</emphasis> render the results of the shaping call
into rendered text (you can use <program>hb-view</program>, below, for
that). Instead, it prints out the final glyph indices and
positions, taking all shaping operations into account, as if the
input string were a HarfBuzz input buffer.
</para>
<para>
You can specify the font to be used for shaping and, with
command-line options, you can add various aspects of the
internal state to the output that is sent to the terminal. The
general format is
</para>
<programlisting>
<command>hb-shape</command> <optional>[OPTIONS]</optional>
<parameter>path/to/font/file.ttf</parameter>
<parameter>yourinputtext</parameter>
</programlisting>
<para>
The default output format is plain text (although JSON output
can be selected instead by specifying the option
<optional>--output-format=json</optional>). The default output
syntax reports each glyph name (or glyph index if there is no
name) followed by its cluster value, its horizontal and vertical
position displacement, and its horizontal and vertical advances.
</para>
<para>
Output options exist to skip any of these elements in the
output, and to include additional data, such as Unicode
code-point values, glyph extents, glyph flags, or interim
shaping results.
</para>
<para>
Output can also be redirected to a file, or input read from a
file. Additional options enable you to enable or disable
specific font features, to set variation-font axis values, to
alter the language, script, direction, and clustering settings
used, to enable sanity checks, or to change which shaping engine is used.
</para>
<para>
For a complete explanation of the options available, run
</para>
<programlisting>
<command>hb-shape</command> <parameter>--help</parameter>
</programlisting>
</section>
<section id="utilities-command-line-hbview">
<title>hb-view</title>
<para>
<emphasis><program>hb-view</program></emphasis> allows you to
see the shaped output of an input string in rendered
form. Like <program>hb-shape</program>,
<program>hb-view</program> takes a font file and a text string
as its arguments:
</para>
<programlisting>
<command>hb-view</command> <optional>[OPTIONS]</optional>
<parameter>path/to/font/file.ttf</parameter>
<parameter>yourinputtext</parameter>
</programlisting>
<para>
By default, <program>hb-view</program> renders the shaped
text in ASCII block-character images as terminal output. By
appending the
<command>--output-file=<optional>filename</optional></command>
switch, you can write the output to a PNG, SVG, or PDF file
(among other formats).
</para>
<para>
As with <program>hb-shape</program>, a lengthy set of options
is available, with which you can enable or disable
specific font features, set variation-font axis values,
alter the language, script, direction, and clustering settings
used, enable sanity checks, or change which shaping engine is
used.
</para>
<para>
You can also set the foreground and background colors used for
the output, independently control the width of all four
margins, alter the line spacing, and annotate the output image
with
</para>
<para>
In general, <program>hb-view</program> is a quick way to
verify that the output of HarfBuzz's shaping operation looks
correct for a given text-and-font combination, but you may
want to use <program>hb-shape</program> to figure out exactly
why something does not appear as expected.
</para>
</section>
<section id="utilities-command-line-hbsubset">
<title>hb-subset</title>
<para>
<emphasis><program>hb-subset</program></emphasis> allows you
to generate a subset of a given font, with a limited set of
supported characters, features, and variation settings.
</para>
<para>
By default, you provide an input font and an input text string
as the arguments to <program>hb-subset</program>, and it will
generate a font that covers the input text exactly like the
input font does, but includes no other characters or features.
</para>
<programlisting>
<command>hb-subset</command> <optional>[OPTIONS]</optional>
<parameter>path/to/font/file.ttf</parameter>
<parameter>yourinputtext</parameter>
</programlisting>
<para>
For example, to create a subset of Noto Serif that just includes the
numerals and the lowercase Latin alphabet, you could run
</para>
<programlisting>
<command>hb-subset</command> <optional>[OPTIONS]</optional>
<parameter>NotoSerif-Regular.ttf</parameter>
<parameter>0123456789abcdefghijklmnopqrstuvwxyz</parameter>
</programlisting>
<para>
There are options available to remove hinting from the
subsetted font and to specify a list of variation-axis settings.
</para>
</section>
</section>
<section id="utilities-common-types-apis">
<title>Common data types and APIs</title>
<para>
HarfBuzz includes several APIs for working with general-purpose
data that you may find convenient to leverage in your own
software. They include set operations and integer-to-integer
mapping operations.
</para>
<para>
HarfBuzz uses set operations for internal bookkeeping, such as
when it collects all of the glyph IDs covered by a particular
font feature. You can also use the set API to build sets, add
and remove elements, test whether or not sets contain particular
elements, or compute the unions, intersections, or differences
between sets.
</para>
<para>
All set elements are integers (specifically,
<type>hb_codepoint_t</type> 32-bit unsigned ints), and there are
functions for fetching the minimum and maximum element from a
set. The set API also includes some functions that might not
be part of a generic set facility, such as the ability to add a
contiguous range of integer elements to a set in bulk, and the
ability to fetch the next-smallest or next-largest element.
</para>
<para>
The HarfBuzz set API includes some conveniences as well. All
sets are lifecycle-managed, just like other HarfBuzz
objects. You increase the reference count on a set with
<function>hb_set_reference()</function> and decrease it with
<function>hb_set_destroy()</function>. You can also attach
user data to a set, just like you can to blobs, buffers, faces,
fonts, and other objects, and set destroy callbacks.
</para>
<para>
HarfBuzz also provides an API for keeping track of
integer-to-integer mappings. As with the set API, each integer is
stored as an unsigned 32-bit <type>hb_codepoint_t</type>
element. Maps, like other objects, are reference counted with
reference and destroy functions, and you can attach user data to
them. The mapping operations include adding and deleting
integer-to-integer key:value pairs to the map, testing for the
presence of a key, fetching the population of the map, and so on.
</para>
<para>
There are several other internal HarfBuzz facilities that are
exposed publicly and which you may want to take advantage of
while processing text. HarfBuzz uses a common
<type>hb_tag_t</type> for a variety of OpenType tag identifiers (for
scripts, languages, font features, table names, variation-axis
names, and more), and provides functions for converting strings
to tags and vice-versa.
</para>
<para>
Finally, HarfBuzz also includes data type for Booleans, bit
masks, and other simple types.
</para>
</section>
<section id="utilities-ucdn">
<title>UCDN</title>
<para>
HarfBuzz includes a copy of the <ulink
url="https://github.com/grigorig/ucdn">UCDN</ulink> (Unicode
Database and Normalization) library, which provides functions
for accessing basic Unicode character properties, performing
canonical composition, and performing both canonical and
compatibility decomposition.
</para>
<para>
Currently, UCDN supports direct queries for several more character
properties than HarfBuzz's built-in set of Unicode functions
does, such as the BiDirectional Class, East Asian Width, Paired
Bracket and Resolved Linebreak properties. If you need to access
more properties than HarfBuzz's internal implementation
provides, using the built-in UCDN functions may be a useful solution.
</para>
<para>
The built-in UCDN functions are compiled by default when
building HarfBuzz from source, but this can be disabled with a
compile-time switch.
</para>
</section>
</chapter>