diff --git a/docs/Makefile.am b/docs/Makefile.am index d8da37073..878a4723c 100644 --- a/docs/Makefile.am +++ b/docs/Makefile.am @@ -79,9 +79,9 @@ content_files= \ usermanual-object-model.xml \ usermanual-buffers-language-script-and-direction.xml \ usermanual-fonts-and-faces.xml \ - usermanual-clusters.xml \ usermanual-opentype-features.xml \ - usermanual-glyph-information.xml \ + usermanual-clusters.xml \ + usermanual-utilities.xml \ version.xml # SGML files where gtk-doc abbrevations (#GtkWidget) are expanded diff --git a/docs/harfbuzz-docs.xml b/docs/harfbuzz-docs.xml index a0400daeb..2e4b35cbb 100644 --- a/docs/harfbuzz-docs.xml +++ b/docs/harfbuzz-docs.xml @@ -36,9 +36,9 @@ - - + + diff --git a/docs/usermanual-buffers-language-script-and-direction.xml b/docs/usermanual-buffers-language-script-and-direction.xml index 1c6b5dab1..2865426f2 100644 --- a/docs/usermanual-buffers-language-script-and-direction.xml +++ b/docs/usermanual-buffers-language-script-and-direction.xml @@ -7,30 +7,38 @@ Buffers, language, script and direction - The input to HarfBuzz is a series of Unicode characters, stored in a + The input to the HarfBuzz shaper is a series of Unicode characters, stored in a buffer. In this chapter, we'll look at how to set up a buffer with - the text that we want and then customize the properties of the - buffer. + the text that we want and how to customize the properties of the + buffer. We'll also look at a piece of lower-level machinery that + you will need to understand before proceeding: the functions that + HarfBuzz uses to retrieve Unicode information. + + + After shaping is complete, HarfBuzz puts its output back + into the buffer. But getting that output requires setting up a + face and a font first, so we will look at that in the next chapter + instead of here.
Creating and destroying buffers As we saw in our Getting Started example, a buffer is created and - initialized with hb_buffer_create(). This + initialized with hb_buffer_create(). This produces a new, empty buffer object, instantiated with some default values and ready to accept your Unicode strings. HarfBuzz manages the memory of objects (such as buffers) that it creates, so you don't have to. When you have finished working on - a buffer, you can call hb_buffer_destroy(): + a buffer, you can call hb_buffer_destroy(): - hb_buffer_t *buffer = hb_buffer_create(); - ... - hb_buffer_destroy(buffer); - + hb_buffer_t *buf = hb_buffer_create(); + ... + hb_buffer_destroy(buf); + This will destroy the object and free its associated memory - unless some other part of the program holds a reference to this @@ -39,46 +47,364 @@ else destroying it, you should increase its reference count: -void somefunc(hb_buffer_t *buffer) { - buffer = hb_buffer_reference(buffer); - ... - + void somefunc(hb_buffer_t *buf) { + buf = hb_buffer_reference(buf); + ... + And then decrease it once you're done with it: - hb_buffer_destroy(buffer); -} - + hb_buffer_destroy(buf); + } + + + While we are on the subject of reference-counting buffers, it is + worth noting that an individual buffer can only meaningfully be + used by one thread at a time. + To throw away all the data in your buffer and start from scratch, - call hb_buffer_reset(buffer). If you want to + call hb_buffer_reset(buf). If you want to throw away the string in the buffer but keep the options, you can - instead call hb_buffer_clear_contents(buffer). + instead call hb_buffer_clear_contents(buf).
+
Adding text to the buffer Now we have a brand new HarfBuzz buffer. Let's start filling it with text! From HarfBuzz's perspective, a buffer is just a stream - of Unicode codepoints, but your input string is probably in one of - the standard Unicode character encodings (UTF-8, UTF-16, UTF-32) + of Unicode code points, but your input string is probably in one of + the standard Unicode character encodings (UTF-8, UTF-16, or + UTF-32). HarfBuzz provides convenience functions that accept + each of these encodings: + hb_buffer_add_utf8(), + hb_buffer_add_utf16(), and + hb_buffer_add_utf32(). Other than the + character encoding they accept, they function identically. + + You can add UTF-8 text to a buffer by passing in the text array, + the array's length, an offset into the array for the first + character to add, and the length of the segment to add: + + + hb_buffer_add_utf8 (hb_buffer_t *buf, + const char *text, + int text_length, + unsigned int item_offset, + int item_length) + + + So, in practice, you can say: + + + hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text)); + + + This will append your new characters to + buf, not replace its existing + contents. Also, note that you can use -1 in + place of the first instance of strlen(text) + if your text array is NULL-terminated. Similarly, you can also use + -1 as the final argument want to add its full + contents. + + + Whatever start item_offset and + item_length you provide, HarfBuzz will also + attempt to grab the five characters before + the offset point and the five characters + after the designated end. These are the + before and after "context" segments, which are used internally + for HarfBuzz to make shaping decisions. They will not be part of + the final output, but they ensure that HarfBuzz's + script-specific shaping operations are correct. If there are + fewer than five characters available for the before or after + contexts, HarfBuzz will just grab what is there. + + + For longer text runs, such as full paragraphs, it might be + tempting to only add smaller sub-segments to a buffer and + shape them in piecemeal fashion. Generally, this is not a good + idea, however, because a lot of shaping decisions are + dependent on this context information. For example, in Arabic + and other connected scripts, HarfBuzz needs to know the code + points before and after each character in order to correctly + determine which glyph to return. + + + The safest approach is to add all of the text available, then + use item_offset and + item_length to indicate which characters you + want shaped, so that HarfBuzz has access to any context. + + + You can also add Unicode code points directly with + hb_buffer_add_codepoints(). The arguments + to this function are the same as those for the UTF + encodings. But it is particularly important to note that + HarfBuzz does not do validity checking on the text that is added + to a buffer. Invalid code points will be replaced, but it is up + to you to do any deep-sanity checking necessary. + +
+
Setting buffer properties + Buffers containing input characters still need several + properties set before HarfBuzz can shape their text correctly. -
-
- What about the other scripts? + Initially, all buffers are set to the + HB_BUFFER_CONTENT_TYPE_INVALID content + type. After adding text, the buffer should be set to + HB_BUFFER_CONTENT_TYPE_UNICODE instead, which + indicates that it contains un-shaped input + characters. After shaping, the buffer will have the + HB_BUFFER_CONTENT_TYPE_GLYPHS content type. + + + hb_buffer_add_utf8() and the + other UTF functions set the content type of their buffer + automatically. But if you are reusing a buffer you may want to + check its state with + hb_buffer_get_content_type(buffer). If + necessary you can set the content type with + + + hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE); + + + to prepare for shaping. + + + Buffers also need to carry information about the script, + language, and text direction of their contents. You can set + these properties individually: + + + hb_buffer_set_direction(buf, HB_DIRECTION_LTR); + hb_buffer_set_script(buf, HB_SCRIPT_LATIN); + hb_buffer_set_language(buf, hb_language_from_string("en", -1)); + + + However, since these properties are often the repeated for + multiple text runs, you can also save them in a + hb_segment_properties_t for reuse: + + + hb_segment_properties_t *savedprops; + hb_buffer_get_segment_properties (buf, savedprops); + ... + hb_buffer_set_segment_properties (buf2, savedprops); + + + HarfBuzz also provides getter functions to retrieve a buffer's + direction, script, and language properties individually. + + + HarfBuzz recognizes four text directions in + hb_direction_t: left-to-right + (HB_DIRECTION_LTR), right-to-left (HB_DIRECTION_RTL), + top-to-bottom (HB_DIRECTION_TTB), and + bottom-to-top (HB_DIRECTION_BTT). For the + script property, HarfBuzz uses identifiers based on the + ISO 15924 + standard. For languages, HarfBuzz uses tags based on the + IETF BCP 47 standard. + + + Helper functions are provided to convert character strings into + the necessary script and language tag types. + + + Two additional buffer properties to be aware of are the + "invisible glyph" and the replacement code point. The + replacement code point is inserted into buffer output in place of + any invalid code points encountered in the input. By default, it + is the Unicode REPLACEMENT CHARACTER code + point, U+FFFD "�". You can change this with + + + hb_buffer_set_replacement_codepoint(buf, replacement); + + + passing in the replacement Unicode code point as the + replacement parameter. + + + The invisible glyph is used to replace all output glyphs that + are invisible. By default, the standard space character + U+0020 is used; you can replace this (for + example, when using a font that provides script-specific + spaces) with + + + hb_buffer_set_invisible_glyph(buf, replacement_glyph); + + + Do note that in the replacement_glyph + parameter, you must provide the glyph ID of the replacement you + wish to use, not the Unicode code point. + + + HarfBuzz supports a few additional flags you might want to set + on your buffer under certain circumstances. The + HB_BUFFER_FLAG_BOT and + HB_BUFFER_FLAG_EOT flags tell HarfBuzz + that the buffer represents the beginning or end (respectively) + of a text element (such as a paragraph or other block). Knowing + this allows HarfBuzz to apply certain contextual font features + when shaping, such as initial or final variants in connected + scripts. + + + HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES + tells HarfBuzz not to hide glyphs with the + Default_Ignorable property in Unicode. This + property designates control characters and other non-printing + code points, such as joiners and variation selectors. Normally + HarfBuzz replaces them in the output buffer with zero-width + space glyphs (using the "invisible glyph" property discussed + above); setting this flag causes them to be printed, which can + be helpful for troubleshooting. + + + Conversely, setting the + HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES flag + tells HarfBuzz to remove Default_Ignorable + glyphs from the output buffer entirely. Finally, setting the + HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE + flag tells HarfBuzz not to insert the dotted-circle glyph + (U+25CC, "◌"), which is normally + inserted into buffer output when broken character sequences are + encountered (such as combining marks that are not attached to a + base character).
+
Customizing Unicode functions + HarfBuzz requires some simple functions for accessing + information from the Unicode Character Database (such as the + General_Category (gc) and + Script (sc) properties) that is useful + for shaping, as well as some useful operations like composing and + decomposing code points. + + + HarfBuzz includes its own internal, lightweight set of Unicode + functions. At build time, it is also possible to compile support + for some other options, such as the Unicode functions provided + by GLib or the International Components for Unicode (ICU) + library. Generally, this option is only of interest for client + programs that have specific integration requirements or that do + a significant amount of customization. + + + If your program has access to other Unicode functions, however, + such as through a system library or application framework, you + might prefer to use those instead of the built-in + options. HarfBuzz supports this by implementing its Unicode + functions as a set of virtual methods that you can replace — + without otherwise affecting HarfBuzz's functionality. + + + The Unicode functions are specified in a structure called + unicode_funcs which is attached to each + buffer. But even though unicode_funcs is + associated with a hb_buffer_t, the functions + themselves are called by other HarfBuzz APIs that access + buffers, so it would be unwise for you to hook different + functions into different buffers. + + + In addition, you can mark your unicode_funcs + as immutable by calling + hb_unicode_funcs_make_immutable (ufuncs). + This is especially useful if your code is a + library or framework that will have its own client programs. By + marking your Unicode function choices as immutable, you prevent + your own client programs from changing the + unicode_funcs configuration and introducing + inconsistencies and errors downstream. + + + You can retrieve the Unicode-functions configuration for + your buffer by calling hb_buffer_get_unicode_funcs(): + + + hb_unicode_funcs_t *ufunctions; + ufunctions = hb_buffer_get_unicode_funcs(buf); + + + The current version of unicode_funcs uses six functions: + + + + + hb_unicode_combining_class_func_t: + returns the Canonical Combining Class of a code point. + + + + + hb_unicode_general_category_func_t: + returns the General Category (gc) of a code point. + + + + + hb_unicode_mirroring_func_t: returns + the Mirroring Glyph code point (for bi-directional + replacement) of a code point. + + + + + hb_unicode_script_func_t: returns the + Script (sc) property of a code point. + + + + + hb_unicode_compose_func_t: returns the + canonical composition of a sequence of two code points. + + + + + hb_unicode_decompose_func_t: returns + the canonical decomposition of a code point. + + + + + Note, however, that future HarfBuzz releases may alter this set. + + + Each Unicode function has a corresponding setter, with which you + can assign a callback to your replacement function. For example, + to replace + hb_unicode_general_category_func_t, you can call + + + hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy) + + + Virtualizing this set of Unicode functions is primarily intended + to improve portability. There is no need for every client + program to make the effort to replace the default options, so if + you are unsure, do not feel any pressure to customize + unicode_funcs.
+
diff --git a/docs/usermanual-fonts-and-faces.xml b/docs/usermanual-fonts-and-faces.xml index 553600439..c1787bddf 100644 --- a/docs/usermanual-fonts-and-faces.xml +++ b/docs/usermanual-fonts-and-faces.xml @@ -5,20 +5,449 @@ ]> - Fonts and faces -
+ Fonts, faces, and output + + In the previous chapter, we saw how to set up a buffer and fill + it with text as Unicode code points. In order to shape this + buffer text with HarfBuzz, you will need also need a font + object. + + + HarfBuzz provides abstractions to help you cache and reuse the + heavier parts of working with binary fonts, so we will look at + how to do that. We will also look at how to work with the + FreeType font-rendering library and at how you can customize + HarfBuzz to work with other libraries. + + + Finally, we will look at how to work with OpenType variable + fonts, the latest update to the OpenType font format, and at + some other recent additions to OpenType. + + +
+ Font and face objects + + The outcome of shaping a run of text depends on the contents of + a specific font file (such as the substitutions and positioning + moves in the 'GSUB' and 'GPOS' tables), so HarfBuzz makes + accessing those internals fast. + + + An hb_face_t represents a face + in HarfBuzz. This data type is a wrapper around an + hb_blob_t blob that holds the contents of a binary + fotn file. Since HarfBuzz supports TrueType Collections and + OpenType Collections (each of which can include multiple + typefaces), a HarfBuzz face also requires an index number + specifying which typeface in the file you want to use. Most of + the font files you will encounter in the wild include just a + single face, however, so most of the time you would pass in + 0 as the index when you create a face: + + + hb_blob_t* blob = hb_blob_create_from_file(file); + ... + hb_face_t* face = hb_face_create(blob, 0); + + + On its own, a face object is not quite ready to use for + shaping. The typeface must be set to a specific point size in + order for some details (such as hinting) to work. In addition, + if the font file in question is an OpenType Variable Font, then + you may need to specify one or variation-axis settings (or a + named instance) in order to get the output you need. + + + In HarfBuzz, you do this by creating a font + object from your face. + + + Font objects also have the advantage of being considerably + lighter-weight than face objects (remember that a face contains + the contents of a binary font file mapped into memory). As a + result, you can cache and reuse a font object, but you could + also create a new one for each additional size you needed. + Creating new fonts incurs some additional overhead, of course, + but whether or not it is excessive is your call in the end. In + contrast, face objects are substantially larger, and you really + should cache them and reuse them whenever possible. + + + You can create a font object from a face object: + + + hb_font_t* hb_font = hb_font_create(hb_face); + + + After creating a font, there are a few properties you should + set. Many fonts enable and disable hints based on the size it + is used at, so setting this is important for font + objects. hb_font_set_ppem(font, x_ppem, + y_ppem) sets the pixels-per-EM value of the font. You + can also set the point size of the font with + hb_font_set_ppem(font, ptem). HarfBuzz uses the + industry standard 72 points per inch. + + + HarfBuzz lets you specify the degree subpixel precision you want + through a scaling factor. You can set horizontal and + vertical scaling factors on the + font by calling hb_font_set_scale(font, x_scale, + y_scale). + + + There may be times when you are handed a font object and need to + access the face object that it comes from. For that, you can call + + + hb_face = hb_font_get_face(hb_font); + + + You can also create a font object from an existing font object + using the hb_font_create_sub_font() + function. This creates a child font object that is initiated + with the same attributes as its parent; it can be used to + quickly set up a new font for the purpose of overriding a specific + font-functions method. + + + All face objects and font objects are lifecycle-managed by + HarfBuzz. After creating a face, you increase its reference + count with hb_face_reference(face) and + decrease it with + hb_face_destroy(face). Likewise, you + increase the reference count on a font with + hb_font_reference(font) and decrease it + with hb_font_destroy(font). + + + You can also attach user data to face objects and font objects. + +
+ +
+ Customizing font functions + + During shaping, HarfBuzz frequently needs to query font objects + to get at the contents and parameters of the glyphs in a font + file. It includes a built-in set of functions that is tailored + to working with OpenType fonts. However, as was the case with + Unicode functions in the buffers chapter, HarfBuzz also wants to + make it easy for you to assign a substitute set of font + functions if you are developing a program to work with a library + or platform that provides its own font functions. + + + Therefore, the HarfBuzz API defines a set of virtual + methods for accessing font-object properties, and you can + replace the defaults with your own selections without + interfering with the shaping process. Each font object in + HarfBuzz includes a structure called + font_funcs that serves as a vtable for the + font object. The virtual methods in + font_funcs are: + + + + + hb_font_get_font_h_extents_func_t: returns + the extents of the font for horizontal text. + + + + + hb_font_get_font_v_extents_func_t: returns + the extents of the font for vertical text. + + + + + hb_font_get_nominal_glyph_func_t: returns + the font's nominal glyph for a given code point. + + + + + hb_font_get_variation_glyph_func_t: returns + the font's glyph for a given code point when it is followed by a + given Variation Selector. + + + + + hb_font_get_nominal_glyphs_func_t: returns + the font's nominal glyphs for a series of code points. + + + + + hb_font_get_glyph_advance_func_t: returns + the advance for a glyph. + + + + + hb_font_get_glyph_h_advance_func_t: returns + the advance for a glyph for horizontal text. + + + + + hb_font_get_glyph_v_advance_func_t:returns + the advance for a glyph for vertical text. + + + + + hb_font_get_glyph_advances_func_t: returns + the advances for a series of glyphs. + + + + + hb_font_get_glyph_h_advances_func_t: returns + the advances for a series of glyphs for horizontal text . + + + + + hb_font_get_glyph_v_advances_func_t: returns + the advances for a series of glyphs for vertical text. + + + + + hb_font_get_glyph_origin_func_t: returns + the origin coordinates of a glyph. + + + + + hb_font_get_glyph_h_origin_func_t: returns + the origin coordinates of a glyph for horizontal text. + + + + + hb_font_get_glyph_v_origin_func_t: returns + the origin coordinates of a glyph for vertical text. + + + + + hb_font_get_glyph_extents_func_t: returns + the extents for a glyph. + + + + + hb_font_get_glyph_contour_point_func_t: + returns the coordinates of a specific contour point from a glyph. + + + + + hb_font_get_glyph_name_func_t: returns the + name of a glyph (from its glyph index). + + + + + hb_font_get_glyph_from_name_func_t: returns + the glyph index that corresponds to a given glyph name. + + + + + You can fetch the font-functions configuration for a font object + by calling hb_font_get_font_funcs(): + + + hb_font_funcs_t *ffunctions; + ffunctions = hb_font_get_font_funcs (font); + + + The individual methods can each be replaced with their own setter + function, such as + hb_font_funcs_set_nominal_glyph_func(*ffunctions, + func, *user_data, destroy). + + + Font-functions structures can be reused for multiple font + objects, and can be reference counted with + hb_font_funcs_reference() and + hb_font_funcs_destroy(). Just like other + objects in HarfBuzz, you can set user-data for each + font-functions structure and assign a destroy callback for + it. + + + You can also mark a font-functions structure as immutable, + with hb_font_funcs_make_immutable(). This + is especially useful if your code is a library or framework that + will have its own client programs. By marking your + font-functions structures as immutable, you prevent your client + programs from changing the configuration and introducing + inconsistencies and errors downstream. + +
+ +
+ Font objects and HarfBuzz's native OpenType implementation + + By default, whenever HarfBuzz creates a font object, it will + configure the font to use a built-in set of font functions that + supports contemporary OpenType font internals. If you want to + work with OpenType or TrueType fonts, you should be able to use + these functions without difficulty. + + + Many of the methods in the font-functions structure deal with + the fundamental properties of glyphs that are required for + shaping text: extents (the maximums and minimums on each axis), + origins (the (0,0) coordinate point which + glyphs are drawn in reference to), and advances (the amount that + the cursor needs to be moved after drawing each glyph, including + any empty space for the glyph's side bearings). + + + As you can see in the list of functions, there are separate "horizontal" + and "vertical" variants depending on whether the text is set in + the horizontal or vertical direction. For some scripts, fonts + that are designed to support text set horizontally or vertically (for + example, in Japanese) may include metrics for both text + directions. When fonts don't include this information, HarfBuzz + does its best to transform what the font provides. + + + In addition to the direction-specific functions, HarfBuzz + provides some higher-level functions for fetching information + like extents and advances for a glyph. If you call + + + hb_font_get_glyph_advance_for_direction(font, direction, extents); + + + then you can provide any hb_direction_t as the + direction parameter, and HarfBuzz will + use the correct function variant for the text direction. There + are similar higher-level versions of the functions for fetching + extents, origin coordinates, and contour-point + coordinates. There are also addition and subtraction functions + for moving points with respect to the origin. + + + There are also methods for fetching the glyph ID that + corresponds to a Unicode code point (possibly when followed by a + variation-selector code point), fetching the glyph name from the + font, and fetching the glyph ID that corresponds to a glyph name + you already have. + + + HarfBuzz also provides functions for converting between glyph + names and string + variables. hb_font_glyph_to_string(font, glyph, s, + size) retrieves the name for the glyph ID + glyph from the font object. It generates a + generic name of the form gidDDD (where DDD is + the glyph index) if there is no name for the glyph in the + font. The hb_font_glyph_from_string(font, s, len, + glyph) takes an input string s + and looks for a glyph with that name in the font, returning its + glyph ID in the glyph + output parameter. It automatically parses + gidDDD and uniUUUU strings. + +
+ + + + + + +
+ Working with OpenType Variable Fonts + If you are working with OpenType Variable Fonts, there are a few + additional functions you should use to specify the + variation-axis settings of your font object. Without doing so, + your variable font's font object can still be used, but only at + the default setting for every axis (which, of course, is + sometimes what you want, but does not cover general usage). + + + HarfBuzz manages variation settings in the + hb_variation_t data type, which holds a tag for the + variation-axis identifier tag and a value for its + setting. You can retrieve the list of variation axes in a font + binary from the face object (not from a font object, notably) by + calling hb_ot_var_get_axis_count(face) to + find the number of axes, then using + hb_ot_var_get_axis_infos() to collect the + axis structures: + + + axes = hb_ot_var_get_axis_count(face); + ... + hb_ot_var_get_axis_infos(face, 0, axes, axes_array); + + + For each axis returned in the array, you can can access the + identifier in its tag. HarfBuzz also has + tag definitions predefined for the five standard axes specified + in OpenType (ital for italic, + opsz for optical size, + slnt for slant, wdth for + width, and wght for weight). Each axis also + has a min_value, a + default_value, and a max_value. + + + To set your font object's variation settings, you call the + hb_font_set_variations() function with an + array of hb_variation_t variation settings. Let's + say our font has weight and width axes. We need to specify each + of the axes by tag and assign a value on the axis: + + + unsigned int variation_count = 2; + hb_variation_t variation_data[variation_count]; + variation_data[0].tag = HB_OT_TAG_VAR_AXIS_WIDTH; + variation_data[1].tag = HB_OT_TAG_VAR_AXIS_WEIGHT; + variation_data[0].value = 80; + variation_data[1].value = 750; + ... + hb_font_set_variations(font, variation_data, variation_count); + + + That should give us a slightly condensed font ("normal" on the + wdth axis is 100) at a noticeably bolder + weight ("regular" is 400 on the wght axis). + + + In practice, though, you should always check that the value you + want to set on the axis is within the + [min_value,max_value] + range actually implemented in the font's variation axis. After + all, a font might only provide lighter-than-regular weights, and + setting a heavier value on the wght axis will + not change that. + + + Once your variation settings are specified on your font object, + however, shaping with a variable font is just like shaping a + static font.
- + + diff --git a/docs/usermanual-opentype-features.xml b/docs/usermanual-opentype-features.xml index 51ff55a77..3eedab3d4 100644 --- a/docs/usermanual-opentype-features.xml +++ b/docs/usermanual-opentype-features.xml @@ -6,14 +6,299 @@ ]> Shaping and shape plans -
+ + Once you have your face and font objects configured as desired and + your input buffer is filled with the characters you need to shape, + all you need to do is call hb_shape(). + + + HarfBuzz will return the shaped version of the text in the same + buffer that you provided, but it will be in output mode. At that + point, you can iterate through the glyphs in the buffer, drawing + each one at the specified position or handing them off to the + appropriate graphics library. + + + For the most part, HarfBuzz's shaping step is straightforward from + the outside. But that doesn't mean there will never be cases where + you want to look under the hood and see what is happening on the + inside. HarfBuzz provides facilities for doing that, too. + + +
+ Shaping and buffer output + + The hb_shape() function call takes four arguments: the font + object to use, the buffer of characters to shape, an array of + user-specified features to apply, and the length of that feature + array. The feature array can be NULL, so for the sake of + simplicity we will start with that case. + + + Internally, HarfBuzz looks at the tables of the font file to + determine where glyph classes, substitutions, and positioning + are defined, using that information to decide which + shaper to use (ot for + OpenType fonts, aat for Apple Advanced + Typography fonts, and so on). It also looks at the direction, + script, and language properties of the segment to figure out + which script-specific shaping model is needed (at least, in + shapers that support multiple options). + + + If a font has a GDEF table, then that is used for + glyph classes; if not, HarfBuzz will fall back to Unicode + categorization by code point. If a font has an AAT "morx" table, + then it is used for substitutions; if not, but there is a GSUB + table, then the GSUB table is used. If the font has an AAT + "kerx" table, then it is used for positioning; if not, but + there is a GPOS table, then the GPOS table is used. If neither + table is found, but there is a "kern" table, then HarfBuzz will + use the "kern" table. If there is no "kerx", no GPOS, and no + "kern", HarfBuzz will fall back to positioning marks itself. + + + With a well-behaved OpenType font, you expect GDEF, GSUB, and + GPOS tables to all be applied. HarfBuzz implements the + script-specific shaping models in internal functions, rather + than in the public API. + + + The algorithms + used for complex scripts can be quite involved; HarfBuzz tries + to be compatible with the OpenType Layout specification + and, wherever there is any ambiguity, HarfBuzz attempts to replicate the + output of Microsoft's Uniscribe engine. See the Microsoft + Typography pages for more detail. + + + In general, though, all that you need to know is that + hb_shape() returns the results of shaping + in the same buffer that you provided. The buffer's content type + will now be set to + HB_BUFFER_CONTENT_TYPE_GLYPHS, indicating + that it contains shaped output, rather than input text. You can + now extract the glyph information and positioning arrays: + + + hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count); + hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count); + + + The glyph information array holds a hb_glyph_info_t + for each output glyph, which has two fields: + codepoint and + cluster. Whereas, in the input buffer, + the codepoint field contained the Unicode + code point, it now contains the glyph ID of the corresponding + glyph in the font. The cluster field is + an integer that you can use to help identify when shaping has + reordered, split, or combined code points; we will say more + about that in the next chapter. + + + The glyph positions array holds a corresponding + hb_glyph_position_t for each output glyph, + containing four fields: x_advance, + y_advance, + x_offset, and + y_offset. The advances tell you how far + you need to move the drawing point after drawing this glyph, + depending on whether you are setting horizontal text (in which + case you will have x advances) or vertical text (for which you + will have y advances). The x and y offsets tell you where to + move to start drawing the glyph; usually you will have both and + x and a y offset, regardless of the text direction. + + + Most of the time, you will rely on a font-rendering library or + other graphics library to do the actual drawing of glyphs, so + you will need to iterate through the glyphs in the buffer and + pass the corresponding values off. + +
+ +
OpenType features + OpenType features enable fonts to include smart behavior, + implemented as "lookup" rules stored in the GSUB and GPOS + tables. The OpenType specification defines a long list of + standard features that fonts can use for these behaviors; each + feature has a four-character reserved name and a well-defined + semantic meaning. + + + Some OpenType features are defined for the purpose of supporting + complex-script shaping, and are automatically activated, but + only when a buffer's script property is set to a script that the + feature supports. + + + Other features are more generic and can apply to several (or + any) script, and shaping engines are expected to implement + them. By default, HarfBuzz activates several of these features + on every text run. They include ccmp, + locl, mark, + mkmk, and rlig. + + + In addition, if the text direction is horizontal, HarfBuzz + also applies the calt, + clig, curs, + kern, liga, + rclt, and frac features. + + + If the text direction is vertical, HarfBuzz applies + the vert feature by default. + + + Still other features are designed to be purely optional and left + up to the application or the end user to enable or disable as desired. + + + You can adjust the set of features that HarfBuzz applies to a + buffer by supplying an array of hb_feature_t + features as the third argument to + hb_shape(). For a simple case, let's just + enable the dlig feature, which turns on any + "discretionary" ligatures in the font: + + + hb_feature_t userfeatures[1]; + userfeatures[0].tag = HB_TAG('d','l','i','g'); + userfeatures[0].value = 1; + userfeatures[0].start = HB_FEATURE_GLOBAL_START; + userfeatures[0].end = HB_FEATURE_GLOBAL_END; + + + HB_FEATURE_GLOBAL_END and + HB_FEATURE_GLOBAL_END are macros we can use + to indicate that the features will be applied to the entire + buffer. We could also have used a literal 0 + for the start and a -1 to indicate the end of + the buffer (or have selected other start and end positions, if needed). + + + When we pass the userfeatures array to + hb_shape(), any discretionary ligature + substitutions from our font that match the text in our buffer + will get performed: + + + hb_shape(font, buf, userfeatures, num_features); + + + Just like we enabled the dlig feature by + setting its value to + 1, you would disable a feature by setting its + value to 0. Some + features can take other value settings; + be sure you read the full specification of each feature tag to + understand what it does and how to control it.
-
+ +
+ Shaper selection + + The basic version of hb_shape() determines + its shaping strategy based on examining the capabilities of the + font file. OpenType font tables cause HarfBuzz to try the + ot shaper, while AAT font tables cause HarfBuzz to try the + aat shaper. + + + In the real world, however, a font might include some unusual + mix of tables, or one of the tables might simply be broken for + the script you need to shape. So, sometimes, you might not + want to rely on HarfBuzz's process for deciding what to do, and + just tell hb_shape() what you want it to try. + + + hb_shape_full() is an alternate shaping + function that lets you supply a list of shapers for HarfBuzz to + try, in order, when shaping your buffer. For example, if you + have determined that HarfBuzz's attempts to work around broken + tables gives you better results than the AAT shaper itself does, + you might move the AAT shaper to the end of your list of + preferences and call hb_shape_full() + + + char *shaperprefs[3] = {"ot", "default", "aat"}; + ... + hb_shape_full(font, buf, userfeatures, num_features, shaperprefs); + + + to get results you are happier with. + + + You may also want to call + hb_shape_list_shapers() to get a list of + the shapers that were built at compile time in your copy of HarfBuzz. + +
+ +
Plans and caching + Internally, HarfBuzz uses a structure called a shape plan to + track its decisions about how to shape the contents of a + buffer. The hb_shape() function builds up the shape plan by + examining segment properties and by inspecting the contents of + the font. + + + This process can involve some decision-making and + trade-offs — for example, HarfBuzz inspects the GSUB and GPOS + lookups for the script and language tags set on the segment + properties, but it falls back on the lookups under the + DFLT tag (and sometimes other common tags) + if there are actually no lookups for the tag requested. + + + HarfBuzz also includes some work-arounds for + handling well-known older font conventions that do not follow + OpenType or Unicode specifications, for buggy system fonts, and for + peculiarities of Microsoft Uniscribe. All of that means that a + shape plan, while not something that you should edit directly in + client code, still might be an object that you want to + inspect. Furthermore, if resources are tight, you might want to + cache the shape plan that HarfBuzz builds for your buffer and + font, so that you do not have to rebuild it for every shaping call. + + + You can create a cacheable shape plan with + hb_shape_plan_create_cached(face, props, + user_features, num_user_features, shaper_list), where + face is a face object (not a font object, + notably), props is an + hb_segment_properties_t, + user_features is an array of + hb_feature_ts (with length + num_user_features), and + shaper_list is a list of shapers to try. + + + Shape plans are objects in HarfBuzz, so there are + reference-counting functions and user-data attachment functions + you can + use. hb_shape_plan_reference(shape_plan) + increases the reference count on a shape plan, while + hb_shape_plan_destroy(shape_plan) decreases + the reference count, destroying the shape plan when the last + reference is dropped. + + + You can attach user data to a shaper (with a key) using the + hb_shape_plan_set_user_data(shape_plan,key,data,destroy,replace) + function, optionally supplying a destroy + callback to use. You can then fetch the user data attached to a + shape plan with + hb_shape_plan_get_user_data(shape_plan, key).
+ diff --git a/docs/usermanual-utilities.xml b/docs/usermanual-utilities.xml new file mode 100644 index 000000000..1c5370c11 --- /dev/null +++ b/docs/usermanual-utilities.xml @@ -0,0 +1,244 @@ + + + +]> + + Utilities + + HarfBuzz includes several auxiliary components in addition to the + main APIs. These include a set of command-line tools, a set of + lower-level APIs for common data types that may be of interest to + client programs, and an embedded library for working with + Unicode Character Database (UCD) data. + + +
+ Command-line tools + + HarfBuzz include three command-line tools: + hb-shape, hb-view, and + hb-subset. They can be used to examine + HarfBuzz's functionality, debug font binaries, or explore the + various shaping models and features from a terminal. + + +
+ hb-shape + + hb-shape allows you to run HarfBuzz's + hb_shape() function on an input string and + to examine the outcome, in human-readable form, as terminal + output. hb-shape does + not render the results of the shaping call + into rendered text (you can use hb-view, below, for + that). Instead, it prints out the final glyph indices and + positions, taking all shaping operations into account, as if the + input string were a HarfBuzz input buffer. + + + You can specify the font to be used for shaping and, with + command-line options, you can add various aspects of the + internal state to the output that is sent to the terminal. The + general format is + + + hb-shape [OPTIONS] + path/to/font/file.ttf + yourinputtext + + + The default output format is plain text (although JSON output + can be selected instead by specifying the option + --output-format=json). The default output + syntax reports each glyph name (or glyph index if there is no + name) followed by its cluster value, its horizontal and vertical + position displacement, and its horizontal and vertical advances. + + + Output options exist to skip any of these elements in the + output, and to include additional data, such as Unicode + code-point values, glyph extents, glyph flags, or interim + shaping results. + + + Output can also be redirected to a file, or input read from a + file. Additional options enable you to enable or disable + specific font features, to set variation-font axis values, to + alter the language, script, direction, and clustering settings + used, to enable sanity checks, or to change which shaping engine is used. + + + For a complete explanation of the options available, run + + + hb-shape --help + +
+ +
+ hb-view + + hb-view allows you to + see the shaped output of an input string in rendered + form. Like hb-shape, + hb-view takes a font file and a text string + as its arguments: + + + hb-view [OPTIONS] + path/to/font/file.ttf + yourinputtext + + + By default, hb-view renders the shaped + text in ASCII block-character images as terminal output. By + appending the + --output-file=filename + switch, you can write the output to a PNG, SVG, or PDF file + (among other formats). + + + As with hb-shape, a lengthy set of options + is available, with which you can enable or disable + specific font features, set variation-font axis values, + alter the language, script, direction, and clustering settings + used, enable sanity checks, or change which shaping engine is + used. + + + You can also set the foreground and background colors used for + the output, independently control the width of all four + margins, alter the line spacing, and annotate the output image + with + + + In general, hb-view is a quick way to + verify that the output of HarfBuzz's shaping operation looks + correct for a given text-and-font combination, but you may + want to use hb-shape to figure out exactly + why something does not appear as expected. + +
+ +
+ hb-subset + + hb-subset allows you + to generate a subset of a given font, with a limited set of + supported characters, features, and variation settings. + + + By default, you provide an input font and an input text string + as the arguments to hb-subset, and it will + generate a font that covers the input text exactly like the + input font does, but includes no other characters or features. + + + hb-subset [OPTIONS] + path/to/font/file.ttf + yourinputtext + + + For example, to create a subset of Noto Serif that just includes the + numerals and the lowercase Latin alphabet, you could run + + + hb-subset [OPTIONS] + NotoSerif-Regular.ttf + 0123456789abcdefghijklmnopqrstuvwxyz + + + There are options available to remove hinting from the + subsetted font and to specify a list of variation-axis settings. + +
+ +
+ +
+ Common data types and APIs + + HarfBuzz includes several APIs for working with general-purpose + data that you may find convenient to leverage in your own + software. They include set operations and integer-to-integer + mapping operations. + + + HarfBuzz uses set operations for internal bookkeeping, such as + when it collects all of the glyph IDs covered by a particular + font feature. You can also use the set API to build sets, add + and remove elements, test whether or not sets contain particular + elements, or compute the unions, intersections, or differences + between sets. + + + All set elements are integers (specifically, + hb_codepoint_t 32-bit unsigned ints), and there are + functions for fetching the minimum and maximum element from a + set. The set API also includes some functions that might not + be part of a generic set facility, such as the ability to add a + contiguous range of integer elements to a set in bulk, and the + ability to fetch the next-smallest or next-largest element. + + + The HarfBuzz set API includes some conveniences as well. All + sets are lifecycle-managed, just like other HarfBuzz + objects. You increase the reference count on a set with + hb_set_reference() and decrease it with + hb_set_destroy(). You can also attach + user data to a set, just like you can to blobs, buffers, faces, + fonts, and other objects, and set destroy callbacks. + + + HarfBuzz also provides an API for keeping track of + integer-to-integer mappings. As with the set API, each integer is + stored as an unsigned 32-bit hb_codepoint_t + element. Maps, like other objects, are reference counted with + reference and destroy functions, and you can attach user data to + them. The mapping operations include adding and deleting + integer-to-integer key:value pairs to the map, testing for the + presence of a key, fetching the population of the map, and so on. + + + There are several other internal HarfBuzz facilities that are + exposed publicly and which you may want to take advantage of + while processing text. HarfBuzz uses a common + hb_tag_t for a variety of OpenType tag identifiers (for + scripts, languages, font features, table names, variation-axis + names, and more), and provides functions for converting strings + to tags and vice-versa. + + + Finally, HarfBuzz also includes data type for Booleans, bit + masks, and other simple types. + +
+ +
+ UCDN + + HarfBuzz includes a copy of the UCDN (Unicode + Database and Normalization) library, which provides functions + for accessing basic Unicode character properties, performing + canonical composition, and performing both canonical and + compatibility decomposition. + + + Currently, UCDN supports direct queries for several more character + properties than HarfBuzz's built-in set of Unicode functions + does, such as the BiDirectional Class, East Asian Width, Paired + Bracket and Resolved Linebreak properties. If you need to access + more properties than HarfBuzz's internal implementation + provides, using the built-in UCDN functions may be a useful solution. + + + The built-in UCDN functions are compiled by default when + building HarfBuzz from source, but this can be disabled with a + compile-time switch. + +
+ +