From f0807654da160bd7ceb9aff5b8338ec0b643171c Mon Sep 17 00:00:00 2001 From: Simon Cozens Date: Tue, 25 Aug 2015 19:57:15 +0100 Subject: [PATCH] First two chapters. More to follow. --- docs/usermanual-ch01.xml | 115 +++++++++++++++++++++++++ docs/usermanual-ch02.xml | 182 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 297 insertions(+) create mode 100644 docs/usermanual-ch01.xml create mode 100644 docs/usermanual-ch02.xml diff --git a/docs/usermanual-ch01.xml b/docs/usermanual-ch01.xml new file mode 100644 index 000000000..1ee0cbeec --- /dev/null +++ b/docs/usermanual-ch01.xml @@ -0,0 +1,115 @@ + + What is Harfbuzz? + + Harfbuzz is a text shaping engine. It solves + the problem of selecting and positioning glyphs from a font given a + Unicode string. + + + Why do I need it? + + Text shaping is an integral part of preparing text for display. It + is a fairly low level operation; Harfbuzz is used directly by + graphic rendering libraries such as Pango, and the layout engines + in Firefox, LibreOffice and Chromium. Unless you are + writing one of these layout engines yourself, + you will probably not need to use Harfbuzz - normally higher level + libraries will turn text into glyphs for you. + + + However, if you are writing a layout engine + or graphics library yourself, you will need to perform text + shaping, and this is where Harfbuzz can help you. Here are some + reasons why you need it: + + + + + OpenType fonts contain a set of glyphs, indexed by glyph ID. + The glyph ID within the font does not necessarily relate to a + Unicode codepoint. For instance, some fonts have the letter + "a" as glyph ID 1. To pull the right glyph out of + the font in order to display it, you need to consult a table + within the font (the "cmap" table) which maps + Unicode codepoints to glyph IDs. Text shaping turns codepoints + into glyph IDs. + + + + + Many OpenType fonts contain ligatures: combinations of + characters which are rendered together. For instance, it's + common for the fi combination to appear in + print as the single ligature "fi". Whether you should + render text as fi or "fi" does not + depend on the input text, but on the capabilities of the font + and the level of ligature application you wish to perform. + Text shaping involves querying the font's ligature tables and + determining what substitutions should be made. + + + + + While ligatures like "fi" are typographic + refinements, some languages require such + substitutions to be made in order to display text correctly. + In Tamil, when the letter "TTA" (ட) letter is + followed by "U" (உ), the combination should appear + as the single glyph "டு". The sequence of Unicode + characters "டஉ" needs to be rendered as a single + glyph from the font - text shaping chooses the correct glyph + from the sequence of characters provided. + + + + + Similarly, each Arabic character has four different variants: + within a font, there will be glyphs for the initial, medial, + final, and isolated forms of each letter. Unicode only encodes + one codepoint per character, and so a Unicode string will not + tell you which glyph to use. Text shaping chooses the correct + form of the letter and returns the correct glyph from the font + that you need to render. + + + + + Other languages have marks and accents which need to be + rendered in certain positions around a base character. For + instance, the Moldovan language has the Cyrillic letter + "zhe" (ж) with a breve accent, like so: ӂ. Some + fonts will contain this character as an individual glyph, + whereas other fonts will not contain a zhe-with-breve glyph + but expect the rendering engine to form the character by + overlaying the two glyphs ж and ˘. Where you should draw the + combining breve depends on the height of the preceding glyph. + Again, for Arabic, the correct positioning of vowel marks + depends on the height of the character on which you are + placing the mark. Text shaping tells you whether you have a + precomposed glyph within your font or if you need to compose a + glyph yourself out of combining marks, and if so, where to + position those marks. + + + + + If this is something that you need to do, then you need a text + shaping engine: you could use Uniscribe if you are using Windows; + you could use CoreText on OS X; or you could use Harfbuzz. In the + rest of this manual, we are going to assume that you are the + implementor of a text layout engine. + + + + Why is it called Harfbuzz? + + Harfbuzz began its life as text shaping code within the FreeType + project, (and you will see references to the FreeType authors + within the source code copyright declarations) but was then + abstracted out to its own project. This project is maintained by + Behdad Esfahbod, and named Harfbuzz. Originally, it was a shaping + engine for OpenType fonts - "Harfbuzz" is the Persian + for "open type". + + + \ No newline at end of file diff --git a/docs/usermanual-ch02.xml b/docs/usermanual-ch02.xml new file mode 100644 index 000000000..f0a161dd9 --- /dev/null +++ b/docs/usermanual-ch02.xml @@ -0,0 +1,182 @@ + + Hello, Harfbuzz + + Here's the simplest Harfbuzz that can possibly work. We will improve + it later. + + + + + Create a buffer and put your text in it. + + + + + #include <hb.h> + hb_buffer_t *buf; + buf = hb_buffer_create(); + hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text)); + + + + + Guess the script, language and direction of the buffer. + + + + + hb_buffer_guess_segment_properties(buf); + + + + + Create a face and a font, using FreeType for now. + + + + + #include <hb-ft.h> + FT_New_Face(ft_library, font_path, index, &face) + hb_font_t *font = hb_ft_font_create(face); + + + + + Shape! + + + + + hb_shape(font, buf, NULL, 0); + + + + + Get the glyph and position information. + + + + + hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count); + hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count); + + + + + Iterate over each glyph. + + + + + for (i = 0; i < glyph_count; ++i) { + glyphid = glyph_info[i].codepoint; + x_offset = glyph_pos[i].x_offset / 64.0; + y_offset = glyph_pos[i].y_offset / 64.0; + x_advance = glyph_pos[i].x_advance / 64.0; + y_advance = glyph_pos[i].y_advance / 64.0; + draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); + cursor_x += x_advance; + cursor_y += y_advance; + } + + + + + Tidy up. + + + + + hb_buffer_destroy(buf); + hb_font_destroy(hb_ft_font); + + + What Harfbuzz doesn't do + + The code above will take a UTF8 string, shape it, and give you the + information required to lay it out correctly on a single + horizontal (or vertical) line using the font provided. That is the + extent of Harfbuzz's responsibility. + + + If you are implementing a text layout engine you may have other + responsibilities, that Harfbuzz will not help you with: + + + + + Harfbuzz won't help you with bidirectionality. If you want to + lay out text with mixed Hebrew and English, you will need to + ensure that the buffer provided to Harfbuzz has those + characters in the correct layout order. This will be different + from the logical order in which the Unicode text is stored. In + other words, the user will hit the keys in the following + sequence: + + +A B C [space] ג ב א [space] D E F + + + but will expect to see in the output: + + +ABC אבג DEF + + + This reordering is called bidi processing + ("bidi" is short for bidirectional), and there's an + algorithm as an annex to the Unicode Standard which tells you how + to reorder a string from logical order into presentation order. + Before sending your string to Harfbuzz, you may need to apply the + bidi algorithm to it. Libraries such as ICU and fribidi can do + this for you. + + + + Harfbuzz won't help you with text that contains different font + properties. For instance, if you have the string "a + huge breakfast", and you expect + "huge" to be italic, you will need to send three + strings to Harfbuzz: a, in your Roman font; + huge using your italic font; and + breakfast using your Roman font again. + Similarly if you change font, font size, script, language or + direction within your string, you will need to shape each run + independently and then output them independently. Harfbuzz + expects to shape a run of characters sharing the same + properties. + + + + + Harfbuzz won't help you with line breaking, hyphenation or + justification. As mentioned above, it lays out the string + along a single line of, notionally, + infinite length. If you want to find out where the potential + word, sentence and line break points are in your text, you + could use the ICU library's break iterator functions. + + + Harfbuzz can tell you how wide a shaped piece of text is, which is + useful input to a justification algorithm, but it knows nothing + about paragraphs, lines or line lengths. Nor will it adjust the + space between words to fit them proportionally into a line. If you + want to layout text in paragraphs, you will probably want to send + each word of your text to Harfbuzz to determine its shaped width + after glyph substitutions, then work out how many words will fit + on a line, and then finally output each word of the line separated + by a space of the correct size to fully justify the paragraph. + + + + + As a layout engine implementor, Harfbuzz will help you with the + interface between your text and your font, and that's something + that you'll need - what you then do with the glyphs that your font + returns is up to you. The example we saw above enough to get us + started using Harfbuzz. Now we are going to use the remainder of + Harfbuzz's API to refine that example and improve our text shaping + capabilities. + + + \ No newline at end of file