diff --git a/docs/usermanual-what-is-harfbuzz.xml b/docs/usermanual-what-is-harfbuzz.xml index 38f40cf11..8ec7b4030 100644 --- a/docs/usermanual-what-is-harfbuzz.xml +++ b/docs/usermanual-what-is-harfbuzz.xml @@ -1,115 +1,176 @@ What is HarfBuzz? - HarfBuzz is a text shaping engine. It solves - the problem of selecting and positioning glyphs from a font given a - Unicode string. + HarfBuzz is a text shaping engine. If you + give HarfBuzz a font and a string containing a sequence of Unicode + codepoints, HarfBuzz selects and positions the corresponding + glyphs from the font, applying all of the necessary layout rules + and font features. HarfBuzz then returns the string to you in the + form that is correctly arranged for the language and writing + system. -
- Why do I need it? + + HarfBuzz can properly shape all of the world's major writing + systems. It runs on virtually all operating systems and software + platforms, and it supports all of the standard font formats in use + today. + +
+ Why do I need a shaping engine? - Text shaping is an integral part of preparing text for display. It - is a fairly low level operation; HarfBuzz is used directly by - graphic rendering libraries such as Pango, and the layout engines - in Firefox, LibreOffice and Chromium. Unless you are - writing one of these layout engines yourself, - you will probably not need to use HarfBuzz - normally higher level - libraries will turn text into glyphs for you. + Text shaping is an integral part of preparing text for + display. Before a Unicode sequence can be rendered, the + codepoints in the sequence must be mapped to the glyphs + provided in the font, and the glyphs must be positioned + correctly relative to each other. For many of the scripts + supported in Unicode, these steps involve script-specific layout + rules. + + + Text shaping is a fairly low-level operation. HarfBuzz is + used directly by graphic rendering libraries such as Pango, as + well as by the layout engines in Firefox, LibreOffice, and + Chromium. Unless you are writing one of + these layout engines yourself, you will probably not need to use + HarfBuzz: normally, lower-level libraries will turn text into + glyphs for you. However, if you are writing a layout engine or graphics library yourself, you will need to perform text - shaping, and this is where HarfBuzz can help you. Here are some - reasons why you need it: + shaping, and this is where HarfBuzz can help you. + + + Here are some specific scenarios where a text-shaping engine + like HarfBuzz helps you: - OpenType fonts contain a set of glyphs, indexed by glyph ID. - The glyph ID within the font does not necessarily relate to a - Unicode codepoint. For instance, some fonts have the letter - "a" as glyph ID 1. To pull the right glyph out of - the font in order to display it, you need to consult a table - within the font (the "cmap" table) which maps - Unicode codepoints to glyph IDs. Text shaping turns codepoints - into glyph IDs. + OpenType fonts contain a set of glyphs (that is, shapes + to represent the letters, numbers, punctuation marks, and + all other symbols), which are indexed by a glyph ID. + + + The glyph ID within the font does not necessarily correlate + to a predictable Unicode codepoint. For instance, some fonts + have the letter "a" as glyph ID 1, but many others do + not. To pull the right glyph out of the font in order to + display "a", you need to consult the table inside + the font (the cmap table) that maps Unicode + codepoints to glyph IDs. In other words, text shaping turns + codepoints into glyph IDs. Many OpenType fonts contain ligatures: combinations of - characters which are rendered together. For instance, it's - common for the fi combination to appear in - print as the single ligature "fi". Whether you should - render text as fi or "fi" does not - depend on the input text, but on the capabilities of the font - and the level of ligature application you wish to perform. - Text shaping involves querying the font's ligature tables and - determining what substitutions should be made. + characters that are rendered as a single unit. For instance, + it is common for the fi letter + combination to appear in print as the single ligature glyph + "fi". + + + Whether you should render an "f, i" sequence + as fi or as "fi" does not + depend on the input text. Rather, it depends on the whether + or not the font includes an "fi" glyph and on the + level of ligature application you wish to perform. The font + and the amount of ligature application used are under your + control. In other words, text shaping involves + querying the font's ligature tables and determining what + substitutions should be made. - While ligatures like "fi" are typographic - refinements, some languages require such + While ligatures like "fi" are optional typographic + refinements, some languages require certain substitutions to be made in order to display text correctly. - In Tamil, when the letter "TTA" (ட) letter is - followed by "U" (உ), the combination should appear - as the single glyph "டு". The sequence of Unicode - characters "டஉ" needs to be rendered as a single - glyph from the font - text shaping chooses the correct glyph - from the sequence of characters provided. + + + For example, in Tamil, when the letter "TTA" (ட) + letter is followed by "U" (உ), the pair + must be replaced by the single glyph "டு". The + sequence of Unicode characters "டஉ" needs to be + substituted with a single "டு" glyph from the + font. + + + But "டு" does not have a Unicode codepoint. To + find this glyph, you need to consult the table inside + the font (the GSUB table) that contains + substitution information. In other words, text shaping + chooses the correct glyph for a sequence of characters + provided. - Similarly, each Arabic character has four different variants: - within a font, there will be glyphs for the initial, medial, - final, and isolated forms of each letter. Unicode only encodes - one codepoint per character, and so a Unicode string will not - tell you which glyph to use. Text shaping chooses the correct - form of the letter and returns the correct glyph from the font - that you need to render. + Similarly, each Arabic character has four different variants + corresponding to the different positions in might appear in + within a sequence. Inside a font, there will be separate + glyphs for the initial, medial, final, and isolated forms of + each letter, each at a different glyph ID. + + + Unicode only assigns one codepoint per character, so a + Unicode string will not tell you which glyph variant to use + for each character. To decide, you need to analyze the whole + string and determine the appropriate glyph for each character + based on its position. In other words, text + shaping chooses the correct form of the letter by its + position and returns the correct glyph from the font. - Other languages have marks and accents which need to be - rendered in certain positions around a base character. For - instance, the Moldovan language has the Cyrillic letter - "zhe" (ж) with a breve accent, like so: ӂ. Some - fonts will contain this character as an individual glyph, - whereas other fonts will not contain a zhe-with-breve glyph - but expect the rendering engine to form the character by - overlaying the two glyphs ж and ˘. Where you should draw the - combining breve depends on the height of the preceding glyph. - Again, for Arabic, the correct positioning of vowel marks - depends on the height of the character on which you are - placing the mark. Text shaping tells you whether you have a + Other languages involve marks and accents that need to be + rendered in specific positions relative a base character. For + instance, the Moldovan language includes the Cyrillic letter + "zhe" (ж) with a breve accent, like so: "ӂ". + + + Some fonts will provide this character as a single + zhe-with-breve glyph, but other fonts will not and, instead, + will expect the rendering engine to form the character by + superimposing the separate "ж" and "˘" + glyphs. + + + But exactly where you should draw the breve depends on the + height and width of the preceding zhe glyph. To find the + right position, you need to consult the table inside + the font (the GPOS table) that contains + positioning information. + In other words, text shaping tells you whether you have a precomposed glyph within your font or if you need to compose a - glyph yourself out of combining marks, and if so, where to - position those marks. + glyph yourself out of combining marks—and, if so, where to + position those marks. - If this is something that you need to do, then you need a text - shaping engine: you could use Uniscribe if you are using Windows; - you could use CoreText on OS X; or you could use HarfBuzz. In the - rest of this manual, we are going to assume that you are the - implementor of a text layout engine. + If tasks like these are something that you need to do, then you need a text + shaping engine. You could use Uniscribe if you are writing + Windows software; you could use CoreText on macOS; or you could + use HarfBuzz. + + + In the rest of this manual, we are going to assume that you are the + implementor of a text-layout engine.
Why is it called HarfBuzz? - HarfBuzz began its life as text shaping code within the FreeType - project, (and you will see references to the FreeType authors - within the source code copyright declarations) but was then - abstracted out to its own project. This project is maintained by + HarfBuzz began its life as text-shaping code within the FreeType + project (and you will see references to the FreeType authors + within the source code copyright declarations), but was then + extracted out to its own project. This project is maintained by Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping engine for OpenType fonts - "HarfBuzz" is the Persian for "open type".
- \ No newline at end of file +