harfbuzz/docs/usermanual-what-is-harfbuzz...

<chapter id="what-is-harfbuzz">
  <title>What is HarfBuzz?</title>
  <para>
    HarfBuzz is a <emphasis>text shaping engine</emphasis>. If you
    give HarfBuzz a font and a string containing a sequence of Unicode
    codepoints, HarfBuzz selects and positions the corresponding
    glyphs from the font, applying all of the necessary layout rules
    and font features. HarfBuzz then returns the string to you in the
    form that is correctly arranged for the language and writing
    system.
  </para>
  <para>
    HarfBuzz can properly shape all of the world's major writing
    systems. It runs on virtually all operating systems and software
    platforms, and it supports all of the standard font formats in use
    today.
  </para>
  <section id="why-do-i-need-a-shaping-engine">
    <title>Why do I need a shaping engine?</title>
    <para>
      Text shaping is an integral part of preparing text for
      display. Before a Unicode sequence can be rendered, the
      codepoints in the sequence must be mapped to the glyphs
      provided in the font, and the glyphs must be positioned
      correctly relative to each other. For many of the scripts
      supported in Unicode, these steps involve script-specific layout
      rules.
    </para>
    <para>
      Text shaping is a fairly low-level operation. HarfBuzz is
      used directly by graphic rendering libraries such as Pango, as
      well as by the layout engines in Firefox, LibreOffice, and
      Chromium. Unless you are <emphasis>writing</emphasis> one of
      these layout engines yourself, you will probably not need to use
      HarfBuzz: normally, lower-level libraries will turn text into
      glyphs for you.
    </para>
    <para>
      However, if you <emphasis>are</emphasis> writing a layout engine
      or graphics library yourself, you will need to perform text
      shaping, and this is where HarfBuzz can help you.
    </para>
    <para>
      Here are some specific scenarios where a text-shaping engine
      like HarfBuzz helps you:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          OpenType fonts contain a set of glyphs (that is, shapes
	  to represent the letters, numbers, punctuation marks, and
	  all other symbols), which are indexed by a <literal>glyph ID</literal>.
	</para>
	<para>
          The glyph ID within the font does not necessarily correlate
	  to a predictable Unicode codepoint. For instance, some fonts
	  have the letter &quot;a&quot; as glyph ID 1, but many others do
	  not. To pull the right glyph out of the font in order to
	  display &quot;a&quot;, you need to consult the table inside
	  the font (the <literal>cmap</literal> table) that maps Unicode
	  codepoints to glyph IDs. In other words, <emphasis>text shaping turns
	  codepoints into glyph IDs</emphasis>.
        </para>
      </listitem>
      <listitem>
        <para>
          Many OpenType fonts contain ligatures: combinations of
          characters that are rendered as a single unit. For instance,
	  it is common for the <literal>fi</literal> letter
	  combination to appear in print as the single ligature glyph
	  &quot;ﬁ&quot;.
	</para>
	<para>
	  Whether you should render an &quot;f, i&quot; sequence
	  as <literal>fi</literal> or as &quot;ﬁ&quot; does not
          depend on the input text. Rather, it depends on the whether
	  or not the font includes an &quot;ﬁ&quot; glyph and on the
	  level of ligature application you wish to perform. The font
	  and the amount of ligature application used are under your
	  control. In other words, <emphasis>text shaping involves
	  querying the font's ligature tables and determining what
	  substitutions should be made</emphasis>. 
        </para>
      </listitem>
      <listitem>
        <para>
          While ligatures like &quot;ﬁ&quot; are optional typographic
          refinements, some languages <emphasis>require</emphasis> certain
          substitutions to be made in order to display text correctly.
        </para>
	<para>
	  For example, in Tamil, when the letter &quot;TTA&quot; (ட)
	  letter is followed by &quot;U&quot; (உ), the pair
	  must be replaced by the single glyph &quot;டு&quot;. The
	  sequence of Unicode characters &quot;டஉ&quot; needs to be
	  substituted with a single &quot;டு&quot; glyph from the
	  font.
	</para>
	<para>
	  But &quot;டு&quot; does not have a Unicode codepoint. To
	  find this glyph, you need to consult the table inside 
	  the font (the <literal>GSUB</literal> table) that contains
	  substitution information. In other words, <emphasis>text shaping 
	  chooses the correct glyph for a sequence of characters
	  provided</emphasis>.
        </para>
      </listitem>
      <listitem>
        <para>
          Similarly, each Arabic character has four different variants
	  corresponding to the different positions in might appear in
	  within a sequence. Inside a font, there will be separate
	  glyphs for the initial, medial, final, and isolated forms of
	  each letter, each at a different glyph ID.
	</para>
	<para>
	  Unicode only assigns one codepoint per character, so a
	  Unicode string will not tell you which glyph variant to use
	  for each character. To decide, you need to analyze the whole
	  string and determine the appropriate glyph for each character
	  based on its position. In other words, <emphasis>text
	  shaping chooses the correct form of the letter by its
	  position and returns the correct glyph from the font</emphasis>.
        </para>
      </listitem>
      <listitem>
        <para>
          Other languages involve marks and accents that need to be
          rendered in specific positions relative a base character. For
          instance, the Moldovan language includes the Cyrillic letter
          &quot;zhe&quot; (ж) with a breve accent, like so: &quot;ӂ&quot;.
	</para>
	<para>
	  Some fonts will provide this character as a single
	  zhe-with-breve glyph, but other fonts will not and, instead,
	  will expect the rendering engine to form the character by 
          superimposing the separate &quot;ж&quot; and &quot;˘&quot;
	  glyphs.
	</para>
	<para>
	  But exactly where you should draw the breve depends on the
	  height and width of the preceding zhe glyph. To find the
	  right position, you need to consult the table inside
	  the font (the <literal>GPOS</literal> table) that contains
	  positioning information.
          In other words, <emphasis>text shaping tells you whether you have a
          precomposed glyph within your font or if you need to compose a
          glyph yourself out of combining marks&mdash;and, if so, where to
          position those marks.</emphasis>
        </para>
      </listitem>
    </itemizedlist>
    <para>
      If tasks like these are something that you need to do, then you need a text
      shaping engine. You could use Uniscribe if you are writing
      Windows software; you could use CoreText on macOS; or you could
      use HarfBuzz.
    </para>
    <para>
      In the rest of this manual, we are going to assume that you are the
      implementor of a text-layout engine.
    </para>
  </section>
  <section id="why-is-it-called-harfbuzz">
    <title>Why is it called HarfBuzz?</title>
    <para>
      HarfBuzz began its life as text-shaping code within the FreeType
      project (and you will see references to the FreeType authors
      within the source code copyright declarations), but was then
      extracted out to its own project. This project is maintained by
      Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping
      engine for OpenType fonts - &quot;HarfBuzz&quot; is the Persian
      for &quot;open type&quot;.
    </para>
  </section>
</chapter>
Correct tag hierarchy, to allow for table-of-contents entries. 2015-08-31 11:39:10 +02:00			`<chapter id="what-is-harfbuzz">`
[docs] s/Harfbuzz/HarfBuzz/g 2017-11-21 00:07:48 +01:00			`<title>What is HarfBuzz?</title>`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`HarfBuzz is a <emphasis>text shaping engine</emphasis>. If you`
			`give HarfBuzz a font and a string containing a sequence of Unicode`
			`codepoints, HarfBuzz selects and positions the corresponding`
			`glyphs from the font, applying all of the necessary layout rules`
			`and font features. HarfBuzz then returns the string to you in the`
			`form that is correctly arranged for the language and writing`
			`system.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`<para>`
			`HarfBuzz can properly shape all of the world's major writing`
			`systems. It runs on virtually all operating systems and software`
			`platforms, and it supports all of the standard font formats in use`
			`today.`
			`</para>`
			`<section id="why-do-i-need-a-shaping-engine">`
			`<title>Why do I need a shaping engine?</title>`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`Text shaping is an integral part of preparing text for`
			`display. Before a Unicode sequence can be rendered, the`
			`codepoints in the sequence must be mapped to the glyphs`
			`provided in the font, and the glyphs must be positioned`
			`correctly relative to each other. For many of the scripts`
			`supported in Unicode, these steps involve script-specific layout`
			`rules.`
			`</para>`
			`<para>`
			`Text shaping is a fairly low-level operation. HarfBuzz is`
			`used directly by graphic rendering libraries such as Pango, as`
			`well as by the layout engines in Firefox, LibreOffice, and`
			`Chromium. Unless you are <emphasis>writing</emphasis> one of`
			`these layout engines yourself, you will probably not need to use`
			`HarfBuzz: normally, lower-level libraries will turn text into`
			`glyphs for you.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`<para>`
			`However, if you <emphasis>are</emphasis> writing a layout engine`
			`or graphics library yourself, you will need to perform text`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`shaping, and this is where HarfBuzz can help you.`
			`</para>`
			`<para>`
			`Here are some specific scenarios where a text-shaping engine`
			`like HarfBuzz helps you:`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`OpenType fonts contain a set of glyphs (that is, shapes`
			`to represent the letters, numbers, punctuation marks, and`
			`all other symbols), which are indexed by a <literal>glyph ID</literal>.`
			`</para>`
			`<para>`
			`The glyph ID within the font does not necessarily correlate`
			`to a predictable Unicode codepoint. For instance, some fonts`
			`have the letter "a" as glyph ID 1, but many others do`
			`not. To pull the right glyph out of the font in order to`
			`display "a", you need to consult the table inside`
			`the font (the <literal>cmap</literal> table) that maps Unicode`
			`codepoints to glyph IDs. In other words, <emphasis>text shaping turns`
			`codepoints into glyph IDs</emphasis>.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`Many OpenType fonts contain ligatures: combinations of`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`characters that are rendered as a single unit. For instance,`
			`it is common for the <literal>fi</literal> letter`
			`combination to appear in print as the single ligature glyph`
			`"ﬁ".`
			`</para>`
			`<para>`
			`Whether you should render an "f, i" sequence`
			`as <literal>fi</literal> or as "ﬁ" does not`
			`depend on the input text. Rather, it depends on the whether`
			`or not the font includes an "ﬁ" glyph and on the`
			`level of ligature application you wish to perform. The font`
			`and the amount of ligature application used are under your`
			`control. In other words, <emphasis>text shaping involves`
			`querying the font's ligature tables and determining what`
			`substitutions should be made</emphasis>.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`While ligatures like "ﬁ" are optional typographic`
			`refinements, some languages <emphasis>require</emphasis> certain`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`substitutions to be made in order to display text correctly.`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`</para>`
			`<para>`
			`For example, in Tamil, when the letter "TTA" (ட)`
			`letter is followed by "U" (உ), the pair`
			`must be replaced by the single glyph "டு". The`
			`sequence of Unicode characters "டஉ" needs to be`
			`substituted with a single "டு" glyph from the`
			`font.`
			`</para>`
			`<para>`
			`But "டு" does not have a Unicode codepoint. To`
			`find this glyph, you need to consult the table inside`
			`the font (the <literal>GSUB</literal> table) that contains`
			`substitution information. In other words, <emphasis>text shaping`
			`chooses the correct glyph for a sequence of characters`
			`provided</emphasis>.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`Similarly, each Arabic character has four different variants`
			`corresponding to the different positions in might appear in`
			`within a sequence. Inside a font, there will be separate`
			`glyphs for the initial, medial, final, and isolated forms of`
			`each letter, each at a different glyph ID.`
			`</para>`
			`<para>`
			`Unicode only assigns one codepoint per character, so a`
			`Unicode string will not tell you which glyph variant to use`
			`for each character. To decide, you need to analyze the whole`
			`string and determine the appropriate glyph for each character`
			`based on its position. In other words, <emphasis>text`
			`shaping chooses the correct form of the letter by its`
			`position and returns the correct glyph from the font</emphasis>.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`Other languages involve marks and accents that need to be`
			`rendered in specific positions relative a base character. For`
			`instance, the Moldovan language includes the Cyrillic letter`
			`"zhe" (ж) with a breve accent, like so: "ӂ".`
			`</para>`
			`<para>`
			`Some fonts will provide this character as a single`
			`zhe-with-breve glyph, but other fonts will not and, instead,`
			`will expect the rendering engine to form the character by`
			`superimposing the separate "ж" and "˘"`
			`glyphs.`
			`</para>`
			`<para>`
			`But exactly where you should draw the breve depends on the`
			`height and width of the preceding zhe glyph. To find the`
			`right position, you need to consult the table inside`
			`the font (the <literal>GPOS</literal> table) that contains`
			`positioning information.`
			`In other words, <emphasis>text shaping tells you whether you have a`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`precomposed glyph within your font or if you need to compose a`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`glyph yourself out of combining marks—and, if so, where to`
			`position those marks.</emphasis>`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
			`</listitem>`
			`</itemizedlist>`
			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`If tasks like these are something that you need to do, then you need a text`
			`shaping engine. You could use Uniscribe if you are writing`
			`Windows software; you could use CoreText on macOS; or you could`
			`use HarfBuzz.`
			`</para>`
			`<para>`
			`In the rest of this manual, we are going to assume that you are the`
			`implementor of a text-layout engine.`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`</para>`
Correct tag hierarchy, to allow for table-of-contents entries. 2015-08-31 11:39:10 +02:00			`</section>`
			`<section id="why-is-it-called-harfbuzz">`
[docs] s/Harfbuzz/HarfBuzz/g 2017-11-21 00:07:48 +01:00			`<title>Why is it called HarfBuzz?</title>`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`<para>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`HarfBuzz began its life as text-shaping code within the FreeType`
			`project (and you will see references to the FreeType authors`
			`within the source code copyright declarations), but was then`
			`extracted out to its own project. This project is maintained by`
[docs] s/Harfbuzz/HarfBuzz/g 2017-11-21 00:07:48 +01:00			`Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping`
			`engine for OpenType fonts - "HarfBuzz" is the Persian`
First two chapters. More to follow. 2015-08-25 20:57:15 +02:00			`for "open type".`
			`</para>`
Correct tag hierarchy, to allow for table-of-contents entries. 2015-08-31 11:39:10 +02:00			`</section>`
Docs: update Usermanual-What Is HarfBuzz. 2018-09-28 23:07:37 +02:00			`</chapter>`