Merge pull request #129 from simoncozens/docs

First two chapters. More to follow.
2015-08-31 09:53:16 +01:00 · 2015-08-31 09:53:16 +01:00 · c424b41705
parent 31594b98af 5470e744dd
commit c424b41705
6 changed files with 413 additions and 0 deletions
--- a/docs/usermanual-ch01.xml
+++ b/docs/usermanual-ch01.xml
@ -0,0 +1,115 @@
 <sect1 id="what-is-harfbuzz">
  <title>What is Harfbuzz?</title>
  <para>
    Harfbuzz is a <emphasis>text shaping engine</emphasis>. It solves
    the problem of selecting and positioning glyphs from a font given a
    Unicode string.
  </para>
  <sect2 id="why-do-i-need-it">
    <title>Why do I need it?</title>
    <para>
      Text shaping is an integral part of preparing text for display. It
      is a fairly low level operation; Harfbuzz is used directly by
      graphic rendering libraries such as Pango, and the layout engines
      in Firefox, LibreOffice and Chromium. Unless you are
      <emphasis>writing</emphasis> one of these layout engines yourself,
      you will probably not need to use Harfbuzz - normally higher level
      libraries will turn text into glyphs for you.
    </para>
    <para>
      However, if you <emphasis>are</emphasis> writing a layout engine
      or graphics library yourself, you will need to perform text
      shaping, and this is where Harfbuzz can help you. Here are some
      reasons why you need it:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          OpenType fonts contain a set of glyphs, indexed by glyph ID.
          The glyph ID within the font does not necessarily relate to a
          Unicode codepoint. For instance, some fonts have the letter
          &quot;a&quot; as glyph ID 1. To pull the right glyph out of
          the font in order to display it, you need to consult a table
          within the font (the &quot;cmap&quot; table) which maps
          Unicode codepoints to glyph IDs. Text shaping turns codepoints
          into glyph IDs.
        </para>
      </listitem>
      <listitem>
        <para>
          Many OpenType fonts contain ligatures: combinations of
          characters which are rendered together. For instance, it's
          common for the <literal>fi</literal> combination to appear in
          print as the single ligature &quot;ﬁ&quot;. Whether you should
          render text as <literal>fi</literal> or &quot;ﬁ&quot; does not
          depend on the input text, but on the capabilities of the font
          and the level of ligature application you wish to perform.
          Text shaping involves querying the font's ligature tables and
          determining what substitutions should be made.
        </para>
      </listitem>
      <listitem>
        <para>
          While ligatures like &quot;ﬁ&quot; are typographic
          refinements, some languages <emphasis>require</emphasis> such
          substitutions to be made in order to display text correctly.
          In Tamil, when the letter &quot;TTA&quot; (ட) letter is
          followed by &quot;U&quot; (உ), the combination should appear
          as the single glyph &quot;டு&quot;. The sequence of Unicode
          characters &quot;டஉ&quot; needs to be rendered as a single
          glyph from the font - text shaping chooses the correct glyph
          from the sequence of characters provided.
        </para>
      </listitem>
      <listitem>
        <para>
          Similarly, each Arabic character has four different variants:
          within a font, there will be glyphs for the initial, medial,
          final, and isolated forms of each letter. Unicode only encodes
          one codepoint per character, and so a Unicode string will not
          tell you which glyph to use. Text shaping chooses the correct
          form of the letter and returns the correct glyph from the font
          that you need to render.
        </para>
      </listitem>
      <listitem>
        <para>
          Other languages have marks and accents which need to be
          rendered in certain positions around a base character. For
          instance, the Moldovan language has the Cyrillic letter
          &quot;zhe&quot; (ж) with a breve accent, like so: ӂ. Some
          fonts will contain this character as an individual glyph,
          whereas other fonts will not contain a zhe-with-breve glyph
          but expect the rendering engine to form the character by
          overlaying the two glyphs ж and ˘. Where you should draw the
          combining breve depends on the height of the preceding glyph.
          Again, for Arabic, the correct positioning of vowel marks
          depends on the height of the character on which you are
          placing the mark. Text shaping tells you whether you have a
          precomposed glyph within your font or if you need to compose a
          glyph yourself out of combining marks, and if so, where to
          position those marks.
        </para>
      </listitem>
    </itemizedlist>
    <para>
      If this is something that you need to do, then you need a text
      shaping engine: you could use Uniscribe if you are using Windows;
      you could use CoreText on OS X; or you could use Harfbuzz. In the
      rest of this manual, we are going to assume that you are the
      implementor of a text layout engine.
    </para>
  </sect2>
  <sect2 id="why-is-it-called-harfbuzz">
    <title>Why is it called Harfbuzz?</title>
    <para>
      Harfbuzz began its life as text shaping code within the FreeType
      project, (and you will see references to the FreeType authors
      within the source code copyright declarations) but was then
      abstracted out to its own project. This project is maintained by
      Behdad Esfahbod, and named Harfbuzz. Originally, it was a shaping
      engine for OpenType fonts - &quot;Harfbuzz&quot; is the Persian
      for &quot;open type&quot;.
    </para>
  </sect2>
 </sect1>
--- a/docs/usermanual-ch02.xml
+++ b/docs/usermanual-ch02.xml
@ -0,0 +1,182 @@
 <sect1 id="hello-harfbuzz">
  <title>Hello, Harfbuzz</title>
  <para>
    Here's the simplest Harfbuzz that can possibly work. We will improve
    it later.
  </para>
  <orderedlist numeration="arabic">
    <listitem>
      <para>
        Create a buffer and put your text in it.
      </para>
    </listitem>
  </orderedlist>
  <programlisting language="C">
  #include &lt;hb.h&gt;
  hb_buffer_t *buf;
  buf = hb_buffer_create();
  hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
 </programlisting>
  <orderedlist numeration="arabic">
    <listitem override="2">
      <para>
        Guess the script, language and direction of the buffer.
      </para>
    </listitem>
  </orderedlist>
  <programlisting language="C">
  hb_buffer_guess_segment_properties(buf);
 </programlisting>
  <orderedlist numeration="arabic">
    <listitem override="3">
      <para>
        Create a face and a font, using FreeType for now.
      </para>
    </listitem>
  </orderedlist>
  <programlisting language="C">
  #include &lt;hb-ft.h&gt;
  FT_New_Face(ft_library, font_path, index, &amp;face)
  hb_font_t *font = hb_ft_font_create(face);
 </programlisting>
  <orderedlist numeration="arabic">
    <listitem override="4">
      <para>
        Shape!
      </para>
    </listitem>
  </orderedlist>
  <programlisting>
  hb_shape(font, buf, NULL, 0);
 </programlisting>
  <orderedlist numeration="arabic">
    <listitem override="5">
      <para>
        Get the glyph and position information.
      </para>
    </listitem>
  </orderedlist>
  <programlisting language="C">
  hb_glyph_info_t *glyph_info    = hb_buffer_get_glyph_infos(buf, &amp;glyph_count);
  hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &amp;glyph_count);
 </programlisting>
  <orderedlist numeration="arabic">
    <listitem override="6">
      <para>
        Iterate over each glyph.
      </para>
    </listitem>
  </orderedlist>
  <programlisting language="C">
  for (i = 0; i &lt; glyph_count; ++i) {
    glyphid = glyph_info[i].codepoint;
    x_offset = glyph_pos[i].x_offset / 64.0;
    y_offset = glyph_pos[i].y_offset / 64.0;
    x_advance = glyph_pos[i].x_advance / 64.0;
    y_advance = glyph_pos[i].y_advance / 64.0;
    draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset);
    cursor_x += x_advance;
    cursor_y += y_advance;
  }
 </programlisting>
  <orderedlist numeration="arabic">
    <listitem override="7">
      <para>
        Tidy up.
      </para>
    </listitem>
  </orderedlist>
  <programlisting language="C">
  hb_buffer_destroy(buf);
  hb_font_destroy(hb_ft_font);
 </programlisting>
  <sect2 id="what-harfbuzz-doesnt-do">
    <title>What Harfbuzz doesn't do</title>
    <para>
      The code above will take a UTF8 string, shape it, and give you the
      information required to lay it out correctly on a single
      horizontal (or vertical) line using the font provided. That is the
      extent of Harfbuzz's responsibility.
    </para>
    <para>
      If you are implementing a text layout engine you may have other
      responsibilities, that Harfbuzz will not help you with:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          Harfbuzz won't help you with bidirectionality. If you want to
          lay out text with mixed Hebrew and English, you will need to
          ensure that the buffer provided to Harfbuzz has those
          characters in the correct layout order. This will be different
          from the logical order in which the Unicode text is stored. In
          other words, the user will hit the keys in the following
          sequence:
        </para>
        <programlisting>
 A B C [space] ג ב א [space] D E F
        </programlisting>
        <para>
          but will expect to see in the output:
        </para>
        <programlisting>
 ABC אבג DEF
        </programlisting>
        <para>
          This reordering is called <emphasis>bidi processing</emphasis>
          (&quot;bidi&quot; is short for bidirectional), and there's an
          algorithm as an annex to the Unicode Standard which tells you how
          to reorder a string from logical order into presentation order.
          Before sending your string to Harfbuzz, you may need to apply the
          bidi algorithm to it. Libraries such as ICU and fribidi can do
          this for you.
        </para>
      <listitem>
        <para>
          Harfbuzz won't help you with text that contains different font
          properties. For instance, if you have the string &quot;a
          <emphasis>huge</emphasis> breakfast&quot;, and you expect
          &quot;huge&quot; to be italic, you will need to send three
          strings to Harfbuzz: <literal>a</literal>, in your Roman font;
          <literal>huge</literal> using your italic font; and
          <literal>breakfast</literal> using your Roman font again.
          Similarly if you change font, font size, script, language or
          direction within your string, you will need to shape each run
          independently and then output them independently. Harfbuzz
          expects to shape a run of characters sharing the same
          properties.
        </para>
      </listitem>
      <listitem>
        <para>
          Harfbuzz won't help you with line breaking, hyphenation or
          justification. As mentioned above, it lays out the string
          along a <emphasis>single line</emphasis> of, notionally,
          infinite length. If you want to find out where the potential
          word, sentence and line break points are in your text, you
          could use the ICU library's break iterator functions.
        </para>
        <para>
          Harfbuzz can tell you how wide a shaped piece of text is, which is
          useful input to a justification algorithm, but it knows nothing
          about paragraphs, lines or line lengths. Nor will it adjust the
          space between words to fit them proportionally into a line. If you
          want to layout text in paragraphs, you will probably want to send
          each word of your text to Harfbuzz to determine its shaped width
          after glyph substitutions, then work out how many words will fit
          on a line, and then finally output each word of the line separated
          by a space of the correct size to fully justify the paragraph.
        </para>
      </listitem>
    </itemizedlist>
    <para>
      As a layout engine implementor, Harfbuzz will help you with the
      interface between your text and your font, and that's something
      that you'll need - what you then do with the glyphs that your font
      returns is up to you. The example we saw above enough to get us
      started using Harfbuzz. Now we are going to use the remainder of
      Harfbuzz's API to refine that example and improve our text shaping
      capabilities.
    </para>
  </sect2>
 </sect1>
--- a/docs/usermanual-ch03.xml
+++ b/docs/usermanual-ch03.xml
@ -0,0 +1,77 @@
 <sect1 id="buffers-language-script-and-direction">
  <title>Buffers, language, script and direction</title>
  <para>
    The input to Harfbuzz is a series of Unicode characters, stored in a
    buffer. In this chapter, we'll look at how to set up a buffer with
    the text that we want and then customize the properties of the
    buffer.
  </para>
  <sect2 id="creating-and-destroying-buffers">
    <title>Creating and destroying buffers</title>
    <para>
      As we saw in our initial example, a buffer is created and
      initialized with <literal>hb_buffer_create()</literal>. This
      produces a new, empty buffer object, instantiated with some
      default values and ready to accept your Unicode strings.
    </para>
    <para>
      Harfbuzz manages the memory of objects that it creates (such as
      buffers), so you don't have to. When you have finished working on
      a buffer, you can call <literal>hb_buffer_destroy()</literal>:
    </para>
    <programlisting language="C">
  hb_buffer_t *buffer = hb_buffer_create();
  ...
  hb_buffer_destroy(buffer);
 </programlisting>
    <para>
      This will destroy the object and free its associated memory -
      unless some other part of the program holds a reference to this
      buffer. If you acquire a Harfbuzz buffer from another subsystem
      and want to ensure that it is not garbage collected by someone
      else destroying it, you should increase its reference count:
    </para>
    <programlisting language="C">
 void somefunc(hb_buffer_t *buffer) {
  buffer = hb_buffer_reference(buffer);
  ...
 </programlisting>
    <para>
      And then decrease it once you're done with it:
    </para>
    <programlisting language="C">
  hb_buffer_destroy(buffer);
 }
 </programlisting>
    <para>
      To throw away all the data in your buffer and start from scratch,
      call <literal>hb_buffer_reset(buffer)</literal>. If you want to
      throw away the string in the buffer but keep the options, you can
      instead call <literal>hb_buffer_clear_contents(buffer)</literal>.
    </para>
  </sect2>
  <sect2 id="adding-text-to-the-buffer">
    <title>Adding text to the buffer</title>
    <para>
      Now we have a brand new Harfbuzz buffer. Let's start filling it
      with text! From Harfbuzz's perspective, a buffer is just a stream
      of Unicode codepoints, but your input string is probably in one of
      the standard Unicode character encodings (UTF-8, UTF-16, UTF-3 )
    </para>
  </sect2>
  <sect2 id="setting-buffer-properties">
    <title>Setting buffer properties</title>
    <para>
    </para>
  </sect2>
  <sect2 id="what-about-the-other-scripts">
    <title>What about the other scripts?</title>
    <para>
    </para>
  </sect2>
  <sect2 id="customizing-unicode-functions">
    <title>Customizing Unicode functions</title>
    <para>
    </para>
  </sect2>
 </sect1>
--- a/docs/usermanual-ch04.xml
+++ b/docs/usermanual-ch04.xml
@ -0,0 +1,18 @@
 <sect1 id="fonts-and-faces">
  <title>Fonts and faces</title>
  <sect2 id="using-freetype">
    <title>Using FreeType</title>
    <para>
    </para>
  </sect2>
  <sect2 id="using-harfbuzzs-native-opentype-implementation">
    <title>Using Harfbuzz's native OpenType implementation</title>
    <para>
    </para>
  </sect2>
  <sect2 id="using-your-own-font-functions">
    <title>Using your own font functions</title>
    <para>
    </para>
  </sect2>
 </sect1>
--- a/docs/usermanual-ch05.xml
+++ b/docs/usermanual-ch05.xml
@ -0,0 +1,13 @@
 <sect1 id="shaping-and-shape-plans">
  <title>Shaping and shape plans</title>
  <sect2 id="opentype-features">
    <title>OpenType features</title>
    <para>
    </para>
  </sect2>
  <sect2 id="plans-and-caching">
    <title>Plans and caching</title>
    <para>
    </para>
  </sect2>
 </sect1>
--- a/docs/usermanual-ch06.xml
+++ b/docs/usermanual-ch06.xml
@ -0,0 +1,8 @@
 <sect1 id="glyph-information">
  <title>Glyph information</title>
  <sect2 id="names-and-numbers">
    <title>Names and numbers</title>
    <para>
    </para>
  </sect2>
 </sect1>