Usermanual: small updates.

2018-11-28 13:48:38 -06:00 · 2018-11-28 13:48:38 -06:00 · ed13caddf2
parent 26c5b54fb0
commit ed13caddf2
5 changed files with 315 additions and 78 deletions
--- a/docs/usermanual-buffers-language-script-and-direction.xml
+++ b/docs/usermanual-buffers-language-script-and-direction.xml
@ -15,14 +15,15 @@
  <section id="creating-and-destroying-buffers">
    <title>Creating and destroying buffers</title>
    <para>
-      As we saw in our initial example, a buffer is created and
+      As we saw in our <emphasis>Getting Started</emphasis> example, a
      buffer is created and 
      initialized with <literal>hb_buffer_create()</literal>. This
      produces a new, empty buffer object, instantiated with some
      default values and ready to accept your Unicode strings.
    </para>
    <para>
-      HarfBuzz manages the memory of objects that it creates (such as
+      HarfBuzz manages the memory of objects (such as buffers) that it
-      buffers), so you don't have to. When you have finished working on
+      creates, so you don't have to. When you have finished working on 
      a buffer, you can call <literal>hb_buffer_destroy()</literal>:
    </para>
    <programlisting language="C">
--- a/docs/usermanual-clusters.xml
+++ b/docs/usermanual-clusters.xml
@ -6,25 +6,41 @@
 ]>
 <chapter id="clusters">
  <title>Clusters</title>
-  <section id="clusters">
+  <section id="clusters-and-shaping">
-    <title>Clusters</title>
+    <title>Clusters and shaping</title>
    <para>
      In text shaping, a <emphasis>cluster</emphasis> is a sequence of
      characters that needs to be treated as a single, indivisible
-      unit.
+      unit. A single letter or symbol can be a cluster of its
      own. Other clusters correspond to longer subsequences of the
      input code points &mdash; such as a ligature or conjunct form
      &mdash; and require the shaper to ensure that the cluster is not
      broken during the shaping process.
    </para>
    <para>
      A cluster is distinct from a <emphasis>grapheme</emphasis>,
-      which is the smallest unit of a writing system or script,
+      which is the smallest unit of meaning in a writing system or
-      because clusters are only relevant for script shaping and the
+      script.
      layout of glyphs.
    </para>
    <para>
-      For example, a grapheme may be a letter, a number, a logogram,
+      The definitions of the two terms are similar. However, clusters
-      or a symbol. When two letters form a ligature, however, they
+      are only relevant for script shaping and glyph layout. In
-      combine into a single glyph. They are therefore part of the same
+      contrast, graphemes are a property of the underlying script, and
-      cluster and are treated as a unit &mdash; even though the two
+      are of interest when client programs implement orthographic 
-      original, underlying letters are separate graphemes.
+      or linguistic functionality.
    </para>
    <para>
      For example, two individual letters are often two separate
      graphemes. When two letters form a ligature, however, they
      combine into a single glyph. They are then part of the same
      cluster and are treated as a unit by the shaping engine &mdash;
      even though the two original, underlying letters remain separate
      graphemes.
    </para>
    <para>
      HarfBuzz is concerned with clusters, <emphasis>not</emphasis>
      with graphemes &mdash; although client programs using HarfBuzz
      may still care about graphemes for other reasons from time to time.
    </para>
    <para>
      During the shaping process, there are several shaping operations
@ -32,14 +48,15 @@
      points form a ligature or a conjunct form and are replaced by a
      single glyph) or split one character into several (for example,
      when decomposing a code point through the
-      <literal>ccmp</literal> feature).
+      <literal>ccmp</literal> feature). Operations like these alter
      clusters; HarfBuzz tracks the changes to ensure that no clusters
      get lost or broken during shaping. 
    </para>
    <para>
-      HarfBuzz tracks clusters independently from how these
+      HarfBuzz records cluster information independently from how
-      shaping operations affect the individual glyphs that comprise the
+      shaping operations affect the individual glyphs returned in an
-      output HarfBuzz returns in a buffer. Consequently,
+      output buffer. Consequently, a client program using HarfBuzz can
-      a client program using HarfBuzz can utilize the cluster
+      utilize the cluster information to implement features such as:
      information to implement features such as:
    </para>
    <itemizedlist>
      <listitem>
@ -77,11 +94,14 @@
 	<para>
 	  Performing line-breaking, justification, and other
 	  line-level or paragraph-level operations that must be done
-	  after shaping is complete, but which require character-level
+	  after shaping is complete, but which require examining
-	  properties.
+	  character-level properties.
 	</para>
      </listitem>
    </itemizedlist>
  </section>
  <section id="working-with-harfbuzz-clusters">
    <title>Working with HarfBuzz clusters</title>
    <para>
      When you add text to a HarfBuzz buffer, each code point must be
      assigned a <emphasis>cluster value</emphasis>.
@ -94,7 +114,65 @@
      value does not matter.
    </para>
    <para>
-      Client programs can choose how HarfBuzz handles clusters during
+      Some of the shaping operations performed by HarfBuzz &mdash;
      such as reordering, composition, decomposition, and substitution
      &mdash; may alter the cluster values of some characters. The
      final cluster values in the buffer at the end of the shaping
      process will indicate to client programs which subsequences of
      glyphs represent a cluster and, therefore, must not be
      separated.
    </para>
    <para>
      In addition, client programs can query the final cluster values
      to discern other potentially important information about the
      glyphs in the output buffer (such as whether or not a ligature
      was formed).
    </para>
    <para>
      For example, if the initial sequence of cluster values was:
    </para>
    <programlisting>
      0,1,2,3,4
    </programlisting>
    <para>
      and the final sequence of cluster values is:
    </para>
    <programlisting>
      0,0,3,3
    </programlisting>
    <para>
      then there are two clusters in the output buffer: the first
      cluster includes the first two glyphs, and the second cluster
      includes the third and fourth glyphs. It is also evident that a
      ligature or conjunct has been formed, because there are fewer
      glyphs in the output buffer (four) than there were code points
      in the input buffer (five).
    </para>
    <para>
      Although client programs using HarfBuzz are free to assign
      initial cluster values in any manner they choose to, HarfBuzz
      does offer some useful guarantees if the cluster values are
      assigned in a monotonic (either non-decreasing or non-increasing)
      order.
    </para>
    <para>
      For left-to-right scripts (LTR) and top-to-bottom scripts (TTB),
      HarfBuzz will preserve the monotonic property: client programs
      are guaranteed that monotonically increasing initial clulster
      values will be returned as monotonically increasing final
      cluster values.
    </para>
    <para>
      For right-to-left scripts (RTL) and bottom-to-top scripts (BTT),
      the directionality of the buffer itself is reversed for final
      output as a matter of design. Therefore, HarfBuzz inverts the
      monotonic property: client programs are guaranteed that
      monotonically increasing initial clulster values will be
      returned as monotonically <emphasis>decreasing</emphasis> final
      cluster values.
    </para>
    <para>
      Client programs can adjust how HarfBuzz handles clusters during
      shaping by setting the
      <literal>cluster_level</literal> of the
      buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
@ -179,7 +257,7 @@
      assign initial cluster values in a buffer by reusing the indices
      of the code points in the input text. This gives a sequence of
      cluster values that is monotonically increasing (for example,
-      0,1,2,3,4,5). 
+      0,1,2,3,4). 
    </para>
    <para>
      It is not <emphasis>required</emphasis> that the cluster values
@ -233,16 +311,44 @@
 	</para>
      </listitem>
    </itemizedlist>
  </section>
  <section id="a-clustering-example-for-levels-0-and-1">
    <title>A clustering example for levels 0 and 1</title>
    <para>
-      The guarantees and benefits of level 0 and level 1 can be seen
+      The basic shaping operations affect clusters in a predictable
-      with some examples. First, let us examine what happens with cluster
+      manner when using level 0 or level 1: 
      values when shaping involves cluster merging with ligatures and
      decomposition.
    </para>
    <itemizedlist>
      <listitem>
 	<para>
 	  When two or more clusters <emphasis>merge</emphasis>, the
 	  resulting merged cluster takes as its cluster value the
 	  <emphasis>minimum</emphasis> of the incoming cluster values.
 	</para>
      </listitem>
      <listitem>
 	<para>
 	  When a cluster <emphasis>decomposes</emphasis>, all of the
 	  resulting child clusters inherit as their cluster value the
 	  cluster value of the parent cluster.
 	</para>
      </listitem>
      <listitem>
 	<para>
 	  When a character is <emphasis>reordered</emphasis>, the
 	  reordered character and all clusters that the character
 	  moves past as part of the reordering are merged into one cluster.
 	</para>
      </listitem>
    </itemizedlist>
    <para>
      The functionality, guarantees, and benefits of level 0 and level
      1 behavior can be seen with some examples. First, let us examine
      what happens with cluster values when shaping involves cluster
      merging with ligatures and decomposition.
    </para>
    <para>
      Let's say we start with the following character sequence (top row) and
      initial cluster values (bottom row):
@ -279,8 +385,8 @@
    <para>
      Next, let us say that the <literal>BC</literal> ligature glyph
      decomposes into three components, and <literal>D</literal> also
-      decomposes into two components. These components each inherit the
+      decomposes into two components. Whenever a cluster decomposes,
-      cluster value of their parent: 
+      its components each inherit the cluster value of their parent: 
    </para>
    <programlisting>
      A,BC0,BC1,BC2,D0,D1,E
@ -295,6 +401,12 @@
      A,BC0,BC1,BC2D0,D1,E
      0,1  ,1  ,1    ,1 ,4
    </programlisting>
    <para>
      Note that the entirety of cluster 3 merges into cluster 1, not
      just the <literal>D0</literal> glyph. This reflects the fact
      that the cluster <emphasis>must</emphasis> be treated as an
      indivisible unit.
    </para>
    <para>
      At this point, cluster 1 means: the character sequence
      <literal>BCD</literal> is represented by glyphs
@ -319,18 +431,24 @@
      0,1,2,3,4
    </programlisting>
    <para>
-      If <literal>D</literal> is reordered to before <literal>B</literal>,
+      If <literal>D</literal> is reordered to the position immediately
-      then HarfBuzz merges the <literal>B</literal>,
+      before <literal>B</literal>, then HarfBuzz merges the
-      <literal>C</literal>, and <literal>D</literal> clusters, and we
+      <literal>B</literal>, <literal>C</literal>, and
-      get:
+      <literal>D</literal> clusters &mdash; all the clusters between
      the final position of the reordered glyph and its original
      position. This means that we get:
    </para>
    <programlisting>
      A,D,B,C,E
      0,1,1,1,4
    </programlisting>
    <para>
-      This is clearly not ideal, but it is the only sensible way to
+      as the final cluster sequence.
-      maintain a monotonic sequence of cluster values and retain the
+    </para>
    <para>
      Merging this many clusters is not ideal, but it is the only
      sensible way for HarfBuzz to maintain the guarantee that the
      sequence of cluster values remains monotonic and to retain the
      true relationship between glyphs and characters.
    </para>
  </section>
@ -340,8 +458,9 @@
      The preceding examples demonstrate the main effects of using
      cluster levels 0 and 1. The only difference between the two
      levels is this: in level 0, at the very beginning of the shaping
-      process, HarfBuzz also merges clusters between any base character
+      process, HarfBuzz merges the cluster of each base character
-      and all Unicode marks (combining or not) that follow it.
+      with the clusters of all Unicode marks (combining or not) and
      modifiers that follow it.
    </para>
    <para>
      For example, let us start with the following character sequence
@ -361,6 +480,10 @@
      A,acute,B
      0,0    ,2
    </programlisting>
    <para>
      This merger is performed before any other script-shaping
      steps.
    </para>
    <para>
      This initial cluster merging is the default behavior of the
      Windows shaping engine, and the old HarfBuzz codebase copied
@ -368,9 +491,10 @@
      remained the default behavior in the new HarfBuzz codebase.
    </para>
    <para>
-      But this initial cluster-merging behavior makes it impossible to
+      But this initial cluster-merging behavior makes it impossible
      client programs to implement some features (such as to
      color diacritic marks differently from their base
-      characters. That is why, in level 1, HarfBuzz does not perform
+      characters). That is why, in level 1, HarfBuzz does not perform
      the initial merging step.
    </para>
    <para>
@ -378,29 +502,34 @@
      perform cursor positioning, level 0 is more convenient. But
      relying on cluster boundaries for cursor positioning is wrong: cursor
      positions should be determined based on Unicode grapheme
-      boundaries, not on shaping-cluster boundaries. As such, level 1
+      boundaries, not on shaping-cluster boundaries. As such, using
-      clusters are preferred. 
+      level 1 clustering behavior is recommended. 
    </para>
    <para>
-      One last note about levels 0 and 1. HarfBuzz currently does not allow a
+      One final facet of levels 0 and 1 is worth noting. HarfBuzz
-      <literal>MultipleSubst</literal> lookup to replace a glyph with zero
+      currently does not allow any
-      glyphs (in other words, to delete a glyph). But, in some other situations,
+      <emphasis>multiple-substitution</emphasis> GSUB lookups to 
-      glyphs can be deleted. In those cases, if the glyph being deleted is
+      replace a glyph with zero glyphs (in other words, to delete a
-      the last glyph of its cluster, HarfBuzz makes sure to merge the cluster
+      glyph).
-      with a neighboring cluster.
+    </para>
    <para>
      But, in some other situations, glyphs can be deleted. In
      those cases, if the glyph being deleted is the last glyph of its
      cluster, HarfBuzz makes sure to merge the deleted glyph's
      cluster with a neighboring cluster.
    </para>
    <para>
      This is done primarily to make sure that the starting cluster of the
      text always has the cluster index pointing to the start of the text
-      for the run; more than one client currently relies on this
+      for the run; more than one client program currently relies on this
      guarantee.
    </para>
    <para>
-      Incidentally, Apple's CoreText does something else to maintain the
+      Incidentally, Apple's CoreText does something different to
-      same promise: it inserts a glyph with id 65535 at the beginning of
+      maintain the same promise: it inserts a glyph with id 65535 at
-      the glyph string if the glyph corresponding to the first character
+      the beginning of the glyph string if the glyph corresponding to
-      in the run was deleted. HarfBuzz might do something similar in the
+      the first character in the run was deleted. HarfBuzz might do
-      future.
+      something similar in the future.
    </para>
  </section>
  <section id="level-2">
@ -415,16 +544,39 @@
      performs no merging of clusters whatsoever.
    </para>
    <para>
-      When glyphs form a ligature (or when some other feature
+      This means that there is no initial base-and-mark merging step
-      substitutes multiple glyphs with one glyph), the cluster value
+      (as is done in level 0), and it means that reordering moves and
-      of the first glyph is retained as the cluster value for the
+      ligature substitutions do not trigger a cluster merge.
      ligature. However, no subsequent clusters &mdash; including
      marks and modifiers &mdash; are affected.
    </para>
    <para>
-      Level 2 cluster behavior is less complex than level 0 or level
+      Only one shaping operation directly affects clusters when using
-      1, but there are a few cases in which processing cluster values
+      level 2:
-      produced at level 2 may be tricky. 
+    </para>
    <itemizedlist>
      <listitem>
 	<para>
 	  When a cluster <emphasis>decomposes</emphasis>, all of the
 	  resulting child clusters inherit as their cluster value the
 	  cluster value of the parent cluster.
 	</para>
      </listitem>
    </itemizedlist>
    <para>
      When glyphs do form a ligature (or when some other feature
      substitutes multiple glyphs with one glyph) the cluster value
      of the first glyph is retained as the cluster value for the
      resulting ligature.
    </para>
    <para>
      This occurrence sounds similar to a cluster merge, but it is
      different. In particular, no subsequent characters &mdash;
      including marks and modifiers &mdash; are affected. They retain
      their previous cluster values. 
    </para>
    <para>
      Level 2 cluster behavior is ultimately less complex than level 0
      or level 1, but there are several cases for which processing
      cluster values produced at level 2 may be tricky. 
    </para>
    <section id="ligatures-with-combining-marks-in-level-2">
      <title>Ligatures with combining marks in level 2</title>
@ -532,10 +684,11 @@
      <para>
 	There may be other problems encountered with ligatures under
 	level 2, such as if the direction of the text is forced to
-	opposite of its natural direction (for example, left-to-right
+	opposite of its natural direction (for example, Arabic text
-	Arabic). But, generally speaking, these other scenarios are
+	that is forced into left-to-right directionality). But,
-	minor corner cases that are too obscure for most client
+	generally speaking, these other scenarios are minor corner
-	programs to need to worry about.
+	cases that are too obscure for most client programs to need to
 	worry about.
      </para>
    </section>
  </section>
--- a/docs/usermanual-getting-started.xml
+++ b/docs/usermanual-getting-started.xml
@ -76,12 +76,41 @@
  <section>
    <title>Terminology</title>
      <variablelist>
 	<?dbfo list-presentation="blocks"?> 
 	<varlistentry>
 	  <term>script</term>
 	  <listitem>
 	    <para>
 	      In text shaping, a <emphasis>script</emphasis> is a
 	      writing system: a set of symbols, rules, and conventions
 	      that is used to represent a language or multiple
 	      languages.
 	    </para>
 	    <para>
 	      In general computing lingo, the word "script" can also
 	      be used to mean an executable program (usually one
 	      written in a human-readable programming language). For
 	      the sake of clarity, HarfBuzz documents will always use
 	      more specific terminology when referring to this
 	      meaning, such as "Python script" or "shell script." In
 	      all other instances, "script" refers to a writing system.
 	    </para>
 	    <para>
 	      For developers using HarfBuzz, it is important to note
 	      the distinction between a script and a language. Most
 	      scripts are used to write a variety of different
 	      languages, and many languages may be written in more
 	      than one script.
 	    </para>
 	  </listitem>
 	</varlistentry>
 	<varlistentry>
 	  <term>shaper</term>
 	  <listitem>
 	    <para>
 	      In HarfBuzz, a <emphasis>shaper</emphasis> is a
-	      handler for a specific script shaping model. HarfBuzz
+	      handler for a specific script-shaping model. HarfBuzz
 	      implements separate shapers for Indic, Arabic, Thai and
 	      Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the
 	      Universal Shaping Engine (USE), and a default shaper for
@ -95,12 +124,12 @@
 	  <listitem>
 	    <para>
 	      In text shaping, a <emphasis>cluster</emphasis> is a
-	      sequence of codepoints that must be handled as an
+	      sequence of codepoints that must be treated as an
-	      indivisible unit. Clusters can include codepoint
+	      indivisible unit. Clusters can include code-point
 	      sequences that form a ligature or base-and-mark
 	      sequences. Tracking and preserving clusters is important
 	      when shaping operations might separate or reorder
-	      codepoints.
+	      code points.
 	    </para>
 	    <para>
 	      HarfBuzz provides three cluster
@ -111,7 +140,59 @@
 	  </listitem>
 	</varlistentry>
-
+	<varlistentry>
 	  <term>grapheme</term>
 	  <listitem>
 	    <para>
 	      In linguistics, a <emphasis>grapheme</emphasis> is one
 	      of the indivisible units that make up a writing system or
 	      script. Often, graphemes are individual symbols (letters,
 	      numbers, punctuation marks, logograms, etc.) but,
 	      depending on the writing system, a particular grapheme
 	      might correspond to a sequence of several Unicode code
 	      points.
 	    </para>
 	    <para>
 	      In practice, HarfBuzz and other text-shaping engines
 	      are not generally concerned with graphemes. However, it
 	      is important for developers using HarfBuzz to recognize
 	      that there is a difference between graphemes and shaping
 	      clusters (see above). The two concepts may overlap
 	      frequently, but there is no guarantee that they will be
 	      identical.
 	    </para>
 	  </listitem>
 	</varlistentry>
 	<varlistentry>
 	  <term>syllable</term>
 	  <listitem>
 	    <para>
 	      In linguistics, a <emphasis>syllable</emphasis> is an 
 	      a sequence of sounds that makes up a building block of a
 	      particular language. Every language has its own set of
 	      rules describing what constitutes a valid syllable.
 	    </para>
 	    <para>
 	      For text-shaping purposes, the various definitions of
 	      "syllable" are important because script-specific shaping
 	      operations may be applied at the syllable level. For
 	      example, a reordering rule might specify that a vowel
 	      mark be reordered to the beginning of the syllable.
 	    </para>
 	    <para>
 	      Syllables will consist of one or more Unicode code
 	      points. The definition of a syllable for a particular
 	      writing system might correspond to how HarfBuzz
 	      identifies clusters (see above) for the same writing
 	      system. However, it is important for developers using
 	      HarfBuzz to recognize that there is a difference between
 	      syllables and shaping clusters. The two concepts may
 	      overlap frequently, but there is no guarantee that they
 	      will be identical.
 	    </para>
 	  </listitem>
 	</varlistentry>
      </variablelist>
  </section>
--- a/docs/usermanual-install-harfbuzz.xml
+++ b/docs/usermanual-install-harfbuzz.xml
@ -126,7 +126,7 @@
      </para>
      <para>
 	If you need to build HarfBuzz from source, first put the
-	<program>ragel</program> binary on your
+	<package>ragel</package> binary on your
 	<literal>PATH</literal>, then follow the appveyor CI cmake
 	<ulink
 	    url="https://github.com/harfbuzz/harfbuzz/blob/master/appveyor.yml">build
@ -229,6 +229,7 @@
      </para>
      <variablelist>
 	<?dbfo list-presentation="blocks"?> 
 	<varlistentry>
 	  <term>--with-libstdc++</term>
 	  <listitem>
--- a/docs/usermanual-shaping-concepts.xml
+++ b/docs/usermanual-shaping-concepts.xml
@ -182,22 +182,23 @@
      Southeast Asian scripts are also assigned
      <emphasis>Unicode Indic Syllabic Category</emphasis> (UISC) and
      <emphasis>Unicode Indic Positional Category</emphasis> (UIPC)
-      property that provides more detailed information needed for
+      properties that provide more detailed information needed for
      shaping.
    </para>
    <para>
      The UISC property sub-categorizes Letters and Marks according to
      common script-shaping behaviors. For example, UISC distinguishes
      between consonant letters, vowel letters, and vowel marks. The
-      UIPC property sub-categorizes Mark codepoints by the visual
+      UIPC property sub-categorizes Mark codepoints by the relative visual
      position that they occupy (above, below, right, left, or in
      multiple positions).
    </para>
    <para>
      Some complex scripts require that the text run be split into
-      syllables, and what constitutes a valid syllable in these
+      syllables. What constitutes a valid syllable in these
-      scripts is specified in regular expressions of the Letter and
+      scripts is specified in regular expressions, formed from the
-      Mark codepoints that take the UISC and UIPC properties into account.
+      Letter and Mark codepoints, that take the UISC and UIPC
      properties into account.
    </para>
  </section>