From 5fdf7b724eb3cb5ac60cd7f90d3250877ad7ca06 Mon Sep 17 00:00:00 2001 From: Nathan Willis Date: Thu, 15 Nov 2018 17:40:21 -0600 Subject: [PATCH] Usermanual: clusters chapter; add brief grapheme definition and clarify monotonous cluster handling. --- docs/usermanual-clusters.xml | 56 +++++++++++++++++++++++++----------- 1 file changed, 39 insertions(+), 17 deletions(-) diff --git a/docs/usermanual-clusters.xml b/docs/usermanual-clusters.xml index c59818fc4..f48e89c20 100644 --- a/docs/usermanual-clusters.xml +++ b/docs/usermanual-clusters.xml @@ -14,15 +14,29 @@ unit. - During the shaping process, some shaping operations may - merge adjacent characters (for example, when two code points form - a ligature and are replaced by a single glyph) or split one - character into several (for example, when performing the Unicode - canonical decomposition of a code point). + A cluster is distinct from a grapheme, + which is the smallest unit of a writing system or script, + because clusters are only relevant for script shaping and the + layout of glyphs. + + + For example, a grapheme may be a letter, a number, a logogram, + or a symbol. When two letters form a ligature, however, they + combine into a single glyph. They are therefore part of the same + cluster and are treated as a unit — even though the two + original, underlying letters are separate graphemes. + + + During the shaping process, there are several shaping operations + that may merge adjacent characters (for example, when two code + points form a ligature or a conjunct form and are replaced by a + single glyph) or split one character into several (for example, + when decomposing a code point through the + ccmp feature). HarfBuzz tracks clusters independently from how these - shaping operations alter the individual glyphs that comprise the + shaping operations affect the individual glyphs that comprise the output HarfBuzz returns in a buffer. Consequently, a client program using HarfBuzz can utilize the cluster information to implement features such as: @@ -69,15 +83,15 @@ - When you add text to a HarfBuzz buffer, each code point is assigned - a cluster value. + When you add text to a HarfBuzz buffer, each code point must be + assigned a cluster value. This cluster value is an arbitrary number; HarfBuzz uses it only to distinguish between clusters. Many client programs will use the index of each code point in the input text stream as the - cluster value, for the sake of convenience; the actual value does - not matter. + cluster value. This is for the sake of convenience; the actual + value does not matter. Client programs can choose how HarfBuzz handles clusters during @@ -100,7 +114,7 @@ as well as the Zero Width Joiner and Zero Width Non-Joiner code points, are assigned the cluster value of the closest preceding code - point from diferent category. + point from different category. In essence, whenever a base character is followed by a mark @@ -160,23 +174,31 @@ + + As mentioned earlier, client programs using HarfBuzz often + assign initial cluster values in a buffer by reusing the indices + of the code points in the input text. This gives a sequence of + cluster values that is monotonically increasing (for example, + 0,1,2,3,4,5). + It is not required that the cluster values in a buffer be monotonically increasing. However, if the initial cluster values in a buffer are monotonic and the buffer is - configured to use clustering level 0 or 1, then HarfBuzz + configured to use cluster level 0 or 1, then HarfBuzz guarantees that the final cluster values in the shaped buffer will also be monotonic. No such guarantee is made for cluster level 2. - In levels 0 and 1, HarfBuzz implements the following conceptual model for - cluster values: + In levels 0 and 1, HarfBuzz implements the following conceptual + model for cluster values: - The sequence of cluster values will always remain monotonic. + If the sequence of input cluster values is monotonic, the + sequence of cluster values will remain monotonic. @@ -231,7 +253,7 @@ During shaping, HarfBuzz maps these characters to glyphs from - the font. For simplicity, let's assume that each character maps + the font. For simplicity, let us assume that each character maps to the corresponding, identical-looking glyph: @@ -297,7 +319,7 @@ 0,1,2,3,4 - If D is reordered before B, + If D is reordered to before B, then HarfBuzz merges the B, C, and D clusters, and we get: