diff --git a/docs/usermanual-clusters.xml b/docs/usermanual-clusters.xml
index c59818fc4..f48e89c20 100644
--- a/docs/usermanual-clusters.xml
+++ b/docs/usermanual-clusters.xml
@@ -14,15 +14,29 @@
unit.
- During the shaping process, some shaping operations may
- merge adjacent characters (for example, when two code points form
- a ligature and are replaced by a single glyph) or split one
- character into several (for example, when performing the Unicode
- canonical decomposition of a code point).
+ A cluster is distinct from a grapheme,
+ which is the smallest unit of a writing system or script,
+ because clusters are only relevant for script shaping and the
+ layout of glyphs.
+
+
+ For example, a grapheme may be a letter, a number, a logogram,
+ or a symbol. When two letters form a ligature, however, they
+ combine into a single glyph. They are therefore part of the same
+ cluster and are treated as a unit — even though the two
+ original, underlying letters are separate graphemes.
+
+
+ During the shaping process, there are several shaping operations
+ that may merge adjacent characters (for example, when two code
+ points form a ligature or a conjunct form and are replaced by a
+ single glyph) or split one character into several (for example,
+ when decomposing a code point through the
+ ccmp feature).
HarfBuzz tracks clusters independently from how these
- shaping operations alter the individual glyphs that comprise the
+ shaping operations affect the individual glyphs that comprise the
output HarfBuzz returns in a buffer. Consequently,
a client program using HarfBuzz can utilize the cluster
information to implement features such as:
@@ -69,15 +83,15 @@
- When you add text to a HarfBuzz buffer, each code point is assigned
- a cluster value.
+ When you add text to a HarfBuzz buffer, each code point must be
+ assigned a cluster value.
This cluster value is an arbitrary number; HarfBuzz uses it only
to distinguish between clusters. Many client programs will use
the index of each code point in the input text stream as the
- cluster value, for the sake of convenience; the actual value does
- not matter.
+ cluster value. This is for the sake of convenience; the actual
+ value does not matter.
Client programs can choose how HarfBuzz handles clusters during
@@ -100,7 +114,7 @@
as well as the Zero Width Joiner and
Zero Width Non-Joiner code points, are
assigned the cluster value of the closest preceding code
- point from diferent category.
+ point from different category.
In essence, whenever a base character is followed by a mark
@@ -160,23 +174,31 @@
+
+ As mentioned earlier, client programs using HarfBuzz often
+ assign initial cluster values in a buffer by reusing the indices
+ of the code points in the input text. This gives a sequence of
+ cluster values that is monotonically increasing (for example,
+ 0,1,2,3,4,5).
+
It is not required that the cluster values
in a buffer be monotonically increasing. However, if the initial
cluster values in a buffer are monotonic and the buffer is
- configured to use clustering level 0 or 1, then HarfBuzz
+ configured to use cluster level 0 or 1, then HarfBuzz
guarantees that the final cluster values in the shaped buffer
will also be monotonic. No such guarantee is made for cluster
level 2.
- In levels 0 and 1, HarfBuzz implements the following conceptual model for
- cluster values:
+ In levels 0 and 1, HarfBuzz implements the following conceptual
+ model for cluster values:
- The sequence of cluster values will always remain monotonic.
+ If the sequence of input cluster values is monotonic, the
+ sequence of cluster values will remain monotonic.
@@ -231,7 +253,7 @@
During shaping, HarfBuzz maps these characters to glyphs from
- the font. For simplicity, let's assume that each character maps
+ the font. For simplicity, let us assume that each character maps
to the corresponding, identical-looking glyph:
@@ -297,7 +319,7 @@
0,1,2,3,4
- If D is reordered before B,
+ If D is reordered to before B,
then HarfBuzz merges the B,
C, and D clusters, and we
get: