Usermanual: clusters chapter; add brief grapheme definition and clarify monotonous cluster handling.
This commit is contained in:
parent
939220e57d
commit
5fdf7b724e
|
@ -14,15 +14,29 @@
|
||||||
unit.
|
unit.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
During the shaping process, some shaping operations may
|
A cluster is distinct from a <emphasis>grapheme</emphasis>,
|
||||||
merge adjacent characters (for example, when two code points form
|
which is the smallest unit of a writing system or script,
|
||||||
a ligature and are replaced by a single glyph) or split one
|
because clusters are only relevant for script shaping and the
|
||||||
character into several (for example, when performing the Unicode
|
layout of glyphs.
|
||||||
canonical decomposition of a code point).
|
</para>
|
||||||
|
<para>
|
||||||
|
For example, a grapheme may be a letter, a number, a logogram,
|
||||||
|
or a symbol. When two letters form a ligature, however, they
|
||||||
|
combine into a single glyph. They are therefore part of the same
|
||||||
|
cluster and are treated as a unit — even though the two
|
||||||
|
original, underlying letters are separate graphemes.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
During the shaping process, there are several shaping operations
|
||||||
|
that may merge adjacent characters (for example, when two code
|
||||||
|
points form a ligature or a conjunct form and are replaced by a
|
||||||
|
single glyph) or split one character into several (for example,
|
||||||
|
when decomposing a code point through the
|
||||||
|
<literal>ccmp</literal> feature).
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
HarfBuzz tracks clusters independently from how these
|
HarfBuzz tracks clusters independently from how these
|
||||||
shaping operations alter the individual glyphs that comprise the
|
shaping operations affect the individual glyphs that comprise the
|
||||||
output HarfBuzz returns in a buffer. Consequently,
|
output HarfBuzz returns in a buffer. Consequently,
|
||||||
a client program using HarfBuzz can utilize the cluster
|
a client program using HarfBuzz can utilize the cluster
|
||||||
information to implement features such as:
|
information to implement features such as:
|
||||||
|
@ -69,15 +83,15 @@
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
<para>
|
<para>
|
||||||
When you add text to a HarfBuzz buffer, each code point is assigned
|
When you add text to a HarfBuzz buffer, each code point must be
|
||||||
a <emphasis>cluster value</emphasis>.
|
assigned a <emphasis>cluster value</emphasis>.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
This cluster value is an arbitrary number; HarfBuzz uses it only
|
This cluster value is an arbitrary number; HarfBuzz uses it only
|
||||||
to distinguish between clusters. Many client programs will use
|
to distinguish between clusters. Many client programs will use
|
||||||
the index of each code point in the input text stream as the
|
the index of each code point in the input text stream as the
|
||||||
cluster value, for the sake of convenience; the actual value does
|
cluster value. This is for the sake of convenience; the actual
|
||||||
not matter.
|
value does not matter.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
Client programs can choose how HarfBuzz handles clusters during
|
Client programs can choose how HarfBuzz handles clusters during
|
||||||
|
@ -100,7 +114,7 @@
|
||||||
as well as the <emphasis>Zero Width Joiner</emphasis> and
|
as well as the <emphasis>Zero Width Joiner</emphasis> and
|
||||||
<emphasis>Zero Width Non-Joiner</emphasis> code points, are
|
<emphasis>Zero Width Non-Joiner</emphasis> code points, are
|
||||||
assigned the cluster value of the closest preceding code
|
assigned the cluster value of the closest preceding code
|
||||||
point from <emphasis>diferent</emphasis> category.
|
point from <emphasis>different</emphasis> category.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
In essence, whenever a base character is followed by a mark
|
In essence, whenever a base character is followed by a mark
|
||||||
|
@ -160,23 +174,31 @@
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
<para>
|
||||||
|
As mentioned earlier, client programs using HarfBuzz often
|
||||||
|
assign initial cluster values in a buffer by reusing the indices
|
||||||
|
of the code points in the input text. This gives a sequence of
|
||||||
|
cluster values that is monotonically increasing (for example,
|
||||||
|
0,1,2,3,4,5).
|
||||||
|
</para>
|
||||||
<para>
|
<para>
|
||||||
It is not <emphasis>required</emphasis> that the cluster values
|
It is not <emphasis>required</emphasis> that the cluster values
|
||||||
in a buffer be monotonically increasing. However, if the initial
|
in a buffer be monotonically increasing. However, if the initial
|
||||||
cluster values in a buffer are monotonic and the buffer is
|
cluster values in a buffer are monotonic and the buffer is
|
||||||
configured to use clustering level 0 or 1, then HarfBuzz
|
configured to use cluster level 0 or 1, then HarfBuzz
|
||||||
guarantees that the final cluster values in the shaped buffer
|
guarantees that the final cluster values in the shaped buffer
|
||||||
will also be monotonic. No such guarantee is made for cluster
|
will also be monotonic. No such guarantee is made for cluster
|
||||||
level 2.
|
level 2.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
In levels 0 and 1, HarfBuzz implements the following conceptual model for
|
In levels 0 and 1, HarfBuzz implements the following conceptual
|
||||||
cluster values:
|
model for cluster values:
|
||||||
</para>
|
</para>
|
||||||
<itemizedlist spacing="compact">
|
<itemizedlist spacing="compact">
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
The sequence of cluster values will always remain monotonic.
|
If the sequence of input cluster values is monotonic, the
|
||||||
|
sequence of cluster values will remain monotonic.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
|
@ -231,7 +253,7 @@
|
||||||
</programlisting>
|
</programlisting>
|
||||||
<para>
|
<para>
|
||||||
During shaping, HarfBuzz maps these characters to glyphs from
|
During shaping, HarfBuzz maps these characters to glyphs from
|
||||||
the font. For simplicity, let's assume that each character maps
|
the font. For simplicity, let us assume that each character maps
|
||||||
to the corresponding, identical-looking glyph:
|
to the corresponding, identical-looking glyph:
|
||||||
</para>
|
</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
|
@ -297,7 +319,7 @@
|
||||||
0,1,2,3,4
|
0,1,2,3,4
|
||||||
</programlisting>
|
</programlisting>
|
||||||
<para>
|
<para>
|
||||||
If <literal>D</literal> is reordered before <literal>B</literal>,
|
If <literal>D</literal> is reordered to before <literal>B</literal>,
|
||||||
then HarfBuzz merges the <literal>B</literal>,
|
then HarfBuzz merges the <literal>B</literal>,
|
||||||
<literal>C</literal>, and <literal>D</literal> clusters, and we
|
<literal>C</literal>, and <literal>D</literal> clusters, and we
|
||||||
get:
|
get:
|
||||||
|
|
Loading…
Reference in New Issue