58 lines
2.0 KiB
Plaintext
58 lines
2.0 KiB
Plaintext
Compound word hyphenation
|
|
|
|
Hyphen library supports better compound word hyphenation and special
|
|
rules of compound word hyphenation of German languages and other
|
|
languages with arbitrary number of compound words. The new options,
|
|
COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right
|
|
style for the hyphenation of compound words.
|
|
|
|
Algorithm
|
|
|
|
The algorithm is an extension of the original pattern based hyphenation
|
|
algorithm. It uses two hyphenation pattern sets, defined in the same
|
|
pattern file and separated by the NEXTLEVEL keyword. First pattern
|
|
set is for hyphenation only at compound word boundaries, the second one
|
|
is for hyphenation within words or word parts.
|
|
|
|
Recursive compound level hyphenation
|
|
|
|
The algorithm is recursive: every word parts of a successful
|
|
first (compound) level hyphenation will be rehyphenated
|
|
by the same (first) pattern set.
|
|
|
|
Finally, when first level hyphenation is not possible, Hyphen uses
|
|
the second level hyphenation for the word or the word parts.
|
|
|
|
Word endings and word parts
|
|
|
|
Patterns for word endings (patterns with ellipses) match the
|
|
word parts, too.
|
|
|
|
Options
|
|
|
|
COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary
|
|
COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary
|
|
NEXTLEVEL: sign second level hyphenation patterns
|
|
|
|
Default hyphenmin values
|
|
|
|
Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0,
|
|
and 0 under the hyphenation, too. ("0" values of
|
|
LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.)
|
|
|
|
Examples
|
|
|
|
See tests/compound* test files.
|
|
|
|
Preparation of hyphenation patterns
|
|
|
|
It hasn't been special pattern generator tool for compound hyphenation
|
|
patterns, yet. It is possible to use PATGEN to generate both of
|
|
pattern sets, concatenate it manually and set the requested HYPHENMIN values.
|
|
(But don't forget the preprocessing steps by substrings.pl before
|
|
concatenation.) One of the disadvantage of this method, that PATGEN
|
|
doesn't know recursive compound hyphenation of Hyphen.
|
|
|
|
László Németh
|
|
<nemeth (at) openoffice.org>
|