2010-03-04 13:13:53 +01:00
|
|
|
Hyphen - hyphenation library to use converted TeX hyphenation patterns
|
|
|
|
|
|
|
|
(C) 1998 Raph Levien
|
|
|
|
(C) 2001 ALTLinux, Moscow
|
2012-06-29 09:10:58 +02:00
|
|
|
(C) 2006, 2007, 2008, 2010, 2011 László Németh
|
2010-03-04 13:13:53 +01:00
|
|
|
|
|
|
|
This was part of libHnj library by Raph Levien.
|
|
|
|
|
|
|
|
Peter Novodvorsky from ALTLinux cut hyphenation part from libHnj
|
|
|
|
to use it in OpenOffice.org.
|
|
|
|
|
|
|
|
Compound word and non-standard hyphenation support by László Németh.
|
|
|
|
|
|
|
|
License is the original LibHnj license:
|
|
|
|
LibHnj is dual licensed under LGPL and MPL (see also README.libhnj).
|
|
|
|
|
|
|
|
Because LGPL allows GPL relicensing, COPYING contains now
|
|
|
|
LGPL/GPL/MPL tri-license for explicit Mozilla source compatibility.
|
|
|
|
|
|
|
|
Original Libhnj source with OOo's patches are managed by Rene Engelhard
|
|
|
|
and Chris Halls at Debian:
|
|
|
|
|
|
|
|
http://packages.debian.org/stable/libdevel/libhnj-dev
|
|
|
|
and http://packages.debian.org/unstable/source/libhnj
|
|
|
|
|
|
|
|
|
|
|
|
OTHER FILES
|
|
|
|
|
|
|
|
This distribution is the source of the en_US hyphenation patterns
|
|
|
|
"hyph_en_US.dic", too. See README_hyph_en_US.txt.
|
|
|
|
|
|
|
|
Source files of hyph_en_US.dic in the distribution:
|
|
|
|
|
|
|
|
hyphen.tex (en_US hyphenation patterns from plain TeX)
|
|
|
|
|
|
|
|
Source: http://tug.ctan.org/text-archive/macros/plain/base/hyphen.tex
|
|
|
|
|
|
|
|
tbhyphext.tex: hyphenation exception log from TugBoat archive
|
|
|
|
|
|
|
|
Source of the hyphenation exception list:
|
|
|
|
http://www.ctan.org/tex-archive/info/digests/tugboat/tb0hyf.tex
|
|
|
|
|
|
|
|
Generated with the hyphenex script
|
|
|
|
(http://www.ctan.org/tex-archive/info/digests/tugboat/hyphenex.sh)
|
|
|
|
|
|
|
|
sh hyphenex.sh <tb0hyf.tex >tbhyphext.tex
|
|
|
|
|
|
|
|
|
|
|
|
INSTALLATION
|
|
|
|
|
|
|
|
./configure
|
|
|
|
make
|
|
|
|
make install
|
|
|
|
|
|
|
|
UNIT TESTS (WITH VALGRIND DEBUGGER)
|
|
|
|
|
|
|
|
make check
|
|
|
|
VALGRIND=memcheck make check
|
|
|
|
|
|
|
|
USAGE
|
|
|
|
|
|
|
|
./example hyph_en_US.dic mywords.txt
|
|
|
|
|
|
|
|
or (under Linux)
|
|
|
|
|
|
|
|
echo example | ./example hyph_en_US.dic /dev/stdin
|
|
|
|
|
|
|
|
NOTE: In the case of Unicode encoded input, convert your words
|
|
|
|
to lowercase before hyphenation (under UTF-8 console environment):
|
|
|
|
|
|
|
|
cat mywords.txt | awk '{print tolower($0)}' >mywordslow.txt
|
|
|
|
|
|
|
|
DEVELOPMENT
|
|
|
|
|
|
|
|
See README.hyphen for hyphenation algorithm, README.nonstandard
|
|
|
|
and doc/tb87nemeth.pdf for non-standard hyphenation,
|
|
|
|
README.compound for compound word hyphenation, and tests/*.
|
|
|
|
|
|
|
|
Description of the dictionary format:
|
|
|
|
|
|
|
|
First line contains the character encoding (ISO8859-x, UTF-8).
|
|
|
|
|
|
|
|
Possible options in the following lines:
|
|
|
|
|
|
|
|
LEFTHYPHENMIN num minimal hyphenation distance from the left word end
|
|
|
|
RIGHTHYPHENMIN num minimal hyphation distance from the right word end
|
|
|
|
COMPOUNDLEFTHYPHENMIN num min. hyph. dist. from the left compound word boundary
|
|
|
|
COMPOUNDRIGHTHYPHENMIN num min. hyph. dist. from the right comp. word boundary
|
|
|
|
|
|
|
|
hyphenation patterns see README.* files
|
|
|
|
|
|
|
|
NEXTWORD separate the two compound sets (see README.compound)
|
|
|
|
|
|
|
|
Default values:
|
|
|
|
Without explicite declarations, hyphenmin fields of dict struct
|
|
|
|
are zeroes, but in this case the lefthyphenmin and righthyphenmin
|
|
|
|
will be the default 2 under the hyphenation (for backward compatibility).
|
|
|
|
|
|
|
|
Comments
|
|
|
|
|
|
|
|
Use percent sign at the beginning of the lines to add comments to your
|
|
|
|
hpyhenation patterns (after the character encoding in the first line):
|
|
|
|
|
|
|
|
% comment
|
|
|
|
|
|
|
|
*****************************************************************************
|
|
|
|
* Warning! Correct working of Libhnj *needs* prepared hyphenation patterns. *
|
|
|
|
|
|
|
|
For example, generating hyph_en_US.dic from "hyphen.us" TeX patterns:
|
|
|
|
|
|
|
|
perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1
|
|
|
|
|
|
|
|
or with default LEFTHYPHENMIN and RIGHTHYPHENMIN values:
|
|
|
|
|
|
|
|
perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 2 3
|
|
|
|
perl substrings.pl hyphen.gb hyph_en_GB.dic ISO8859-1 3 3
|
|
|
|
****************************************************************************
|
|
|
|
|
|
|
|
OTHERS
|
|
|
|
|
|
|
|
Java hyphenation: Peter B. West (Folio project) implements a hyphenator with
|
|
|
|
non standard hyphenation facilities based on extended Libhnj. The HyFo module
|
|
|
|
is released in binary form as jar files and in source form as zip files.
|
|
|
|
See http://sourceforge.net/project/showfiles.php?group_id=119136
|
|
|
|
|
|
|
|
László Németh
|
2012-06-29 09:10:58 +02:00
|
|
|
<nemeth (at) numbertext (dot) org>
|