[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] Spanish hyphenation
From: |
Paco Andres Verdu |
Subject: |
Re: [Groff] Spanish hyphenation |
Date: |
Sun, 10 Sep 2000 21:16:13 +0200 (CEST) |
Hi,
On Thu, 7 Sep 2000 address@hidden wrote:
> Does anyone out there understand TeX hyphenation files well enough
> to say what a Spanish hyphenation file for groff -- say "hyphen.es"
> (corresponding to groff's US hypehantion file ../groff/tmac/hyphen.us,
> identical to the patterns file ushyph1.tex in TeX) should be like?
>
> I have been contemplating TeX's sphyph.tex without being able to make
> too much sense of it.
The shyphen.sh shell script, which is used to generate the sphyph.tex file
usually shipped with TeX distributions is attached to this message, you can get
the original versions of both files at
ftp://ftp.cdrom.com/pub/tex/ctan/languages/spanish/hyphen/
Maybe you find easier to modify the script that generates the
hyphenation file rather than the hyphenation file itself.
I've used groff for Spanish documents but without hyphenation, and
I've never used hyphenation neither in groff nor TeX, so I'll be unable
to help with the development. But if it is helpful I can test your
hyphenation file with some of my Spanish language groff files and tell you
if the result is right, or which are the flaws.
Paco
--
Saludos
-----------------------------------------------------------------------------
Paco Andrés Verdú address@hidden
Alicante (Spain)
#!/bin/sh
# file: shyphen.sh Version: 1.2
# Got at: 91/09/25 13:23:04
# Delta made: 91/09/25 13:23:03
version=1.2
# This script generates TeX hyphenation patterns for Spanish
# This script is Copyright (c) GMV, 1991
# The copyright notice below applies to this script as well,
# read it before using this software.
#
# Usage: script [TeX] [ftc] [isolatin1] [ugly] [hiatus]
#
# TeX diacritics are done as in plain TeX and LaTeX but
# without the escape character: 'a 'e 'i 'o 'u "u ~n
# ftc diacritics for the above are specified using the
# ftc conventions: 'a 'e 'i 'o 'u :u 'n
# isolatin1 means using the respective character codes in
# IS 8859/1 (ISO Latin Alphabet 1)
# ugly will prevent legal but undesirable breaks.
# hiatus Allow break between strong vowels. Don't do it.
#
# Default is no support for diacritics. You can use combinations
# of the above and the number of patterns will grow fast.
#
# Recommended options:
#
# isolatin1 ugly if you have TeX 3.0 with DC/EC fonts or ML-TeX
# TeX ugly if you don't have the above
# ftc ugly if you are used to ftc and don't have the above
#
# h is not here.
consonants="b c d f g j k l m n p q r s t v w x y z"
# Open vowels: a e o plus all accented letters
vop="a e o"
# Closed vowels: i u plus diaeresis-u
vcl="i u"
# Groups that cannot be broken. Deleted tl.
legal="ch ll rr bl br cl cr dr fl fr gl gr kl kr pl pr tr"
isolatin1=0
ftc=0
TeX=0
ugly=0
hiatus=0
common=0
options="basic"
for i
do
if [ $i = "ftc" ]
then
common=1
ftc=1
options="$options ftc"
elif [ $i = "TeX" ]
then
common=1
TeX=1
options="$options TeX"
elif [ $i = "isolatin1" ]
then
isolatin1=1
options="$options isolatin1"
elif [ $i = "ugly" ]
then
ugly=1
options="$options ugly"
elif [ $i = "hiatus" ]
then
hiatus=1
options="$options hiatus"
else
echo -n Usage: `basename $0`
echo " [TeX] [ftc] [isolatin1] [ugly] [hiatus]"
exit 1
fi
done
if [ $common -ne 0 ]
then
vop="$vop 'a 'e 'i 'o 'u"
fi
if [ $ftc -ne 0 ]
then
vcl="$vcl :u"
consonants="$consonants 'n"
fi
if [ $TeX -ne 0 ]
then
vcl="$vcl \"u"
consonants="$consonants ~n"
fi
if [ $isolatin1 -ne 0 ]
then
vop="$vop ^^e1 ^^e9 ^^ed ^^f3 ^^fa"
vcl="$vcl ^^fc"
consonants="$consonants ^^f1"
fi
vowels="$vop $vcl"
echo "\
% Hyphenation patterns for Spanish.
% Compiled by Julio Sanchez (address@hidden) on September 1991.
%
% These patterns have been derived from \"On Word Division in Spanish\",
% Jos'e A. Ma~nas, Communications of the ACM, and implemented in his
% package ftc. You can get ftc and a draft of the abovementioned
% paper from goya.dit.upm.es in src/text.proc/ftc.Z. FTP access may
% be available. Otherwise, send "help" to address@hidden for
% details on use of the mail server.
%
% Rules mentioned below are those described in that paper. After
% several unsatisfactory attempts to pretend I knew better, these
% patterns closely follow that paper. Pattern 'tl' is not considered.
% It is conflictive and ftc does not use it either.
%
% These patterns have been generated by shyphen.sh version $version,
% shyphen.sh is a sh script that allows a number of choices.
% Full benefit from some of these options can only be
% obtained if appropriate fonts are available.
%
% Follows a copyright notice. This is not in the public domain,
% but the copyright is essentially a hold-harmless clause. That
% is, use it at will, but don't sue me if you don't like it.
%
% COPYRIGHT NOTICE
%
% These patterns and the generating sh script are Copyright (c) GMV 1991
% These patterns were developed for internal GMV use and are made
% public in the hope that they will benefit others. Also, spreading
% these patterns throughout the Spanish-language TeX community is
% expected to provide back-benefits to GMV in that it can help keeping
% GMV in the mainstream of spanish users. However, this is given
% for free and WITHOUT ANY WARRANTY. Under no circumstances can Julio
% Sanchez, GMV, Jos'e A. Ma~nas or any agents or representatives thereof
% be held responsible for any errors in this software nor for any damages
% derived from its use, even in case any of the above has been notified
% of the possibility of such damages. If any such situation arises, you
% responsible for repair. Use of this software is an explicit
% acceptance of these conditions.
%
% You can use this software for any purpose. You cannot delete this
% copyright notice. If you change this software, you must include
% comments explaining who, when and why. You are kindly requested to
% send any changes to address@hidden If you change the generating
% script, you must include code in it such that any output is clearly
% labeled as generated by a modified script.
%
% Despite the lack of warranty, we would like to hear about any
% problem you find. Please report problems to address@hidden
%
% END OF COPYRIGHT NOTICE
%
% Options included in this set: $options
% Open vowels: $vop
% Closed vowels: $vcl
% Consonants: $consonants
%
% Some of the patterns below represent combinations that never
% happen in Spanish. Would they happen, they would be hyphenated
% according to the rules."
echo
echo "\
% This keeps {cat|lc}code changes, if any, local. Nice to users of
% multilingual versions. These are the minimum changes needed to process
% the patterns. These and other changes will have to be re-enacted when
% Spanish be established as the current language. See the babel docs if
% you don't understand this.
\begingroup"
if [ $common -ne 0 ]
then
echo "\catcode\`'=12 \lccode\`'=\`'"
fi
if [ $ftc -ne 0 ]
then
echo "\catcode\`:=12 \lccode\`:=\`:"
fi
if [ $TeX -ne 0 ]
then
echo "\catcode\`\"=12 \lccode\`\"=\`\""
echo "\catcode\`~=12 \lccode\`~=\`~"
fi
if [ $isolatin1 -ne 0 ]
then
echo "\
\catcode\`\^^e1=11 \lccode\`\^^e1=\`\^^e1 % 'a
\catcode\`\^^e9=11 \lccode\`\^^e9=\`\^^e9 % 'e
\catcode\`\^^ed=11 \lccode\`\^^ed=\`\^^ed % 'i
\catcode\`\^^f1=11 \lccode\`\^^f1=\`\^^f1 % 'o
\catcode\`\^^f3=11 \lccode\`\^^f3=\`\^^f3 % ~n
\catcode\`\^^fa=11 \lccode\`\^^fa=\`\^^fa % 'u
\catcode\`\^^fc=11 \lccode\`\^^fc=\`\^^fc % \"u"
fi
echo "\
\patterns{
% Rule SR1
% Vowels are kept together by the defaults"
if [ $hiatus -ne 0 ]
then
echo "\
% We break here diphthongs and the like"
for i in $vop
do
for j in $vop
do
echo -n ${i}1${j}" "
done
echo
for j in $vop
do
echo -n ${i}1h${j}" "
done
echo
done
fi
echo "\
% Rule SR2
% Attach vowel groups to left consonant"
for i in $consonants
do
for j in $vowels
do
echo -n 1${i}${j}" "
done
echo
done
echo "\
% Rule SR3
% Build legal consonant groups, leave other consonants bound to
% the previous group. This overrides part of the SR2 pattern
% group."
for i in $legal
do
set `echo $i | sed -e 's/^./& /'`
for j in $vowels
do
echo -n 1${1}2${2}${j}" "
done
echo
done
echo "\
% Rule SR4 is implicitly implemented by the default values
% Rule HE1 is implemented by TeX parameters \lefthyphenmin and
% \righthyphenmin. Help yourself. The correct values for
% Spanish are 2 and 2. If you set them below these values,
% incorrect breaks will happen.
% Rule HE2
% Break between a consonant and an h"
for i in `echo $consonants | sed -e 's/c//'`
do
echo -n ${i}1h" "
done
echo
echo "\
% We now avoid some problematic breaks.
su2b2r su2b2l"
if [ $ugly -ne 0 ]
then
echo "\
% These are included here to avoid ugly, though legal, breaks
% They were taken from the sphyphen.tex (silaba.tex) produced
% by Aurion Tecnologia and other sources.
2caca. 2cacas.
2caga. 2cagas.
2cago. 2cerdo
2cola. 2colas.
2culo. 2culos.
2cular.
2loco. 2locos. 2loca. 2locas.
2moco. 2mocos.
2mula. 2mulas.
2pedo. 2pedos. 2peda. 2pedas.
2pito. 2pitos.
2puto. 2putos. 2puta. 2putas.
.caca2"
fi
echo "}"
echo "\endgroup"