groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hyphenating non-english characters


From: G. Branden Robinson
Subject: Re: hyphenating non-english characters
Date: Fri, 9 Aug 2024 04:03:27 -0500

Hi Gáspár,

At 2024-08-01T10:23:31+0200, Gáspár Gergő wrote:
> Thank you for your thorough answer!

My pleasure!

> > The good news is that if you convert the hyphenation pattern file to
> > ISO Latin-2 (ISO 8859-2), you should be able to use it.
> 
> Unfortunately, I got stuck at this point, as the hyphenation file that
> I used is encoded in latin1. Sorry for not including it, this is the
> place I got the file from:
> https://ctan.org/tex-archive/language/hungarian/hyphenation

Werner Lemberg has been a major contributor to (and formerly was
maintainer of) groff, so I urge you to consider his advice.

WL> These patterns are very outdated.  The current version of the
WL> patterns (which is a file larger than a whopping 500kByte) is part
WL> of the 'tex-hyphen' bundle and can be found, among other places, in
WL> the TeXLive git repository at
WL> 
WL>   
https://git.texlive.info/texlive/tree/Master/texmf-dist/tex/generic/hyph-utf8/patterns/tex/hyph-hu.tex
WL> 
WL> > Another weird thing is, while it does contain the á,é,í,ó,ö and ü
WL> > characters, it omits the ő and ű characters entirely.
WL> 
WL> The abovementioned file is encoded in UTF-8, which you have to
WL> convert to Latin2 by doing, for example,
WL> 
WL> ```
WL> iconv -f utf8 -t latin2 < hyph-hu.tex > hyph-hu.latin2.tex
WL> ```

> I am using groff 1.23.0.  This encoding business is out of my depth,
> so sorry if I'm missing something obvious.

If you have any experienced at cloning Git repositories and compiling
software packages from source, I would encourage you to try groff Git
HEAD, which has seen subtle but significant improvements to hyphenation
support in just the past couple of weeks.

As the groff home page[1] says:

"To participate in groff development, clone its Git repository. Read the
INSTALL.extra and INSTALL.REPO files within for build requirements and
instructions."

Importantly, you should no longer need to mess with defining hyphenation
codes at all; that is now done by the existing "latin2.tmac" file, and
Hungarian, to my knowledge, doesn't need to alter any case mappings as
Turkish does (and which uses yet another encoding file anyway).

If you can get the "hyph-hu.tex" file working after converting it to ISO
Latin-2, then with a little translation help from you (or others fluent
in Hungarian), we could add "hyphen.hu" and "hu.tmac" files to groff.
I'd be pleased to get those into place for groff 1.24, which I hope to
release later this year.

Regards,
Branden

[1] https://www.gnu.org/software/groff/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]