[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Do Latin-2-based hyphenation files work with Unicode?
From: |
onf |
Subject: |
Re: Do Latin-2-based hyphenation files work with Unicode? |
Date: |
Tue, 12 Nov 2024 16:43:56 +0100 |
On Tue Nov 5, 2024 at 8:15 PM CET, onf wrote:
> as the title says. If I use UTF-8 via preconv and request
> .hy 2
> .hpf hyphen.cs
> will that work, given that the file is using the Latin-2 encoding
> for characters with diacritics? If not, what changes need to be done?
I did a little bit of testing. The hyphenation patterns work correctly
with UTF-8, but ONLY if Latin2 is loaded, like so:
.mso latin2.tmac
and hyphenation codes must be specified, like so for Latin2 Czech:
.hcode <e1> <e1> <c1> <e1>
.hcode <e8> <e8> <c8> <e8>
.hcode <ef> <ef> <cf> <ef>
.hcode <e9> <e9> <c9> <e9>
.hcode <ec> <ec> <cc> <ec>
.hcode <ed> <ed> <cd> <ed>
.hcode <f2> <f2> <d2> <f2>
.hcode <f3> <f3> <d3> <f3>
.hcode <f8> <f8> <d8> <f8>
.hcode <b9> <b9> <a9> <b9>
.hcode <bb> <bb> <ab> <bb>
.hcode <fa> <fa> <da> <fa>
.hcode <f9> <f9> <d9> <f9>
.hcode <fd> <fd> <dd> <fd>
.hcode <be> <be> <ae> <be>
Without loading the latin2.tmac file, it doesn't hyphenate correctly.
Given that latin2.tmac specifies a bunch of translations which convert
Latin2 bytes into respective character codes, e.g.:
.trin \[char248]\[r ah]
my guess is that these translations enable the Latin2 bytes in hyphen.cs
to be converted to their character counterparts, which the UTF-8 codes
are converted to as well, so that in the end both input methods result
in the same glyph. [Pardon my inadequate terminology.] Latin1 characters
continue working even when loading Latin2 as long as they are specified
as the respective UTF-8 codes.
My conclusion is that, given the intricacies of all this, loading the
appropriate localization file is THE way to setup hyphenation correctly.
I feel like splitting the hyphenation part of localization files off
(into hycs.tmac etc.) would be beneficial in that one could load the
hyphenation settings for a given language without all the localization
strings. Groff's documentation of hyphenation could then be updated
with a simple mention of
.mso hycs.tmac
before specifying the technical details (.hy, .hla, .hpf, ...) which
ordinary users won't need to deal with.
~ onf
- Do Latin-2-based hyphenation files work with Unicode?, onf, 2024/11/05
- Re: Do Latin-2-based hyphenation files work with Unicode?,
onf <=
- Re: Do Latin-2-based hyphenation files work with Unicode?, G. Branden Robinson, 2024/11/13
- Re: Do Latin-2-based hyphenation files work with Unicode?, onf, 2024/11/13
- Re: Do Latin-2-based hyphenation files work with Unicode?, G. Branden Robinson, 2024/11/13
- Re: Do Latin-2-based hyphenation files work with Unicode?, onf, 2024/11/13
- character translation, hyphenation, and adjustment (was: Do Latin-2-based hyphenation files work with Unicode?), G. Branden Robinson, 2024/11/13
- Re: character translation, hyphenation, and adjustment (was: Do Latin-2-based hyphenation files work with Unicode?), onf, 2024/11/13
- Re: character translation, hyphenation, and adjustment (was: Do Latin-2-based hyphenation files work with Unicode?), G. Branden Robinson, 2024/11/13
- Re: character translation, hyphenation, and adjustment (was: Do Latin-2-based hyphenation files work with Unicode?), G. Branden Robinson, 2024/11/13
- Re: character translation, hyphenation, and adjustment (was: Do Latin-2-based hyphenation files work with Unicode?), onf, 2024/11/14