groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Do Latin-2-based hyphenation files work with Unicode?


From: onf
Subject: Re: Do Latin-2-based hyphenation files work with Unicode?
Date: Tue, 12 Nov 2024 16:43:56 +0100

On Tue Nov 5, 2024 at 8:15 PM CET, onf wrote:
> as the title says. If I use UTF-8 via preconv and request
>   .hy 2
>   .hpf hyphen.cs
> will that work, given that the file is using the Latin-2 encoding
> for characters with diacritics? If not, what changes need to be done?

I did a little bit of testing. The hyphenation patterns work correctly
with UTF-8, but ONLY if Latin2 is loaded, like so:
  .mso latin2.tmac
and hyphenation codes must be specified, like so for Latin2 Czech:
  .hcode <e1> <e1>  <c1> <e1>
  .hcode <e8> <e8>  <c8> <e8>
  .hcode <ef> <ef>  <cf> <ef>
  .hcode <e9> <e9>  <c9> <e9>
  .hcode <ec> <ec>  <cc> <ec>
  .hcode <ed> <ed>  <cd> <ed>
  .hcode <f2> <f2>  <d2> <f2>
  .hcode <f3> <f3>  <d3> <f3>
  .hcode <f8> <f8>  <d8> <f8>
  .hcode <b9> <b9>  <a9> <b9>
  .hcode <bb> <bb>  <ab> <bb>
  .hcode <fa> <fa>  <da> <fa>
  .hcode <f9> <f9>  <d9> <f9>
  .hcode <fd> <fd>  <dd> <fd>
  .hcode <be> <be>  <ae> <be>

Without loading the latin2.tmac file, it doesn't hyphenate correctly.
Given that latin2.tmac specifies a bunch of translations which convert
Latin2 bytes into respective character codes, e.g.:
  .trin \[char248]\[r ah]
my guess is that these translations enable the Latin2 bytes in hyphen.cs
to be converted to their character counterparts, which the UTF-8 codes
are converted to as well, so that in the end both input methods result
in the same glyph. [Pardon my inadequate terminology.] Latin1 characters
continue working even when loading Latin2 as long as they are specified
as the respective UTF-8 codes.

My conclusion is that, given the intricacies of all this, loading the
appropriate localization file is THE way to setup hyphenation correctly.
I feel like splitting the hyphenation part of localization files off
(into hycs.tmac etc.) would be beneficial in that one could load the
hyphenation settings for a given language without all the localization
strings. Groff's documentation of hyphenation could then be updated
with a simple mention of
  .mso hycs.tmac
before specifying the technical details (.hy, .hla, .hpf, ...) which
ordinary users won't need to deal with.

~ onf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]