groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uppercase german umlaut


From: Dave Kemper
Subject: Re: uppercase german umlaut
Date: Tue, 2 Jan 2024 11:04:25 -0600

[moving this back to the thread where it belongs]

On 1/2/24, hohe72@posteo.de <hohe72@posteo.de> wrote:
> If gpic gets Ä (0xc3 0x84) it complains about 0x84.
> If gpic gets ä (0xc3 0xa4) it does not complain about 0xa4.

True, but irrelevant, because *in neither case will the character be
interpreted the way you intend*.

gpic will consider 0xc3 0x84 a valid Latin-1 character (LATIN CAPITAL
LETTER A WITH TILDE) and an invalid character.

gpic will consider 0xc3 0xa4 two valid Latin-1 characters (LATIN
CAPITAL LETTER A WITH TILDE and CURRENCY SIGN).

What you're trying to send to gpic in your two examples is LATIN
CAPITAL LETTER A WITH DIAERESIS and LATIN SMALL LETTER A WITH
DIAERESIS.  But if those are sent as UTF-8 to gpic, it will not
interpret them as you want.  To get what you want, you need to convert
your input to Latin-1, or run it through preconv before gpic.

> ECMA-48 says for 0x84:

Also irrelevant to groff, as it doesn't use ECMA-48.  Groff tools
(including gpic) take input in Latin-1, period.  (Pure ASCII, being a
subset of Latin-1, is also valid.)  Any bytes that aren't Latin-1
characters are illegal input to all groff tools.  The only exception
is preconv, which recognizes various encodings and converts them to
pure ASCII, with all non-ASCII characters being converted to groff
escape sequences.

> If you want to know why I ignore preconv, read the last mail.)

I don't recall a previous message giving a reason for this, but if you
don't use preconv (or convert input to Latin-1 by some means), you're
not going to get what you want.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]