[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uppercase german umlaut
|
From: |
Dave Kemper |
|
Subject: |
Re: uppercase german umlaut |
|
Date: |
Tue, 2 Jan 2024 11:04:25 -0600 |
[moving this back to the thread where it belongs]
On 1/2/24, hohe72@posteo.de <hohe72@posteo.de> wrote:
> If gpic gets Ä (0xc3 0x84) it complains about 0x84.
> If gpic gets ä (0xc3 0xa4) it does not complain about 0xa4.
True, but irrelevant, because *in neither case will the character be
interpreted the way you intend*.
gpic will consider 0xc3 0x84 a valid Latin-1 character (LATIN CAPITAL
LETTER A WITH TILDE) and an invalid character.
gpic will consider 0xc3 0xa4 two valid Latin-1 characters (LATIN
CAPITAL LETTER A WITH TILDE and CURRENCY SIGN).
What you're trying to send to gpic in your two examples is LATIN
CAPITAL LETTER A WITH DIAERESIS and LATIN SMALL LETTER A WITH
DIAERESIS. But if those are sent as UTF-8 to gpic, it will not
interpret them as you want. To get what you want, you need to convert
your input to Latin-1, or run it through preconv before gpic.
> ECMA-48 says for 0x84:
Also irrelevant to groff, as it doesn't use ECMA-48. Groff tools
(including gpic) take input in Latin-1, period. (Pure ASCII, being a
subset of Latin-1, is also valid.) Any bytes that aren't Latin-1
characters are illegal input to all groff tools. The only exception
is preconv, which recognizes various encodings and converts them to
pure ASCII, with all non-ASCII characters being converted to groff
escape sequences.
> If you want to know why I ignore preconv, read the last mail.)
I don't recall a previous message giving a reason for this, but if you
don't use preconv (or convert input to Latin-1 by some means), you're
not going to get what you want.
- Re: uppercase german umlaut,
Dave Kemper <=