emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: address@hidden: Coding problem with Euro sign]


From: Kevin Rodgers
Subject: Re: address@hidden: Coding problem with Euro sign]
Date: Wed, 14 Dec 2005 11:56:43 -0700
User-agent: Mozilla Thunderbird 0.9 (X11/20041105)

Richard M. Stallman wrote:
> Would someone please DTRT and ack?
>
> ------- Start of forwarded message -------
> From: Ralf Angeli <address@hidden>
> To: address@hidden
> Date: Tue, 13 Dec 2005 13:12:02 +0100
...
> Subject: Coding problem with Euro sign
> Sender: address@hidden
...
>
> - --=-=-=
>
> Attached you can find a file with two 8-bit characters I extracted
> from a file produced by Visual Studio under Windows.  The characters
> should be u umlaut and the Euro sign.  Emacs does not seem to be able
> to find the right coding system for it and displays it with
> raw-text-dos.  I could not get the file displayed correctly by loading
> it with iso-latin-1, iso-latin-9, or cp1251.  And I am not sure if
> this is a problem of Emacs or if Visual Studio simply produced
> garbage.
>
>
> - --=-=-=
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: attachment; filename=test.txt
> Content-Transfer-Encoding: quoted-printable
>
> =FC u umlaut
> =C2=80 euro
>
> - --=-=-=

I think the OP is confused: u umlaut is 0xFC in ISO 8859-1 (Latin 1),
ISO 8859-15 (Latin 9), and Unicode.  The euro is 0xA4 in ISO 8859-15 but
U+20AC in Unicode (and not defined in ISO 8859-1).

But in UTF-8, as the quoted-printable attachment claims to be, they are
0xC3 0xBC and 0xE2 0x82 0xAC resp.

The attachment above uses a single-byte encoding for u umlaut.  But the
encoding used for the euro is a either an unknown 2-byte encoding or the
wrong single-byte encoding (C2 is A circumflex in ISO 8859-15) followed
by 0x80 (undefined in ISO 8859-*).  That could explain why Emacs does
not recognize it as iso-latin-1 or iso-latin-9.

As far as Microsoft Windows code pages go, 1251 is Cyrillic so the OP
must have meant 1252.  And in that character set, the euro is indeed
0x80 (and 0xC2 is still A circumflex).

So the attachment should have been labelled windows-1252 instead of
utf-8, and its contents would be more accurately written as:

=FC u umlaut
=C2 A circumflex
=80 euro

And the OP should try visiting the file with the cp1252 coding system.

--
Kevin Rodgers





reply via email to

[Prev in Thread] Current Thread [Next in Thread]