nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] UTF=8 in message bodies


From: Ken Hornstein
Subject: Re: [Nmh-workers] UTF=8 in message bodies
Date: Tue, 04 Dec 2012 11:55:00 -0500

>http://www.sandelman.ca/tmp/mcrcapture/6585.2012-12-04/capture1.png

Wow, a PERFECT example of mojibake!

So, for those of you who care ... I don't normally send out messages with
UTF-8 characters in them, but I did in this case on purpose ... just to see
if people would prove me wrong.  And of course they did!

So, let's take a look at what happened here.  My paragraphs were
prefixed with a bullet character (U+2022).  The UTF-8 for that is E2 80
A2.  What Mike's screen showed was U+00E2 (small a with circumflex),
\200 (in other words, hex 0x80), and the cent character (U+00A2).  I
don't know the mechanics of MH-E, but one of the tools decoded the
quoted-printable properly (actually, the base-64 thanks to the mailing
list), but assumed the text was in ISO-8859-1 instead of UTF-8.  There
is no equivalent of the bullet character in ISO-8859-1 so that was never
going to display properly for Mike, but I also included the "Latin Small
Letter Sharp S" (U+00DF) which does have a ISO-8859-1 equivalent and that
should have been fine.  Obviously what happened was the UTF-8 was again
interpreted as ISO-8859-1.

>ken's message on my screen with Xemacs/mh-e/GNUs.
>UTF-8 support is poor in this combination. I'm told that if I gave up my
>*X*emacs fetish, and got with the GNUemacs, it would be better.

I'm wondering what happens if you run "mhshow" on that message outside of
MH-E.  Also ... are you running in a UTF-8 locale, or ISO-8859-1?  If the
former, I expect the issue there is part of the tools are assuming that
you can handle UTF-8 just fine, but clearly Xemacs cannot.

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]