[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] repl*comps and and non-ascii characters
From: |
Eric Gillespie |
Subject: |
Re: [Nmh-workers] repl*comps and and non-ascii characters |
Date: |
Fri, 25 Jul 2008 21:32:38 -0700 |
address@hidden writes:
> On Thu, 24 Jul 2008 23:34:57 PDT, Eric Gillespie said:
>
> > Um, you're looking at the quoted-unreadable format I transmitted
> > the files in. You want to save these with mhstore, and then
> > you'll see.
>
> You missed the point. The problem is that even *after* you handle
> the Q-P encoding, the line looks like this:
>
> % hexdump -C /tmp/work9
> 00000000 54 6f 3a 20 54 c3 b6 6d 20 c3 98 72 6c 65 79 20 |To: T..m ..rley |
> 00000010 3c 74 65 73 74 64 65 63 6f 64 65 40 65 78 61 6d |<address@hidden|
> 00000020 70 6c 65 2e 63 6f 6d 3e 0a |ple.com>.|
> 00000029
>
> Broken out byte by byte we have x'54' T, x'c3' (iso8859-1 cap-A-tilde or the
> first half of a UTF-8, something else entirely for koi-8), x'b6' (iso8859-1
I'm aware; it's UTF-8 text. Says so in the MIME header.
I suppose you could try to interpret utf8 text as iso8859 or
koi8, but of course it doesn't work. What is your point?
> para-sign or second half of an UTF-8 O-umlaut). A few bytes later, we have
> x'20 c3 98'. A blank, and then two *more* bytes that are encoding-dependent,
> but with no way to tell what encoding was used. After undoing the Q-P,
> you now have the same *bytecodes* - but by the same toke, the mhstore
> has *LOST* the charset="UTF-8" that the text/plain had attached to it.
Lost what how HUH? After you mhstore the test case, you have a
plain old file on disk. Do your other utf8-encoded files have
any encoding metadata attached to them? Mine don't...
The test sets LC_CTYPE so that repl will decode to utf8, and so
the text will match.
> The line doesn't contain any rfc2047 encoding tags, or any other way to
> determine what non-ascii characters are in use. They're not in the mail as I
> received it, they're not in the file produced after I mhsave it, they're
> simply
> *NOT THERE*.
Um, yes? That's the whole point: the patch causes repl to decode
the 2047-encoded text. The test script decodes to utf8; if your
locale is koi8, repl will decode to that.
> If you *do* have an rfc2047 tag in that line that I'm managing to not see,
> please point it out to me. Not all the world is UTF-8, and it is *NOT*,
> repeat *NOT* acceptable to just proclaim that it is.
I have no idea what you're trying to say. Nothing proclaims that
all the world is UTF-8. All the he test case, however, is.
You could write a test case with latin1 or koi8 if you wanted to,
but why?
> Wrong:
> To: T=C3=B6m =C3=98rley <address@hidden>
>
> Also Wrong, but produces the same bytes:
> To: =iso8859-1?Q?T=C3=B6m =C3=98rley? <address@hidden>
>
> Right, and produces the same bytes:
> To: =?utf-8?Q?T=C3=B6m=20=C3=98rley? <address@hidden>
>
> Ponder until you understand why all 3 produce the same decoded bytes,
> but only one is actually correct.
I understand that perfectly well; do you? What you've written
here is an nonsensical mish-mash of quoted-printable and rfc2047.
Just try to put your lines into a message file and show(1) it;
you'll see.
--
Eric Gillespie <*> address@hidden
- [Nmh-workers] repl*comps and and non-ascii characters, Eric Gillespie, 2008/07/18
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Eric Gillespie, 2008/07/24
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Valdis . Kletnieks, 2008/07/24
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Peter Maydell, 2008/07/25
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Eric Gillespie, 2008/07/26
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Peter Maydell, 2008/07/26
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Eric Gillespie, 2008/07/26
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Peter Maydell, 2008/07/27
- Re: [Nmh-workers] repl*comps and and non-ascii characters, Eric Gillespie, 2008/07/27