emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rmail changes for Emacs 22


From: Kai Großjohann
Subject: Re: Rmail changes for Emacs 22
Date: Mon, 21 Oct 2002 18:37:23 +0200
User-agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.3.50 (i686-pc-linux-gnu)

Dave Love <address@hidden> writes:

> Eli Zaretskii <address@hidden> writes:
>
>> Personally, I think emacs-mule is not a good idea in this case, since 
>> mbox is not Emacs-private format, so some other software should be able 
>> to read it.
>
> I don't see how that follows, but any file that has to represent the
> full range of Emacs characters has to be stored in the internal
> encoding.  I don't know what the rationale is for any of this, or why
> rmail uses emacs-mule now.

Well, mbox files usually contain data that arrived via email.  So it
would be safe to just keep the data as it arrived, unmodified.
So most messages won't contain characters that only Emacs knows
about.  So there is a pretty good chance that an mbox file contains
only charsets that other programs also grok.

But what do other programs do?  Convert all incoming messages to
Unicode?  If they read from /var/mail, that might be difficult to
do.  Or do other programs just grok multiple charsets (encodings?) in
the same file?

It would, however, be slightly difficult to keep messages encoded in
ascii and utf-16 in the same file.  Hm.  But if one keeps
Content-Length headers, say, then one would know that one is looking
at the From_ line.  Therefore, one could tell whether those five
characters are encoded in something that looks like ascii or whether
it looks like utf-16.  That might be sufficient to find the
Content-type header to be really sure what the charset/encoding is.

>> A good alternative would be to encode each message as what 
>> the charset= header says (and add/fix such a header if there is none, or 
>> if the one that's there lies).
>
> I doubt you should do anything to them, especially as you have no
> assurance any headers are correct.

Maybe it would be useful to offer the user a command so that they can
say "this message is encoded in Big5" and the like.  Then RMAIL could
store this information in a header (in the Content-Type header?) and
subsequent views of the message would automatically use the "right"
charset/encoding.

Presumably, the user just tries a number of possible charsets and then
they can just look at the message to see whether their guess was
right.  And if they are like me who can't distinguish a GB2312
encoded Chinese text from a Big5 encoded one, then choosing the wrong
charset won't be much of a loss as they won't be able to read it
anyhow :-)

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]