emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf-7 encoding in imap.el is applied to already encoded byte sequenc


From: Katsumi Yamaoka
Subject: Re: utf-7 encoding in imap.el is applied to already encoded byte sequences
Date: Thu, 13 Dec 2007 11:15:42 +0900
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.60 (gnu/linux)

>>>>> Stefan Monnier wrote:

> It seems that the utf-encode call in imap.el is often (always?) applied
> to unibyte data (i.e. streams of bytes, a.k.a already encoded text).

> The reason this is so, is because when reading newsrc.eld, Gnus calls
> mm-string-as-unibyte (lisp/gnus/gnus-start.el:2420).

Gnus saves a newsgroup name in the ~/.newsrc.eld file as a string
like this:

(prin1-to-string (encode-coding-string "NAME" 'CODING-SYSTEM))
If this is a nntp group, what actually encodes it is the news
server.  For instance, news.newsfan.net uses gb2312 (or possibly
gbk).  Gnus reads it through the net and uses it as-is internally.
Only when displaying it for a user, the newsgroup name is decoded
according to `gnus-group-name-charset-method-alist' or
`gnus-group-name-charset-group-alist'.  Gnus does it for groups
based on the other back ends, too.  But please note that Gnus
trunk supports non-ASCII newsgroup names[1] for only nntp, nnml
(including nnagent), and nnrss back ends.  In those cases,
encoding of newsgroup names is done by Gnus by itself.

When reading such an encoded newsgroup name from the ~/.newsrc.eld
file, Gnus uses `read' and `eval' (gnus-start.el:2391).  Once both
Emacs trunk and Emacs Unicode-2 did read it as a multibyte string,
and it didn't match the one that was in the active data.  That is
why I added `mm-string-as-unibyte'.  Though it seems to be
unnecessary nowadays, it behaves as no-op for a unibyte string,
doesn't it?

> It's also because Gnus pre-encodes the names when they're read
> from the keyboard in gnus-read-move-group-name (lisp/gnus/gnus-sum.el:11785).

Because a non-ASCII group name should be an encoded unibyte string
for the internal use.  But there should be no non-ASCII name in
the nnimap groups and pre-encoding doesn't affect those newsgroup
names (am I wrong?).

> I see 3 problems here:
> 1 - The use of mm-string-as-unibyte (I consider any use of
>     string-as-unibyte to be wrong, unless it is accompagnied by a comment
>     that explains why it is right).
> 2 - Inconsistent encoding: gnus-sum.el apparently uses utf-8 (at least
>     that's what (gnus-group-name-charset to-method to-newsgroup) returned
>     in my tests, tho maybe it's because of my locale), whereas
>     gnus-start.el uses emacs-mule (implicitly, via mm-string-as-unibyte).
> 3 - imap.el tries to re-encode in utf7 a folder names that have already
>     been encoded (with emacs-mule or utf-8).

I'm sorry I'm ignorant in IMAP and don't know what encoding in
utf-7 is for.  But re-encoding of ASCII newsgroup names makes no
difference, doesn't it?  Although I'm not capable in improving
nnimap.el, it might have to decode encoded newsgroup names
according to `gnus-group-name-charset-method-alist' or
`gnus-group-name-charset-group-alist' before re-encoding in utf-7.

[1] (info "(gnus)Non-ASCII Group Names")

reply via email to

[Prev in Thread] Current Thread [Next in Thread]