emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnus: incorrect conversion of Subject and From field from utf-8 to k


From: Boris Samorodov
Subject: Re: gnus: incorrect conversion of Subject and From field from utf-8 to koi8-r
Date: Wed, 19 Oct 2005 01:12:10 +0400
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (berkeley-unix)

On Thu, 13 Oct 2005 20:26:54 +0200 Reiner Steib wrote:

> On Thu, Oct 13 2005, Boris B. Samorodov wrote:

> [ On emacs-pretest.  Cc-ing Ding ]

> > Symptoms:
> >
> > I do have a letter with the next Subject:
> > -----
> > Subject: 
> > =?UTF-8?B?W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQ?= 
> > =?UTF-8?B?nyDRgtC10YHRgg==?=
> > -----
> >
> > In command-line mode I can do...
> >
> > $ echo 
> > "W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQnyDRgtC10YHRgg=="
> >  | base64 -d | iconv -f utf-8
> >
> > ...and receive the answer:
> >
> > [ipt.ru #163] АвтоОтвет: МСК: СП тест
> >
> > But gnus (from cvs as emacs) shows the next line...
> >
> > Subject: [ipt.ru #163] АвтоОтвет: МСК: СП тест

This line really was: "[ipt.ru #163] АвтоОтвет: МСК: С\XYZ\ABC тест"

> > ...which is wrong.

> I don't see any difference.  Maybe I'm misunderstanding what you mean.

It was really an eccident at my bug-letter format. I saw \XYZ\ABC at my
subject string (hexadecimal strings). I did a cut-and-paste. After
formatting the letter to UTF-8, they appeared to be good letters.

Nevertheless, the problem is now solved at gnus.cvs HEAD. I included
my confirmation of this fact at the end of the current letter.
The problem was with decoding UTF-8 string that was encoded at
non-character boundary.

Thank you for cooperation and sorry for misformatting the initial
report.

> > The bug appeared to be at illegal concatenation of
> > =?UTF-8?<foo> =?UTF-8?<bar> parts of the Subject.

> Whitespace between adjacent encoded words have to be ignored according
> to RFC 2047:

> ,----[ rfc2047.txt ]
> |    (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)     (ab)
> | 
> |            White space between adjacent 'encoded-word's is not
> |            displayed.
> | 
> |    (=?ISO-8859-1?Q?a?=  =?ISO-8859-1?Q?b?=)    (ab)
> | 
> |         Even multiple SPACEs between 'encoded-word's are ignored
> |         for the purpose of display.
> | 
> |    (=?ISO-8859-1?Q?a?=                         (ab)
> |        =?ISO-8859-1?Q?b?=)
> | 
> |            Any amount of linear-space-white between 'encoded-word's,
> |            even if it includes a CRLF followed by one or more SPACEs,
> |            is ignored for the purposes of display.
> `----

=====
From: Boris Samorodov <address@hidden>
Subject: Re: gnus: incorrect conversion of Subject and From field from utf-8 to 
koi8-r
To: Katsumi Yamaoka <address@hidden>
Cc: Kenichi Handa <address@hidden>,  address@hidden,  address@hidden
Date: Tue, 18 Oct 2005 22:20:37 +0400
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (berkeley-unix)

On Sat, 15 Oct 2005 19:06:52 +0900 Katsumi Yamaoka wrote:

> >>>>> In <address@hidden> Handa-san wrote:

> >> 5. Use of encoded-words in message headers

> >> [...]

> >>    The 'encoded-text' in an 'encoded-word' must be self-contained;
> >>    'encoded-text' MUST NOT be continued from one 'encoded-word' to
> >>    another.  This implies that the 'encoded-text' portion of a "B"
> >>    'encoded-word' will be a multiple of 4 characters long; for a "Q"
> >>    'encoded-word', any "=" character that appears in the 'encoded-text'
> >>    portion will be followed by two hexadecimal characters.

> >> The encoded-words that Boris B. Samorodov presented comes just
> >> under this case.  Even so, should Gnus support such encodings?

> >>>  Subject: 
> >>> =?UTF-8?B?W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQ?= 
> >>> =?UTF-8?B?nyDRgtC10YHRgg==?=

> > This example doesn't violate the above restriction.  Each
> > 'encoded-word' is surely "multiple of 4 characters long".

> > Please note that the above restriction is for
> > 'encoded-text', not for the underlining coded character set.
> > So, I think the above document doesn't prohibit diviging
> > UTF-8 byte sequence at non-character boundary.

> I agree.  Thank you for clarifying it.  I've committed your
> patch to cvs.gnus.org with small modifications.  It will be
> propagated to Emacs soon.


This is to confirm that the latest revision 7.43 from HEAD
for gnus/lisp/rfc2047.el from gnus cvs is fine with Subject and From
fields.

Thank you all who helped to investigate and unbreak the case!
=====


> Bye, Reiner.
> -- 
>        ,,,
>       (o o)
> ---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

WBR
-- 
bsam




reply via email to

[Prev in Thread] Current Thread [Next in Thread]