[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnus: incorrect conversion of Subject and From field from utf-8 to k
From: |
Boris Samorodov |
Subject: |
Re: gnus: incorrect conversion of Subject and From field from utf-8 to koi8-r |
Date: |
Wed, 19 Oct 2005 01:12:10 +0400 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (berkeley-unix) |
On Thu, 13 Oct 2005 20:26:54 +0200 Reiner Steib wrote:
> On Thu, Oct 13 2005, Boris B. Samorodov wrote:
> [ On emacs-pretest. Cc-ing Ding ]
> > Symptoms:
> >
> > I do have a letter with the next Subject:
> > -----
> > Subject:
> > =?UTF-8?B?W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQ?=
> > =?UTF-8?B?nyDRgtC10YHRgg==?=
> > -----
> >
> > In command-line mode I can do...
> >
> > $ echo
> > "W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQnyDRgtC10YHRgg=="
> > | base64 -d | iconv -f utf-8
> >
> > ...and receive the answer:
> >
> > [ipt.ru #163] АвтоОтвет: МСК: СП тест
> >
> > But gnus (from cvs as emacs) shows the next line...
> >
> > Subject: [ipt.ru #163] АвтоОтвет: МСК: СП тест
This line really was: "[ipt.ru #163] АвтоОтвет: МСК: С\XYZ\ABC тест"
> > ...which is wrong.
> I don't see any difference. Maybe I'm misunderstanding what you mean.
It was really an eccident at my bug-letter format. I saw \XYZ\ABC at my
subject string (hexadecimal strings). I did a cut-and-paste. After
formatting the letter to UTF-8, they appeared to be good letters.
Nevertheless, the problem is now solved at gnus.cvs HEAD. I included
my confirmation of this fact at the end of the current letter.
The problem was with decoding UTF-8 string that was encoded at
non-character boundary.
Thank you for cooperation and sorry for misformatting the initial
report.
> > The bug appeared to be at illegal concatenation of
> > =?UTF-8?<foo> =?UTF-8?<bar> parts of the Subject.
> Whitespace between adjacent encoded words have to be ignored according
> to RFC 2047:
> ,----[ rfc2047.txt ]
> | (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab)
> |
> | White space between adjacent 'encoded-word's is not
> | displayed.
> |
> | (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab)
> |
> | Even multiple SPACEs between 'encoded-word's are ignored
> | for the purpose of display.
> |
> | (=?ISO-8859-1?Q?a?= (ab)
> | =?ISO-8859-1?Q?b?=)
> |
> | Any amount of linear-space-white between 'encoded-word's,
> | even if it includes a CRLF followed by one or more SPACEs,
> | is ignored for the purposes of display.
> `----
=====
From: Boris Samorodov <address@hidden>
Subject: Re: gnus: incorrect conversion of Subject and From field from utf-8 to
koi8-r
To: Katsumi Yamaoka <address@hidden>
Cc: Kenichi Handa <address@hidden>, address@hidden, address@hidden
Date: Tue, 18 Oct 2005 22:20:37 +0400
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (berkeley-unix)
On Sat, 15 Oct 2005 19:06:52 +0900 Katsumi Yamaoka wrote:
> >>>>> In <address@hidden> Handa-san wrote:
> >> 5. Use of encoded-words in message headers
> >> [...]
> >> The 'encoded-text' in an 'encoded-word' must be self-contained;
> >> 'encoded-text' MUST NOT be continued from one 'encoded-word' to
> >> another. This implies that the 'encoded-text' portion of a "B"
> >> 'encoded-word' will be a multiple of 4 characters long; for a "Q"
> >> 'encoded-word', any "=" character that appears in the 'encoded-text'
> >> portion will be followed by two hexadecimal characters.
> >> The encoded-words that Boris B. Samorodov presented comes just
> >> under this case. Even so, should Gnus support such encodings?
> >>> Subject:
> >>> =?UTF-8?B?W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQ?=
> >>> =?UTF-8?B?nyDRgtC10YHRgg==?=
> > This example doesn't violate the above restriction. Each
> > 'encoded-word' is surely "multiple of 4 characters long".
> > Please note that the above restriction is for
> > 'encoded-text', not for the underlining coded character set.
> > So, I think the above document doesn't prohibit diviging
> > UTF-8 byte sequence at non-character boundary.
> I agree. Thank you for clarifying it. I've committed your
> patch to cvs.gnus.org with small modifications. It will be
> propagated to Emacs soon.
This is to confirm that the latest revision 7.43 from HEAD
for gnus/lisp/rfc2047.el from gnus cvs is fine with Subject and From
fields.
Thank you all who helped to investigate and unbreak the case!
=====
> Bye, Reiner.
> --
> ,,,
> (o o)
> ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
WBR
--
bsam