bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bad rfc2047 encoding


From: Simon Josefsson
Subject: Re: bad rfc2047 encoding
Date: Tue, 20 Aug 2002 19:22:47 +0200
User-agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.3.50 (i686-pc-linux-gnu)

Dave Love <d.love@dl.ac.uk> writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> This was fixed in Oort some time ago
>
> Does that mean that Gnus 5.9 isn't being maintained?

That wasn't what I meant.  I don't know the answer.

>> (rev 6.5 of rfc2047.el in Gnus
>> CVS), patch modified against work with 21.3:
>
> It doesn't solve the problem as far as I can tell.  I'd have thought
> that obeying the RFC means parsing the header, since it concerns
> comment fields.

Is that necessery?  Encoded words are allowed inside comments, they
must simply not contain the character ).  Which the patch fixes.

Your example

(with-temp-buffer
  (insert "To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann)
")
  (rfc2047-encode-message-header)
  (buffer-string))

evaluates to

"To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
"

with the patch, which seems valid to me.  Compare an example in the RFC:

   From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
         (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)

> I've restored bug-gnu-Emacs to the Cc since this is something I think
> is important for a release.

I agree.  (I'm reading the gnus bugs list from quimby.gnus.org, which
removes To/Cc so when I reply it only goes to the author and
bugs@gnus.org.)

Suggested patch (against Emacs 21.3 RC) included again below.

2000-11-19 12:00:00  ShengHuo ZHU  <zsh@cs.rochester.edu>

        * rfc2047.el (rfc2047-q-encoding-alist): Match Resent-.
        (rfc2047-header-encoding-alist): Addresses are different from text.
        (rfc2047-encode-message-header): Ditto.
        (rfc2047-dissect-region): Extra parameter.
        (rfc2047-encode-region): Ditto.
        (rfc2047-encode-string): Ditto.

Index: rfc2047.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/gnus/rfc2047.el,v
retrieving revision 1.10
diff -u -p -u -w -r1.10 rfc2047.el
--- rfc2047.el  15 Jul 2001 17:42:53 -0000      1.10
+++ rfc2047.el  16 Aug 2002 19:23:17 -0000
@@ -41,6 +41,8 @@
 (defvar rfc2047-header-encoding-alist
   '(("Newsgroups" . nil)
     ("Message-ID" . nil)
+    ("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\)" .
+     "-A-Za-z0-9!*+/=_")
     (t . mime))
   "*Header/encoding method alist.
 The list is traversed sequentially.  The keys can either be
@@ -52,7 +54,8 @@ The values can be:
 2) `mime', in which case the header will be encoded according to RFC2047;
 3) a charset, in which case it will be encoded as that charset;
 4) `default', in which case the field will be encoded as the rest
-   of the article.")
+   of the article.
+5) a string, like `mime', expect for using it as word-chars.")
 
 (defvar rfc2047-charset-encoding-alist
   '((us-ascii . nil)
@@ -87,7 +90,8 @@ Valid encodings are nil, `Q' and `B'.")
   "Alist of RFC2047 encodings to encoding functions.")
 
 (defvar rfc2047-q-encoding-alist
-  '(("\\(From\\|Cc\\|To\\|Bcc\||Reply-To\\):" . "-A-Za-z0-9!*+/")
+  '(("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\):" 
+     . "-A-Za-z0-9!*+/" )
     ;; = (\075), _ (\137), ? (\077) are used in the encoded word.
     ;; Avoid using 8bit characters.
     ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=_?"
@@ -142,6 +146,8 @@ Should be called narrowed to the head of
                (setq alist nil
                      method (cdr elem))))
            (cond
+            ((stringp method)
+             (rfc2047-encode-region (point-min) (point-max) method))
             ((eq method 'mime)
              (rfc2047-encode-region (point-min) (point-max)))
             ((eq method 'default)
@@ -179,11 +185,12 @@ The buffer may be narrowed."
        (setq found t)))
     found))
 
-(defun rfc2047-dissect-region (b e)
+(defun rfc2047-dissect-region (b e &optional word-chars)
   "Dissect the region between B and E into words."
-  (let ((word-chars "-A-Za-z0-9!*+/")
-       ;; Not using ietf-drums-specials-token makes life simple.
-       mail-parse-mule-charset
+  (unless word-chars
+    ;; Anything except most CTLs, WSP
+    (setq word-chars "\010\012\014\041-\177"))
+  (let (mail-parse-mule-charset
        words point current
        result word)
     (save-restriction
@@ -233,9 +240,9 @@ The buffer may be narrowed."
        (setq word (pop words))))
     result))
 
-(defun rfc2047-encode-region (b e)
-  "Encode all encodable words in region B to E."
-  (let ((words (rfc2047-dissect-region b e)) word)
+(defun rfc2047-encode-region (b e &optional word-chars)
+  "Encode all encodable words in REGION."
+  (let ((words (rfc2047-dissect-region b e word-chars)) word)
     (save-restriction
       (narrow-to-region b e)
       (delete-region (point-min) (point-max))
@@ -255,11 +262,11 @@ The buffer may be narrowed."
                          (cdr word))))
       (rfc2047-fold-region (point-min) (point-max)))))
 
-(defun rfc2047-encode-string (string)
+(defun rfc2047-encode-string (string &optional word-chars)
   "Encode words in STRING."
   (with-temp-buffer
     (insert string)
-    (rfc2047-encode-region (point-min) (point-max))
+    (rfc2047-encode-region (point-min) (point-max) word-chars)
     (buffer-string)))
 
 (defun rfc2047-encode (b e charset)







reply via email to

[Prev in Thread] Current Thread [Next in Thread]