emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages


From: Mark Lillibridge
Subject: [resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages
Date: Fri, 01 Apr 2011 19:41:58 -0700

    I finally had a chance to get back to my convert all my BABYL email
to mbox project.  I'm still trying to get 8 bit characters to convert
properly.


    I've found a new bug in pre-23 Rmail that throws away information
necessary for unrmail to work properly.  :-( 

    As I have described on this list before, Rmail decodes new messages
without MIME charset specifications for their bodies (this includes most
multipart MIME) using undecided-unix.  It is supposed to save the
decoding it used for the message using a new X-Coding-System header
line.  The bug is that it saves the charset it tried to decode with, not
the charset that actually got used:

rmail.el (version 22.3.1):1899:
;; Decode the region specified by FROM and TO by CODING.
;; If CODING is nil or an invalid coding system, decode by `undecided'.
(defun rmail-decode-region (from to coding)
  (if (or (not coding) (not (coding-system-p coding)))
      (setq coding 'undecided))
  ;; Use -dos decoding, to remove ^M characters left from base64 or
  ;; rogue qp-encoded text.
  (decode-coding-region from to
                        (coding-system-change-eol-conversion coding 1))
  ;; Don't reveal the fact we used -dos decoding, as users generally
  ;; will not expect the RMAIL buffer to use DOS EOL format.
  (setq buffer-file-coding-system
        (setq last-coding-system-used
              (coding-system-change-eol-conversion coding 0))))

TomThe problem is with the last line where we use coding instead of
last-coding-system-used.  If we were still patching v22, fixing this is
literally just swapping out coding for last-coding-system-used.


    Thus, if I receive a message containing multipart MIME with say a
latin-1 8bit encoding attachment (the MIT mailer converts some 7bit to
8bit), Rmail will decode it correctly using iso-latin-1, but record only
that it used undecided-UNIX.  This is problematic for unrmail because
the resulting code points can be encoded using both UTF-8 and Latin-1
among other encodings.  unrmail encodes the message using undecided-unix
which results in using UTF-8 in my case (depends on encoding priorities
I think -- I believe I'm using the defaults), which needless to say is
wrong and leads to garbled characters in the resulting message when
interpreted as Latin-1 as specified by the MIME encoding.


    In general, unrmail has to make a guess in these cases and it will
unavoidably guess wrong some of the time.  I recommend at a minimum
printing a warning message in these cases -- I may make a patch for this
later -- and possibly writing a better heuristic (look for charset='s in
the body?) to minimize wrong guesses.

- Mark



reply via email to

[Prev in Thread] Current Thread [Next in Thread]