bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12296: 24.1.50; Slow decoding in Rmail


From: Kenichi Handa
Subject: bug#12296: 24.1.50; Slow decoding in Rmail
Date: Wed, 29 Aug 2012 13:35:17 +0900

In article <E1T6SMo-0007BW-E5@fencepost.gnu.org>, Richard Stallman 
<rms@gnu.org> writes:

> Mime-decoding in Rmail the message included below
> takes 10 seconds on my machine (which is rather slow).
> I am pretty sure it is due to the character code,
> because in general messages in Russian are slow
> and others are not.  I include this so you get an example.

I think the slowness is because of
quoted-printable-decode-region (in lisp/gnus/qp.el).  It is
not well tuned for speed, but I think that's because the
quoted-printable encoding is not intended to be used for
such a mostly non-ASCII text.  RFC2045 says:

------------------------------------------------------------
6.7. Quoted-Printable Content-Transfer-Encoding

   The Quoted-Printable encoding is intended to represent data that
   largely consists of octets that correspond to printable characters in
   the US-ASCII character set. 
------------------------------------------------------------

Anyway, here's a little bit tuned version.  Could you please
try it.
------------------------------------------------------------
(defun qp-decode-hex (n1 n2)
  (+ (* (if (<= n1 ?9) (- n1 ?0) (+ (- n1 ?A) 10)) 16)
     (if (<= n2 ?9) (- n2 ?0) (+ (- n2 ?A) 10))))

(defun quoted-printable-decode-region (from to &optional coding-system)
  "Decode quoted-printable in the region between FROM and TO, per RFC 2045.
If CODING-SYSTEM is non-nil, decode bytes into characters with that
coding-system.

Interactively, you can supply the CODING-SYSTEM argument
with \\[universal-coding-system-argument].

The CODING-SYSTEM argument is a historical hangover and is deprecated.
QP encodes raw bytes and should be decoded into raw bytes.  Decoding
them into characters should be done separately."
  (interactive
   ;; Let the user determine the coding system with "C-x RET c".
   (list (region-beginning) (region-end) coding-system-for-read))
  (unless (mm-coding-system-p coding-system) ; e.g. `ascii' from Gnus
    (setq coding-system nil))
  (save-excursion
    (save-restriction
      ;; RFC 2045:  ``An "=" followed by two hexadecimal digits, one
      ;; or both of which are lowercase letters in "abcdef", is
      ;; formally illegal. A robust implementation might choose to
      ;; recognize them as the corresponding uppercase letters.''
      (let ((case-fold-search t))
        (narrow-to-region from to)
        ;; Do this in case we're called from Gnus, say, in a buffer
        ;; which already contains non-ASCII characters which would
        ;; then get doubly-decoded below.
        (if coding-system
            (mm-encode-coding-region (point-min) (point-max) coding-system))
        (goto-char (point-min))
        (while (and (skip-chars-forward "^=")
                    (not (eobp)))
          (cond ((eq (char-after (1+ (point))) ?\n)
                 (delete-char 2))
                ((looking-at "\\(=[0-9A-F][0-9A-F]\\)+")
                 (let* ((n (/ (- (match-end 0) (point)) 3))
                        (str (make-string n 0))
                        (i 0))
                   (while (< i n)
                     (aset str i (qp-decode-hex (char-after (1+ (point)))
                                                (char-after (+ 2 (point)))))
                     (setq i (1+ i))
                     (forward-char 3))
                   (delete-region (match-beginning 0) (match-end 0))
                   (insert str)))
                (t
                 (message "Malformed quoted-printable text")
                 (forward-char)))))
      (if coding-system
          (mm-decode-coding-region (point-min) (point-max) coding-system)))))
------------------------------------------------------------

---
Kenichi Handa
handa@gnu.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]