[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#12296: 24.1.50; Slow decoding in Rmail
From: |
Kenichi Handa |
Subject: |
bug#12296: 24.1.50; Slow decoding in Rmail |
Date: |
Wed, 29 Aug 2012 13:35:17 +0900 |
In article <E1T6SMo-0007BW-E5@fencepost.gnu.org>, Richard Stallman
<rms@gnu.org> writes:
> Mime-decoding in Rmail the message included below
> takes 10 seconds on my machine (which is rather slow).
> I am pretty sure it is due to the character code,
> because in general messages in Russian are slow
> and others are not. I include this so you get an example.
I think the slowness is because of
quoted-printable-decode-region (in lisp/gnus/qp.el). It is
not well tuned for speed, but I think that's because the
quoted-printable encoding is not intended to be used for
such a mostly non-ASCII text. RFC2045 says:
------------------------------------------------------------
6.7. Quoted-Printable Content-Transfer-Encoding
The Quoted-Printable encoding is intended to represent data that
largely consists of octets that correspond to printable characters in
the US-ASCII character set.
------------------------------------------------------------
Anyway, here's a little bit tuned version. Could you please
try it.
------------------------------------------------------------
(defun qp-decode-hex (n1 n2)
(+ (* (if (<= n1 ?9) (- n1 ?0) (+ (- n1 ?A) 10)) 16)
(if (<= n2 ?9) (- n2 ?0) (+ (- n2 ?A) 10))))
(defun quoted-printable-decode-region (from to &optional coding-system)
"Decode quoted-printable in the region between FROM and TO, per RFC 2045.
If CODING-SYSTEM is non-nil, decode bytes into characters with that
coding-system.
Interactively, you can supply the CODING-SYSTEM argument
with \\[universal-coding-system-argument].
The CODING-SYSTEM argument is a historical hangover and is deprecated.
QP encodes raw bytes and should be decoded into raw bytes. Decoding
them into characters should be done separately."
(interactive
;; Let the user determine the coding system with "C-x RET c".
(list (region-beginning) (region-end) coding-system-for-read))
(unless (mm-coding-system-p coding-system) ; e.g. `ascii' from Gnus
(setq coding-system nil))
(save-excursion
(save-restriction
;; RFC 2045: ``An "=" followed by two hexadecimal digits, one
;; or both of which are lowercase letters in "abcdef", is
;; formally illegal. A robust implementation might choose to
;; recognize them as the corresponding uppercase letters.''
(let ((case-fold-search t))
(narrow-to-region from to)
;; Do this in case we're called from Gnus, say, in a buffer
;; which already contains non-ASCII characters which would
;; then get doubly-decoded below.
(if coding-system
(mm-encode-coding-region (point-min) (point-max) coding-system))
(goto-char (point-min))
(while (and (skip-chars-forward "^=")
(not (eobp)))
(cond ((eq (char-after (1+ (point))) ?\n)
(delete-char 2))
((looking-at "\\(=[0-9A-F][0-9A-F]\\)+")
(let* ((n (/ (- (match-end 0) (point)) 3))
(str (make-string n 0))
(i 0))
(while (< i n)
(aset str i (qp-decode-hex (char-after (1+ (point)))
(char-after (+ 2 (point)))))
(setq i (1+ i))
(forward-char 3))
(delete-region (match-beginning 0) (match-end 0))
(insert str)))
(t
(message "Malformed quoted-printable text")
(forward-char)))))
(if coding-system
(mm-decode-coding-region (point-min) (point-max) coding-system)))))
------------------------------------------------------------
---
Kenichi Handa
handa@gnu.org