bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not wo


From: Kenichi Handa
Subject: bug#1654: 23.0.60; auto encoding detection (detect-coding-region) not working
Date: Fri, 27 Mar 2009 15:55:16 +0900

In article <ukwsabwo8x.fsf@nusnet-97-126.dynip.nus.edu.sg>, poppyer 
<poppyer@gmail.com> writes:

> But for the big5, in the list returned by 
> "(detect_coding_region (region-beginning) (region-end))", 
> there is not big5. I do understand that gbk and big5's sequences might
> not be easy to distinguish, but in this case, both encodings are
> compatible to the input literal text, so both should be in the returned list. 
> Am
> I right?

You are right.  But, the current Emacs can't have both GBK
and Big5 in a list of coding systems to try for detecting
because they are in the same category of coding-system
(i.e. charset-base).   I know that this restriction is not
good, and improving it is in my todo list, but I still don't
have a time to work on it.

> BTW, is that any hook that I can put after the coding detection? I might
> want to write a small lisp to distinguish BIG5 and GBK (by char statistics,
> for example).

We don't have such a hook, but I think you can use
after-insert-file-functions for reading a file.  When that
hook is called, the buffer already contains a text decoded
by buffer-file-coding-system.  You can re-decode the newly
inserted text as this:

(defun check-gbk-big5 (nchars)
  (if (and enable-multibyte-characters
           (not coding-system-for-read)
           (coding-system-equal
            'chinese-gbk (coding-system-base buffer-file-coding-system)))
      (let* ((pos (point))
             (end (+ pos nchars))
             (modified (buffer-modified-p)))
        (when (search-forward "\x5201" end t)  ;; (*1)
          (save-restriction
            (goto-char pos)
            (narrow-to-region pos end)
            (encode-coding-region pos end buffer-file-coding-system)
            (decode-coding-region pos (point-max) 'big5)
            (set-buffer-file-coding-system last-coding-system-used)
            (set-buffer-modified-p modified)
            (setq nchars (point-max))))))
  nchars)

(add-hook 'after-insert-file-functions 'check-gbk-big5)

You can change (*1) part to your check function.

---
Kenichi Handa
handa@m17n.org






reply via email to

[Prev in Thread] Current Thread [Next in Thread]