[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Unicode-2] `read' always returns multibyte symbol
From: |
Kenichi Handa |
Subject: |
Re: [Unicode-2] `read' always returns multibyte symbol |
Date: |
Tue, 13 Nov 2007 21:55:44 +0900 |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) |
In article <address@hidden>, Katsumi Yamaoka <address@hidden> writes:
> The following Lisp snippet emulates what Gnus does when reading
> active data for the local.テスト newsgroup. The buffer contains
> data which have been retrieved from the nntp server. Note that
> the newsgroup name contains non-ASCII characters, which has been
> encoded by utf-8 in the server.
> --8<---------------cut here---------------start------------->8---
> (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-to-multibyte string))
> (goto-char (point-min))
> (multibyte-string-p (symbol-name (read (current-buffer))))))
> --8<---------------cut here---------------end--------------->8---
> While Emacs trunk returns nil for this, Emacs Unicode-2 returns t.
That is because `read' decides the name is unibyte or
multibyte by whether the name is a valid multibyte sequence
or not. In the trunk, utf-8 byte sequecne is not a valid
multibyte sequecne, but in emacs-unicode-2, it is valid.
> If it is not intentional, I hope `read' behaves just like it does
> in Emacs trunk.
The relevant code for `read' is very complicated and I want
to avoid touching it if there's another way.
In addition, I think it is the right thing that the above
code return t; i.e. any symbol created by reading a
multibyte buffer should have a multibyte string name. The
bug to fix is that the following code also returns t in
emacs-unicode-2.
< --8<---------------cut here---------------start------------->8---
< (let ((string (encode-coding-string "local.テスト" 'utf-8)))
< (with-temp-buffer
< (set-buffer-multibyte nil)
< (insert string)
< (goto-char (point-min))
< (multibyte-string-p (symbol-name (read (current-buffer))))))
< --8<---------------cut here---------------end--------------->8---
> Otherwise, is there a way to make `read' return a unibyte
> symbol (without slowing down)?
The replacement of the above code is simple as this:
(multibyte-string-p (intern (encode-coding-string "local.テスト" 'utf-8)))
But, hmmm, it seems that we can't use such a code in gnus...
> In the inside of Gnus, non-ASCII group names are all treated as
> unibyte strings, that are the ones that the server has encoded
> with certain coding systems. Because of the present behavior of
> `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups
> perfectly. You can find the actual code in gnus-start.el as
> follows:
> --8<---------------cut here---------------start------------->8---
> ;; Read an active file and place the results in `gnus-active-hashtb'.
> (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors
> real-active)
> [...]
> ;; group gets set to a symbol interned in the hash table
> ;; (what a hack!!) - jwz
> (setq group (let ((obarray hashtb)) (read cur)))
> --8<---------------cut here---------------end--------------->8---
How about this?
(setq group
(let ((obarray hashtb) pos)
(skip-syntax-forward "^w_")
(setq pos (point))
(skip-syntax-forward "w_")
(intern (buffer-substring pos (point)))))
I think the overhead is just several more function calls. The
actual task (searching for a range of symbol constituents,
make string from them, and intern it) is almost the same.
---
Kenichi Handa
address@hidden
- [Unicode-2] `read' always returns multibyte symbol, Katsumi Yamaoka, 2007/11/13
- Re: [Unicode-2] `read' always returns multibyte symbol,
Kenichi Handa <=
- Re: [Unicode-2] `read' always returns multibyte symbol, Stefan Monnier, 2007/11/13
- Re: [Unicode-2] `read' always returns multibyte symbol, Kenichi Handa, 2007/11/13
- [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol), Katsumi Yamaoka, 2007/11/14
- Re: [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol), Kenichi Handa, 2007/11/14
- Re: [Unicode-2] `C-h f' error, Katsumi Yamaoka, 2007/11/14
- Re: [Unicode-2] `C-h f' error, Katsumi Yamaoka, 2007/11/19
- Re: [Unicode-2] `C-h f' error, CHENG Gao, 2007/11/20
- Re: [Unicode-2] `C-h f' error, Katsumi Yamaoka, 2007/11/21
- Re: [Unicode-2] `C-h f' error, Kenichi Handa, 2007/11/21
- Re: [Unicode-2] `C-h f' error, Katsumi Yamaoka, 2007/11/21