[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: making "玄奘" say "Xuanzang" in chinese
From: |
Miles Bader |
Subject: |
Re: making "玄奘" say "Xuanzang" in chinese |
Date: |
Fri, 25 Mar 2005 02:16:51 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Joe Corneli <jcorneli <at> math.utexas.edu> writes:
> (defun w3m-ucs-to-char (codepoint)
> (or (decode-char 'ucs codepoint) ?~))
>
> But keeping the function around wasn't helping either. Except, when I
> tried it again, it worked, so I must have gotten something wrong.
>
> This code seems a little more readable than the code you
> supplied... but they seem to have the same effect.
Hmmm, I missed that; yeah, `decode-char' does look much nicer ... :-)
> Can you suggest something that will work on this content from the
> gnu.org homepage? Neither the w3m code nor your code seems to produce
> human readable output on this stuff (maybe I'm missing some fonts or
> something?). I get a bunch of control-at characters... (oh yeah,
> after modifying the "[0-9]" to be ".....".
>
> [ Az <at> rbaycanca | Bahasa Indonesia | Bosanski | Catal`
> | 简体中文 |
> 繁體中文 | Cesky | Dansk |
Presumably the "x" following &# means "hex", so you should use the BASE argument
to string-to-number if you see it.
The following tweak to your original code seems to generate reasonable output:
(while (re-search-forward "&#\\(x\\)?\\([0-9a-f]+\\);" nil t)
(let ((ucs (string-to-number (match-string 2)
(if (match-beginning 1) 16 10))))
(delete-region (match-beginning 0) (match-end 0))
(insert-char (decode-char 'ucs ucs) 1)))
[The trick to select decimal or hex works because `match-beginning' returns nil
for optional parenthesized expressions which didn't match.]
-Miles