help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Turning HTML character references into something readable?


From: Benjamin Riefenstahl
Subject: Re: Turning HTML character references into something readable?
Date: 27 Apr 2003 17:54:34 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Hi Karl,


Karl Eichwalder <keichwa@gmx.net> writes:
> Is there already a fuction to turn these HTML character references
> into proper characters?  "&#1071;" must come out as the cyrillic
> "??" etc.

Actually that literal seems to be in some JIS encoding on my side,
while &#1071; indicates Unicode.

If you want an ELisp function, M-x apropos RET char RET turns up
functions that can be used to convert a Unicode codepoint to a string
that Emacs shows fine here as cyrillic, e.g.:

  (char-to-string (decode-char 'ucs 1071))

If you want to get this into an interactive command, you'd need some
more coding.  Or maybe PSGML or some other SGML/HTML/XML mode may have
that functionality already.

> On the command line recode can do the trick:
> 
>    echo "&#1071;" | recode html..utf-8

You can use shell-command-on-region (M-|) to use "recode html..utf-8"
directly.  Note that M-| is not recognized by some german keyboard
drivers.  But than for everyday use you may want to encapsulate that
into your own command where the "recode" command is hardcoded anyway.


so long, benny


reply via email to

[Prev in Thread] Current Thread [Next in Thread]