help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: those funny non-ASCII characters


From: Xah Lee
Subject: Re: those funny non-ASCII characters
Date: Thu, 24 May 2012 17:56:59 -0700 (PDT)
User-agent: G2/1.0

On May 24, 4:49 pm, "Buchs, Kevin" <buchs.ke...@mayo.edu> wrote:
> I often paste content from web pages into an emacs org-mode buffer and I
> get the odd quote characters or dashes that are not ASCII. I created a
> lisp function to remove the unicode ones that are just 8 bits. Lately I
> am seeing that there are characters that are not being caught. They show
> up in emacs as the expected character. When I kill/yank them into lisp
> code, they are not being found. When I save the buffer, I am asked for
> coding and chose raw text. When the file is opened again, these
> characters are showing up as some sort of special symbol (dashed circle
> with flag off the top) followed by doubles/triples of \2xx. For example,
> the dash character I just stored was this sequence: circle-flag \200
> \231. Using Gnu/Linux od to dump them I get hex strings such as: 340 245
> 206 340 244 206 210 200 and for the dash mentioned above 342 200 231.
>
> I am very naive in regard to coding, so please excuse my ignorance. I
> would guess these are 16-bit (Unicode16) characters. Can someone
> enlighten me as to how I can determine what these characters are (after
> pasted into a buffer) and how I can code a function to replace them with
> ASCII equivalents? The only thing I could think of was hexl mode, but
> that didn't turn out well. Thanks.


better to embrace unicode than fight it.

what encoding you have when you paste is rather complex. I guess it
depends on the sources you copy from, as each web page can be in diff
charset and encoding then am not sure your OS do some translation in
the pasteboard.

maybe this will help.

〈Emacs File/Character Encoding/Decoding FAQ〉
http://xahlee.org/emacs/emacs_encoding_decoding_faq.html

〈Xah's Unicode Tutorial〉
http://xahlee.org/Periodic_dosage_dir/unicode.html

to replace non-ascii, you can use the regex

[[:nonascii:]]+

〈Char Classes - GNU Emacs Lisp Reference Manual〉
http://xahlee.org/emacs_manual/elisp/Char-Classes.html

〈Emacs Lisp: Convert Unicode String to ASCII (Zap Gremlins)〉
http://xahlee.org/emacs/emacs_zap_gremlins.html

 Xah


reply via email to

[Prev in Thread] Current Thread [Next in Thread]