help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: those funny non-ASCII characters


From: Eli Zaretskii
Subject: Re: those funny non-ASCII characters
Date: Fri, 25 May 2012 17:04:00 +0300

> Date: Fri, 25 May 2012 08:40:25 -0500
> From: "Buchs, Kevin" <buchs.kevin@mayo.edu>
> 
> Thanks, Xah and Eli, for contributing to my further understanding. I
> went to a specific website where I got the content I copied and pasted
> and I can see from the HTML that it has a charset=UTF-8, so I understand
> that is Unicode 8-bit. Using the C-u C-x =, I see that the particular
> character I pasted has a code point of 0x2013 (U+2013). I didn't see,
> however, what the UTF-8 encoding of that code point was. Should I be
> able to read that somewhere on the buffer of information I get with C-u
> C-x = ?

Yes, this part of "C-u C-x ="'s display:

            file code: #xE2 #x80 #x93 (encoded by coding system utf-8-dos)

shows you how it would be encoded in UTF-8.  If you see something like
"not encodable by ...", then you need to set the buffer's encoding
using "C-x RET f".  Under "file code", Emacs shows how the character
would be encoded if the buffer is saved to a disk file or sent to
another program or as an email message.

> I was poking around the www.unicode.org website, trying to
> understand how this U+2013 code point is encoded into UTF-8, but I
> haven't determined that yet.

See above: Emacs shows this under the right circumstances.

> So, help me piece together what happens as I paste the UTF-8 text into a
> buffer. First, the paste buffer must define that it is in UTF-8.

On Windows, Emacs always uses UTF-16 to pass text via the clipboard,
because doing so lets Emacs copy and paste any character from any
character set on Earth.

> Emacs reads this information and inserts it into the byte string
> that defines the buffer. Now, how does emacs record that it was a
> UTF-8 encoded character?

It doesn't.  What it records is the encoding to be used for the
current buffer if it is saved to disk or sent to some program.  That
encoding is a property of the buffer, not of the characters.

> Does it translate it into a different internal encoding

Yes, it does.

> Is this encoding used
> as a superset of all possible encoding systems that emacs supports?

Yes.  See the section "Text Representations" in the ELisp manual that
comes with Emacs, you will find the details there.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]