emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug with UTF-8 string and dbus


From: David Kastrup
Subject: Re: Bug with UTF-8 string and dbus
Date: Wed, 09 Jun 2010 22:42:55 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

Andreas Schwab <address@hidden> writes:

> Stefan Monnier <address@hidden> writes:
>
>> AFAIK, Emacs's internal encoding is valid utf-8.  It uses private
>> characters for some things, but I don't think that makes it invalid.
>
> The eight-bit characters are encoded outside of the Unicode range, and a
> good utf-8 decoder must treat them as invalid.

Yes, that's the whole point.  Indeed, Emacs own utf-8 decoder treats
them as invalid too: when Emacs considers the data to be in utf-8
instead of emacs-internal encoding, it will decode the respective codes
into its "raw byte" presentation.  Which again is not legal utf-8 (but a
rather obvious "extension" of the utf-8 encoding scheme which quite
artificially stops at 2^20+2^16 or something similar which I don't
accurately remember and that is a consequence of the range encodable
with utf-16 with surrogate codes).

-- 
David Kastrup




reply via email to

[Prev in Thread] Current Thread [Next in Thread]