emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf8 and emacs text/string multibyte representation


From: Camm Maguire
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Fri, 31 Oct 2014 14:05:20 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Thanks so much!

Been discussing this elsewhere, and its come to my attention that not
only do all unicode code-points not fit into UTF-16, but all unicode
characters don't fit into unicode code-points :-).  Presumably this is
why emacs expanded to 22bits?  In any case, it makes clear what one
correspondent said, that unicode must be processed sequentially, so
there is no real reason to struggle to get random O(1) access to unicode
characters. 

If this is indeed the case, all these encodings have the same problems
though varying in degree, and UTF-8 is clearly the smallest and most
ascii compatible.  The question then arises as to whether lisp
characters, which by definition do offer random access in strings, need
be the same as or close to unicode characters.  

Did you consider leaving aref, char-code and code-char alone and writing
unicode functions on top of these, i.e. unicode-length!=length, as
opposed to making aref itself do this translation under the hood,
thereby violating the expectation of O(1) access, (which is certainly
offered in other kinds of arrays, though it is questionable whether real
users actually expect this for strings)?  In doing so, one would then
know that aref is random-access, and unicode-??? is sequential only.

Take care,

Eli Zaretskii <address@hidden> writes:

>> From: Camm Maguire <address@hidden>
>> Cc: address@hidden,  address@hidden
>> Date: Thu, 30 Oct 2014 12:27:58 -0400
>> 
>> > I'm not sure what you mean by a "boxed character".  A character in
>> > Emacs is just an int.
>> >
>> 
>> Then how do you distinguish integers from characters at the lisp level?
>
> We don't -- except that a valid character's value must fit the Unicode
> range.
>
> There's no character data type in Emacs.  (XEmacs does have it.)
>
>
>
>

-- 
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]