emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers


From: David Kastrup
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 09:40:03 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: David Kastrup <address@hidden>
>> Cc: address@hidden
>> Date: Sat, 29 Mar 2014 08:23:33 +0100
>> 
>> Eli Zaretskii <address@hidden> writes:
>> 
>> >> From: David Kastrup <address@hidden>
>> >> Cc: address@hidden
>> >> Date: Fri, 28 Mar 2014 20:25:17 +0100
>> >> 
>> >> >> > Then what do you call a buffer whose "text" is encoded?
>> >> >> 
>> >> >> I can't speak for Stephen, of course, but my impression was he would
>> >> >> call it "a bad idea".
>> >> >
>> >> > Then what other ideas to use when Lisp code needs to encode or decode
>> >> > text manually?
>> >> 
>> >> Redecode right to a "binary" coding system would be my guess.
>> >
>> > Sorry, I don't follow.  Can you tell more what that means?
>> 
>> It means a buffer where each _character_ has the same value that the
>> no-longer-available unibyte buffer would have in its bytes/characters.
>
> This doesn't seem to be a complete description of what is suggested.
> E.g., just by looking at the values of characters, it is impossible to
> distinguish between Latin characters below 256 and raw bytes.  In a
> unibyte buffer, we know how to make that distinction,

Uh, what?  The point of a unibyte buffer is that it does not make the
distinction.

> but if there are no unibyte buffers, something else is needed for
> doing that.

>> You can do that whether or not the conceptual array of 0..255 characters
>> is internally encoded in unibyte or multibyte encodings.
>
> What do you mean by "multibyte encodings" in this context?  Are you
> suggesting to store the bytes 128..255 as Latin-1 characters,
> i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> characters?

That would make the most sense, yes.

> Or are you suggesting something else?

You could also use the "raw byte" character encodings we use for not
losing information when reading not properly formed utf-8 files into a
multibyte buffer, but that seems less practical when working with the
character codes.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]