Re: Unibyte characters, strings, and buffers

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers

From:	David Kastrup
Subject:	Re: Unibyte characters, strings, and buffers
Date:	Sat, 29 Mar 2014 09:40:03 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: David Kastrup <address@hidden>
>> Cc: address@hidden
>> Date: Sat, 29 Mar 2014 08:23:33 +0100
>> 
>> Eli Zaretskii <address@hidden> writes:
>> 
>> >> From: David Kastrup <address@hidden>
>> >> Cc: address@hidden
>> >> Date: Fri, 28 Mar 2014 20:25:17 +0100
>> >> 
>> >> >> > Then what do you call a buffer whose "text" is encoded?
>> >> >> 
>> >> >> I can't speak for Stephen, of course, but my impression was he would
>> >> >> call it "a bad idea".
>> >> >
>> >> > Then what other ideas to use when Lisp code needs to encode or decode
>> >> > text manually?
>> >> 
>> >> Redecode right to a "binary" coding system would be my guess.
>> >
>> > Sorry, I don't follow.  Can you tell more what that means?
>> 
>> It means a buffer where each _character_ has the same value that the
>> no-longer-available unibyte buffer would have in its bytes/characters.
>
> This doesn't seem to be a complete description of what is suggested.
> E.g., just by looking at the values of characters, it is impossible to
> distinguish between Latin characters below 256 and raw bytes.  In a
> unibyte buffer, we know how to make that distinction,

Uh, what?  The point of a unibyte buffer is that it does not make the
distinction.

> but if there are no unibyte buffers, something else is needed for
> doing that.

>> You can do that whether or not the conceptual array of 0..255 characters
>> is internally encoded in unibyte or multibyte encodings.
>
> What do you mean by "multibyte encodings" in this context?  Are you
> suggesting to store the bytes 128..255 as Latin-1 characters,
> i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> characters?

That would make the most sense, yes.

> Or are you suggesting something else?

You could also use the "raw byte" character encodings we use for not
losing information when reading not properly formed utf-8 files into a
multibyte buffer, but that seems less practical when working with the
character codes.

-- 
David Kastrup

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Unibyte characters, strings, and buffers, (continued)

Prev by Date: Re: build-specific lisp files (generated by configure)
Next by Date: Re: Unibyte characters, strings, and buffers
Previous by thread: Re: Unibyte characters, strings, and buffers
Next by thread: Re: Unibyte characters, strings, and buffers
Index(es):
- Date
- Thread