emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers


From: Stephen J. Turnbull
Subject: Re: Unibyte characters, strings, and buffers
Date: Sun, 30 Mar 2014 21:13:01 +0900

David Kastrup writes:

 > I don't think it gets much more transparent than "unibyte flag only
 > marks the valid Unicode-in-Emacs character range".  I'm for the
 > range 0..255,

It's easy to be more transparent in that case: no unibyte flag.
However, that delays detection of out-of-range characters to encoding
rather than the insert step.

 > Andreas for something like 0..127 U 4194176..4194303 which
 > IĀ find cumbersome for little return.

Agreed.  If bytes are going to be non-characters, having a half-ASCII
type is just going to cause surprises when US English apps get
internationalized.

 > > Maybe it wouldn't work; maybe it would be inefficient.  But one
 > > thing it wouldn't do is present a charset other than Unicode to
 > > Lisp.
 > 
 > Neither does the above.  Abolishing unibyte just means that
 > buffers/strings have only one possible character range.

That's not really true.  Encoding and decoding will still constrain
ranges; as pointed out above, it delays detection on the one hand, on
the other avoids spurious errors when the user really does want to add
characters outside of the prespecified range for some reason.

 > That does not really give any "transparency" per se from the Lisp
 > level.

I disagree, based primarily on the experience of XEmacs that we can do
everything (with characters and bytes) that Emacs does[1], without
randomly injecting new bugs due to lack of unibyte that I can recall.
(Other bugs, yes, but bugs due to adapting code that used unibyte to
XEmacs where there is no unibyte, no.)

 > The interesting level is the C level.  You need a byte stream
 > representation in C at some point anyway, and not being able to
 > call this representation either "string" or "buffer" may be neat in
 > some manners but will end up cumbersome in others.

I don't see why you need that, actually.  Of course you need C level
streams for I/O, but I don't see why it needs to persist past decoding
into a buffer or string.


Footnotes: 
[1]  OK, we don't have a representation of "undecodable bytes".  But
that's not conceptually hard, just tedious enough that nobody's done
it yet.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]