Re: Unibyte characters, strings, and buffers

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers

From:	Eli Zaretskii
Subject:	Re: Unibyte characters, strings, and buffers
Date:	Sat, 29 Mar 2014 14:07:48 +0300

> From: David Kastrup <address@hidden>
> Cc: Eli Zaretskii <address@hidden>,  address@hidden,  address@hidden
> Date: Sat, 29 Mar 2014 11:42:43 +0100
> 
> The current point of contention is about changing the way of
> codepoint-based character operations depending on the unibyte state of
> the current buffer.

The point for which this discussion was started was how to get rid of
this dependency, in those few places where we have them in Emacs.

> I am not necessarily of the same opinion as Stephen regarding whether or
> not abolishing unibyte buffers is a worthwhile goal.  But I am pretty
> sure that "unibyte" should not be bleeding over into character and
> string operations.

Indeed, and Emacs tries very hard to contain that distinction, so that
it doesn't leak out of the internals.  Mostly, it succeeds, but
sometimes it doesn't.

> A unibyte buffer or unibyte string might error out when trying to insert
> characters out of the range 0..255.

We currently don't do that.  Try (insert "xyz") in a unibyte buffer,
where "xyz" is some non-ASCII string, and watch the fun.

> If we want different semantics for case-fold-search in binary buffers,
> then the solution is setting a buffer-local setting of case-fold-search
> when opening a buffer intended to be manipulated in a binary way.
> 
> But the unibyte setting of the buffer should not affect normal character
> and string operation semantics.  It is a buffer implementation detail
> that should not really have a visible effect apart from making some
> buffer operations impossible.

But if case-fold-search is set to nil in unibyte buffers, and (as we
know) buffer-local value of case-fold-search does affects functions
that compare text, either because they consult case-fold-search
directly or because the consult buffer-local case-table, then the
unibyte setting does affect the semantics, albeit indirectly.

> If something chooses a unibyte buffer representation for some reason, it
> is the responsibility of the same something to switch character
> operations and case-fold-search etc to something making sense in the
> context of its operation.  That may well be through some buffer-local
> setting of case-fold-search etc, but it is not tied to the internal
> representation of the buffer contents.

Not that I disagree with you, but why does it matter whether some code
makes a buffer unibyte or sets its case-fold-search, to achieve that
goal?  In both cases, that something tells Emacs to ignore case
conversion, it just uses 2 different ways of saying that.  If we are
not going to abolish unibyte buffers, how is the difference important?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Unibyte characters, strings, and buffers, (continued)

Prev by Date: Re: Unibyte characters, strings, and buffers
Next by Date: Re: Unibyte characters, strings, and buffers
Previous by thread: Re: Unibyte characters, strings, and buffers
Next by thread: Re: Unibyte characters, strings, and buffers
Index(es):
- Date
- Thread