emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inadequate documentation of silly characters on screen.


From: David Kastrup
Subject: Re: Inadequate documentation of silly characters on screen.
Date: Sat, 21 Nov 2009 15:36:49 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux)

"Stephen J. Turnbull" <address@hidden> writes:

> David Kastrup writes:
>
>  > > However, I think a well-behaved platform should by default error
>  > > (something derived from invalid-state, in XEmacs's error
>  > > hierarchy) in such a case; normally this means corruption in the
>  > > file.
>  > 
>  > We take care that it does not mean corruption.
>
> I meant pre-existing corruption [...]

That interpretation is not the business of the editor.  It may decide to
give a warning, but refusing to work at all does not increase its
usefulness.

>  > And more often it means that you might have been loading with the
>  > wrong encoding (people do that all the time).  If you edit some
>  > innocent ASCII part
>
> You can't do that if the file is not in a buffer because the encoding
> error aborted the conversion.

Not being able to do what I want is not a particularly enticing feature.

> Aborting the conversion is what the Unicode Consortium requires, too,
> IIRC:

An editor is not the same as a validator.  It's not its business to
decide what files I should be allowed to work with.

> errors in UTF-8 (or any other UTF for that matter) are considered
> *fatal* by the standard.  Exactly what that means is up to the
> application to decide.  One plausible approach would be to do what you
> do now, but make the buffer read-only.

Making the buffer read-only is a reasonable thing to do if it can't
possibly be written back unchanged.  For example, if I load a file in
latin-1 and insert a few non-latin-1 characters.  In this case Emacs
should not just silently write the file in utf-8 because that changes
the encoding of some preexisting characters.  The situation is different
if I load a pure ASCII file: in that case, the utf-8 decision is
feasible when compatible with the environment.

>  > Sometimes there is no "right encoding".
>
> So what?  The point is that there certainly are *wrong* encodings,
> namely ones that will result in corruption if you try to save the file
> in that encoding.

But we have a fair amount of encodings (those without escape characters
IIRC) which don't imply corruption when saving.  And that is a good
feature for an editor.  For example, when working with version control
systems, you want minimal diffs.  Encoding systems with escape
characters are not good for that.  I would strongly advise against Emacs
picking any escape-character based encoding (or otherwise
non-byte-stream-preserving) automatically.

Less breakage is always a good thing.

> But when faced with ambiguity, it is best to refuse to guess.

You don't need to guess if you just preserve the byte sequence.  That
makes it somebody else's problem.  The GNU utilities have always made it
a point to work with arbitrary input without insisting on it being
"sensible".  Historically, most Unix utilities just crashed when you fed
them arbitrary garbage.  They have taken a lesson from GNU nowadays.

And I consider it a good lesson.

>  > We currently _have_ [a scheme for encoding invalid sequences of
>  > code units] in place.  We just use different Unicode-invalid code
>  > points [from Python].
>
> Conceded.  I realized that later; the important difference is that
> Python only uses that scheme when explicitly requested.

All in all, it is nobody else's business what encoding Emacs uses for
internal purposes.  Making Emacs preserve byte streams means that the
user has to worry less, not more, about what Emacs might be able to work
with.  The Emacs 23 internal encoding does a better job not getting into
the hair of users with encoding issues than Emacs 22 did, because of a
better correspondence with external encodings.  But ideally, the user
should not have to worry about the difference.

-- 
David Kastrup





reply via email to

[Prev in Thread] Current Thread [Next in Thread]