emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future


From: Eli Zaretskii
Subject: Re: Emacs Lisp's future
Date: Mon, 06 Oct 2014 19:47:11 +0300

> From: Mark H Weaver <address@hidden>
> Cc: address@hidden,  address@hidden,  address@hidden,  address@hidden,  
> address@hidden,  address@hidden,  address@hidden
> Date: Mon, 06 Oct 2014 12:27:35 -0400
> 
> > The obvious solution is to encode the raw bytes internally in a UTF-8
> > compatible way.  Which is what Emacs does in its buffers and strings,
> > as I'm sure you know.  Can't Guile do something similar?
> 
> I'm afraid you've misunderstood, or perhaps I've failed to explain it
> clearly.

I think I did understand your perfectly clear explanation.

> It doesn't matter how these raw bytes are encoded internally.  No matter
> what mechanism we use to accomplish it, propagating invalid byte
> sequences by default is bad security policy.

How can we be responsible for byte streams that originated outside?
That's the responsibility of the source.  And if there is a consumer,
then it is their responsibility not to trip upon such bytes.

But how can you refuse to copy such bytes when you are just a pipe
that is expected not to change anything it wasn't toild to?

Btw, Emacs doesn't expose the internal representation of these bytes
easily to Lisp programs.  That is, whenever any program tries to
access the character at that position, it gets the original raw byte
that was there before the string was read from outside.  A Lisp
program needs some very tricky and deliberate techniques to access the
internal representation of such bytes.  (It isn't "overlong", btw, we
just represent the 128 bytes as codepoints in the 0x3fffXX range, and
encode it in UTF-8 with 5 bytes.)

> The Unicode standard requires that all UTF-8 codecs refuse to accept,
> produce, or propagate invalid byte sequences, including the troublesome
> overlong encodings.

What Emacs does is interpret each byte of such invalid byte sequences
as a separate raw byte, and represent each one of them internally as
described above.  Emacs cannot "refuse to propagate" the original
sequence, because users of an editor expect it not to alter any part
of the input that wasn't explicitly modified by the user or commands
she invoked.

> I'm not one for blindly following standards, but in my opinion this
> is the default policy we should adopt.

So just passing a string unaltered through a Guile program would
change that string?  That sounds like unpleasant surprise for the
users, at least for Emacs users.  Emacs has been there around v20.x,
and we still carry the scars.  It would be a unwise, IMO, if Guile
would repeat those same mistakes.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]