emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future


From: Mark H Weaver
Subject: Re: Emacs Lisp's future
Date: Mon, 06 Oct 2014 02:21:41 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Richard Stallman <address@hidden> writes:

>     * I'm concerned that there are security implications to supporting the
>       "raw byte" code points.  I can expand on this more if you'd like.
>
> I'd like to know how it is that "raw bytes" have security implications.

To give an example, consider a procedure that needs to pass a string
from an untrusted source to an SQL query.  To do this safely, it needs
to quote the string.  I haven't researched how to properly quote SQL
string literals, but in general, quoting is typically done by
recognizing some set of special characters that must be escaped, and
allowing all other characters through unmodified.

However, "raw byte" code points can be used to bypass such a quoting
mechanism, and thus send an unescaped closing quote to the SQL database
followed by arbitrary SQL commands.

A related problem has to do with the fact that naively implemented UTF-8
allows code points to be represented with more bytes than are actually
needed, essentially by padding the code point with leading zeroes and
then encoding with UTF-8 as if the high bits were non-zero.  For
example, the ASCII quote (") can be represented as the single byte 0x22,
the two byte sequence 0xC0 0xA2, etc.

UTF-8 decoders are supposed to detect and reject these "overlong"
encodings, but it is likely that many programs fail to do this.  Such
programs are usually vulnerable to these overlong encodings when trying
to detect special characters (e.g. for quoting/escaping) or when
validating inputs.

To cope with this, the Unicode standards require that UTF-8 codecs
reject overlong encodings and other invalid byte sequences.  This is in
direct conflict with the idea of "raw byte" code points, whose purpose
is to be tolerant of arbitrary byte sequences and to propagate them
unchanged.

FWIW, I agree that the Emacs behavior is desirable when editing a file
that may contain coding errors, but in most other cases (e.g. when
communicating with processes or network sockets) I think that it's more
appropriate to refuse to accept, produce, or propagate invalid UTF-8
such as overlong encodings.

      Mark



reply via email to

[Prev in Thread] Current Thread [Next in Thread]