emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future


From: David Kastrup
Subject: Re: Emacs Lisp's future
Date: Mon, 06 Oct 2014 17:33:21 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: Mark H Weaver <address@hidden>
>> Cc: address@hidden, address@hidden, address@hidden,
>> address@hidden, address@hidden, address@hidden, address@hidden
>> Date: Mon, 06 Oct 2014 02:21:41 -0400
>> 
>> A related problem has to do with the fact that naively implemented UTF-8
>> allows code points to be represented with more bytes than are actually
>> needed, essentially by padding the code point with leading zeroes and
>> then encoding with UTF-8 as if the high bits were non-zero.  For
>> example, the ASCII quote (") can be represented as the single byte 0x22,
>> the two byte sequence 0xC0 0xA2, etc.
>> 
>> UTF-8 decoders are supposed to detect and reject these "overlong"
>> encodings, but it is likely that many programs fail to do this.  Such
>> programs are usually vulnerable to these overlong encodings when trying
>> to detect special characters (e.g. for quoting/escaping) or when
>> validating inputs.
>> 
>> To cope with this, the Unicode standards require that UTF-8 codecs
>> reject overlong encodings and other invalid byte sequences.  This is in
>> direct conflict with the idea of "raw byte" code points, whose purpose
>> is to be tolerant of arbitrary byte sequences and to propagate them
>> unchanged.
>
> The obvious solution is to encode the raw bytes internally in a UTF-8
> compatible way.  Which is what Emacs does in its buffers and strings,
> as I'm sure you know.  Can't Guile do something similar?

If an overlong UTF-8 byte sequence representing '"' is processed
transparently by Emacs, it will be reencoded into the original
afterwards and depending on the next processing stage might trip up
software afterwards.  Of course, it would have done equally so without
Emacs (or GUILE) in the middle.  The solution obviously is to use a
coding scheme for recoding that does _not_ reproduce unencodable bytes.
Now if the intermediate processing added escape characters for the
unencodable bytes, you can arrive at something like (using % for
unencodable)

[Input] Robert%");DROP TABLE Students;--
[quotified] "Robert\%\");DROP TABLE Students;--"
[cleanencoded] "Robert\\");DROP TABLE Students;--"
[Pasted into SQL command] Uh oh.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]