Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000

From:	Eric Blake
Subject:	Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Date:	Mon, 13 Aug 2018 09:57:18 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 08/13/2018 02:00 AM, Markus Armbruster wrote:

Eric Blake <address@hidden> writes:

On 08/10/2018 10:48 AM, Eric Blake wrote:

On 08/08/2018 07:03 AM, Markus Armbruster wrote:

This is consistent with qobject_to_json().  See commit e2ec3f97680.


Side note: that commit mentions that on output, ASCII DEL (0x7f) is
always escaped. RFC 7159 does not require it to be escaped on input,


Weird, isn't it?

but I wonder if any of your earlier testsuite improvements should
specifically cover \x7f vs. \u007f on input being canonicalized to
\u007f on round trip output.

From utf8_string():


         /* 2.2.1  1 byte U+007F */
         {
             "\x7F",
             "\x7F",
             "\\u007F",
         },

We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
that JSON (expecting the C string again).  Sufficient?

Indeed, looks like we have coverage of DEL thanks to the bounds testingof various interesting UTF-8 inflection points.

+++ b/qobject/json-lexer.c
@@ -93,7 +93,7 @@
    *   interpolation = %((l|ll|I64)[du]|[ipsf])
    *
    * Note:
- * - Input must be encoded in UTF-8.
+ * - Input must be encoded in modified UTF-8.


Worth documenting this in the QMP doc as an explicit extension?


qmp-spec.txt:

     The sever expects its input to be encoded in UTF-8, and sends its
     output encoded in ASCII.

The obvious update would be to stick in "modified".

                                                                 In
general, our QMP interfaces that take binary input do so via base64
encoding, rather than via a modified UTF-8 string -


Agreed.

However, whether QMP has a use for funny characters or not, the JSON
parser has to handle them *somehow*.  "Handle" in the broadest possible
sense, including "reject".  Not including misbehavior like "crash" and
"silently ignore some input following ASCII NUL".

                                                     So it's really
only a question of whether our input engine can pass "\x00"
vs. "\\u0000" when we NEED an input NUL, and except for the testsuite,
our QAPI schema never really needs an input NUL.


The question is how the JSON parser is to handle "\u0000" escapes in
JSON strings and NUL bytes anywhere.

The answer for NUL bytes is obvious: reject them just like any other
byte <= 0x1F, as required by the RFC.

Yes, rejecting \x00 on input is fine, which leaves only the escapeinside strings:


My answer for \u0000 is to treat it as much like any other codepoint <=
0x1F as possible.  Treating it exactly like them isn't possible, because
NUL bytes in C strings aren't possible.  However, \xC0\x80 sequences
are.

Yes, for internal processing to use modified UTF-8 so that we can accept"\\u0000" on input is reasonable. And even nicer if we turn a QStringcontaining \xc0\x80 back into "\\u0000" on output, so that our use ofmodified UTF-8 never even escapes to the QMP client (and then we don'tneed a documentation change).


Reasonably simple and consistent, don't you think?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures, (continued)
- [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, Markus Armbruster, 2018/08/13
    - Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, Eric Blake, 2018/08/13
- [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/13
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Eric Blake <=
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/14
    - Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Eric Blake, 2018/08/10
    - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/13
  - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Paolo Bonzini, 2018/08/12
    - Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/13
- [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/08
  - Re: [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Eric Blake, 2018/08/16
    - Re: [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/16

Prev by Date: Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
Next by Date: Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser
Previous by thread: Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Next by thread: Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Index(es):
- Date
- Thread