[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings |
Date: |
Mon, 04 Feb 2013 18:44:31 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
Il 04/02/2013 18:19, Markus Armbruster ha scritto:
> + /* 2 Boundary condition test cases */
> + /* 2.1 First possible sequence of a certain length */
> + /* 2.1.5 5 bytes U+200000 */
> + {
> + "\"\xF8\x88\x80\x80\x80\"",
> + NULL, /* bug: rejected */
> + "\"\\u8200\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xF8\x88\x80\x80\x80",
> + },
> + /* 2.1.6 6 bytes U+4000000 */
> + {
> + "\"\xFC\x84\x80\x80\x80\x80\"",
> + NULL, /* bug: rejected */
> + "\"\\uC100\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xFC\x84\x80\x80\x80\x80",
> + },
> + },
> + /* 2.2.4 4 bytes U+1FFFFF */
> + {
> + "\"\xF7\xBF\xBF\xBF\"",
> + NULL, /* bug: rejected */
> + "\"\\u7FFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xF7\xBF\xBF\xBF",
> + },
> + /* 2.2.5 5 bytes U+3FFFFFF */
> + {
> + "\"\xFB\xBF\xBF\xBF\xBF\"",
> + NULL, /* bug: rejected */
> + "\"\\uBFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xFB\xBF\xBF\xBF\xBF",
> + },
> + /* 2.2.6 6 bytes U+7FFFFFFF */
> + {
> + "\"\xFD\xBF\xBF\xBF\xBF\xBF\"",
> + NULL, /* bug: rejected */
> + "\"\\uDFFF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xFD\xBF\xBF\xBF\xBF\xBF",
> + },
> + {
> + /* \U+1FFFFF */
> + "\"\xF8\x87\xBF\xBF\xBF\"",
> + NULL, /* bug: rejected */
> + "\"\\u81FF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xF8\x87\xBF\xBF\xBF",
> + },
> + {
> + /* \U+3FFFFFF */
> + "\"\xFC\x83\xBF\xBF\xBF\xBF\"",
> + NULL, /* bug: rejected */
> + "\"\\uC0FF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + "\xFC\x83\xBF\xBF\xBF\xBF",
> + },
> + {
> + /* \U+0000 */
> + "\"\xF8\x80\x80\x80\x80\"",
> + NULL, /* bug: rejected */
> + "\"\\u8000\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" */
> + "\xF8\x80\x80\x80\x80",
> + },
> + {
> + /* \U+0000 */
> + "\"\xFC\x80\x80\x80\x80\x80\"",
> + NULL, /* bug: rejected */
> + "\"\\uC000\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" */
> + "\xFC\x80\x80\x80\x80\x80",
> + },
Rejecting these is not a bug IMO. Unicode is only defined up to
U+10FFFF. Codepoints above are not valid UTF-8 at all, and in
particular 5/6-byte sequences are never valid UTF-8 (they used to be).
But there are indeed other bugs...
> + /* 2.1.4 4 bytes U+10000 */
> + {
> + "\"\xF0\x90\x80\x80\"",
> + "\xF0\x90\x80\x80",
> + "\"\\u0400\\uFFFF\"", /* bug: want "\"\\uD800\\uDC00\"" */
> + },
> + /* U+10FFFF */
> + "\"\xF4\x8F\xBF\xBF\"",
> + "\xF4\x8F\xBF\xBF",
> + "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
> + },
> + {
> + /* U+110000 */
> + "\"\xF4\x90\x80\x80\"",
> + "\xF4\x90\x80\x80",
> + "\"\\u4400\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> + },
...and also some good catches here! In particular U+110000 should be
rejected.
Paolo