Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in

From:	Paolo Bonzini
Subject:	Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings
Date:	Mon, 04 Feb 2013 18:44:31 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

Il 04/02/2013 18:19, Markus Armbruster ha scritto:
> +        /* 2  Boundary condition test cases */
> +        /* 2.1  First possible sequence of a certain length */
> +        /* 2.1.5  5 bytes U+200000 */
> +        {
> +            "\"\xF8\x88\x80\x80\x80\"",
> +            NULL,                        /* bug: rejected */
> +            "\"\\u8200\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xF8\x88\x80\x80\x80",
> +        },
> +        /* 2.1.6  6 bytes U+4000000 */
> +        {
> +            "\"\xFC\x84\x80\x80\x80\x80\"",
> +            NULL,                               /* bug: rejected */
> +            "\"\\uC100\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xFC\x84\x80\x80\x80\x80",
> +        },
> +        },
> +        /* 2.2.4  4 bytes U+1FFFFF */
> +        {
> +            "\"\xF7\xBF\xBF\xBF\"",
> +            NULL,                 /* bug: rejected */
> +            "\"\\u7FFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xF7\xBF\xBF\xBF",
> +        },
> +        /* 2.2.5  5 bytes U+3FFFFFF */
> +        {
> +            "\"\xFB\xBF\xBF\xBF\xBF\"",
> +            NULL,                        /* bug: rejected */
> +            "\"\\uBFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xFB\xBF\xBF\xBF\xBF",
> +        },
> +        /* 2.2.6  6 bytes U+7FFFFFFF */
> +        {
> +            "\"\xFD\xBF\xBF\xBF\xBF\xBF\"",
> +            NULL,                               /* bug: rejected */
> +            "\"\\uDFFF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xFD\xBF\xBF\xBF\xBF\xBF",
> +        },
> +        {
> +            /* \U+1FFFFF */
> +            "\"\xF8\x87\xBF\xBF\xBF\"",
> +            NULL,                        /* bug: rejected */
> +            "\"\\u81FF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xF8\x87\xBF\xBF\xBF",
> +        },
> +        {
> +            /* \U+3FFFFFF */
> +            "\"\xFC\x83\xBF\xBF\xBF\xBF\"",
> +            NULL,                               /* bug: rejected */
> +            "\"\\uC0FF\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +            "\xFC\x83\xBF\xBF\xBF\xBF",
> +        },
> +        {
> +            /* \U+0000 */
> +            "\"\xF8\x80\x80\x80\x80\"",
> +            NULL,                        /* bug: rejected */
> +            "\"\\u8000\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" */
> +            "\xF8\x80\x80\x80\x80",
> +        },
> +        {
> +            /* \U+0000 */
> +            "\"\xFC\x80\x80\x80\x80\x80\"",
> +            NULL,                               /* bug: rejected */
> +            "\"\\uC000\\uFFFF\\uFFFF\\uFFFF\"", /* bug: want "\"\\u0000\"" */
> +            "\xFC\x80\x80\x80\x80\x80",
> +        },

Rejecting these is not a bug IMO.  Unicode is only defined up to
U+10FFFF.  Codepoints above are not valid UTF-8 at all, and in
particular 5/6-byte sequences are never valid UTF-8 (they used to be).

But there are indeed other bugs...

> +        /* 2.1.4  4 bytes U+10000 */
> +        {
> +            "\"\xF0\x90\x80\x80\"",
> +            "\xF0\x90\x80\x80",
> +            "\"\\u0400\\uFFFF\"", /* bug: want "\"\\uD800\\uDC00\"" */
> +        },
> +            /* U+10FFFF */
> +            "\"\xF4\x8F\xBF\xBF\"",
> +            "\xF4\x8F\xBF\xBF",
> +            "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
> +        },
> +        {
> +            /* U+110000 */
> +            "\"\xF4\x90\x80\x80\"",
> +            "\xF4\x90\x80\x80",
> +            "\"\\u4400\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
> +        },

...and also some good catches here!  In particular U+110000 should be
rejected.

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/04
- Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Paolo Bonzini <=
  - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/04
- Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/04
  - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/05
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Kuhn, 2013/02/05
- Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/27
  - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/28
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/28
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Markus Armbruster, 2013/02/28
    - Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings, Blue Swirl, 2013/02/28

Prev by Date: Re: [Qemu-devel] [edk2] EDKII OVMF+QEMU Query- What is the Max supported Buffer size for file IO.
Next by Date: Re: [Qemu-devel] [PATCH 5/7] qbus_find_recursive(): terminate search by name in case of fatal error
Previous by thread: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings
Next by thread: Re: [Qemu-devel] [PATCH] check-qjson: More thorough testing of UTF-8 in strings
Index(es):
- Date
- Thread