[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than
From: |
Laszlo Ersek |
Subject: |
Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings |
Date: |
Fri, 22 Mar 2013 15:52:09 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130307 Thunderbird/17.0.3 |
On 03/22/13 15:37, Markus Armbruster wrote:
> Laszlo Ersek <address@hidden> writes:
>
>> On 03/14/13 18:49, Markus Armbruster wrote:
>>> These are all broken, too.
>>
>> What are "these"? And how are they broken? And how does the patch fix them?
>
> "These" refers to the subject: noncharacters other than U+FFFE, U+FFFF.
>
> I agree that I should better explain how they're broken, and what the
> patch does to fix them. Will fix on respin.
>
>>>
>>> A few test cases use noncharacters U+FFFF and U+10FFFF. Risks testing
>>> noncharacters some more instead of what they're supposed to test. Use
>>> U+FFFD and U+10FFFD instead.
>>>
>>> Signed-off-by: Markus Armbruster <address@hidden>
>>> ---
>>> tests/check-qjson.c | 85
>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>> 1 file changed, 72 insertions(+), 13 deletions(-)
>>
>> I'm confused about the commit message. There are three paragraphs in it
>> (the title, the first paragraph, and the 2nd paragraph). This patch
>> modifies different tests:
>>
>>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>>> index 852124a..efec1b2 100644
>>> --- a/tests/check-qjson.c
>>> +++ b/tests/check-qjson.c
>>> @@ -158,7 +158,7 @@ static void utf8_string(void)
>>> * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>>> * UTF-8").
>>> *
>>> - * Test cases are scraped from Markus Kuhn's UTF-8 decoder
>>> + * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>>> * capability and stress test at
>>> * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>>> */
>>> @@ -256,11 +256,11 @@ static void utf8_string(void)
>>> "\xDF\xBF",
>>> "\"\\u07FF\"",
>>> },
>>> - /* 2.2.3 3 bytes U+FFFF */
>>> + /* 2.2.3 3 bytes U+FFFD */
>>> {
>>> - "\"\xEF\xBF\xBF\"",
>>> - "\xEF\xBF\xBF",
>>> - "\"\\uFFFF\"",
>>> + "\"\xEF\xBF\xBD\"",
>>> + "\xEF\xBF\xBD",
>>> + "\"\\uFFFD\"",
>>> },
>>
>> This is under "2.2 Last possible sequence of a certain length". I guess
>
> Which is in turn under "2 Boundary condition test cases".
>
>> this is where you say "last possible sequence of a certain length,
>> encoding a character (= non-noncharacter)". OK, p#2.
>
> Yes.
>
> The test's purpose is testing the upper bound of 3-byte sequences is
> decoded correctly.
>
> The upper bound is U+FFFF. Since that's a noncharacter, the parser
> should reject it (or maybe replace), the formatter should replace it.
> Trouble is it could be misdecoded and then rejected / replaced.
>
> Besides, U+FFFF already gets tested along with the other noncharacters
> under "5.3 Other illegal code positions".
>
> Next in line is U+FFFE, also a noncharacter, also under 5.3.
>
> Next in line is U+FFFD, which I picked.
>
> But that gets tested under "2.3 Other boundary conditions"! I guess I
> either drop it there, or make this one U+FFFC.
>
> I think testing U+FFFC here makes sense, because U+FFFD could be
> misdecoded, then replaced by U+FFFD.
>
> What do you think?
I think that we're extending Markus Kuhn's test suite, basically taking
random shots at where one specific parser's/formatter's weak spots might
be :)
That said, with intelligent fuzzing out of scope / capacity, U+FFFC
could be a good pick.
I also think I'm a quite a useless person to ask for thoughts in this
area :)
Thanks,
Laszlo
[Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Markus Armbruster, 2013/03/14
Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/17
Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/23