Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is bugg

From:	Markus Armbruster
Subject:	Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite
Date:	Fri, 22 Mar 2013 15:51:46 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

Laszlo Ersek <address@hidden> writes:

> comments below
>
> On 03/14/13 18:49, Markus Armbruster wrote:
>
>> diff --git a/qobject/qjson.c b/qobject/qjson.c
>> index 83a6b4f..19085a1 100644
>> --- a/qobject/qjson.c
>> +++ b/qobject/qjson.c
>> @@ -136,68 +136,56 @@ static void to_json(const QObject *obj, QString *str, 
>> int pretty, int indent)
>>      case QTYPE_QSTRING: {
>>          QString *val = qobject_to_qstring(obj);
>>          const char *ptr;
>> +        int cp;
>> +        char buf[16];
>> +        char *end;
>>  
>>          ptr = qstring_get_str(val);
>>          qstring_append(str, "\"");
>> -        while (*ptr) {
>> -            if ((ptr[0] & 0xE0) == 0xE0 &&
>> -                (ptr[1] & 0x80) && (ptr[2] & 0x80)) {
>> -                uint16_t wchar;
>> -                char escape[7];
>> -
>> -                wchar  = (ptr[0] & 0x0F) << 12;
>> -                wchar |= (ptr[1] & 0x3F) << 6;
>> -                wchar |= (ptr[2] & 0x3F);
>> -                ptr += 2;
>> -
>> -                snprintf(escape, sizeof(escape), "\\u%04X", wchar);
>> -                qstring_append(str, escape);
>> -            } else if ((ptr[0] & 0xE0) == 0xC0 && (ptr[1] & 0x80)) {
>> -                uint16_t wchar;
>> -                char escape[7];
>> -
>> -                wchar  = (ptr[0] & 0x1F) << 6;
>> -                wchar |= (ptr[1] & 0x3F);
>> -                ptr++;
>> -
>> -                snprintf(escape, sizeof(escape), "\\u%04X", wchar);
>> -                qstring_append(str, escape);
>> -            } else switch (ptr[0]) {
>> -                case '\"':
>> -                    qstring_append(str, "\\\"");
>> -                    break;
>> -                case '\\':
>> -                    qstring_append(str, "\\\\");
>> -                    break;
>> -                case '\b':
>> -                    qstring_append(str, "\\b");
>> -                    break;
>> -                case '\f':
>> -                    qstring_append(str, "\\f");
>> -                    break;
>> -                case '\n':
>> -                    qstring_append(str, "\\n");
>> -                    break;
>> -                case '\r':
>> -                    qstring_append(str, "\\r");
>> -                    break;
>> -                case '\t':
>> -                    qstring_append(str, "\\t");
>> -                    break;
>> -                default: {
>> -                    if (ptr[0] <= 0x1F) {
>> -                        char escape[7];
>> -                        snprintf(escape, sizeof(escape), "\\u%04X", ptr[0]);
>> -                        qstring_append(str, escape);
>> -                    } else {
>> -                        char buf[2] = { ptr[0], 0 };
>> -                        qstring_append(str, buf);
>> -                    }
>> -                    break;
>> +
>> +        for (; *ptr; ptr = end) {
>> +            cp = mod_utf8_codepoint(ptr, 6, &end);
>
> This provides more background: you never call mod_utf8_codepoint() with
> '\0' at offset 0. So handling that in mod_utf8_codepoint() may not be
> that important.

Yes, this caller doesn't care.  Doesn't mean we shouldn't try to come up
with a sane function contract.

Note the use of literal 6.  It means "unlimited".  Perfectly safe
because the string is nul-terminated.

> If a '\0' is found at offset >= 1, it will correctly trigger the /*
> continuation byte missing */ branch in mod_utf8_codepoint(). The retval
> is -1, and *end is left pointing to the NUL byte. (This is consistent
> with mod_utf8_codepoint()'s docs.)
>
> The -1 (incomplete sequence) produces the replacement character below,
> and the next time around *ptr is '\0', so we finish the loop. Seems OK.
>
> (
> An alternative interface for mod_utf8_codepoint() might be something like:
>
>     size_t alternative(const char *ptr, int *cp, size_t n);
>
> Resembling read() somewhat:
> - the return value would be the number of bytes consumed (it can't be
> negative (= fatal error), because we guarantee progress). 0 is EOF and
> only possible when "n" is 0.
> - "ptr" is the source,
> - "cp" is the output code point, -1 if invalid,
> - "n" is the bytes available in the source / requested to process at most.
>
> Encountering a \0 in the byte stream would be an error (*cp = -1), but
> would not terminate parsing per se.
>
> Then the loop would look like:
>
>     processed = 0;
>     while (processed < full) {
>       int cp;
>
>       rd = alternative(ptr + processed, &cp, full - processed);
>       g_assert(rd > 0);
>
>       /* look at cp */
>
>       processed += rd;
>   }
>
> But of course I'm not suggesting to rewrite the function!
> )

I'll keep this in mind when deciding how I want to handle '\0'.

>> +            switch (cp) {
>> +            case '\"':
>> +                qstring_append(str, "\\\"");
>> +                break;
>> +            case '\\':
>> +                qstring_append(str, "\\\\");
>> +                break;
>> +            case '\b':
>> +                qstring_append(str, "\\b");
>> +                break;
>> +            case '\f':
>> +                qstring_append(str, "\\f");
>> +                break;
>> +            case '\n':
>> +                qstring_append(str, "\\n");
>> +                break;
>> +            case '\r':
>> +                qstring_append(str, "\\r");
>> +                break;
>> +            case '\t':
>> +                qstring_append(str, "\\t");
>> +                break;
>
> The C standard also names \a (alert) and \v (vertical tab); I'm not sure
> about their JSON notation. (The (cp < 0x20) condition catches them below
> of course.)

JSON RFC 4627 defines only the seven above plus '\/'.  Escaping '/' that
way makes no sense for us, so the old code doesn't, and mine doesn't
either.

>> +            default:
>> +                if (cp < 0) {
>> +                    cp = 0xFFFD; /* replacement character */
>>                  }
>> +                if (cp > 0xFFFF) {
>> +                    /* beyond BMP; need a surrogate pair */
>> +                    snprintf(buf, sizeof(buf), "\\u%04X\\u%04X",
>> +                             0xD800 + ((cp - 0x10000) >> 10),
>> +                             0xDC00 + ((cp - 0x10000) & 0x3FF));
>
> Seems like we write 13 bytes into buf, OK. Also cp is never greater than
> 0x10FFFF, hence the difference is at most 0xFFFFF. The RHS surrogate
> half can go up to 0xDFFF, the LHS up to 0xD800+0x3FF == 0xDBFF. Good.

Exactly.

>> +                } else if (cp < 0x20 || cp >= 0x7F) {
>> +                    snprintf(buf, sizeof(buf), "\\u%04X", cp);
>> +                } else {
>> +                    buf[0] = cp;
>> +                    buf[1] = 0;
>>                  }
>> -            ptr++;
>> -        }
>> +                qstring_append(str, buf);
>> +            }
>> +        };
>> +
>>          qstring_append(str, "\"");
>>          break;
>>      }
>
> Seems OK.
>
>
>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>> index efec1b2..595ddc0 100644
>> --- a/tests/check-qjson.c
>> +++ b/tests/check-qjson.c
>
> I'll trust you on that one :)

Waah, you don't want another case of bleeding eyes?!?

> Reviewed-by: Laszlo Ersek <address@hidden>

Thanks!

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint(), (continued)
- [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings, Markus Armbruster, 2013/03/14
  - Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings, Laszlo Ersek, 2013/03/21
    - Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings, Markus Armbruster, 2013/03/22
    - Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings, Laszlo Ersek, 2013/03/22
- [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Markus Armbruster, 2013/03/14
  - Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Laszlo Ersek, 2013/03/21
  - Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Laszlo Ersek, 2013/03/22
    - Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Markus Armbruster <=
- Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/17
  - Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Markus Armbruster, 2013/03/18
- Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/23

Prev by Date: [Qemu-devel] [PATCH v5 7/7] Use qemu_put_buffer_async for guest memory pages
Next by Date: Re: [Qemu-devel] [PATCH v3] Use proper term in TCG README
Previous by thread: Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite
Next by thread: Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter
Index(es):
- Date
- Thread