[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is bugg
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite |
Date: |
Fri, 22 Mar 2013 15:51:46 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) |
Laszlo Ersek <address@hidden> writes:
> comments below
>
> On 03/14/13 18:49, Markus Armbruster wrote:
>
>> diff --git a/qobject/qjson.c b/qobject/qjson.c
>> index 83a6b4f..19085a1 100644
>> --- a/qobject/qjson.c
>> +++ b/qobject/qjson.c
>> @@ -136,68 +136,56 @@ static void to_json(const QObject *obj, QString *str,
>> int pretty, int indent)
>> case QTYPE_QSTRING: {
>> QString *val = qobject_to_qstring(obj);
>> const char *ptr;
>> + int cp;
>> + char buf[16];
>> + char *end;
>>
>> ptr = qstring_get_str(val);
>> qstring_append(str, "\"");
>> - while (*ptr) {
>> - if ((ptr[0] & 0xE0) == 0xE0 &&
>> - (ptr[1] & 0x80) && (ptr[2] & 0x80)) {
>> - uint16_t wchar;
>> - char escape[7];
>> -
>> - wchar = (ptr[0] & 0x0F) << 12;
>> - wchar |= (ptr[1] & 0x3F) << 6;
>> - wchar |= (ptr[2] & 0x3F);
>> - ptr += 2;
>> -
>> - snprintf(escape, sizeof(escape), "\\u%04X", wchar);
>> - qstring_append(str, escape);
>> - } else if ((ptr[0] & 0xE0) == 0xC0 && (ptr[1] & 0x80)) {
>> - uint16_t wchar;
>> - char escape[7];
>> -
>> - wchar = (ptr[0] & 0x1F) << 6;
>> - wchar |= (ptr[1] & 0x3F);
>> - ptr++;
>> -
>> - snprintf(escape, sizeof(escape), "\\u%04X", wchar);
>> - qstring_append(str, escape);
>> - } else switch (ptr[0]) {
>> - case '\"':
>> - qstring_append(str, "\\\"");
>> - break;
>> - case '\\':
>> - qstring_append(str, "\\\\");
>> - break;
>> - case '\b':
>> - qstring_append(str, "\\b");
>> - break;
>> - case '\f':
>> - qstring_append(str, "\\f");
>> - break;
>> - case '\n':
>> - qstring_append(str, "\\n");
>> - break;
>> - case '\r':
>> - qstring_append(str, "\\r");
>> - break;
>> - case '\t':
>> - qstring_append(str, "\\t");
>> - break;
>> - default: {
>> - if (ptr[0] <= 0x1F) {
>> - char escape[7];
>> - snprintf(escape, sizeof(escape), "\\u%04X", ptr[0]);
>> - qstring_append(str, escape);
>> - } else {
>> - char buf[2] = { ptr[0], 0 };
>> - qstring_append(str, buf);
>> - }
>> - break;
>> +
>> + for (; *ptr; ptr = end) {
>> + cp = mod_utf8_codepoint(ptr, 6, &end);
>
> This provides more background: you never call mod_utf8_codepoint() with
> '\0' at offset 0. So handling that in mod_utf8_codepoint() may not be
> that important.
Yes, this caller doesn't care. Doesn't mean we shouldn't try to come up
with a sane function contract.
Note the use of literal 6. It means "unlimited". Perfectly safe
because the string is nul-terminated.
> If a '\0' is found at offset >= 1, it will correctly trigger the /*
> continuation byte missing */ branch in mod_utf8_codepoint(). The retval
> is -1, and *end is left pointing to the NUL byte. (This is consistent
> with mod_utf8_codepoint()'s docs.)
>
> The -1 (incomplete sequence) produces the replacement character below,
> and the next time around *ptr is '\0', so we finish the loop. Seems OK.
>
> (
> An alternative interface for mod_utf8_codepoint() might be something like:
>
> size_t alternative(const char *ptr, int *cp, size_t n);
>
> Resembling read() somewhat:
> - the return value would be the number of bytes consumed (it can't be
> negative (= fatal error), because we guarantee progress). 0 is EOF and
> only possible when "n" is 0.
> - "ptr" is the source,
> - "cp" is the output code point, -1 if invalid,
> - "n" is the bytes available in the source / requested to process at most.
>
> Encountering a \0 in the byte stream would be an error (*cp = -1), but
> would not terminate parsing per se.
>
> Then the loop would look like:
>
> processed = 0;
> while (processed < full) {
> int cp;
>
> rd = alternative(ptr + processed, &cp, full - processed);
> g_assert(rd > 0);
>
> /* look at cp */
>
> processed += rd;
> }
>
> But of course I'm not suggesting to rewrite the function!
> )
I'll keep this in mind when deciding how I want to handle '\0'.
>> + switch (cp) {
>> + case '\"':
>> + qstring_append(str, "\\\"");
>> + break;
>> + case '\\':
>> + qstring_append(str, "\\\\");
>> + break;
>> + case '\b':
>> + qstring_append(str, "\\b");
>> + break;
>> + case '\f':
>> + qstring_append(str, "\\f");
>> + break;
>> + case '\n':
>> + qstring_append(str, "\\n");
>> + break;
>> + case '\r':
>> + qstring_append(str, "\\r");
>> + break;
>> + case '\t':
>> + qstring_append(str, "\\t");
>> + break;
>
> The C standard also names \a (alert) and \v (vertical tab); I'm not sure
> about their JSON notation. (The (cp < 0x20) condition catches them below
> of course.)
JSON RFC 4627 defines only the seven above plus '\/'. Escaping '/' that
way makes no sense for us, so the old code doesn't, and mine doesn't
either.
>> + default:
>> + if (cp < 0) {
>> + cp = 0xFFFD; /* replacement character */
>> }
>> + if (cp > 0xFFFF) {
>> + /* beyond BMP; need a surrogate pair */
>> + snprintf(buf, sizeof(buf), "\\u%04X\\u%04X",
>> + 0xD800 + ((cp - 0x10000) >> 10),
>> + 0xDC00 + ((cp - 0x10000) & 0x3FF));
>
> Seems like we write 13 bytes into buf, OK. Also cp is never greater than
> 0x10FFFF, hence the difference is at most 0xFFFFF. The RHS surrogate
> half can go up to 0xDFFF, the LHS up to 0xD800+0x3FF == 0xDBFF. Good.
Exactly.
>> + } else if (cp < 0x20 || cp >= 0x7F) {
>> + snprintf(buf, sizeof(buf), "\\u%04X", cp);
>> + } else {
>> + buf[0] = cp;
>> + buf[1] = 0;
>> }
>> - ptr++;
>> - }
>> + qstring_append(str, buf);
>> + }
>> + };
>> +
>> qstring_append(str, "\"");
>> break;
>> }
>
> Seems OK.
>
>
>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>> index efec1b2..595ddc0 100644
>> --- a/tests/check-qjson.c
>> +++ b/tests/check-qjson.c
>
> I'll trust you on that one :)
Waah, you don't want another case of bleeding eyes?!?
> Reviewed-by: Laszlo Ersek <address@hidden>
Thanks!
- Re: [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint(), (continued)
[Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings, Markus Armbruster, 2013/03/14
[Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite, Markus Armbruster, 2013/03/14
Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/17
Re: [Qemu-devel] [PATCH 0/4] Fix JSON string formatter, Blue Swirl, 2013/03/23