|
From: | Eric Blake |
Subject: | Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs |
Date: | Fri, 10 Aug 2018 12:18:10 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
On 08/08/2018 07:03 AM, Markus Armbruster wrote:
The JSON parser treats each half of a surrogate pair as unpaired surrogate. Fix it to recognize surrogate pairs. Signed-off-by: Markus Armbruster <address@hidden> --- qobject/json-parser.c | 16 +++++++++++++++- tests/check-qjson.c | 3 +-- 2 files changed, 16 insertions(+), 3 deletions(-)
@@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token) cp |= hex2decimal(*ptr); }+ if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate+ && ptr[1] == '\\' && ptr[2] == 'u') { + ptr += 2; + leading_surrogate = cp; + goto hex; + } + if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) { + cp &= 0x3FF; + cp |= (leading_surrogate & 0x3FF) << 10; + cp += 0x010000; + } + if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) { parse_error(ctxt, token, "\\u%.4s is not a valid Unicode character",
Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being in range), but which decodes to u+10ffff. Since is_valid_codepoint() (part of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) == 0xfffe, it means we end up printing this error message, but only using the second half of the surrogate pair. Is that okay?
Otherwise, Reviewed-by: Eric Blake <address@hidden> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |