|
From: | Eric Blake |
Subject: | Re: [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences |
Date: | Fri, 10 Aug 2018 10:21:23 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
On 08/10/2018 09:40 AM, Markus Armbruster wrote:
+ cp = mod_utf8_codepoint(ptr, 6, &end);Why are you hard-coding 6 here, rather than computing min(6, strchr(ptr,0)-ptr)? If the user passes an invalid sequence at the end of the string, can we end up making mod_utf8_codepoint() read beyond the end of our string? Would it be better to just always pass the remaining string length (mod_utf8_codepoint() only cares about stopping short of 6 bytes, but never reads beyond there even if you pass a larger number)?mod_utf8_codepoint() never reads beyond '\0'. The second parameter exists only so you can further limit reads. I like to provide that capability, because it sometimes saves a silly substring copy.
Okay. Perhaps the comments on mod_utf8_codepoint() could make that more clear that the contract is not violated (I didn't spot it without a close re-read of the code, prompted by your reply). But that's possibly a separate patch.
+ if (codepoint > 0 && codepoint <= 0x7F) { + buf[0] = codepoint & 0x7F;Dead use of binary &. But acceptable for symmetry with the other code branches.Exactly as dead as ...
+ buf[0] = 0xF0 | ((codepoint >> 18) & 0x07);... even this one. The last one only because is_valid_codepoint() rejects codepoints > 0x10FFFFu, which is admittedly a non-local argument. I'm debating whether to keep or drop the redundant masking. Got a preference?
No strong preference. A compiler with good range propagation during optimization should be able to eliminate the dead mask from the emitted assembly.
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |