[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 41/60] json: Nicer recovery from invalid lead
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH v2 41/60] json: Nicer recovery from invalid leading zero |
Date: |
Mon, 20 Aug 2018 13:39:49 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Eric Blake <address@hidden> writes:
> On 08/17/2018 10:05 AM, Markus Armbruster wrote:
>> For input 0123, the lexer produces the tokens
>>
>> JSON_ERROR 01
>> JSON_INTEGER 23
>>
>> Reporting an error is correct; 0123 is invalid according to RFC 7159.
>> But the error recovery isn't nice.
>>
>> Make the finite state machine eat digits before going into the error
>> state. The lexer now produces
>>
>> JSON_ERROR 0123
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> Reviewed-by: Eric Blake <address@hidden>
>
> Did you also want to reject invalid attempts at hex numbers, by adding
> [xXa-fA-F] to the set of characters eaten by IN_BAD_ZERO?
I put one foot on a slippery slope with this patch...
In review of v1, we discussed whether to try matching non-integer
numbers with redundant leading zero. Doing that tightly in the lexer
requires duplicating six states. A simpler alternative is to have the
lexer eat "digit salad" after redundant leading zero: 0[0-9.eE+-]+.
Your suggestion for hexadecimal numbers is digit salad with different
digits: [0-9a-fA-FxX]. Another option is their union: [0-9a-fA-FxX+-].
Even more radical would be eating anything but whitespace and structural
characters: [^][}{:, \t\n\r]. That idea pushed to the limit results in
a two-stage lexer: first stage finds token strings, where a token string
is a structural character or a sequence of non-structural,
non-whitespace characters, second stage rejects invalid token strings.
Hmm, we could try to recover from lexical errors more smartly in
general: instead of ending the JSON error token after the first
offending character, end it before the first whitespace or structural
character following the offending character.
I can try that, but I'd prefer to try it in a follow-up patch.
>> + [IN_BAD_ZERO] = {
>> + ['0' ... '9'] = IN_BAD_ZERO,
>> + },
>> +
- [Qemu-devel] [PATCH v2 11/60] check-qjson: Cover UTF-8 in single quoted strings, (continued)
- [Qemu-devel] [PATCH v2 11/60] check-qjson: Cover UTF-8 in single quoted strings, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 50/60] json: Unbox tokens queue in JSONMessageParser, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 32/60] json: Have lexer call streamer directly, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 53/60] json: Make JSONToken opaque outside json-parser.c, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 59/60] json: Improve safety of qobject_from_jsonf_nofail() & friends, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 46/60] json: Assert json_parser_parse() consumes all tokens on success, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 45/60] json: Fix streamer not to ignore trailing unterminated structures, Markus Armbruster, 2018/08/17
- [Qemu-devel] [PATCH v2 41/60] json: Nicer recovery from invalid leading zero, Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 40/60] json: Replace %I64d, %I64u by %PRId64, %PRIu64, Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 49/60] json: Streamline json_message_process_token(), Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 39/60] json: Leave rejecting invalid interpolation to parser, Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 51/60] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN, Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 43/60] qjson: Fix qobject_from_json() & friends for multiple values, Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 57/60] tests/drive_del-test: Fix harmless JSON interpolation bug, Markus Armbruster, 2018/08/17
[Qemu-devel] [PATCH v2 48/60] json: Enforce token count and size limits more tightly, Markus Armbruster, 2018/08/17