[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RFC 3/6] qapi: rewrite string-input-visitor
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH RFC 3/6] qapi: rewrite string-input-visitor |
Date: |
Thu, 15 Nov 2018 10:48:09 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
David Hildenbrand <address@hidden> writes:
> On 14.11.18 18:38, Markus Armbruster wrote:
>> David Hildenbrand <address@hidden> writes:
>>
>>> The input visitor has some problems right now, especially
>>> - unsigned type "Range" is used to process signed ranges, resulting in
>>> inconsistent behavior and ugly/magical code
>>> - uint64_t are parsed like int64_t, so big uint64_t values are not
>>> supported and error messages are misleading
>>> - lists/ranges of int64_t are accepted although no list is parsed and
>>> we should rather report an error
>>> - lists/ranges are preparsed using int64_t, making it hard to
>>> implement uint64_t values or uint64_t lists
>>> - types that don't support lists don't bail out
>>
>> Known weirdness: empty list is invalid (test-string-input-visitor.c
>> demonstates). Your patch doesn't change that (or else it would update
>> the test). Should it be changed?
>>
>
> I don't change the test, so the old behavior still works.
> (empty string -> error)
Understand. Design question: should it remain an error? Feel free to
declare the question out of scope for this patch.
>>> So let's rewrite it by getting rid of usage of the type "Range" and
>>> properly supporting lists of int64_t and uint64_t (including ranges of
>>> both types), fixing the above mentioned issues.
>>>
>>> Lists of other types are not supported and will properly report an
>>> error. Virtual walks are now supported.
>>>
>>> Tests have to be fixed up:
>>> - Two BUGs were hardcoded that are fixed now
>>> - The string-input-visitor now actually returns a parsed list and not
>>> an ordered set.
>>>
>>> While at it, do some smaller style changes (e.g. use g_assert).
>>
>> Please don't replace assert() by g_assert() in files that consistently
>> use assert() now.
>
> Alright, will use assert() and don't replace. (I thought g_assert() was
> the new fancy thing to do it in QEMU).
>
>> In my opinion, g_assert() is one of the cases where GLib pointlessly
>> reinvents wheels that have been carrying load since forever.
>>
>>> Signed-off-by: David Hildenbrand <address@hidden>
>>> ---
>>> include/qapi/string-input-visitor.h | 3 +-
>>> qapi/string-input-visitor.c | 436 +++++++++++++++++-----------
>>> tests/test-string-input-visitor.c | 18 +-
>>> 3 files changed, 264 insertions(+), 193 deletions(-)
>>>
>>> diff --git a/include/qapi/string-input-visitor.h
>>> b/include/qapi/string-input-visitor.h
>>> index 33551340e3..2c40f7f5a6 100644
>>> --- a/include/qapi/string-input-visitor.h
>>> +++ b/include/qapi/string-input-visitor.h
>>> @@ -19,8 +19,7 @@ typedef struct StringInputVisitor StringInputVisitor;
>>>
>>> /*
>>> * The string input visitor does not implement support for visiting
>>> - * QAPI structs, alternates, null, or arbitrary QTypes. It also
>>> - * requires a non-null list argument to visit_start_list().
>>> + * QAPI structs, alternates, null, or arbitrary QTypes.
>>
>> Preexisting: neglects to spell out the list limitations, i.e. can do
>> only flat lists of integers. Care do fix that, too?
>
> Added "Only flat lists of integers (int64/uint64) are supported."
Hmm, do lists of narrower integer types also work? I guess they do: the
narrower visit_type_*int*() call v->type_*int64() via
visit_type_*intN().
Lists of type size are expressly excluded, in parse_type_size() below.
That's okay, we can lift the restriction when it gets in the way.
>>> */
>>> Visitor *string_input_visitor_new(const char *str);
>>>
>>> diff --git a/qapi/string-input-visitor.c b/qapi/string-input-visitor.c
>>> index dee708d384..16da47409e 100644
>>> --- a/qapi/string-input-visitor.c
>>> +++ b/qapi/string-input-visitor.c
>>> @@ -4,10 +4,10 @@
>>> * Copyright Red Hat, Inc. 2012-2016
>>> *
>>> * Author: Paolo Bonzini <address@hidden>
>>> + * David Hildenbrand <address@hidden>
>>> *
>>> * This work is licensed under the terms of the GNU LGPL, version 2.1 or
>>> later.
>>> * See the COPYING.LIB file in the top-level directory.
>>> - *
>>> */
>>>
>>> #include "qemu/osdep.h"
>>> @@ -18,21 +18,39 @@
>>> #include "qapi/qmp/qerror.h"
>>> #include "qapi/qmp/qnull.h"
>>> #include "qemu/option.h"
>>> -#include "qemu/queue.h"
>>> -#include "qemu/range.h"
>>> #include "qemu/cutils.h"
>>>
>>> +typedef enum ListMode {
>>> + /* no list parsing active / no list expected */
>>> + LM_NONE,
>>> + /* we have an unparsed string remaining */
>>> + LM_UNPARSED,
>>> + /* we have an unfinished int64 range */
>>> + LM_INT64_RANGE,
>>> + /* we have an unfinished uint64 range */
>>> + LM_UINT64_RANGE,
>>> + /* we have parsed the string completely and no range is remaining */
>>> + LM_END,
>>> +} ListMode;
>>> +
>>> +typedef union RangeLimit {
>>> + int64_t i64;
>>> + uint64_t u64;
>>> +} RangeLimit;
>>>
>>> struct StringInputVisitor
>>> {
>>> Visitor visitor;
>>>
>>> - GList *ranges;
>>> - GList *cur_range;
>>> - int64_t cur;
>>> + /* Porperties related to list processing */
>>
>> "Properties"
>>
>> Suggest
>>
>> /* List parsing state */
>
> Yes, will use that.
>
>>
>>> + ListMode lm;
>>> + RangeLimit rangeNext;
>>> + RangeLimit rangeEnd;
>>
>> RangeLimit is a good name for rangeEnd, but not so hot for rangeNext.
>> Naming is hard.
>
> Initially I had two unnamed unions to not have to give a name ;)
>
> RangeElement ? Will use that for now.
Works for me.
>>
>>> + const char *unparsed_string;
>>> + void *list;
>>
>> You drop /* Only needed for sanity checking the caller */.
>> Double-checking: is that not true anymore?
>
> I consider such comments bad as they can easily become outdated (and
> IMHO don't help anybody). But I can leave it here, don't really care.
I'm okay with dropping it.
> [...]
>
>>> + error_setg(errp, "Already processing a list.");
>>
>> Drop the period, please. error_setg()'s contract stipulates:
>>
>> * The resulting message should be a single phrase, with no newline or
>> * trailing punctuation.
>>
>> More of the same below.
>
> All dots dropped :)
>
> [...]
>>> static GenericList *next_list(Visitor *v, GenericList *tail, size_t size)
>>> {
>>> StringInputVisitor *siv = to_siv(v);
>>> - Range *r;
>>> -
>>> - if (!siv->ranges || !siv->cur_range) {
>>> - return NULL;
>>> - }
>>>
>>> - r = siv->cur_range->data;
>>> - if (!r) {
>>> + switch (siv->lm) {
>>> + case LM_NONE:
>>> + case LM_END:
>>> + /* we have reached the end of the list already or have no list */
>>> return NULL;
>>
>> Hmm. Is there a way to reach case LM_NONE other than a programming
>> error? If no, we should abort then. To do so, fold LM_NONE into the
>> default case below.
>
> Alright, all programming errors will be encoded as asserts.
Yes, please. In my experience, distinguishing between programming
errors and other errors really helps with understanding the code.
>>
>>> - }
>>> -
>>> - if (!range_contains(r, siv->cur)) {
>>> - siv->cur_range = g_list_next(siv->cur_range);
>>> - if (!siv->cur_range) {
>>> - return NULL;
>>> - }
>>> - r = siv->cur_range->data;
>>> - if (!r) {
>>> - return NULL;
>>> - }
>>> - siv->cur = range_lob(r);
>>> + case LM_INT64_RANGE:
>>> + case LM_UINT64_RANGE:
>>> + case LM_UNPARSED:
>>> + /* we have an unparsed string or something left in a range */
>>> + break;
>>> + default:
>>> + g_assert_not_reached();
>>
>> abort() suffices, and is more consistent with the rest of qapi/.
>
> Right, qapi is special ;)
Sort of. I like keeping things consistent and simple. Some parts of
QEMU consistently use plain assert(), some parts consistently use
g_assert(), and some mix them up with abandon. I'm fighting a rearguard
action against the latter.
> Will use s/g_assert/assert/
> s/g_assert_not_reached/abort/
Thanks!
>>
>>> }
>>>
>>> tail->next = g_malloc0(size);
>>> @@ -179,88 +109,216 @@ static GenericList *next_list(Visitor *v,
>>> GenericList *tail, size_t size)
>>> static void check_list(Visitor *v, Error **errp)
>>> {
>>> const StringInputVisitor *siv = to_siv(v);
>>> - Range *r;
>>> - GList *cur_range;
>>>
>>> - if (!siv->ranges || !siv->cur_range) {
>>> + switch (siv->lm) {
>>> + case LM_NONE:
>>> + error_setg(errp, "Not processing a list.");
>>
>> Same question as for next_list().
>
> abort() it is :) -> default case
>
>>
>>> + case LM_INT64_RANGE:
>>> + case LM_UINT64_RANGE:
>>> + case LM_UNPARSED:
>>> + error_setg(errp, "There are elements remaining in the list.");
>>
>> Hmm. qobject_input_check_list() reports "Only %u list elements expected
>> in %s" here. Looks unintuitive, until you remember how it's normally
>> used: to parse user input. The error gets reported when the parsing
>> didn't consume the whole list. Can only happen with a virtual walk.
>> And when it happens, the issue to report is "you provided a longer list
>> than I can accept". qobject_input_check_list() does exactly that. Your
>> message is less clear.
>
> We don't keep track of element count (and I don't like adding it just to
> print a error message).
The QObject input visitor has more of a need for identifying the
offending part of the input precisely, because it supports nesting.
> What about "Less list elements expected"? That at least matches the
> other error.
Good enough. I'd say "fewer", though.
>>
>>> return;
>>> - }
>>> -
>>> - r = siv->cur_range->data;
>>> - if (!r) {
>>> + case LM_END:
>>> return;
>>> + default:
>>> + g_assert_not_reached();
>>> }
>>> -
>>> - if (!range_contains(r, siv->cur)) {
>>> - cur_range = g_list_next(siv->cur_range);
>>> - if (!cur_range) {
>>> - return;
>>> - }
>>> - r = cur_range->data;
>>> - if (!r) {
>>> - return;
>>> - }
>>> - }
>>> -
>>> - error_setg(errp, "Range contains too many values");
>>> }
>>>
>>> static void end_list(Visitor *v, void **obj)
>>> {
>>> StringInputVisitor *siv = to_siv(v);
>>>
>>> - assert(siv->list == obj);
>>
>> I suspect state LM_NONE is also a programming error here. If that's
>> correct, let's assert(siv->lm != LM_NONE).
>
> Indeed, added.
>
> [...]
>>> + if (siv->rangeNext.i64 > siv->rangeEnd.i64 || *obj == INT64_MAX) {
>>> + /* end of range, check if there is more to parse */
>>> + if (siv->unparsed_string[0]) {
>>> + siv->lm = LM_UNPARSED;
>>> + } else {
>>> + siv->lm = LM_END;
>>> + }
>>> }
>>
>> Are you sure this is kosher?
>>
>> if siv->rangeNext.i64 and siv->rangeEnd are both initially INT64_MAX,
>> siv->rangeNext++ wraps around to INT64_MIN. Except when the compiler
>> exploits undefined behavior.
>
> *obj == INT64_MAX checks exactly that
>
> (it is the old value of siv->rangeNext.i64 before a possible wraparound).
By the time you check *obj == INT64_MAX, you already executed
siv->rangeNext++ with undefined behavior.
But wait... your code is fine because we compile with -fwrapv.
> BTW I added a test case for exactly that as I had the BUG you described
> in the code ;)
>
> [...]
>>> + return;
>>> + case LM_UNPARSED:
>>> + if (try_parse_int64_list_entry(siv, obj)) {
>>> + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name ? name :
>>> "null",
>>> + "list of int64 values or ranges");
>>> + }
>>
>> I figure I'd make try_parse_int64_list_entry() just parse, and on
>> success fall through to case LM_INT64_RANGE. But your solution works,
>> too.
>
> Then we would have to represent even single values as ranges, which is
> something I'd like to avoid.
Your artistic license applies.
>>> + return;
>>> + case LM_END:
>>> + error_setg(errp, "No more elements in the list.");
>>
>> This is the buddy of check_list() case LM_UNPARSED etc. There, input
>> provides more list members than we expect. Here, input provides less.
>> Again, the error is less clear than the one the QObject input visitor
>> reports: "Parameter '%s' is missing".
>>
>> Same for the unsigned copy below.
>
> Will also use "Less list elements expected".
Good enough. I'd say "fewer", though.
> "Parameter '%s' is missing" does not really apply to list elements
> without a name. We can discuss when I resend.
The QObject input visitor uses the list index as name, see
full_name_nth(). I'm *not* suggesting you implement that for the string
input visitor.
>>
>>> + return;
>>> + default:
>>> + error_setg(errp, "Lists don't support mixed types.");
>>
>> Is this a programming error?
>>
>> Same for the unsigned copy below.
>
> abort() it is.
>
> [...]
>>> +
>>> + if (siv->rangeNext.u64 > siv->rangeEnd.u64 || *obj == UINT64_MAX) {
>>> + /* end of range, check if there is more to parse */
>>> + if (siv->unparsed_string[0]) {
>>> + siv->lm = LM_UNPARSED;
>>> + } else {
>>> + siv->lm = LM_END;
>>> + }
>>> + }
>>
>> Unsigned wraps around fine. But when you fix the signed code, you
>> should probably keep this unsigned code as similar as practical.
>
> Again, *obj == UINT64_MAX checks this.
>
>>
>>> + return;
>>> + case LM_UNPARSED:
>>> + if (try_parse_uint64_list_entry(siv, obj)) {
>>> + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name ? name :
>>> "null",
>>> + "list of uint64 values or ranges");
>>> + }
>>> + return;
>>> + case LM_END:
>>> + error_setg(errp, "No more elements in the list.");
>>> + return;
>>> + default:
>>> + error_setg(errp, "Lists don't support mixed types.");
>>> + return;
>>> }
>>> }
>>>
>>> @@ -271,6 +329,11 @@ static void parse_type_size(Visitor *v, const char
>>> *name, uint64_t *obj,
>>> Error *err = NULL;
>>> uint64_t val;
>>>
>>> + if (siv->lm != LM_NONE) {
>>> + error_setg(errp, "Lists not supported for type \"size\"");
>>> + return;
>>> + }
>>> +
>>
>> Programming error? More of the same below.
>
> assert() it is.
>
> [...]
>>> visit_type_uint64List(v, NULL, &res, &error_abort);
>>> tail = res;
>>> for (i = 0; i < n; i++) {
>>> @@ -117,10 +107,10 @@ static void check_ulist(Visitor *v, uint64_t
>>> *expected, size_t n)
>>> static void test_visitor_in_intList(TestInputVisitorData *data,
>>> const void *unused)
>>> {
>>> - /* Note: the visitor *sorts* ranges *unsigned* */
>>> - int64_t expect1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 20 };
>>> + int64_t expect1[] = { 1, 2, 0, 2, 3, 4, 20, 5, 6, 7, 8, 9, 1, 2, 3, 4,
>>> 5,
>>
>> Please wrap this line a bit earlier.
>
> Will do.
>
>>
>>> + 6, 7, 8 };
>>> int64_t expect2[] = { 32767, -32768, -32767 };
>>> - int64_t expect3[] = { INT64_MAX, INT64_MIN };
>>> + int64_t expect3[] = { INT64_MIN, INT64_MAX };
>>
>> Did you mean to change this line?
>>
>
> Yes, we don't sort the list anymore.
D'oh!
>>> uint64_t expect4[] = { UINT64_MAX };
>>> Error *err = NULL;
>>> int64List *res = NULL;
>>> @@ -189,7 +179,7 @@ static void
>>> test_visitor_in_intList(TestInputVisitorData *data,
>>> visit_type_int64(v, NULL, &tail->value, &err);
>>> g_assert_cmpint(tail->value, ==, 0);
>>> visit_type_int64(v, NULL, &val, &err);
>>> - g_assert_cmpint(val, ==, 1); /* BUG */
>>> + error_free_or_abort(&err);
>>
>> Which of the problems listed in commit message is this?
>
> I guess the not mentioned "visiting beyond the end of a list is not
> handled properly" bug (most probably due to old range handling). Will
> add it to the description,
Thanks!
>>
>>> visit_check_list(v, &error_abort);
>>> visit_end_list(v, (void **)&res);
>>
>> Please don't let my review comments discourage you! This is so much
>> better than the code we have now, and feels pretty close to ready.
>> Thanks a lot!
>>
>
> I appreciate people reviewing my code :) So don't worry, I'll do
> requested changes and resend, then we'll see what we are still missing.
>
> BTW: I'll add support for single-element ranges (1-1) as the previous
> code seemed to accepted that.
Good idea.