qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 21/28] qapi: Require valid names


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v5 21/28] qapi: Require valid names
Date: Sun, 29 Mar 2015 12:17:40 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 03/27/2015 02:48 AM, Markus Armbruster wrote:
>> Eric Blake <address@hidden> writes:
>> 
>>> Previous commits demonstrated that the generator overlooked various
>>> bad naming situations:
>>> - types, commands, and events need a valid name
>>> - union and alternate branches cannot be marked optional
>>>
>>> The set of valid names includes [a-zA-Z0-9._-] (where '.' is
>>> useful only in downstream extensions).
>>>
>
>>> +valid_characters = set(string.ascii_letters + string.digits + '.'
>>> + '-' + '_')
>> 
>> strings.ascii_letters depends on the locale...
>
> https://docs.python.org/2/library/string.html
>
> string.ascii_letters
>
>     The concatenation of the ascii_lowercase and ascii_uppercase
> constants described below. This value is not locale-dependent.
>
> You are thinking of string.letters, which IS locale-dependent.  I
> intentionally used ascii_letters.

You're right, I misread the docs.

>>> +def check_name(expr_info, source, name, allow_optional = False):
>>> +    membername = name
>>> +
>>> +    if not isinstance(name, str):
>>> +        raise QAPIExprError(expr_info,
>>> +                            "%s requires a string name" % source)
>>> +    if name == '**':
>>> +        return
>> 
>> Doesn't this permit '**' anywhere, not just as pseudo-type in command
>> arguments and results?
>
> Yes, on the grounds that check_type then filters it appropriately.  But
> worthy of a comment (probably both in the commit message AND in the code
> base).  Grounds for a v6 respin.
>
>> 
>>> +    if name.startswith('*'):
>>> +        membername = name[1:]
>>> +        if not allow_optional:
>>> +            raise QAPIExprError(expr_info,
>>> +                                "%s does not allow optional name '%s'"
>>> +                                % (source, name))
>>> +    if not set(membername) <= valid_characters:
>> 
>> ... so this check would break if we called locale.setlocale() in this
>> program.  While I don't think we need to worry about it, I think you
>> could just as well use something like
>> 
>>     valid_name = re.compile(r"^[A-Za-z0-9-._]+$")
>> 
>>     if not valid_name.match(membername):
>
> regex is slightly slower than string matching _if the regex is
> precompiled_, and MUCH slower than string matching if the regex is
> compiled every time.  In turn, string matching is slower than
> open-coding things, but has the benefit of being more compact and
> maintainable (open-coded loops are the worst on that front). Here's
> where I got my inspiration:
>
> https://stackoverflow.com/questions/1323364/in-python-how-to-check-if-a-string-only-contains-certain-characters
>
> But I may just go with the regex after all (I don't know how efficient
> python is about reusing a regex when a function is called multiple
> times, rather than recompiling the regex every time.  Personal side
> note: back in 2009 or so, I was able to make 'm4' significantly faster
> in the context of 'autoconf' when I taught it to cache the compilation
> of the 8 most-recently-encountered regex, rather than recompiling every
> time; and then made 'autoconf' even faster when I taught it to do
> actions that didn't require regex use from 'm4' in the first place.)

Neat.

> The nice thing, though, is that I factored things so that the
> implementation of this one function can change without having to hunt
> down all call-sites, if I keep the contract the same.

I don't care for matching performance here, I do care for readability.
Please pick the solution you find easiest to understand.

>>>          discriminator_type = base_fields.get(discriminator)
>>>          if not discriminator_type:
>>>              raise QAPIExprError(expr_info,
>> 
>> What happens when I try 'discriminator': '**'?
>
> No clue.  Good thing for me to add somewhere in my series.  However, I

I had a second look.  I think the generator accepting '**' in exactly
the right places relies on:

(1) check_name() accepts only proper names, not '**'.

(2) All names get checked with check_name().

(3) Except check_type() accepts special type name '**' when allow_star.

(4) allow_star is set for exactly the right places.

With check_name()'s superfluous / incorrect '**' check gone, (1)
obviously holds.  (2) isn't obvious, thus food for code review.  (3) is
trivial.  (4) is trivial except for "exactly the right places".
qapi-code-gen.txt:

    In rare cases, QAPI cannot express a type-safe representation of a
    corresponding Client JSON Protocol command.  In these cases, if the
    command expression includes the key 'gen' with boolean value false,
    then the 'data' or 'returns' member that intends to bypass generated
    type-safety and do its own manual validation should use '**' rather
    than a valid type name.

Unfortunately, "the 'data' or 'returns' member that intends to bypass
[...] should use '**' rather than a valid type name" can be read in (at
least) two ways:

1. It applies to the 'data', 'returns' members of the command object.

2. It applies to members of 'data', 'returns' members of the command
object.

The schema uses both readings: qom-get has 'returns': '**', and qom-set,
netdev_add, object_add have 'data' members of the form KEY: '**'.

Note that neither code nor tests have 'data': '**' or KEY: '**' in
'returns'.

qapi.py appears to implement a third way: '**' may appear as type name
anywhere within 'data' or 'returns'.  May well make sense, and may well
work, but we have no test coverage for it.

We can extend tests to cover what the generator accepts (separate series
recommended), or restrict the generator to what we actually use and
test.  I'm leaning towards the latter.

Further note that '**' can only be used right in a command declaration.
Factoring out the right hand side of 'data' or 'returns' into a separate
struct type declaration isn't possible when it contains '**'.

> did manage to have this series at least think about a base type with
> '*switch':'Enum', then use 'discriminator':'*switch', which got past the
> generator (who knows what the C code would have done if have_switch was
> false?), so I plugged that hole; but in the process of testing it, I
> noticed that '*switch':'Enum' paired with 'discriminator':'switch'
> complained that 'switch' was not a member of the base class (which is a
> lie; it is present in the base class, but as an optional member).  Proof
> that the generator is a bunch of hacks strung together :)

Indeed!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]