John Snow <jsnow@redhat.com> writes:
> Differentiate between "actively in the process of checking" and
> "checking has completed". This allows us to clean up the types of some
> internal fields such as QAPISchemaObjectType's members field which
> currently uses "None" as a canary for determining if check has
> completed.
Certain members become valid only after .check(). Two ways to code
that:
1. Assign to such members only in .check(). If you try to use them
before .check(), AttributeError. Nice. Drawback: pylint is unhappy,
W0201 attribute-defined-outside-init.
Can be overcome by declaring the field in __init__, which satisfies both the linter and my developer usability sense (Classes should be easy to have their properties enumerated by looking in one well known place.)
2. Assign None in .__init__(), and the real value in .check(). If you
try to use them before .check(), you get None, which hopefully leads to
an error. Meh, but pylint is happy.
I picked 2. because pylint's warning made me go "when in Rome..."
Yep, this is perfectly cromulent dynamically typed Python. It's not the Roman's fault I'm packing us up to go to another empire.
With type hints, we can declare in .__init__(), and assign in .check().
Gives me the AttributeError I like, and pylint remains happy. What do
you think?
Sounds good to me in general, I already changed this for 2/3 of my other uses of @property.
(I'm only reluctant because I don't really like that it's a "lie", but in this case, without potentially significant rewrites, it's a reasonable type band-aid. Once we're type checked, we can refactor more confidently if we so desire, to make certain patterns less prominent or landmine-y.)
Splitting .checked feels like a separate change, though. I can't quite
see why we need it. Help me out: what problem does it solve?
Mechanically, I wanted to eliminate the Optional type from the members field, but you have conditionals in the code that use the presence or absence of this field as a query to determine if we had run check or not yet.
So I did the stupidest possible thing and added a "checked" variable to explicitly represent it.
> This simplifies the typing from a cumbersome Optional[List[T]] to merely
> a List[T], which is more pythonic: it is safe to iterate over an empty
> list with "for x in []" whereas with an Optional[List[T]] you have to
> rely on the more cumbersome "if L: for x in L: ..."
Yes, but when L is None, it's *invalid*, and for i in L *should* fail
when L is invalid.
Sure, but it's so invalid that it causes static typing errors.
You can't write "for x in None" in a way that makes mypy happy, None is not iterable.
If you want to preserve the behavior of "iterating an empty collection is an Assertion", you need a custom iterator that throws an exception when the collection is empty. I can do that, if you'd like, but I think it's actually fine to just allow the collection to be empty and to just catch the error in check() with either an assertion (oops, something went wrong and the list is empty, this should never happen) or a QAPISemError (oops, you didn't specify any members, which is illegal.)
I'd prefer to catch this in check and just allow the iterator to permit empty iterators at the callsite, knowing it'll never happen.
I think the actual problem is something else. By adding the type hint
Optional[List[T]], the valid uses of L become type errors. We really
want L to be a List[T]. Doesn't mean we have to (or want to) make uses
of invalid L "work".
I didn't think I did allow for invalid uses to work - if the list should semantically never be empty, I think it's fine to enforce that in schema.py during construction of the schema object and to assume all uses of "for x in L: ..." are inherently valid.
> RFC: are we guaranteed to have members here? can we just use "if
> members" without adding the new field?
I'm afraid I don't understand these questions.
I think you answered this one for me; I was asking if it was ever valid in any circumstance to have an empty members list, but I think you've laid out in your response that it isn't.
And I think with that knowledge I can simplify this patch, but don't quite recall. (On my mobile again, please excuse my apparent laziness.)
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
> scripts/qapi/schema.py | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> index 164d86c4064..200bc0730d6 100644
> --- a/scripts/qapi/schema.py
> +++ b/scripts/qapi/schema.py
> @@ -18,7 +18,7 @@
> from collections import OrderedDict
> import os
> import re
> -from typing import List, Optional
> +from typing import List, Optional, cast
>
> from .common import (
> POINTER_SUFFIX,
> @@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond, features,
> self.base = None
> self.local_members = local_members
> self.variants = variants
> - self.members = None
> + self.members: List[QAPISchemaObjectTypeMember] = []
Can we do
self.members: List[QAPISchemaObjectTypeMember]
?
Possibly, but also possibly I can just initialize it to an empty list.
> + self._checking = False
>
> def check(self, schema):
> # This calls another type T's .check() exactly when the C
> # struct emitted by gen_object() contains that T's C struct
> # (pointers don't count).
> - if self.members is not None:
> - # A previous .check() completed: nothing to do
> - return
> - if self._checked:
> + if self._checking:
> # Recursed: C struct contains itself
> raise QAPISemError(self.info,
> "object %s contains itself" % self.name)
> + if self._checked:
> + # A previous .check() completed: nothing to do
> + return
>
> + self._checking = True
> super().check(schema)
> - assert self._checked and self.members is None
> + assert self._checked and not self.members
If we assign only in .check(), we can't read self.members to find out
whether it's valid. We could perhaps mess with .__dict__() instead.
Not sure it's worthwhile. Dumb down the assertion?
If I initialize to an empty list (and don't mess with the checked member) maybe
assert self._checked and not self.members
would be perfectly acceptable.
>
> seen = OrderedDict()
> if self._base_name:
> @@ -479,13 +481,17 @@ def check(self, schema):
> for m in self.local_members:
> m.check(schema)
> m.check_clash(self.info, seen)
> - members = seen.values()
> +
> + # check_clash is abstract, but local_members is asserted to be
> + # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
> + members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
>
> if self.variants:
> self.variants.check(schema, seen)
> self.variants.check_clash(self.info, seen)
>
> - self.members = members # mark completed
> + self.members = members
> + self._checking = False # mark completed
>
> # Check that the members of this type do not cause duplicate JSON members,
> # and update seen to track the members seen so far. Report any errors