qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/9] Require error handling for dynamically created object


From: Markus Armbruster
Subject: Re: [PATCH v3 0/9] Require error handling for dynamically created objects
Date: Fri, 06 Dec 2024 09:25:54 +0100
User-agent: Gnus/5.13 (Gnus v5.13)

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > NB, this series is targetting 10.0, NOT for 9.2 freeze.
>> >
>> > With code like
>> >
>> >     Object *obj = object_new(TYPE_BLAH)
>> >
>> > the caller can be pretty confident that they will successfully create
>> > an object instance of TYPE_BLAH. They know exactly what type has been
>> > requested, so it passing an abstract type for example, it is a clear
>> > programmer error that they'll get an assertion failure.
>> >
>> > Conversely with code like
>> >
>> >    void somefunc(const char *typename) {
>> >       Object * obj = object_new(typename)
>> >       ...
>> >    }
>> >
>> > all bets are off, because the call of object_new() knows nothing
>> > about what 'typename' resolves to.
>> 
>> We know nothing *locally*.
>> 
>> Commonly, a non-local argument can demonstrate safety.  Only when the
>> type name comes from the user, we truly know nothing.
>
> ...except for the failures introduced by modules not being installed,
> then all bets are off for all types unless you happen to recall
> which have been modularized so far.

True.  I was pretending modules don't exist until later in the message.

>> >                                    It could easily be an abstract
>> > type.
>> 
>> It could also be no type at all.
>> 
>> >       As a result, many code paths have added a manual check ahead
>> > of time
>> >
>> >    if (object_class_is_abstract(typename)) {
>> >       error_setg(errp, ....)
>> >    }
>> 
>> Actually, object_class_is_abstract() takes an ObjectClass, not a type
>> name string.
>> 
>> The actual guards we use are variations of
>> 
>>     klass = object_class_by_name(typename);
>>     if (!klass) {
>>         error_setg(errp, "invalid object type: %s", typename);
>>         return NULL;
>>     }
>> 
>>     if (object_class_is_abstract(klass)) {
>>         error_setg(errp, "object type '%s' is abstract", typename);
>>         return NULL;
>>     }
>> 
>> which covers "no type at all", too.
>> 
>> Sometimes, we use module_object_class_by_name() instead, which I believe
>> additionally loads the module providing the type, if any.  Which of the
>> two should be used where is a mystery to me, and I suspect we're getting
>> it wrong in places.  But this is turning into a digression.  To
>> hopefully maintain focus, I'm pretending modules don't exist until later
>> in this message.
>
> Yeah, I'm not a fan of having the separate module_object_class_by_name,
> because it requires us to remember whether something has been modularized
> or not.
>
>> Sometimes, we throw in an object_class_dynamic_cast(klass, T) to check
>> @typename resolves to a subtype of some T.
>> 
>> > ...except for where we forget to do this, such as qdev_new().
>> 
>> We did not forget it there!  It's by design a thin wrapper around
>> object_new(), with preconditions just like object_new().
>
> Yes, I think what I meant to write here, was "...except for where
> we forgot todo this in *callers* of qdev_new that take user input"

Correct.

>> > Overall 'object_new' is a bad design because it is inherantly
>> > unsafe to call with unvalidated typenames.
>> 
>> To be fair, object_new() was not designed for use with user-provided
>> type names.  When it chokes on type names not provided by the user, it's
>> clearly a programming error, and assert() is a perfectly fine way to
>> catch programming errors.  Same for qdev_new().
>> 
>> However, we do in fact use these functions with user-provided type
>> names, if rarely.  When we do, we need to validate the type name before
>> we pass it to them.
>> 
>> Trouble is the validation code is a bit involved, and reimplementing it
>> everywhere it's needed is asking for bugs.
>>
>> Creating and using more interfaces that are more convenient for this
>> purpose would avoid that.
>
> Yep, I don't have confidence in an API that will assert if the caller
> forgot to validate the pre-conditions that can be triggered by user
> input (or potentially other unexpected scenarios like something being
> switched over to a module).

Modules broke object_new(), but I'd rather not call object_new()'s
design bad for not accomodating a feature tacked on half-baked almost a
decade later.  But let's discuss modules further down.

Asserting preconditions isn't the problem; this is how preconditions
*should* be checked.  The problem is error-prone preconditions.

Using string type names is in theory error-prone: the compiler cannot
check the type name is valid.  It could be invalid because of a typo, or
because it names a type that's not linked into this binary.

The compiler could check with an enumeration, but then the header
defining needed to be included basically everywhere QOM is used, and
changed all the time.

So QOM went with strings.  I can't remember "invalid type name" bugs
surviving even basic testing in more than a decade of QOM use.

Except for *user-supplied* type names.  These need to be validated, we
failed to factor out common validation code, and ended up with bugs in
some of the copies.

>> > This problem is made worse by the proposal to introduce the idea
>> > of 'singleton' classes[1].
>> >
>> > Thus, this series suggests a way to improve safety at build
>> > time. The core idea is to allow 'object_new' to continue to be
>> > used *if-and-only-if* given a static, const string, because that
>> > scenario indicates the caller is aware of what type they are
>> > creating at build time.
>> >
>> > A new 'object_new_dynamic' method is proposed for cases where
>> > the typename is dynamically chosen at runtime. This method has
>> > an "Error **errp" parameter, which can report when an abstract
>> > type is created, leaving the assert()s only for scenarios which
>> > are unambiguous programmer errors.
>> >
>> > With a little macro magic, we guarantee a compile error is
>> > generated if 'object_new' is called with a dynamic type, forcing
>> > all potentially unsafe code over to object_new_dynamic.
>> 
>> Three cases:
>> 
>> 1. Type name is literal string.  No change.  This is the most common
>>    case.
>> 
>> 2. It's not.
>> 
>> 2a. Type name is user-provided.  This is rare.  We replace
>> 
>>         if (... guard ...) {
>>             ... return failure ...
>>         }
>>         obj = object_new(...);
>> 
>>     by
>> 
>>         obj = object_new_dynamic(..., errp);
>>         if (!obj) {
>>             ... return failure ...
>>         }
>> 
>>     This is an improvement.
>> 
>> 2b. It's not.  We should replace
>> 
>>         obj = object_new(...);
>> 
>>     by
>> 
>>         obj = object_new_dynamic(..., &error_abort);
>> 
>>     Exact same behavior, just wordier, to placate the compiler.
>>     Tolerable as long as it's relatively rare.
>> 
>>     But I'm not sure it's worthwhile.  All it really does is helping
>>     some towards not getting case 2a wrong.  But 2a is rare.
>
> Yes, 2a is fairly rare, but this is amplified by the consequences
> of getting it wrong, which are an assert killing your running VM.
> My goal was to make it much harder to screw up and trigger an
> assert, even if that makes some valid uses more verbose.

Has this been a problem in practice?  We have thirteen years of
experience...

>> > This is more tractable than adding 'Error **errp' to 'object_new'
>> > as only a handful of places use a dynamic type name.
>> 
>> True!
>> 
>> Alright, enter modules.
>> 
>> Modules break a fundamental design assumption: object_new() on a
>> compiled-in type name is safe, i.e. the failure modes are all
>> programming errors.
>> 
>> Modules add new failure modes that are *not* programming errors:
>> 
>> * The module providing the type was not deployed correctly.
>> 
>> * It was, but the host system lacks the resources to load it.
>
> Hmm, yes, I hadn't considered the 2nd problem. That's more
> unpleasant, as libvirt may well have queried QEMU earlier to
> detect the missing module, and assume all is safe if it is
> present.

The first problem could perhaps be hand-waved away: must deploy all
modules or else.  But that partly defeats the purpose of modules, namely
keeping unwanted dependencies off the host.

The second problem cannot: assertion failure on otherwise survivable
resource shortage is unequivocally wrong.

For more on module trouble, see "Problem 3: Loadable modules" in my memo
"Dynamic & heterogeneous machines, initial configuration: problems"[*]

>> Before modules, object_new(T) was safe unless T was user-provided.
>> Which implies it's safe when T is a literal string.
>> 
>> Since modules, object_new(T) is safe unless T is user-provided or the
>> type named by it is compiled as module.  This does *not* imply it's safe
>> when T is a literal string.
>
> Agreed. 
>
>> When looking at a use of object_new(), whether the argument names a type
>> that could be compiled as module cannot be known locally.  Therefore, we
>> cannot know locally whether we need to handle failure, either with a
>> suitable guard or by switching to a new function like
>> object_new_dynamic().  This is bad.
>>
>> Breaking fundamental design assumptions tends to have ugly and expensive
>> consequences.  Consequences like having to rework every single call of
>> object_new() & friends.
>> 
>> Can we reduce the damage?  Maybe.  What if we create a
>> module_object_new() that takes an Error **, and make object_new() crash
>> & burn when the @typename argument resolves to a type provided by a
>> module?
>
> I'm doubtful about a design where maintainers have to choose the
> right API, based on mental knowledge of what is a module or not.
> The place where object_new is called is typically distinct from
> the module impl, so the knowledge is separated. This opens the
> door to forgetting to change code from object_new to module_object_new.
> This is what motivated my attempt to try to force compile time errors
> scenarios which had a high chance of being user specified types.

We'd have to rely on the test suite here, as we've done for "type name
is valid".

> I don't have a good answer for how to extend compile time validation
> to cover non-user specified types that might be modules, without
> changnig 'object_new' itself to add "Error **errp" and convert as
> many callers as possible to propagate errors. That's a huge pile
> of tedious work and in many cases would deteriorate  to &error_abort
> since some key common use scenarios lack a "Error *errp" to propagate
> into.

I can offer two ideas.

I'll start with devices for reasons that will become apparent in a
minute.

The first idea is straighforward in conception: since the problem is
modules breaking existing design assumptions, unbreak them.

Device creation cannot fail, only realize can.  Could we delay the
problematic failure modes introduced by modules from creation to
realize?

When creating the real thing fails, create a dummy instead.  Of course,
the dummy needs to be sufficiently functional to provide for the things
we do with devices before realize, such as introspection.

Note that we already link information on modules into the binary, so
that the binary knows which modules provide a certain object.  To enable
sufficiently functional dummies, we'd have to link more.

The difficulty is "the things we do with devices before realize": do we
even know?

The other difficulty is that objects don't have realize.  User-creatable
objects have complete, which is kind of similar.  See also "Problem 5:
QOM lacks a clear life cycle" in my memo "Dynamic & heterogeneous
machines, initial configuration: problems"[*].

The second idea is a variation of your idea to provide two interfaces
for object creation, where using the wrong one won't compile: a common
one that cannot fail, i.e. object_new(), and an uncommon one that can.
Let's call that one object_try_new() for now.

Your proposed "string literal" as a useful approximation of "cannot
fail".  Modules defeat that.

What if we switch from strings to something more expressive?

Step one: replace string type names by symbols

Change

    #define TYPE_FOO "foo"

    Object *object_new(const char *typename);

to something like

    extern const TypeInfoSafe foo_info;
    #define TYPE_FOO &foo_info

    Object *object_new(const TypeInfoSafe *type_info);

Step two: different symbols for safe and unsafe types

    extern const TypeInfoUnsafe bar_info;
    #define TYPE_BAR &bar_info
    
    Object *object_try_new(const TypeInfoUnsafe *type_info);

Now you cannot pass bar_info to object_new().

For a module-enabled TYPE_BAR, we already have something like

    module_obj(TYPE_BAR)

Make macro module_obj() require its argument to be TypeInfoUnsafe.

Voilà, the compiler enforces use of object_try_new() for objects
provided by loadable modules.

There will be some fallout around computed type names such as
ACCEL_OPS_NAME().  Fairly rare, I think.

More fallout around passing TYPE_ macros to functions that accept both
safe and unsafe types.  How common is that?

Worth exploring?

>> Maybe module_object_new() and object_new_dynamic() could be fused into a
>> single function with a better name.
>> 
>> > With this series, my objections to Peter Xu's singleton series[1]
>> > would be largely nullified.
>> >
>> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
>> 
>
> With regards,
> Daniel


[*] Message-ID: <87o7d1i7ky.fsf@pond.sub.org>
https://lore.kernel.org/qemu-devel/87o7d1i7ky.fsf@pond.sub.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]