[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 0/9] Require error handling for dynamically created object
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH v3 0/9] Require error handling for dynamically created objects |
Date: |
Thu, 5 Dec 2024 16:04:39 +0000 |
User-agent: |
Mutt/2.2.13 (2024-03-09) |
On Wed, Dec 04, 2024 at 12:07:58PM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > NB, this series is targetting 10.0, NOT for 9.2 freeze.
> >
> > With code like
> >
> > Object *obj = object_new(TYPE_BLAH)
> >
> > the caller can be pretty confident that they will successfully create
> > an object instance of TYPE_BLAH. They know exactly what type has been
> > requested, so it passing an abstract type for example, it is a clear
> > programmer error that they'll get an assertion failure.
> >
> > Conversely with code like
> >
> > void somefunc(const char *typename) {
> > Object * obj = object_new(typename)
> > ...
> > }
> >
> > all bets are off, because the call of object_new() knows nothing
> > about what 'typename' resolves to.
>
> We know nothing *locally*.
>
> Commonly, a non-local argument can demonstrate safety. Only when the
> type name comes from the user, we truly know nothing.
...except for the failures introduced by modules not being installed,
then all bets are off for all types unless you happen to recall
which have been modularized so far.
> > It could easily be an abstract
> > type.
>
> It could also be no type at all.
>
> > As a result, many code paths have added a manual check ahead
> > of time
> >
> > if (object_class_is_abstract(typename)) {
> > error_setg(errp, ....)
> > }
>
> Actually, object_class_is_abstract() takes an ObjectClass, not a type
> name string.
>
> The actual guards we use are variations of
>
> klass = object_class_by_name(typename);
> if (!klass) {
> error_setg(errp, "invalid object type: %s", typename);
> return NULL;
> }
>
> if (object_class_is_abstract(klass)) {
> error_setg(errp, "object type '%s' is abstract", typename);
> return NULL;
> }
>
> which covers "no type at all", too.
>
> Sometimes, we use module_object_class_by_name() instead, which I believe
> additionally loads the module providing the type, if any. Which of the
> two should be used where is a mystery to me, and I suspect we're getting
> it wrong in places. But this is turning into a digression. To
> hopefully maintain focus, I'm pretending modules don't exist until later
> in this message.
Yeah, I'm not a fan of having the separate module_object_class_by_name,
because it requires us to remember whether something has been modularized
or not.
> Sometimes, we throw in an object_class_dynamic_cast(klass, T) to check
> @typename resolves to a subtype of some T.
>
> > ...except for where we forget to do this, such as qdev_new().
>
> We did not forget it there! It's by design a thin wrapper around
> object_new(), with preconditions just like object_new().
Yes, I think what I meant to write here, was "...except for where
we forgot todo this in *callers* of qdev_new that take user input"
> > Overall 'object_new' is a bad design because it is inherantly
> > unsafe to call with unvalidated typenames.
>
> To be fair, object_new() was not designed for use with user-provided
> type names. When it chokes on type names not provided by the user, it's
> clearly a programming error, and assert() is a perfectly fine way to
> catch programming errors. Same for qdev_new().
>
> However, we do in fact use these functions with user-provided type
> names, if rarely. When we do, we need to validate the type name before
> we pass it to them.
>
> Trouble is the validation code is a bit involved, and reimplementing it
> everywhere it's needed is asking for bugs.
> Creating and using more interfaces that are more convenient for this
> purpose would avoid that.
Yep, I don't have confidence in an API that will assert if the caller
forgot to validate the pre-conditions that can be triggered by user
input (or potentially other unexpected scenarios like something being
switched over to a module).
> > This problem is made worse by the proposal to introduce the idea
> > of 'singleton' classes[1].
> >
> > Thus, this series suggests a way to improve safety at build
> > time. The core idea is to allow 'object_new' to continue to be
> > used *if-and-only-if* given a static, const string, because that
> > scenario indicates the caller is aware of what type they are
> > creating at build time.
> >
> > A new 'object_new_dynamic' method is proposed for cases where
> > the typename is dynamically chosen at runtime. This method has
> > an "Error **errp" parameter, which can report when an abstract
> > type is created, leaving the assert()s only for scenarios which
> > are unambiguous programmer errors.
> >
> > With a little macro magic, we guarantee a compile error is
> > generated if 'object_new' is called with a dynamic type, forcing
> > all potentially unsafe code over to object_new_dynamic.
>
> Three cases:
>
> 1. Type name is literal string. No change. This is the most common
> case.
>
> 2. It's not.
>
> 2a. Type name is user-provided. This is rare. We replace
>
> if (... guard ...) {
> ... return failure ...
> }
> obj = object_new(...);
>
> by
>
> obj = object_new_dynamic(..., errp);
> if (!obj) {
> ... return failure ...
> }
>
> This is an improvement.
>
> 2b. It's not. We should replace
>
> obj = object_new(...);
>
> by
>
> obj = object_new_dynamic(..., &error_abort);
>
> Exact same behavior, just wordier, to placate the compiler.
> Tolerable as long as it's relatively rare.
>
> But I'm not sure it's worthwhile. All it really does is helping
> some towards not getting case 2a wrong. But 2a is rare.
Yes, 2a is fairly rare, but this is amplified by the consequences
of getting it wrong, which are an assert killing your running VM.
My goal was to make it much harder to screw up and trigger an
assert, even if that makes some valid uses more verbose.
> > This is more tractable than adding 'Error **errp' to 'object_new'
> > as only a handful of places use a dynamic type name.
>
> True!
>
> Alright, enter modules.
>
> Modules break a fundamental design assumption: object_new() on a
> compiled-in type name is safe, i.e. the failure modes are all
> programming errors.
>
> Modules add new failure modes that are *not* programming errors:
>
> * The module providing the type was not deployed correctly.
>
> * It was, but the host system lacks the resources to load it.
Hmm, yes, I hadn't considered the 2nd problem. That's more
unpleasant, as libvirt may well have queried QEMU earlier to
detect the missing module, and assume all is safe if it is
present.
>
> Before modules, object_new(T) was safe unless T was user-provided.
> Which implies it's safe when T is a literal string.
>
> Since modules, object_new(T) is safe unless T is user-provided or the
> type named by it is compiled as module. This does *not* imply it's safe
> when T is a literal string.
Agreed.
> When looking at a use of object_new(), whether the argument names a type
> that could be compiled as module cannot be known locally. Therefore, we
> cannot know locally whether we need to handle failure, either with a
> suitable guard or by switching to a new function like
> object_new_dynamic(). This is bad.
>
> Breaking fundamental design assumptions tends to have ugly and expensive
> consequences. Consequences like having to rework every single call of
> object_new() & friends.
>
> Can we reduce the damage? Maybe. What if we create a
> module_object_new() that takes an Error **, and make object_new() crash
> & burn when the @typename argument resolves to a type provided by a
> module?
I'm doubtful about a design where maintainers have to choose the
right API, based on mental knowledge of what is a module or not.
The place where object_new is called is typically distinct from
the module impl, so the knowledge is separated. This opens the
door to forgetting to change code from object_new to module_object_new.
This is what motivated my attempt to try to force compile time errors
scenarios which had a high chance of being user specified types.
I don't have a good answer for how to extend compile time validation
to cover non-user specified types that might be modules, without
changnig 'object_new' itself to add "Error **errp" and convert as
many callers as possible to propagate errors. That's a huge pile
of tedious work and in many cases would deteriorate to &error_abort
since some key common use scenarios lack a "Error *errp" to propagate
into.
> Maybe module_object_new() and object_new_dynamic() could be fused into a
> single function with a better name.
>
> > With this series, my objections to Peter Xu's singleton series[1]
> > would be largely nullified.
> >
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-10/msg05524.html
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|