qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/virtio: reset virtio balloon stats on machine reset


From: Daniel P . Berrangé
Subject: Re: [PATCH] hw/virtio: reset virtio balloon stats on machine reset
Date: Thu, 19 Dec 2024 14:08:27 +0000
User-agent: Mutt/2.2.13 (2024-03-09)

On Thu, Dec 19, 2024 at 02:51:21PM +0100, David Hildenbrand wrote:
> On 18.12.24 18:29, Daniel P. Berrangé wrote:
> > When a machine is first booted, all virtio balloon stats are initialized
> > to their default value -1 (18446744073709551615 when represented as
> > unsigned).
> > 
> > They remain that way while the firmware is loading, and early phase of
> > guest OS boot, until the virtio-balloon driver is activated. Thereafter
> > the reported stats reflect the guest OS activity.
> > 
> > When a machine reset is performed, however, the virtio-balloon stats are
> > left unchanged by QEMU, despite the guest OS no longer updating them,
> > nor indeed even still existing.
> > 
> > IOW, the mgmt app keeps getting stale stats until the guest OS starts
> > once more and loads the virtio-balloon driver (if ever). At that point
> > the app will see a discontinuity in the reported values as they sudden
> > jump from the stale value to the new value. This jump is indigituishable
> > from a valid data update.
> > 
> > While there is an "last-updated" field to report on the freshness of
> > the stats, that does not unambiguously tell the mgmt app whether the
> > stats are still conceptually relevant to the current running workload.
> > 
> > It is more conceptually useful to reset the stats to their default
> > values on machine reset, given that the previous guest workload the
> > stats reflect no longer exists. The mgmt app can now clearly identify
> > that there are is no stats information available from the current
> > executing workload.
> > 
> > The 'last-updated' time is also reset back to 0.
> > 
> > IOW, on every machine reset, the virtio stats are in the same clean
> > state they were when the macine first powered on.
> > 
> > A functional test is added to validate this behaviour with a real
> > world guest OS.
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> > 
> > One side-thought I have, is whether it makes sense to add a
> > 'reset-count' field in the virtio stats, alongside the
> > 'last-updated' field. While apps can infer a reset from seeing
> > the stats all go back to their defaults, an explicit flag is
> > simpler...
> > 
> >   MAINTAINERS                             |   1 +
> >   hw/virtio/virtio-balloon.c              |  30 ++++-
> >   include/hw/virtio/virtio-balloon.h      |   4 +
> >   tests/functional/test_virtio_balloon.py | 161 ++++++++++++++++++++++++
> >   4 files changed, 195 insertions(+), 1 deletion(-)
> >   create mode 100755 tests/functional/test_virtio_balloon.py
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 822f34344b..1380d53d03 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2234,6 +2234,7 @@ F: include/hw/virtio/virtio-balloon.h
> >   F: system/balloon.c
> >   F: include/sysemu/balloon.h
> >   F: tests/qtest/virtio-balloon-test.c
> > +F: tests/functional/test_virtio_balloon.py
> >   virtio-9p
> >   M: Greg Kurz <groug@kaod.org>
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index ab2ee30475..fe0854e198 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -31,7 +31,7 @@
> >   #include "trace.h"
> >   #include "qemu/error-report.h"
> >   #include "migration/misc.h"
> > -
> > +#include "sysemu/reset.h"
> >   #include "hw/virtio/virtio-bus.h"
> >   #include "hw/virtio/virtio-access.h"
> > @@ -910,6 +910,8 @@ static void virtio_balloon_device_realize(DeviceState 
> > *dev, Error **errp)
> >       }
> >       reset_stats(s);
> > +    s->stats_last_update = 0;
> > +    qemu_register_resettable(OBJECT(dev));
> >   }
> >   static void virtio_balloon_device_unrealize(DeviceState *dev)
> > @@ -917,6 +919,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
> > *dev)
> >       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >       VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> > +    qemu_unregister_resettable(OBJECT(dev));
> >       if (s->free_page_bh) {
> >           qemu_bh_delete(s->free_page_bh);
> >           object_unref(OBJECT(s->iothread));
> > @@ -987,6 +990,27 @@ static void virtio_balloon_set_status(VirtIODevice 
> > *vdev, uint8_t status)
> >       }
> >   }
> 
> Using qemu_register_resettable() can have unfortunate side effects that this
> code is triggered when the device is reset, not necessarily when the
> complete machine.
> 
> For virtio-mem at least that's an issue, and here is how I'll fix it:
> 
> 20241218105303.1966303-2-david@redhat.com/">https://lore.kernel.org/qemu-devel/20241218105303.1966303-2-david@redhat.com/

Urgh, that's a rather horrible situation. While your patch works around
it quite effectively, it is pretty heavy weight, and of course relies on
maintainers knowing this scenario exists - they won't learn this easily
from the Resettable API design, nor its docs :-(

Shouldn't we put to extend the Resettable design to make this scenario
more explicity distinguishable in the Resettable callback implementations.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]