[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose |
Date: |
Mon, 18 Jan 2021 16:59:34 +0000 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
> > The virtio device/driver (e.g., vhost-scsi and indeed any device including
> > e1000e) may hang due to the lost of IRQ or the lost of doorbell register
> > kick, e.g.,
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> >
> > The virtio-net was in trouble in above link because the 'kick' was not
> > taking effect (missed).
> >
> > This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
> > narrow down if the issue is due to lost of irq/kick. So far the new
> > interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> > e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
> > IRQ).
> >
> > The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> > vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> > on purpose by admin at QEMU/host side for a specific device.
>
> I'm really not convinced that we want to give admins the direct ability to
> poke at internals of devices in a running QEMU. It feels like there is way
> too much potential for the admin to make a situation far worse by doing
> the wrong thing here,
We already do have commands to write to an iport, and to inject MCEs for
example; is this that much different?
> and people dealing with support tickets will have
> no idea that the admin has been poking internals of the device and broken
> it by doing something wrong.
You could add a one time log entry to say that this mischeivous command
had been used.
> You pointed to bug that hit where this could conceivably be useful, but
> that's a one time issue and should not a common occurrance that justifies
> making an official public API to poke at devices forever more IMHO.
I think where it might be practically useful is if you were debugging a
hung customers VM and need to find a way to get it to move again.
THat's something I'm not familiar with on the virtio side;
mst - is this useful from a virtio side?
Dave
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK