qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] virtio device error reporting best practice?


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] virtio device error reporting best practice?
Date: Thu, 20 Mar 2014 08:51:58 +0200

On Wed, Mar 19, 2014 at 11:04:19AM +1030, Rusty Russell wrote:
> Dave Airlie <address@hidden> writes:
> > So I'm looking at how best to do virtio gpu device error reporting,
> > and how to deal with illegal stuff,
> >
> > I've two levels of errors I want to support,
> >
> > a) unrecoverable or bad guest kernel programming errors,
> 
> The QEMU standard approach is to exit at this point.  No, really.

It's easy on the hypervisor but often not very friendly for driver writers
who might not be qemu experts.
QEMU's moving away from exiting on errors and it would be nice
to have a robust way to report driver bugs.
How about setting VIRTIO_CONFIG_S_DEVICE_FAILED ?

Another idea that windows driver implemented is reporting
failure reason hint. They wrote it out to ISR, specifically
they notified host about watchdog timer expiration for net device
in this way.

> > b) per 3D context errors from the renderer backend,
> >
> > (b) I can easily report in an event queue and the guest kernel can in
> > theory blow away the offenders, this is how GL works with some
> > extensions,
> 
> That's probably sanest.

If it's possible to identify the offenders, I agree
a VQ is better than config space for that.
Need to make sure the queue is big enough to avoid
underrun of that queue though. Is that always possible?

> > GPU control queue, the response should always be no error, but in some
> > cases it will be because the guest hit some host resource error, or
> > asked for something insane, (guest kernel drivers would be broken in
> > most of these cases).
> >
> > Alternately I can use the separate event queue to send async errors
> > when the guest does something bad,
> >
> > I'm also considering adding some sort of flag in config space saying
> > the device needs a reset before it will continue doing anything,
> 
> I generally dislike error codes which Never Happen; it's like making
> every void function return int just in case: the caller has no idea what
> to do if it fails.
> 
> The litmus test: does *your* guest handle failures other than by giving
> up on the device?  If so, sure, you need to have a sane error-reporting
> strategy.

Right but driver development is also a valid need.

> > The main reason I'm considering this stuff is for security reasons if
> > the guest asks for something really illegal or crazy what should the
> > expected behaviour of the host be? (at least secure I know that).
> 
> If the guest userspace can do it, don't exit.  If the kernel only, and
> it's should have known better, abort is OK.

I second that, at least for now.
Maybe we will add more capabilities in virtio 1.0, or
after that.

> Sure that doesn't help much!
> Rusty.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]