qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification


From: Alex Williamson
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Wed, 25 May 2016 08:22:15 -0600

On Wed, 25 May 2016 11:45:11 +0300
"Michael S. Tsirkin" <address@hidden> wrote:

> On Tue, May 24, 2016 at 08:54:06PM -0600, Alex Williamson wrote:
> > On Tue, 24 May 2016 13:49:12 +0300
> > "Michael S. Tsirkin" <address@hidden> wrote:
> >   
> > > On Tue, Apr 26, 2016 at 08:48:15AM -0600, Alex Williamson wrote:  
> > > > I think that means that if we want to switch from a
> > > > simple halt-on-error to a mechanism for the guest to handle recovery,
> > > > we need to disable access to the device between being notified that the
> > > > error occurred and being notified to resume.    
> > > 
> > > But this isn't what happens on bare metal.
> > > Errors are reported asynchronously and host might access the device
> > > meanwhile.  These accesses might or might not trigger more errors, but
> > > fundamentally this should not matter too much as device is going to be
> > > reset.  
> > 
> > Bare metal also doesn't have a hypervisor underneath performing a PCI
> > bus reset,  
> 
> This is where I get lost. I assumed we do reset when guest
> requests it. Isn't that the case? Why not?

Unless we can somehow opt-out vfio-pci devices from AER handling on the
host, then the host is going to try to recover the device as part of
the core AER handling, which is not driver directed.  The host driver
can register various callbacks, which is where we get the
error_detected notification, but until we get the resume notification,
I think we assume the device is being operated on by the core.

At the same time we're using the error_detected to signal the AER event
to the guest, which begins a similar recovery process.  We don't
particularly want to rely on how the host has recovered the device and
the guest driver needs to be aware that the device state may need to be
recovered, so we let the recovery proceed in both the host and guest,
but it seems we need some synchronization point and guest accesses to
the device while we know the host is still in recovery may do more harm
than good.
 
> > there's only one OS trying to control the device at a time,
> > so we have some clear differences from bare metal that I don't know we
> > can avoid.  The thought here was that we need to notify the guest at the
> > earliest point we can, but let the host recovery run to completion
> > before allowing the user to interact with the device.  Perhaps there is
> > no need to block region access to the device (ie. config space & BAR
> > resources), but I think we do need to somehow synchronize the bus resets
> > or else we get situations like that observed previously where the bus is
> > still in reset while userspace trys to proceed with using it.
> >  
> 
> Why do we have to trigger reset upon an error?
> Why not wait for guest to request reset?

Is there a way to opt-out of host AER handling?  Do we want to create a
special case for vfio-pci to do this?  Does doing so allow the user to
exploit the host in anyway, such as the user failing to recover the
device, continuing to signal error events on the host, and perhaps
leaving the device in a lest trustworthy state when returned to the
host (not that we should ever be trusting the state of the device at
that point).

> > The next question then would be whether that's QEMU's job or something
> > that should be done in the host kernel.  It's been proposed to add yet
> > another eventfd for the kernel vfio-pci to signal QEMU when a resume
> > notification has occured, but perhaps the better approach would be for
> > the hot reset ioctl (and base reset ioctl) to handle this situation more
> > transparently.  We could immediately return -EAGAIN and allow QEMU to
> > delay itself for any reset ioctl received after the AER error detected
> > event, but before the resume event.  We could also allow some sort of
> > timeout, that the ioctl might enter an interruptible sleep, woken on
> > the resume notification or timeout.  That sounds a bit better to me as
> > the specification of what's allowed between the error detected
> > notification and the resume notification is otherwise pretty poorly
> > defined.  
> 
> So if guest started reset, it might take a while for
> device to come out of that state, and access during this
> time might trigger errors. But that's already possible
> for guest to trigger, right?  How is this different?

You can look back through the history of this series, there was an
arbitrary delay added after reset because the device was still in
reset (presumably host and guest reset racing).  Yes we do not
currently block access to the device during a reset, the issue is
mostly that we don't expect device resets to be occurring in the host
except when directed by the guest.  In this case we expect a host
directed reset is occurring and the guest directed reset seems to be a
synchronization point.

> >  Do you think we can run completely asynchronous, letting the
> > host and guest bus resets race?  Thanks,
> > 
> > Alex  
> 
> I have a feeling we need to put some code out,
> disabled by default, and see how it behaves in the field.
> For example ability to trigger UR errors seems benign but
> I think we are trying to prevent them now because of
> something we saw in the field.

I don't really follow here, but I'm not in love with the idea of "let's
see how it behaves in the field" because I'm going to be stuck with the
support of that code whether it behaves appropriately or not.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]