Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest

From:	Alex Williamson
Subject:	Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest
Date:	Sun, 06 Jan 2013 09:26:02 -0700
On Sun, 2013-01-06 at 10:50 +0200, Michael S. Tsirkin wrote:
> On Fri, Dec 21, 2012 at 08:38:07AM -0700, Alex Williamson wrote:
> > On Fri, 2012-12-21 at 14:21 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2012 at 03:12:46PM -0700, Alex Williamson wrote:
> > > > On Thu, 2012-12-20 at 18:36 +0200, Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 20, 2012 at 09:06:41AM -0700, Alex Williamson wrote:
> > > > > > Guests typically enable MSI-X with all of the vectors in the MSI-X
> > > > > > vector table masked.  Only when the vector is enabled does the 
> > > > > > vector
> > > > > > get unmasked, resulting in a vector_use callback.  These two points,
> > > > > > enable and unmask, correspond to pci_enable_msix() and request_irq()
> > > > > > for Linux guests.  Some drivers rely on VF/PF or PF/fw communication
> > > > > > channels that expect the physical state of the device to match the
> > > > > > guest visible state of the device.  They don't appreciate lazily
> > > > > > enabling MSI-X on the physical device.
> > > > > > 
> > > > > > To solve this, enable MSI-X with a single vector when the MSI-X
> > > > > > capability is enabled and immediate disable the vector.  This leaves
> > > > > > the physical device in exactly the same state between host and 
> > > > > > guest.
> > > > > > Furthermore, the brief gap where we enable vector 0, it fires into
> > > > > > userspace, not KVM, so the guest doesn't get spurious interrupts.
> > > > > > Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters
> > > > > > to enable MSI-X with zero vectors, but this will currently return an
> > > > > > error as the Linux MSI-X interfaces do not allow it.
> > > > > > 
> > > > > > Cc: address@hidden
> > > > > > Signed-off-by: Alex Williamson <address@hidden>
> > > > > 
> > > > > Do you need an interface for this?  Can you do low-level pci config
> > > > > access instead?  I imagine you would then enable MSIX and mask all
> > > > > vectors at the same time.
> > > > > 
> > > > > No?
> > > > 
> > > > I really don't like the idea of enabling MSI-X directly through config
> > > > space.  We're just asking for ownership conflicts doing that.  In fact,
> > > > vfio prevents MSI/X from being enabled through config space since those
> > > > are controlled through ioctl.
> > > 
> > > For vfio the natural thing to do would be to add interfaces
> > > to do this in a controlled manner.
> > 
> > The ioctl is the controlled manner.  The ioctl will actually accept
> > enabling zero vectors, but you'll get an -ERANGE generated from
> > pci_enable_msix, so I think the interface there is fine.
> 
> See comment below at the patch.
> 
> > > >  It also prevents access to the MSI-X
> > > > vector table since userspace has no business reading or modifying it.
> > > > Thanks,
> > > > 
> > > > Alex
> > > 
> > > I'm not sure what the point of this is. If a device can do DMA writes
> > > there is no way to distinguish between them and MSI on the bus.
> > > So this seems to buy us no additional security.
> > 
> > IOMMUs supporting interrupt remapping can.
> > Do you propose we have one
> > interface when interrupt remapping is present and another when it's not?
> > I can't think of anything useful that userspace can do with direct
> > access to MSIX setup, including this MSI-X enable stub.
> 
> I am simply pointing out that interrupts are not virtualized
> well at the moment. The guest has access to the device
> and that knows the real contents of the MSIX table.
> Using device as a side channel, guest and device can detect
> the difference between real and observed MSIX table content.
> It might not matter on most devices but it would be more
> robust if we made guest and device states match as closely
> as possible.

Actually, the end result of the operation below puts the physical device
in exactly the same state as the guest view of the device.  I agree that
an improvement to the kernel API would allow us to avoid the brief
transition through a single vector enabled and avoid teardown and
re-setup any time a new vector is enabled.

> > Do you have any actual objections to the patch below?  I agree that
> > kernel interfaces can be improved and I'd prefer the ability to enable
> > MSI-X with zero vectors and incrementally add and remove vectors, but
> > creating side channels so userspace can poke around here is not an
> > acceptable alternative imho.  We need a solution for users hitting this
> > now and I'm pretty happy with how this solution works.
> 
> It's ugly but for an existing vfio I don't see a better solution, no.
> 
> >  I only wish we could make legacy assignment behave as cleanly, but
> >  I'm not willing to poke MSI-X enable bits myself to make it happen.
> 
> With existing kernels I don't think there's a better option.
> 
> >  Thanks,
> > 
> > Alex
> > > > > > ---
> > > > > >  hw/vfio_pci.c |   31 +++++++++++++++++++++++++++----
> > > > > >  1 file changed, 27 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > VFIO makes this a bit cleaner, so I think this is both the stable 
> > > > > > and
> > > > > > final fix here.
> > > > > > 
> > > > > > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
> > > > > > index 7c27834..5178ccc 100644
> > > > > > --- a/hw/vfio_pci.c
> > > > > > +++ b/hw/vfio_pci.c
> > > > > > @@ -561,8 +561,9 @@ static int vfio_enable_vectors(VFIODevice 
> > > > > > *vdev, bool msix)
> > > > > >      return ret;
> > > > > >  }
> > > > > >  
> > > > > > -static int vfio_msix_vector_use(PCIDevice *pdev,
> > > > > > -                                unsigned int nr, MSIMessage msg)
> > > > > > +static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int 
> > > > > > nr,
> > > > > > +                                   MSIMessage msg, bool try_kvm,
> > > > > > +                                   IOHandler *handler)
> 
> 
> Instead of a flag, I would split this function with kvm handling separate 
> from vfio
> handling, then skip the kvm one for the dummy case.
> 
> > > > > >  {
> > > > > >      VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
> > > > > >      VFIOMSIVector *vector;
> > > > > > @@ -586,7 +587,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
> > > > > >       * Attempt to enable route through KVM irqchip,
> > > > > >       * default to userspace handling if unavailable.
> > > > > >       */
> > > > > > -    vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
> > > > > > +    vector->virq = try_kvm ? kvm_irqchip_add_msi_route(kvm_state, 
> > > > > > msg) : -1;
> > > > > >      if (vector->virq < 0 ||
> > > > > >          kvm_irqchip_add_irqfd_notifier(kvm_state, 
> > > > > > &vector->interrupt,
> > > > > >                                         vector->virq) < 0) {
> > > > > > @@ -595,7 +596,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
> > > > > >              vector->virq = -1;
> > > > > >          }
> > > > > >          
> > > > > > qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
> > > > > > -                            vfio_msi_interrupt, NULL, vector);
> > > > > > +                            handler, NULL, vector);
> > > > > >      }
> > > > > >  
> > > > > >      /*
> > > > > > @@ -638,6 +639,12 @@ static int vfio_msix_vector_use(PCIDevice 
> > > > > > *pdev,
> > > > > >      return 0;
> > > > > >  }
> > > > > >  
> > > > > > +static int vfio_msix_vector_use(PCIDevice *pdev,
> > > > > > +                                unsigned int nr, MSIMessage msg)
> > > > > > +{
> > > > > > +    return vfio_msix_vector_do_use(pdev, nr, msg, true, 
> > > > > > vfio_msi_interrupt);
> > > > > > +}
> > > > > > +
> > > > > >  static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int 
> > > > > > nr)
> > > > > >  {
> > > > > >      VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
> > > > > > @@ -696,6 +703,22 @@ static void vfio_enable_msix(VFIODevice *vdev)
> > > > > >  
> > > > > >      vdev->interrupt = VFIO_INT_MSIX;
> > > > > >  
> > > > > > +    /*
> > > > > > +     * Some communication channels between VF & PF or PF & fw rely 
> > > > > > on the
> > > > > > +     * physical state of the device and expect that enabling MSI-X 
> > > > > > from the
> > > > > > +     * guest enables the same on the host.  When our guest is 
> > > > > > Linux, the
> > > > > > +     * guest driver call to pci_enable_msix() sets the enabling 
> > > > > > bit in the
> > > > > > +     * MSI-X capability, but leaves the vector table masked.  We 
> > > > > > therefore
> > > > > > +     * can't rely on a vector_use callback (from request_irq() in 
> > > > > > the guest)
> > > > > > +     * to switch the physical device into MSI-X mode because that 
> > > > > > may come a
> > > > > > +     * long time after pci_enable_msix().  This code enables 
> > > > > > vector 0 with
> > > > > > +     * triggering to userspace, then immediately release the 
> > > > > > vector, leaving
> > > > > > +     * the physical device with no vectors enabled, but MSI-X 
> > > > > > enabled, just
> > > > > > +     * like the guest view.
> > > > > > +     */
> > > > > > +    vfio_msix_vector_do_use(&vdev->pdev, 0, (MSIMessage) { 0, 0 }, 
> > > > > > false, NULL);
> > > > > > +    vfio_msix_vector_release(&vdev->pdev, 0);
> > > > > > +
> 
> So use enables msix, but release does not disable, this is why
> I didn't think it's a clean interface.

This is a qemu interface issue.  There's no callback for enable or
disable, only use and release, which translates closer to unmask and
mask.  This is why vfio has to watch config space on it's own for the
enable and disable.  The code above is part of the enable hook.  The
disable hook includes code to disable MSIX.

>  the real issue is that
> existing kernels do not let you enable msix with 0 vectors,
> maybe better to have an API that abstracts that away, then when
> kernel learns to do this properly you can probe and do this
> reasonably.

We can already probe this reasonably, vfio should return -ERANGE if we
try to enable MSIX with no vectors on current kernels.  Thanks,

Alex

> > > > > >      if (msix_set_vector_notifiers(&vdev->pdev, 
> > > > > > vfio_msix_vector_use,
> > > > > >                                    vfio_msix_vector_release)) {
> > > > > >          error_report("vfio: msix_set_vector_notifiers failed\n");
> > > > 
> > > > 
> > 
> > 
> >
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest, Michael S. Tsirkin, 2013/01/06
- Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest, Alex Williamson <=
Prev by Date: Re: [Qemu-devel] [PATCH] pci-assign: Enable MSIX on device to match guest
Next by Date: Re: [Qemu-devel] [FYI] Testing new patch management scripts...
Previous by thread: Re: [Qemu-devel] [PATCH] vfio-pci: Make host MSI-X enable track guest
Next by thread: [Qemu-devel] [RFC PATCH 0/3] Time resync support by qemu-ga
Index(es):
- Date
- Thread