[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration
From: |
Alex Williamson |
Subject: |
Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration |
Date: |
Tue, 16 Oct 2012 08:48:04 -0600 |
On Tue, 2012-10-16 at 16:14 +0200, Michael S. Tsirkin wrote:
> On Tue, Oct 16, 2012 at 07:51:43AM -0600, Alex Williamson wrote:
> > On Tue, 2012-10-16 at 08:39 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 15, 2012 at 02:28:15PM -0600, Alex Williamson wrote:
> > > > This makes use of the new level irqfd support enabling bypass of
> > > > qemu userspace both on INTx injection and unmask. This significantly
> > > > boosts the performance of devices making use of legacy interrupts.
> > > >
> > > > Signed-off-by: Alex Williamson <address@hidden>
> > > > ---
> > > >
> > > > My INTx routing workaround below will probably raise some eyebrows,
> > > > but I don't feel it's worth subjecting users to core dumps if they
> > > > want to try vfio-pci on new platforms. INTx routing is part of some
> > > > larger plan, but until that plan materializes we have to try to avoid
> > > > the API unless we think there's a good chance it might be there.
> > > > I'll accept the maintenance of updating a whitelist in the interim.
> > > > Thanks,
> > > >
> > > > Alex
> > > >
> > > > hw/vfio_pci.c | 224
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 224 insertions(+)
> > > >
> > > > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
> > > > index 639371e..777a5f8 100644
> > > > --- a/hw/vfio_pci.c
> > > > +++ b/hw/vfio_pci.c
> > > > @@ -154,6 +154,53 @@ static uint32_t vfio_pci_read_config(PCIDevice
> > > > *pdev, uint32_t addr, int len);
> > > > static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
> > > >
> > > > /*
> > > > + * PCI code refuses to make it possible to probe whether the chipset
> > > > + * supports pci_device_route_intx_to_irq() and booby traps the call
> > > > + * to assert if doesn't. For us, this is just an optimization, so
> > > > + * only enable it when we know it's present. Unfortunately PCIBus is
> > > > + * private, so we can't just look at the function pointer.
> > > > + */
> > > > +static bool vfio_pci_bus_has_intx_route(PCIDevice *pdev)
> > > > +{
> > > > +#ifdef CONFIG_KVM
> > > > + BusState *bus = qdev_get_parent_bus(&pdev->qdev);
> > > > + DeviceState *dev;
> > > > +
> > > > + if (!kvm_irqchip_in_kernel() ||
> > > > + !kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) {
> > > > + return false;
> > > > + }
> > >
> > >
> > > Shouldn't we update linux-headers/ to get KVM_CAP_IRQFD_RESAMPLE?
> > > Also for KVM_IRQFD_FLAG_RESAMPLE.
> >
> > I posted the patch for that separately yesterday. I'll only request a
> > pull once that's in.
>
> OK missed that. In the future, might be a good idea to note dependencies
> in the patchset or repost them as part of patchset with appropriate
> tagging.
>
> > > > +
> > > > + for (; bus->parent; bus = qdev_get_parent_bus(dev)) {
> > > > +
> > > > + dev = bus->parent;
> > > > +
> > > > + if (!strncmp("i440FX-pcihost",
> > > > object_get_typename(OBJECT(dev)), 14)) {
> > > > + return true;
> > > > + }
> > > > + }
> > > > +
> > > > + error_report("vfio-pci: VM chipset does not support INTx routing, "
> > > > + "using slow INTx mode\n");
> > >
> > > When does this code trigger? It seems irqchip implies piix ATM -
> > > is this just dead code?
> >
> > Unused, but not unnecessary. Another chipset is under development,
> > which means very quickly irqchip will not imply piix.
>
> So this is for purposes of an out of tree stuff, let's
> keep these hacks out of tree too. My guess is
> q35 can just be added with pci_device_route_intx straight away.
>
> > Likewise irqfd
> > support is being added to other architectures, so I don't know how long
> > the kvm specific tests will hold up. Testing for a specific chipset
> > could of course be avoided if we were willing to support:
> >
> > bool pci_device_intx_route_supported(PCIDevice *pdev)
> >
> > or the NOROUTE option I posted previously.
>
> This is just moving the pain to users who will
> get bad performance and will have to debug it. Injecting
> through userspace is slow, new architectures should
> simply add irqfd straight away instead of adding
> work arounds in userspace.
>
> This is exactly why we have the assert in pci core.
Let's compare user experience:
As coded here:
- Error message, only runs slower if the driver actually uses INTx.
Result: file bug, continue using vfio-pci, maybe do something useful,
maybe find new bugs to file.
Blindly calling PCI code w/ assert:
- Assert. Result: file bug, full stop.
Maybe I do too much kernel programming, but I don't take asserts lightly
and prefer they be saved for "something is really broken and I can't
safely continue". This is not such a case.
> > > > +#endif
> > > > + return false;
> > > > +}
> > > > +
> > > > +static PCIINTxRoute vfio_pci_device_route_intx_to_irq(PCIDevice *pdev,
> > > > int pin)
> > > > +{
> > > > + if (!vfio_pci_bus_has_intx_route(pdev)) {
> > > > + return (PCIINTxRoute) { .mode = PCI_INTX_DISABLED, .irq = -1 };
> > > > + }
> > > > +
> > > > + return pci_device_route_intx_to_irq(pdev, pin);
> > > > +}
> > > > +
> > > > +static bool vfio_pci_intx_route_changed(PCIINTxRoute *old,
> > > > PCIINTxRoute *new)
> > > > +{
> > > > + return old->mode != new->mode || old->irq != new->irq;
> > > > +}
> > > > +
> > >
> > > Didn't you add an API for this? It's on pci branch but I can drop
> > > it if not needed.
> >
> > I did and I'll switch to it when available, but I have no idea when that
> > will be, so I've hedged my bets by re-implementing it here. 2 week+
> > turnover for a patch makes it difficult to coordinate dependent changes
> > on short qemu release cycles.
>
> It's available on pci branch, please base on that instead of master.
> Yes I merge at about 2 week intervals but the point is using
> subtrees. You do not need to block waiting for stuff to land
> in master.
If it wasn't such a trivial function to re-implement I'd do as you
suggest, but note that I'm still blocked doing a pull request to master
until you get your changes merged. There are only about 8 weeks of open
qemu 1.3 development, so a 2 week interval doesn't allow many cycles.
> > > > +/*
> > > > * Common VFIO interrupt disable
> > > > */
> > > > static void vfio_disable_irqindex(VFIODevice *vdev, int index)
> > > > @@ -185,6 +232,21 @@ static void vfio_unmask_intx(VFIODevice *vdev)
> > > > ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> > > > }
> > > >
> > > > +#ifdef CONFIG_KVM
> > > > +static void vfio_mask_intx(VFIODevice *vdev)
> > > > +{
> > > > + struct vfio_irq_set irq_set = {
> > > > + .argsz = sizeof(irq_set),
> > > > + .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
> > > > + .index = VFIO_PCI_INTX_IRQ_INDEX,
> > > > + .start = 0,
> > > > + .count = 1,
> > > > + };
> > > > +
> > > > + ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> > > > +}
> > > > +#endif
> > > > +
> > > > /*
> > > > * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > * also be a huge overhead. We try to get the best of both worlds by
> > > > @@ -248,6 +310,161 @@ static void vfio_eoi(VFIODevice *vdev)
> > > > vfio_unmask_intx(vdev);
> > > > }
> > > >
> > > > +static void vfio_enable_intx_kvm(VFIODevice *vdev)
> > > > +{
> > > > +#ifdef CONFIG_KVM
> > > > + struct kvm_irqfd irqfd = {
> > > > + .fd = event_notifier_get_fd(&vdev->intx.interrupt),
> > > > + .gsi = vdev->intx.route.irq,
> > > > + .flags = KVM_IRQFD_FLAG_RESAMPLE,
> > >
> > >
> > > Should not kvm ioctl handling be localized in kvm-all.c?
> > > E.g. extend kvm_irqchip_add_irqfd_notifier in
> > > some way? Same question for KVM_CAP_IRQFD_RESAMPLE use above ...
> >
> > Possibly, I took a survey of existing code and found examples of both.
>
> These are bugs really - this prevents adding a device to libhw
> which slows build for everyone. So pls do not copy such examples.
>
> > Therefore I decided to host it myself first and push it out to common
> > helpers later, which will also help me get rid of the #ifdefs here.
> > Thanks,
> >
> > Alex
>
> I think it is better to put it in the final place straight away.
Timing doesn't readily allow for that. The qemu 1.3 soft freeze is only
about 2 weeks away and I'm on vacation for most of that time. Adding it
locally has precedence, provides acceleration to users now, and adds an
in-tree user when I start splitting it out later. Given my history, I'm
just as likely to get push back for adding an API without a user as I am
for adding a user prior to the API. Thanks,
Alex
> > > > + };
> > > > + struct vfio_irq_set *irq_set;
> > > > + int ret, argsz;
> > > > + int32_t *pfd;
> > > > +
> > > > + if (!kvm_irqchip_in_kernel() ||
> > > > + vdev->intx.route.mode != PCI_INTX_ENABLED ||
> > > > + !kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) {
> > > > + return;
> > > > + }
> > > > +
> > > > + /* Get to a known interrupt state */
> > > > + qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
> > > > + vfio_mask_intx(vdev);
> > > > + vdev->intx.pending = false;
> > > > + qemu_set_irq(vdev->pdev.irq[vdev->intx.pin], 0);
> > > > +
> > > > + /* Get an eventfd for resample/unmask */
> > > > + if (event_notifier_init(&vdev->intx.unmask, 0)) {
> > > > + error_report("vfio: Error: event_notifier_init failed eoi\n");
> > > > + goto fail;
> > > > + }
> > > > +
> > > > + /* KVM triggers it, VFIO listens for it */
> > > > + irqfd.resamplefd = event_notifier_get_fd(&vdev->intx.unmask);
> > > > +
> > > > + if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
> > > > + error_report("vfio: Error: Failed to setup resample irqfd:
> > > > %m\n");
> > > > + goto fail_irqfd;
> > > > + }
> > > > +
> > > > + argsz = sizeof(*irq_set) + sizeof(*pfd);
> > > > +
> > > > + irq_set = g_malloc0(argsz);
> > > > + irq_set->argsz = argsz;
> > > > + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > > > VFIO_IRQ_SET_ACTION_UNMASK;
> > > > + irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
> > > > + irq_set->start = 0;
> > > > + irq_set->count = 1;
> > > > + pfd = (int32_t *)&irq_set->data;
> > > > +
> > > > + *pfd = irqfd.resamplefd;
> > > > +
> > > > + ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > > > + g_free(irq_set);
> > > > + if (ret) {
> > > > + error_report("vfio: Error: Failed to setup INTx unmask fd:
> > > > %m\n");
> > > > + goto fail_vfio;
> > > > + }
> > > > +
> > > > + /* Let'em rip */
> > > > + vfio_unmask_intx(vdev);
> > > > +
> > > > + vdev->intx.kvm_accel = true;
> > > > +
> > > > + DPRINTF("%s(%04x:%02x:%02x.%x) KVM INTx accel enabled\n",
> > > > + __func__, vdev->host.domain, vdev->host.bus,
> > > > + vdev->host.slot, vdev->host.function);
> > > > +
> > > > + return;
> > > > +
> > > > +fail_vfio:
> > > > + irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
> > > > + kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
> > > > +fail_irqfd:
> > > > + event_notifier_cleanup(&vdev->intx.unmask);
> > > > +fail:
> > > > + qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
> > > > + vfio_unmask_intx(vdev);
> > > > +#endif
> > > > +}
> > > > +
> > > > +static void vfio_disable_intx_kvm(VFIODevice *vdev)
> > > > +{
> > > > +#ifdef CONFIG_KVM
> > > > + struct kvm_irqfd irqfd = {
> > > > + .fd = event_notifier_get_fd(&vdev->intx.interrupt),
> > > > + .gsi = vdev->intx.route.irq,
> > > > + .flags = KVM_IRQFD_FLAG_DEASSIGN,
> > > > + };
> > > > +
> > > > + if (!vdev->intx.kvm_accel) {
> > > > + return;
> > > > + }
> > > > +
> > > > + /*
> > > > + * Get to a known state, hardware masked, QEMU ready to accept new
> > > > + * interrupts, QEMU IRQ de-asserted.
> > > > + */
> > > > + vfio_mask_intx(vdev);
> > > > + vdev->intx.pending = false;
> > > > + qemu_set_irq(vdev->pdev.irq[vdev->intx.pin], 0);
> > > > +
> > > > + /* Tell KVM to stop listening for an INTx irqfd */
> > > > + if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
> > > > + error_report("vfio: Error: Failed to disable INTx irqfd:
> > > > %m\n");
> > > > + }
> > > > +
> > > > + /* We only need to close the eventfd for VFIO to cleanup the
> > > > kernel side */
> > > > + event_notifier_cleanup(&vdev->intx.unmask);
> > > > +
> > > > + /* QEMU starts listening for interrupt events. */
> > > > + qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
> > > > +
> > > > + vdev->intx.kvm_accel = false;
> > > > +
> > > > + /* If we've missed an event, let it re-fire through QEMU */
> > > > + vfio_unmask_intx(vdev);
> > > > +
> > > > + DPRINTF("%s(%04x:%02x:%02x.%x) KVM INTx accel disabled\n",
> > > > + __func__, vdev->host.domain, vdev->host.bus,
> > > > + vdev->host.slot, vdev->host.function);
> > > > +#endif
> > > > +}
> > > > +
> > > > +static void vfio_update_irq(PCIDevice *pdev)
> > > > +{
> > > > + VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
> > > > + PCIINTxRoute route;
> > > > +
> > > > + if (vdev->interrupt != VFIO_INT_INTx) {
> > > > + return;
> > > > + }
> > > > +
> > > > + route = vfio_pci_device_route_intx_to_irq(&vdev->pdev,
> > > > vdev->intx.pin);
> > > > +
> > > > + if (!vfio_pci_intx_route_changed(&vdev->intx.route, &route)) {
> > > > + return; /* Nothing changed */
> > > > + }
> > > > +
> > > > + DPRINTF("%s(%04x:%02x:%02x.%x) IRQ moved %d -> %d\n", __func__,
> > > > + vdev->host.domain, vdev->host.bus, vdev->host.slot,
> > > > + vdev->host.function, vdev->intx.route.irq, route.irq);
> > > > +
> > > > + vfio_disable_intx_kvm(vdev);
> > > > +
> > > > + vdev->intx.route = route;
> > > > +
> > > > + if (route.mode != PCI_INTX_ENABLED) {
> > > > + return;
> > > > + }
> > > > +
> > > > + vfio_enable_intx_kvm(vdev);
> > > > +
> > > > + /* Re-enable the interrupt in cased we missed an EOI */
> > > > + vfio_eoi(vdev);
> > > > +}
> > > > +
> > > > static int vfio_enable_intx(VFIODevice *vdev)
> > > > {
> > > > uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN,
> > > > 1);
> > > > @@ -262,6 +479,9 @@ static int vfio_enable_intx(VFIODevice *vdev)
> > > > vfio_disable_interrupts(vdev);
> > > >
> > > > vdev->intx.pin = pin - 1; /* Pin A (1) -> irq[0] */
> > > > + vdev->intx.route = vfio_pci_device_route_intx_to_irq(&vdev->pdev,
> > > > +
> > > > vdev->intx.pin);
> > > > +
> > > > ret = event_notifier_init(&vdev->intx.interrupt, 0);
> > > > if (ret) {
> > > > error_report("vfio: Error: event_notifier_init failed\n");
> > > > @@ -290,6 +510,8 @@ static int vfio_enable_intx(VFIODevice *vdev)
> > > > return -errno;
> > > > }
> > > >
> > > > + vfio_enable_intx_kvm(vdev);
> > > > +
> > > > vdev->interrupt = VFIO_INT_INTx;
> > > >
> > > > DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
> > > > @@ -303,6 +525,7 @@ static void vfio_disable_intx(VFIODevice *vdev)
> > > > int fd;
> > > >
> > > > qemu_del_timer(vdev->intx.mmap_timer);
> > > > + vfio_disable_intx_kvm(vdev);
> > > > vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
> > > > vdev->intx.pending = false;
> > > > qemu_set_irq(vdev->pdev.irq[vdev->intx.pin], 0);
> > > > @@ -1870,6 +2093,7 @@ static int vfio_initfn(PCIDevice *pdev)
> > > > if (vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1)) {
> > > > vdev->intx.mmap_timer = qemu_new_timer_ms(vm_clock,
> > > >
> > > > vfio_intx_mmap_enable, vdev);
> > > > + pci_device_set_intx_routing_notifier(&vdev->pdev,
> > > > vfio_update_irq);
> > > > ret = vfio_enable_intx(vdev);
> > > > if (ret) {
> > > > goto out_teardown;
> >
> >
- [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Alex Williamson, 2012/10/15
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Michael S. Tsirkin, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Alex Williamson, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Michael S. Tsirkin, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration,
Alex Williamson <=
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Michael S. Tsirkin, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Alex Williamson, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Michael S. Tsirkin, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Alex Williamson, 2012/10/16
- Re: [Qemu-devel] [PATCH] vfio-pci: Add KVM INTx acceleration, Michael S. Tsirkin, 2012/10/16