qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest
Date: Thu, 4 Feb 2016 13:21:57 +0200

On Thu, Feb 04, 2016 at 10:04:01AM +0800, Chen Fan wrote:
> 
> On 02/03/2016 09:57 PM, Michael S. Tsirkin wrote:
> >On Wed, Feb 03, 2016 at 04:54:01PM +0800, Chen Fan wrote:
> >>On 01/17/2016 02:34 AM, Michael S. Tsirkin wrote:
> >>>On Tue, Jan 12, 2016 at 10:43:01AM +0800, Cao jin wrote:
> >>>>From: Chen Fan <address@hidden>
> >>>>
> >>>>For now, for vfio pci passthough devices when qemu receives
> >>>>an error from host aer report, currentlly just terminate the guest,
> >>>>but usually user want to know what error occurred but stopping the
> >>>>guest, so this patches add aer capability support for vfio device,
> >>>>and pass the error to guest, and have guest driver to recover
> >>>>from the error.
> >>>I would like to see a version of this patchset that doesn't
> >>>depend on pci core changes.
> >>>I think that if you make this simplifying assumption:
> >>>
> >>>- all devices on same bus in guest are on same bus in host
> >>>
> >>>then you can handle both reset and hotplug simply in function 0
> >>>since it will belong to vfio.
> >>>
> >>>So we can have a version without pci core changes that simply assumes
> >>>this, and things will just work.
> >>>
> >>>
> >>>Now, if we wanted to enforce this limitation, I think the
> >>>cleanest way would be to add a callback in struct PCIDevice:
> >>>
> >>>   bool is_valid_function(PCIDevice *newfunction)
> >>>
> >>>and call it as each function is added.
> >>>This way aer function can validate that each function
> >>>added shares the same bus.
> >>>And this way issues will be detected directly and not when
> >>>function 0 is added.
> >>>
> >>>I would prefer this validation code to be a patch on top so we can merge
> >>>the functionality directly and avoid blocking it while we figure out the
> >>>best api to validate things.
> >>>
> >>>I don't see why making guest topology match host would
> >>>ever be a problem, but if it's required to support
> >>>configurations where these differ, I'd like to see
> >>>an attempt to address that be split out, after aer
> >>>is supported.
> >>Hi Michael,
> >>
> >>Just think about this more,  I think we also should check the vfio
> >>devices whether on the same bus at the time of function 0 is added.
> >>because we don't know the affected devices by a bus reset have
> >>already all been assigned to VM.
> >This is something vfio in kernel should check.
> >You can't rely on qemu being well behaved, so don't
> >even try to catch cases which would break host in userspace.
> >
> >qemu should only worry about not breaking guest.
> >
> >
> >>for example, the multi-function's hotplug.
> >>devices on same bus in host are added to VM one by one. when we
> >>test one device, we haven't yet added the other devices.
> >>so I think
> >>the patch should like below. then we could add a vfio_is_valid_function in
> >>vfio
> >>to test each device whether the affected devices on the same bus.
> >>
> >>Thanks,
> >>Chen
> >>
> >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>index d940f79..7163b56 100644
> >>--- a/hw/pci/pci.c
> >>+++ b/hw/pci/pci.c
> >>@@ -1836,6 +1836,38 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num,
> >>uint8_t devfn)
> >>      return bus->devices[devfn];
> >>  }
> >>
> >>+static int pci_bus_check_devices(PCIBus *bus)
> >>+{
> >>+    PCIDeviceClass *pc;
> >>+    int i, ret = 0;
> >>+
> >>+    for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
> >>+        if (!bus->devices[i]) {
> >>+            continue;
> >>+        }
> >>+
> >>+        pc = PCI_DEVICE_GET_CLASS(bus->devices[i]);
> >>+        if (!pc->is_valid_func) {
> >>+            continue;
> >>+        }
> >>+
> >>+        ret = pc->is_valid_func(bus->devices[i], bus);
> >>+        if (!ret) {
> >>+            return -1;
> >>+        }
> >>+    }
> >>+    return 0;
> >>+}
> >>+
> >>+static bool pci_is_valid_function(PCIDevice *pdev, PCIBus *bus)
> >>+{
> >>+    if (pdev->bus == bus) {
> >>+        return true;
> >>+    }
> >>+
> >>+    return false;
> >>+}
> >>+
> >I don't really understand what is this one doing.
> >Why do we need a default function?
> if the vfio driver in kernel can handle the bus reset for any one
> device in qemu without the affected devices assigned. I think
> we don't need this default one.
> BTW, IIRC at present the devices on the same bus in host can
> be assigned to different VM, so if we want to support this kind of
> bus reset for an independent device when enable aer, aren't we
> limiting the case that others devices on the same bus must be
> assigned to current VM?
> 
> Thanks,
> Chen

I don't believe this works at the moment, and
I'd expect kernel to prevent this,
so we should not rely on userspace code for this.
Alex, could you comment please?


> >>  static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >>  {
> >>      PCIDevice *pci_dev = (PCIDevice *)qdev;
> >>@@ -1878,6 +1910,14 @@ static void pci_qdev_realize(DeviceState *qdev, Error
> >>**errp)
> >>          pci_qdev_unrealize(DEVICE(pci_dev), NULL);
> >>          return;
> >>      }
> >>+
> >>+    if (DEVICE(pci_dev)->hotplugged &&
> >>+        pci_get_function_0(pci_dev) == pci_dev &&
> >>+        pci_bus_check_devices(bus)) {
> >>+        error_setg(errp, "failed to hotplug function 0");
> >>+        pci_qdev_unrealize(DEVICE(pci_dev), NULL);
> >>+        return;
> >>+    }
> >>  }
> >>
> >>  static void pci_default_realize(PCIDevice *dev, Error **errp)
> >>@@ -2390,6 +2430,7 @@ static void pci_device_class_init(ObjectClass *klass,
> >>void *data)
> >>      k->bus_type = TYPE_PCI_BUS;
> >>      k->props = pci_props;
> >>      pc->realize = pci_default_realize;
> >>+    pc->is_valid_func = pci_is_valid_function;
> >>  }
> >>
> >>  AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> >>diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>index dedf277..a89580f 100644
> >>--- a/include/hw/pci/pci.h
> >>+++ b/include/hw/pci/pci.h
> >>@@ -191,6 +191,7 @@ typedef struct PCIDeviceClass {
> >>
> >>      void (*realize)(PCIDevice *dev, Error **errp);
> >>      int (*init)(PCIDevice *dev);/* TODO convert to realize() and remove */
> >>+    bool (*is_valid_func)(PCIDevice *dev, PCIBus *bus);
> >>      PCIUnregisterFunc *exit;
> >>      PCIConfigReadFunc *config_read;
> >>      PCIConfigWriteFunc *config_write;
> >>
> >>
> >>
> >>>
> >>>
> >>>>v15-v16:
> >>>>    10/14, 11/14 are new to introduce a reset sequence id to specify the
> >>>>    vfio devices has been reset for that reset. other patches aren't 
> >>>> modified.
> >>>>
> >>>>v14-v15:
> >>>>    1. add device hot reset callback
> >>>>    2. add bus_in_reset for vfio device to avoid multi do host bus reset
> >>>>
> >>>>v13-v14:
> >>>>    1. for multifunction device, requiring all functions enable AER.(9/13)
> >>>>    2. due to all affected functions receive error signal, ignore no
> >>>>       error occurred function. (12/13)
> >>>>
> >>>>v12-v13:
> >>>>    1. since support multifuncion hotplug, here add callback to enable 
> >>>> aer.
> >>>>    2. add pci device pre+post reset for aer host reset.
> >>>>
> >>>>Chen Fan (14):
> >>>>   vfio: extract vfio_get_hot_reset_info as a single function
> >>>>   vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
> >>>>   pcie: modify the capability size assert
> >>>>   vfio: make the 4 bytes aligned for capability size
> >>>>   vfio: add pcie extanded capability support
> >>>>   aer: impove pcie_aer_init to support vfio device
> >>>>   vfio: add aer support for vfio device
> >>>>   vfio: add check host bus reset is support or not
> >>>>   add check reset mechanism when hotplug vfio device
> >>>>   pci: introduce pci bus pre reset
> >>>>   vfio: introduce last reset sequence id
> >>>>   pcie_aer: expose pcie_aer_msg() interface
> >>>>   vfio-pci: pass the aer error to guest
> >>>>   vfio: add 'aer' property to expose aercap
> >>>>
> >>>>  hw/pci-bridge/ioh3420.c            |   2 +-
> >>>>  hw/pci-bridge/xio3130_downstream.c |   2 +-
> >>>>  hw/pci-bridge/xio3130_upstream.c   |   2 +-
> >>>>  hw/pci/pci.c                       |  42 +++
> >>>>  hw/pci/pci_bridge.c                |   3 +
> >>>>  hw/pci/pcie.c                      |   2 +-
> >>>>  hw/pci/pcie_aer.c                  |   6 +-
> >>>>  hw/vfio/pci.c                      | 616 
> >>>> +++++++++++++++++++++++++++++++++----
> >>>>  hw/vfio/pci.h                      |   9 +
> >>>>  include/hw/pci/pci.h               |   1 +
> >>>>  include/hw/pci/pci_bus.h           |   8 +
> >>>>  include/hw/pci/pcie_aer.h          |   3 +-
> >>>>  12 files changed, 624 insertions(+), 72 deletions(-)
> >>>>
> >>>>-- 
> >>>>1.9.3
> >>>>
> >>>>
> >>>.
> >>>
> >>
> >
> >.
> >
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]