qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices
Date: Mon, 14 Sep 2015 14:04:07 +1000
User-agent: Mutt/1.5.23 (2014-03-12)

On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
> > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> > > So far there were 2 limitations enforced on an emulated PHB
> > > regarding VFIO:
> > > 1) only one IOMMU group per IOMMU container was allowed and
> > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> > > 2) only one IOMMU container per PHB was allowed as a group
> > > can only be attached to one container.
> > > 
> > > However these are not really necessary so we are removing them here.
> > > 
> > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> > > not to receive it. Instead of getting a container from a group ID,
> > > this calls ioctl() for every container attached to the PHB address space.
> > > This allows multiple containers on the same PHB which means multiple
> > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> > > to be passed to every container separately which will affect
> > > the performance. For 32bit devices it is still recommended to put
> > > every group to a separate PHB.
> > > 
> > > Since the existing VFIO code is already trying to share a container for
> > > multiple groups, just removing a group id from
> > > the vfio_container_do_ioctl() allows the existing code to share
> > > a container if it is supported by the host kernel.
> > > 
> > > This replaces the check for a group id to be set correctly with
> > > the check that it is not set.
> > > 
> > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> > > members is accessed here.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <address@hidden>
> > > ---
> > >  hw/ppc/spapr_pci.c      | 10 +++++-----
> > >  hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> > >  hw/vfio/common.c        | 22 ++++++----------------
> > >  include/hw/vfio/vfio.h  |  3 +--
> > >  4 files changed, 18 insertions(+), 34 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index 4e289cb..1b7559d 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -810,11 +810,6 @@ static int 
> > > spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> > >      pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices, 
> > > sphb);
> > >  
> > >      if (sphb->vfio_num) {
> > > -        if (sphb->iommugroupid == -1) {
> > > -            error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> > > -            return -1;
> > > -        }
> > > -
> > >          ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> > >          if (ret) {
> > >              error_report("Unable to get DMA32 info from VFIO");
> > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev, 
> > > Error **errp)
> > >      PCIBus *bus;
> > >      uint64_t msi_window_size = 4096;
> > >  
> > > +    if ((sphb->iommugroupid != -1) &&
> > > +        object_dynamic_cast(OBJECT(sphb), 
> > > TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> > > +        error_report("Warning: iommugroupid is deprecated and will be 
> > > ignored");
> > > +    }
> > > +
> > >      if (sphb->index != (uint32_t)-1) {
> > >          hwaddr windows_base;
> > >  
> > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> > > index f94d8a4..48137d5 100644
> > > --- a/hw/ppc/spapr_pci_vfio.c
> > > +++ b/hw/ppc/spapr_pci_vfio.c
> > > @@ -35,7 +35,7 @@ int 
> > > spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> > >      struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> > >      int ret;
> > >  
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as,
> > >                                 VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > >      if (ret) {
> > >          return ret;
> > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> > >          .op    = VFIO_EEH_PE_ENABLE
> > >      };
> > >  
> > > -    vfio_container_ioctl(&sphb->iommu_as,
> > > -                         sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> > > +    vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >  }
> > >  
> > >  int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > >  
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_HW_ERROR;
> > >      }
> > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb, 
> > > int *state)
> > >      int ret;
> > >  
> > >      op.op = VFIO_EEH_PE_GET_STATE;
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int 
> > > option)
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > >  
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_HW_ERROR;
> > >      }
> > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> > >      int ret;
> > >  
> > >      op.op = VFIO_EEH_PE_CONFIGURE;
> > > -    ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > -                               VFIO_EEH_PE_OP, &op);
> > > +    ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > >      if (ret < 0) {
> > >          return RTAS_OUT_PARAM_ERROR;
> > >      }
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 85ee9b0..a00644e 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> > >      close(vbasedev->fd);
> > >  }
> > >  
> > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> > > -                                   int req, void *param)
> > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void 
> > > *param)
> > >  {
> > > -    VFIOGroup *group;
> > >      VFIOContainer *container;
> > >      int ret = -1;
> > > +    VFIOAddressSpace *space = vfio_get_address_space(as);
> > >  
> > > -    group = vfio_get_group(groupid, as);
> > > -    if (!group) {
> > > -        error_report("vfio: group %d not registered", groupid);
> > > -        return ret;
> > > -    }
> > > -
> > > -    container = group->container;
> > > -    if (group->container) {
> > > +    QLIST_FOREACH(container, &space->containers, next) {
> > >          ret = ioctl(container->fd, req, param);
> > >          if (ret < 0) {
> > >              error_report("vfio: failed to ioctl %d to container: ret=%d, 
> > > %s",
> > >                           _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> > > +            return -errno;
> > >          }
> > >      }
> > 
> > This highlights how terrible this ioctl passthrough interface is; what's
> > a caller supposed to do on error?  Here we don't have visibility into
> > the ioctl being called in order to do any unwind on error.  The caller
> > doesn't have the context to unwind only the failed containers.  Is
> > returning errno here really sufficient for anyone to figure out how to
> > proceed or should we just cut our loses and abort()?  What's the least
> > horrible way we can realistically handle a failure here?  Thanks,
> 
> It seemed like I asked this before and it turns out that I did:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
> 
> "Gavin says yes" is not really a supportable solution when I stumbled on
> it again and David doesn't know why it's safe either (nor does it answer
> my question of how does this work).  At a minimum, the reasoning why
> this is safe for the ioctls we currently allow here needs to be spelled
> out in a comment.  We could easily make the mistake of adding another
> ioctl to the passthrough for which those assumptions are not valid.  We
> may even want to abort if it's not one of the ioctls we've evaluated so
> we don't overlook this problem later.  Thanks,

Yeah, you're right.  This is a mess.

What we need to do here is to make vfio_container_ioctl() take a
VFIOContainer object as the name suggests.  The the callers will need
to either use it on a specific container, or iterate across all the
containers in the VFIOAddressSpace as appropriate for the specific
operation.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: pgph5IdtPXdbL.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]