qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] Mediated device Core driver


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH 1/3] Mediated device Core driver
Date: Fri, 24 Jun 2016 13:40:00 -0600

On Fri, 24 Jun 2016 23:24:58 +0530
Kirti Wankhede <address@hidden> wrote:

> Alex,
> 
> Thanks for taking closer look. I'll incorporate all the nits you suggested.
> 
> On 6/22/2016 3:00 AM, Alex Williamson wrote:
> > On Mon, 20 Jun 2016 22:01:46 +0530
> > Kirti Wankhede <address@hidden> wrote:
> >  
> ...
> >> +
> >> +config MDEV
> >> +    tristate "Mediated device driver framework"
> >> +    depends on VFIO
> >> +    default n
> >> +    help
> >> +        MDEV provides a framework to virtualize device without SR-IOV cap
> >> +        See Documentation/mdev.txt for more details.  
> > 
> > Documentation pointer still doesn't exist.  Perhaps this file would be
> > a more appropriate place than the commit log for some of the
> > information above.
> >   
> 
> Sure, I'll add these details to documentation.
> 
> > Every time I review this I'm struggling to figure out why this isn't
> > VFIO_MDEV since it's really tied to vfio and difficult to evaluate it
> > as some sort of standalone mediated device interface.  I don't know
> > the answer, but it always strikes me as a discontinuity.
> >   
> 
> Ok. I'll change to VFIO_MDEV
> 
> >> +
> >> +static struct mdev_device *find_mdev_device(struct parent_device *parent,
> >> +                                      uuid_le uuid, int instance)
> >> +{
> >> +  struct mdev_device *mdev = NULL, *p;
> >> +
> >> +  list_for_each_entry(p, &parent->mdev_list, next) {
> >> +          if ((uuid_le_cmp(p->uuid, uuid) == 0) &&
> >> +              (p->instance == instance)) {
> >> +                  mdev = p;  
> > 
> > Locking here is still broken, the callers are create and destroy, which
> > can still race each other and themselves.
> >  
> 
> Fixed it.
> 
> >> +
> >> +static int mdev_device_create_ops(struct mdev_device *mdev, char 
> >> *mdev_params)
> >> +{
> >> +  struct parent_device *parent = mdev->parent;
> >> +  int ret;
> >> +
> >> +  mutex_lock(&parent->ops_lock);
> >> +  if (parent->ops->create) {  
> > 
> > How would a parent_device without ops->create or ops->destroy useful?
> > Perhaps mdev_register_driver() should enforce required ops.  mdev.h
> > should at least document which ops are optional if they really are
> > optional.  
> 
> Makes sense, adding check in mdev_register_driver() to mandate create
> and destroy in ops. I'll also update the comments in mdev.h for
> mandatory and optional ops.
> 
> >   
> >> +          ret = parent->ops->create(mdev->dev.parent, mdev->uuid,
> >> +                                  mdev->instance, mdev_params);
> >> +          if (ret)
> >> +                  goto create_ops_err;
> >> +  }
> >> +
> >> +  ret = mdev_add_attribute_group(&mdev->dev,
> >> +                                  parent->ops->mdev_attr_groups);  
> > 
> > An error here seems to put us in a bad place, the device is created but
> > the attributes are broken, is it the caller's responsibility to
> > destroy?  Seems like we need a cleanup if this fails.
> >   
> 
> Right, adding cleanup here.
> 
> >> +create_ops_err:
> >> +  mutex_unlock(&parent->ops_lock);  
> > 
> > It seems like ops_lock isn't used so much as a lock as a serialization
> > mechanism.  Why?  Where is this serialization per parent device
> > documented?
> >  
> 
> parent->ops_lock is to serialize parent device callbacks to vendor
> driver, i.e supported_config(), create() and destroy().
> mdev->ops_lock is to serialize mediated device related callbacks to
> vendor driver, i.e. start(), stop(), read(), write(), set_irqs(),
> get_region_info(), validate_map_request().
> Its not documented, I'll add comments to mdev.h about these locks.

Should it be the mediated driver core's responsibility to do this?  If
a given mediated driver wants to serialize on their own, they can do
that, but I don't see why we would impose that on every mediated driver.

> >> +  return ret;
> >> +}
> >> +
> >> +static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
> >> +{
> >> +  struct parent_device *parent = mdev->parent;
> >> +  int ret = 0;
> >> +
> >> +  /*
> >> +   * If vendor driver doesn't return success that means vendor
> >> +   * driver doesn't support hot-unplug
> >> +   */
> >> +  mutex_lock(&parent->ops_lock);
> >> +  if (parent->ops->destroy) {
> >> +          ret = parent->ops->destroy(parent->dev, mdev->uuid,
> >> +                                     mdev->instance);
> >> +          if (ret && !force) {  
> > 
> > It seems this is not so much a 'force' but an ignore errors, we never
> > actually force the mdev driver to destroy the device... which makes me
> > wonder if there are leaks there.
> >   
> 
> Consider a case where VM is running or in teardown path and parent
> device in unbound from vendor driver, then vendor driver would call
> mdev_unregister_device() from its remove() call. Even if
> parent->ops->destroy() returns error that could also mean that
> hot-unplug is not supported but we have to destroy mdev device. remove()
> call doesn't honor error returned. In that case its a force removal.
> 
> >> +
> >> +/*
> >> + * mdev_unregister_device : Unregister a parent device
> >> + * @dev: device structure representing parent device.
> >> + *
> >> + * Remove device from list of registered parent devices. Give a chance to 
> >> free
> >> + * existing mediated devices for given device.
> >> + */
> >> +
> >> +void mdev_unregister_device(struct device *dev)
> >> +{
> >> +  struct parent_device *parent;
> >> +  struct mdev_device *mdev, *n;
> >> +  int ret;
> >> +
> >> +  mutex_lock(&parent_devices.list_lock);
> >> +  parent = find_parent_device(dev);
> >> +
> >> +  if (!parent) {
> >> +          mutex_unlock(&parent_devices.list_lock);
> >> +          return;
> >> +  }
> >> +  dev_info(dev, "MDEV: Unregistering\n");
> >> +
> >> +  /*
> >> +   * Remove parent from the list and remove create and destroy sysfs
> >> +   * files so that no new mediated device could be created for this parent
> >> +   */
> >> +  list_del(&parent->next);
> >> +  mdev_remove_sysfs_files(dev);
> >> +  mutex_unlock(&parent_devices.list_lock);
> >> +
> >> +  mutex_lock(&parent->ops_lock);
> >> +  mdev_remove_attribute_group(dev,
> >> +                              parent->ops->dev_attr_groups);
> >> +  mutex_unlock(&parent->ops_lock);
> >> +
> >> +  mutex_lock(&parent->mdev_list_lock);
> >> +  list_for_each_entry_safe(mdev, n, &parent->mdev_list, next) {
> >> +          mdev_device_destroy_ops(mdev, true);
> >> +          list_del(&mdev->next);
> >> +          mdev_put_device(mdev);
> >> +  }
> >> +  mutex_unlock(&parent->mdev_list_lock);
> >> +
> >> +  do {
> >> +          ret = wait_event_interruptible_timeout(parent->release_done,
> >> +                          list_empty(&parent->mdev_list), HZ * 10);  
> > 
> > But we do a list_del for each mdev in mdev_list above, how could the
> > list not be empty here?  I think you're trying to wait for all the mdev
> > devices to be released, but I don't think this does that.  Isn't the
> > list empty regardless?
> >  
> 
> Right, I do want to wait for all the mdev devices to be released. Moving
> list_del(&mdev->next) from the above for loop to mdev_release_device()
> so that mdev will be removed from list on last mdev_put_device().
> 
> 
> >> +          if (ret == -ERESTARTSYS) {
> >> +                  dev_warn(dev, "Mediated devices are in use, task"
> >> +                                " \"%s\" (%d) "
> >> +                                "blocked until all are released",
> >> +                                current->comm, task_pid_nr(current));
> >> +          }
> >> +  } while (ret <= 0);
> >> +
> >> +  mdev_put_parent(parent);
> >> +}
> >> +EXPORT_SYMBOL(mdev_unregister_device);
> >> +
> >> +  
> 
> 
> >> +int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t 
> >> instance,
> >> +                 char *mdev_params)
> >> +{
> >> +  int ret;
> >> +  struct mdev_device *mdev;
> >> +  struct parent_device *parent;
> >> +
> >> +  parent = mdev_get_parent_by_dev(dev);
> >> +  if (!parent)
> >> +          return -EINVAL;
> >> +
> >> +  /* Check for duplicate */
> >> +  mdev = find_mdev_device(parent, uuid, instance);  
> > 
> > But this doesn't actually prevent duplicates because we we're not
> > holding any lock the guarantee that another racing process doesn't
> > create the same {uuid,instance} between where we check and the below
> > list_add.
> >   
> 
> Oops I missed this race condition. Moving
> mutex_lock(&parent->mdev_list_lock);
> before find_mdev_device() in mdev_device_create() and
> mdev_device_destroy().
> 
> 
> >> +
> >> +int  mdev_device_create(struct device *dev, uuid_le uuid, uint32_t 
> >> instance,
> >> +                  char *mdev_params);
> >> +int  mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t 
> >> instance);
> >> +void mdev_device_supported_config(struct device *dev, char *str);
> >> +int  mdev_device_start(uuid_le uuid);
> >> +int  mdev_device_shutdown(uuid_le uuid);  
> > 
> > nit, stop is start as startup is to shutdown.  IOW, should this be
> > mdev_device_stop()?
> >  
> 
> Ok. Renaming mdev_device_shutdown() to mdev_device_stop().
> 
> 
> >> +
> >> +struct pci_region_info {
> >> +  uint64_t start;
> >> +  uint64_t size;
> >> +  uint32_t flags;         /* VFIO region info flags */
> >> +};
> >> +
> >> +enum mdev_emul_space {
> >> +  EMUL_CONFIG_SPACE,      /* PCI configuration space */
> >> +  EMUL_IO,                /* I/O register space */
> >> +  EMUL_MMIO               /* Memory-mapped I/O space */
> >> +};  
> > 
> > 
> > I'm still confused why this is needed, perhaps a description here would
> > be useful so I can stop asking.  Clearly config space is PCI only, so
> > it's strange to have it in the common code.  Everyone not on x86 will
> > say I/O space is also strange.  I can't keep it in my head why the
> > read/write offsets aren't sufficient for the driver to figure out what
> > type it is.
> > 
> >  
> 
> Now that VFIO_PCI_OFFSET_* macros are moved to vfio.h which vendor
> driver can also use, above enum could be removed from read/write. But
> again these macros are useful when parent device is PCI device. How
> would non-pci parent device differentiate IO ports and MMIO?

Moving VFIO_PCI_OFFSET_* to vfio.h already worries me, the vfio api
does not impose fixed offsets, it's simply an implementation detail of
vfio-pci.  We should be free to change that whenever we want and not
break userspace.  By moving it to vfio.h and potentially having
external mediated drivers depend on those offset macros, they now become
part of the kABI.  So more and more, I'd prefer that reads/writes/mmaps
get passed directly to the mediated driver, let them define which
offset is which, the core is just a passthrough.  For non-PCI devices,
like platform devices, the indexes are implementation specific, the
user really needs to know how to work with the specific device and how
it defines device mmio to region indexes.
 
> >> +  int     (*get_region_info)(struct mdev_device *vdev, int region_index,
> >> +                           struct pci_region_info *region_info);  
> > 
> > This can't be //pci_//region_info.  How do you intend to support things
> > like sparse mmap capabilities in the user REGION_INFO ioctl when such
> > things are not part of the mediated device API?  Seems like the driver
> > should just return a buffer.
> >  
> 
> If not pci_region_info, can use vfio_region_info here, even to fetch
> sparce mmap capabilities from vendor driver?

Sure, you can use vfio_region_info, then it's just a pointer to a
buffer allocated by the callee and the mediated core is just a
passthrough, which is probably how it should be.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]