qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver


From: Neo Jia
Subject: Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
Date: Mon, 15 Aug 2016 16:23:57 -0700
User-agent: Mutt/1.6.2 (2016-07-01)

On Mon, Aug 15, 2016 at 04:52:39PM -0600, Alex Williamson wrote:
> On Mon, 15 Aug 2016 15:09:30 -0700
> Neo Jia <address@hidden> wrote:
> 
> > On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote:
> > > On Mon, 15 Aug 2016 09:38:52 +0000
> > > "Tian, Kevin" <address@hidden> wrote:
> > >   
> > > > > From: Kirti Wankhede [mailto:address@hidden
> > > > > Sent: Saturday, August 13, 2016 8:37 AM
> > > > > 
> > > > > 
> > > > > 
> > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote:    
> > > > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > > > Kirti Wankhede <address@hidden> wrote:
> > > > > >    
> > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:    
> > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > > > >>> Kirti Wankhede <address@hidden> wrote:
> > > > > >>>
> > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() 
> > > > > >>> to get
> > > > > >>> the parent_device so it can call the start and stop ops callbacks
> > > > > >>> respectively.  That seems to imply that all of instances for a 
> > > > > >>> given
> > > > > >>> uuid come from the same parent_device.  Where is that enforced?  
> > > > > >>> I'm
> > > > > >>> still having a hard time buying into the uuid+instance plan when 
> > > > > >>> it
> > > > > >>> seems like each mdev_device should have an actual unique uuid.
> > > > > >>> Userspace tools can figure out which uuids to start for a given 
> > > > > >>> user, I
> > > > > >>> don't see much value in collecting them to instances within a 
> > > > > >>> uuid.
> > > > > >>>    
> > > > > >>
> > > > > >> Initially we started discussion with VM_UUID+instance suggestion, 
> > > > > >> where
> > > > > >> instance was introduced to support multiple devices in a VM.    
> > > > > >
> > > > > > The instance number was never required in order to support multiple
> > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease 
> > > > > > NVIDIA
> > > > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > > > devices with that same UUID and therefore associate udev events to a
> > > > > > given VM.  Only then does an instance number become necessary since 
> > > > > > the
> > > > > > UUID needs to be static for a vGPUs within a VM.  This has always 
> > > > > > felt
> > > > > > like a very dodgy solution when we should probably just be querying
> > > > > > libvirt to give us a device to VM association.    
> > > > 
> > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > > > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > > > tightly.
> > > > 
> > > > I'm OK to give enough flexibility for various upper level management 
> > > > stacks,
> > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > > > option where either UUID or STRING could be optional? Upper management 
> > > > stack can choose its own policy to identify a mdev:
> > > > 
> > > > a) $UUID only, so each mdev is allocated with a unique UUID
> > > > b) STRING only, which could be an index (0, 1, 2, ...), or any 
> > > > combination 
> > > > (vgpu0, vgpu1, etc.)
> > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > > > a numeric index
> > > >   
> > > > > >    
> > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit 
> > > > > >> resources of
> > > > > >> all instances of similar devices assigned to VM.
> > > > > >>
> > > > > >> For example, to create 2 devices:
> > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > > > >>
> > > > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > > > >>
> > > > > >> Commit resources for above devices with single 'mdev_start':
> > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > > > >>
> > > > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > > > >> 'instance', so 'mdev_create' would look like:
> > > > > >>
> > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > > > >>
> > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 
> > > > > >> 'params'
> > > > > >> would be vendor specific parameters.
> > > > > >>
> > > > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > > > >>
> > > > > >> Then 'mdev_start' would be:
> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > > > >>
> > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > > > >>
> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop    
> > > > > >
> > > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > > that we can't use the mediated device being opened and released to
> > > > > > automatically signal to the backend vendor driver to commit and 
> > > > > > release
> > > > > > resources? I don't fully understand why userspace needs this 
> > > > > > interface.    
> > > > 
> > > > There is a meaningful use of start/stop interface, as required in live
> > > > migration support. Such interface allows vendor driver to quiescent 
> > > > mdev activity on source device before mdev hardware state is snapshot,
> > > > and then resume mdev activity on dest device after its state is 
> > > > recovered.
> > > > Intel has implemented experimental live migration support in KVMGT (soon
> > > > to release), based on above two interfaces (plus another two to get/set
> > > > mdev state).  
> > > 
> > > Ok, that's actually an interesting use case for start/stop.
> > >   
> > > > > 
> > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM 
> > > > > in
> > > > > one shot to commit resources of all vGPUs assigned to a VM along with
> > > > > some common resources.    
> > > > 
> > > > Kirti, can you elaborate the background about above one-shot commit
> > > > requirement? It's hard to understand such a requirement.   
> > > 
> > > Agree, I know NVIDIA isn't planning to support hotplug initially, but
> > > this seems like we're precluding hotplug from the design.  I don't
> > > understand what's driving this one-shot requirement.  
> > 
> > Hi Alex,
> > 
> > The requirement here is based on how our internal vGPU device model 
> > designed and
> > with this we are able to pre-allocate resources required for multiple 
> > virtual
> > devices within same domain.
> > 
> > And I don't think this syntax will stop us from supporting hotplug at all.
> > 
> > For example, you can always create a virtual mdev and then do
> > 
> > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > 
> > then use QEMU monitor to add the device for hotplug.
> 
> Hi Neo,
> 
> I'm still not understanding the advantage you get from the "one-shot"
> approach then if we can always add more mdevs by starting them later.
> Are the hotplug mdevs somehow less capable than the initial set of
> mdevs added in a single shot?  If the initial set is allocated
> from the "same domain", does that give them some sort of hardware
> locality/resource benefit?  Thanks,

Hi Alex,

At least we will not able to guarantee some special hardware resource for the
hotplug devices.

So from our point of view, we also have dedicated internal SW entity to manage 
all
virtual devices for each "domain/virtual machine", and such SW entity will be 
created 
at virtual device start time.

This is why we need to do this in one-shot to support multiple virtual device
per VM case.

Thanks,
Neo

> 
> Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]