qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
Date: Tue, 16 Aug 2016 15:03:59 -0600

On Mon, 15 Aug 2016 23:13:20 -0700
Neo Jia <address@hidden> wrote:

> On Tue, Aug 16, 2016 at 05:58:54AM +0000, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:address@hidden
> > > Sent: Tuesday, August 16, 2016 1:44 PM
> > > 
> > > On Tue, Aug 16, 2016 at 04:52:30AM +0000, Tian, Kevin wrote:  
> > > > > From: Neo Jia [mailto:address@hidden
> > > > > Sent: Tuesday, August 16, 2016 12:17 PM
> > > > >
> > > > > On Tue, Aug 16, 2016 at 03:50:44AM +0000, Tian, Kevin wrote:  
> > > > > > > From: Neo Jia [mailto:address@hidden
> > > > > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > > > > >
> > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +0000, Tian, Kevin wrote:  
> > > > > > > > > From: Neo Jia [mailto:address@hidden
> > > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM  
> > > > > > >  
> > > > > > > > > > >
> > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices 
> > > > > > > > > > > assigned to a VM  
> > > in  
> > > > > > > > > > > one shot to commit resources of all vGPUs assigned to a 
> > > > > > > > > > > VM along with
> > > > > > > > > > > some common resources.  
> > > > > > > > > >
> > > > > > > > > > Kirti, can you elaborate the background about above 
> > > > > > > > > > one-shot commit
> > > > > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > > > > >
> > > > > > > > > > As I relied in another mail, I really hope start/stop 
> > > > > > > > > > become a per-mdev
> > > > > > > > > > attribute instead of global one, e.g.:
> > > > > > > > > >
> > > > > > > > > > echo "0/1" >  
> > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start  
> > > > > > > > > >
> > > > > > > > > > In many scenario the user space client may only want to 
> > > > > > > > > > talk to mdev
> > > > > > > > > > instance directly, w/o need to contact its parent device. 
> > > > > > > > > > Still take
> > > > > > > > > > live migration for example, I don't think Qemu wants to 
> > > > > > > > > > know parent
> > > > > > > > > > device of assigned mdev instances.  
> > > > > > > > >
> > > > > > > > > Hi Kevin,
> > > > > > > > >
> > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require 
> > > > > > > > > anybody to  
> > > know  
> > > > > > > > > parent device. you can just do
> > > > > > > > >
> > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > > > > >
> > > > > > > > > or
> > > > > > > > >
> > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > > > > >
> > > > > > > > > without knowing the parent device.
> > > > > > > > >  
> > > > > > > >
> > > > > > > > You can look at some existing sysfs example, e.g.:
> > > > > > > >
> > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > > > > >
> > > > > > > > You may also argue why not using a global style:
> > > > > > > >
> > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > > > > >
> > > > > > > > There are many similar examples...  
> > > > > > >
> > > > > > > Hi Kevin,
> > > > > > >
> > > > > > > My response above is to your question about using the global 
> > > > > > > sysfs entry as you
> > > > > > > don't want to have the global path because
> > > > > > >
> > > > > > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > > > > > instances.".
> > > > > > >
> > > > > > > So I just want to confirm with you that (in case you miss):
> > > > > > >
> > > > > > >     /sys/class/mdev/mdev_start | mdev_stop
> > > > > > >
> > > > > > > doesn't require the knowledge of parent device.
> > > > > > >  
> > > > > >
> > > > > > Qemu is just one example, where your explanation of parent device
> > > > > > makes sense but still it's not good for Qemu to populate 
> > > > > > /sys/class/mdev
> > > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > > > > instance, so any mdev attributes touched by Qemu should be put under
> > > > > > that node (e.g. start/stop for live migration usage as I explained 
> > > > > > earlier).  
> > > > >
> > > > > Exactly, qemu is passed with the actual sysfs path.
> > > > >
> > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | 
> > > > > mdev_stop at all.
> > > > >
> > > > > QEMU will take the sysfs path as input:
> > > > >
> > > > >  -device
> > > > >  
> > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > >   
> > > > > d=vgpu0  
> > > >
> > > > no need of passing "id=vgpu0" here. If necessary you can put id as an 
> > > > attribute
> > > > under sysfs mdev node:
> > > >
> > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id  
> > > 
> > > I think we have moved away from the device index based on Alex's comment, 
> > > so the
> > > device path will be:
> > > 
> > >  /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818  
> > 
> > pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
> > as parameter, and then Qemu can access 'id' under that path. You
> > don't need to pass a separate 'id' field. That's my point.

As noted in previous reply, id is a QEMU device parameter, it's not a
property of the mdev device.  I'd also caution against adding arbitrary
sysfs files with the expectation that QEMU will be able to manipulate
them.  I believe one of the benefits of vfio vs legacy KVM device
assignment is that vfio is self contained through the vfio API.  Each
time we expect QEMU to interact with the system via sysfs, that's a new
file that libvirt needs to add access to and a new attack vector where
we need to worry about security.  I still think it's the right idea
from a sysfs perspective to move to per device files, or an "online"
file to replace both, but let's be a little more strategic how we
expect QEMU to interact with the device.

> > > > > As you are saying in live migration, QEMU needs to access "start" and 
> > > > > "stop".  Could  
> > > you  
> > > > > please share more details, such as how QEMU access the "start" and 
> > > > > "stop" sysfs,
> > > > > when and what triggers that?
> > > > >  
> > > >
> > > > A conceptual flow as below:
> > > >
> > > > 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, 
> > > > wait for
> > > > in-flight DMA completed, etc.)
> > > >
> > > > echo "0" > 
> > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start
> > > >
> > > > 2. Save mdev state:
> > > >
> > > > cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state 
> > > > > xxx
> > > >
> > > > 3. xxx will be part of the final VM image and copied to a new machine
> > > >
> > > > 4. Allocate/prepare mdev on the new machine for this VM
> > > >
> > > > 5. Restore mdev state:
> > > >
> > > > cat xxx > 
> > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state
> > > > (might be a different path name)
> > > >
> > > > 6. start mdev on the new parent device:
> > > >
> > > > echo "1" > 
> > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start  

For example, another solution to this would be a device specific vfio
region dedicated to starting and stopping the device an manipulating
state.  A simplistic API might be that the first dword of this region
is reserved for control of the device, the ability to pause and resume
the device, and reads and writes to the remainder of the region collect
and restore device state, respectively.  We'd need to spend more time
thinking about such an API, but something like this could be added
between the kernel and QEMU as a feature that doesn't change the
existing API.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]