qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 5/5] vifo: introduce new VFIO ioctl VFIO_DEVICE_PC


From: Alex Williamson
Subject: Re: [Qemu-devel] [RFC 5/5] vifo: introduce new VFIO ioctl VFIO_DEVICE_PCI_GET_DIRTY_BITMAP
Date: Mon, 10 Jul 2017 13:47:21 -0600

On Fri, 7 Jul 2017 06:40:58 +0000
"Tian, Kevin" <address@hidden> wrote:

> > From: Alex Williamson [mailto:address@hidden
> > Sent: Saturday, July 1, 2017 1:00 AM
> > 
> > On Fri, 30 Jun 2017 05:14:40 +0000
> > "Tian, Kevin" <address@hidden> wrote:
> >   
> > > > From: Alex Williamson [mailto:address@hidden
> > > > Sent: Friday, June 30, 2017 4:57 AM
> > > >
> > > > On Thu, 29 Jun 2017 00:10:59 +0000
> > > > "Tian, Kevin" <address@hidden> wrote:
> > > >  
> > > > > > From: Alex Williamson [mailto:address@hidden
> > > > > > Sent: Thursday, June 29, 2017 12:00 AM
> > > > > > Thanks Kevin.  So really it's not really a dirty bitmap, it's just a
> > > > > > bitmap of pages that the device has access to and may have dirtied.
> > > > > > Don't we have this more generally in the vfio type1 IOMMU backend?  
> > For  
> > > > > > a mediated device, we know all the pages that the vendor driver has
> > > > > > asked to be pinned.  Should we perhaps make this interface on the  
> > vfio  
> > > > > > container rather than the device?  Any mediated device can provide  
> > this  
> > > > > > level of detail without specific vendor support.  If we had DMA page
> > > > > > faulting, this would be the natural place to put it as well, so 
> > > > > > maybe
> > > > > > we should design the interface there to support everything 
> > > > > > similarly.
> > > > > > Thanks,
> > > > > >  
> > > > >
> > > > > That's a nice idea. Just two comments:
> > > > >
> > > > > 1) If some mediated device has its own way to construct true dirty
> > > > > bitmap (not thru DMA page faulting), the interface is better designed
> > > > > to allow that flexibility. Maybe an optional callback if not 
> > > > > registered
> > > > > then use common type1 IOMMU logic otherwise prefers to vendor
> > > > > specific callback  
> > > >
> > > > I'm not sure what that looks like, but I agree with the idea.  Could
> > > > the pages that type1 knows about every be anything other than a
> > > > superset of the dirty pages?  Perhaps a device ioctl to flush unused
> > > > mappings would be sufficient.  
> > >
> > > sorry I didn't quite get your idea here. My understanding is that
> > > type1 is OK as an alternative in case mediated device has no way
> > > to track dirtied pages (as for Intel GPU), so we can use type1 pinned
> > > pages as an indirect way to indicate dirtied pages. But if mediated
> > > device has its own way (e.g. a device private MMU) to track dirty
> > > pages, then we should allow that device to provide dirty bitmap
> > > instead of using type1.  
> > 
> > My thought was that our current mdev iommu interface allows the vendor
> > driver to pin specific pages.  In order for the mdev device to dirty a
> > page, we need for it to be pinned.  Therefore at worst, the set of
> > pages pinned in type1 is the superset of all pages that can potentially
> > be dirtied by the device.  In the worst case, this devolves to all
> > pages mapped through the iommu in the case of direct assigned devices.
> > My assertion is therefore that a device specific dirty page bitmap can
> > only be a subset of the type1 pinned pages.  Therefore if the mdev  
> 
> this assertion is correct.
> 
> > vendor driver can flush any stale pinnings, then the type1 view off
> > pinned pages should match the devices view of the current working set.
> > Then we wouldn't need a device specific dirty bitmap, we'd only need a
> > mechanism to trigger a flush of stale mappings on the device.  
> 
> what's your definition of 'stale pinnings"? take Intel vGPU for example,
> we pin pages when they are mapped to GPU page table and then unpin
> them when unmapped from GPU page table. Based on this policy type1
> view of pinned pages should just match device view.  There is no 'stale' 
> page example which I can think of since all currently-pinned pages must
> be pinned for various read/write purpose. 

Then it seems that the mdev support in the IOMMU already tracks the
minimum page set for Intel vGPUs, it can be a noop in that case.  Are
we missing some ability to identify pages through vendor specific
hooks, or were those hooks never really needed?
 
> but I do realize one limitation of this option now. Given there are
> both physical device and mdev assigned to the same guest, we
> have to pin all pages for physically assigned device. In that case we
> may need away to differentiate which device a given pinned page
> is for. I'm not sure whether it's an easy thing or require lots of change.
> interface wise we don't need additional one since there is still
> unpin request from mdev path. Just internal structure needs to 
> be extended to maintain that ownership info.

We can't migrate with regular assigned device and this model fails in
the right way, assuming that all memory is dirty, which is correct.
Currently we'd need to detach any directly assigned devices, leaving
only the mdev device.  If we had PRI support, then a way to flush
stale mappings and fault in new ones would be a way to reduce the
working set during migration.  I don't really see that per device
tracking helps us, we need to consider the page dirty regardless of
who dirtied it.
 
> > 
> > Otherwise I'm not sure how we cleanly create an interface where the
> > dirty bitmap can either come from the device or the container... but
> > I'd welcome suggestions.  Thanks,
> >   
> 
> My original thought is simple:
> 
> - by default type1 can give dirty bitmap
> - when mdev is created, mdev driver has chance to claim whether
> it would like to track its own dirty bitmap
> - then when Qemu requests dirty bitmap of mdev, vfio check the
> flag to decide whether return type1 or call into mdev driver

What would the device do differently?  The container is the logical
component tracking memory and therefore seems the logical component to
track dirty memory.  Potentially there's even some efficiency that we
can handle the dirty memory for multiple devices via a container
interface.  It also means that we don't have per vendor driver
implementations (and bugs) for dirty bitmaps.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]