qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region
Date: Tue, 17 Jan 2017 08:46:04 -0700

On Tue, 17 Jan 2017 22:00:00 +0800
Peter Xu <address@hidden> wrote:

> On Mon, Jan 16, 2017 at 12:53:57PM -0700, Alex Williamson wrote:
> > On Fri, 13 Jan 2017 11:06:39 +0800
> > Peter Xu <address@hidden> wrote:
> >   
> > > This is preparation work to finally enabled dynamic switching ON/OFF for
> > > VT-d protection. The old VT-d codes is using static IOMMU address space,
> > > and that won't satisfy vfio-pci device listeners.
> > > 
> > > Let me explain.
> > > 
> > > vfio-pci devices depend on the memory region listener and IOMMU replay
> > > mechanism to make sure the device mapping is coherent with the guest
> > > even if there are domain switches. And there are two kinds of domain
> > > switches:
> > > 
> > >   (1) switch from domain A -> B
> > >   (2) switch from domain A -> no domain (e.g., turn DMAR off)
> > > 
> > > Case (1) is handled by the context entry invalidation handling by the
> > > VT-d replay logic. What the replay function should do here is to replay
> > > the existing page mappings in domain B.  
> > 
> > There's really 2 steps here, right?  Invalidate A, replay B.  I think
> > the code handles this, but I want to make sure.  We don't want to end
> > up with a superset of both A & B.  
> 
> First of all, this discussion should be beyond this patch's scope,
> since this patch is currently only handling the case when guest
> disables DMAR in general.
> 
> Then, my understanding for above question: when we do A -> B domain
> switch, guest will not send specific context entry invalidations for
> A, but will for sure send one when context entry is ready for B. In
> that sense, IMO we don't have a clear "two steps", only one, which is
> the latter "replay B". We do correct unmap based on the PSIs
> (page-selective invalidations) of A when guest unmaps the pages in A.
> 
> So, for the use case of nested device assignment (which is the goal of
> this series for now):
> 
> - L1 guest put device D1,D2,... of L2 guest into domain A
> - L1 guest map the L2 memory into L1 address space (L2GPA -> L1GPA)
> - ... (L2 guest runs, until it stops running)
> - L1 guest unmap all the pages in domain A
> - L1 guest move device D1,D2,... of L2 guest outside domain A
> 
> This series should work for above, since before any device leaves its
> domain, the domain will be clean and without unmapped pages.
> 
> However, if we have the following scenario (which I don't know whether
> this's achievable):
> 
> - guest iommu domain A has device D1, D2
> - guest iommu domain B has device D3
> - move device D2 from domain A into B
> 
> Here when D2 move from A to B, IIUC our current Linux IOMMU driver
> code will not send any PSI (page-selected invalidations) for D2 or
> domain A because domain A still has device in it, guest should only
> send a context entry invalidation for device D2, telling that D2 has
> switched to domain B. In that case, I am not sure whether current
> series can work properly, and IMHO we may need to have the domain
> knowledge in VT-d emulation code (while we don't have it yet) in the
> future to further support this kind of domain switches.

This is a serious issue that needs to be resolved.  The context entry
invalidation when D2 is switched from A->B must unmap anything from
domain A before the replay of domain B.  Your example is easily
achieved, for instance what if domain A is the SI (static identity)
domain for the L1 guest, domain B is the device assignment domain for
the L2 guest with current device D3.  The user hot adds device D2 into
the L2 guest moving it from the L1 SI domain to the device assignment
domain.  vfio will not override existing mappings on replay, it will
error, giving the L2 guest a device with access to the static identity
mappings of the L1 host.  This isn't acceptable.
 
> > On the invalidation, a future optimization when disabling an entire
> > memory region might also be to invalidate the entire range at once
> > rather than each individual mapping within the range, which I think is
> > what happens now, right?  
> 
> Right. IIUC this can be an enhancement to current page walk logic - we
> can coalesce continuous IOTLB with same property and notify only once
> for these coalesced entries.
> 
> Noted in my todo list.

A context entry invalidation as in the example above might make use of
this to skip any sort of page walk logic, simply invalidate the entire
address space.

> >   
> > > However for case (2), we don't want to replay any domain mappings - we
> > > just need the default GPA->HPA mappings (the address_space_memory
> > > mapping). And this patch helps on case (2) to build up the mapping
> > > automatically by leveraging the vfio-pci memory listeners.  
> > 
> > Have you thought about using this address space switching to emulate
> > ecap.PT?  ie. advertise hardware based passthrough so that the guest
> > doesn't need to waste pagetable entries for a direct mapped, static
> > identity domain.  
> 
> Kind of. Currently we still don't have iommu=pt for the emulated code.
> We can achieve that by leveraging this patch.

Well, we have iommu=pt, but the L1 guest will implement this as a fully
populated SI domain rather than as a bit in the context entry to do
hardware direct translation.  Given the mapping overhead through vfio,
the L1 guest will always want to use iommu=pt as dynamic mapping
performance is going to be horrid.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]