qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices


From: Liu, Yi L
Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices
Date: Thu, 19 Jan 2017 06:44:06 +0000

> -----Original Message-----
> From: Qemu-devel [mailto:address@hidden
> On Behalf Of Tian, Kevin
> Sent: Wednesday, January 18, 2017 5:39 PM
> To: Peter Xu <address@hidden>; Jason Wang <address@hidden>
> Cc: Lan, Tianyu <address@hidden>; Raj, Ashok <address@hidden>;
> address@hidden; address@hidden; address@hidden; qemu-
> address@hidden; address@hidden
> Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio
> devices
> 
> > From: Peter Xu [mailto:address@hidden
> > Sent: Wednesday, January 18, 2017 4:46 PM
> >
> > On Wed, Jan 18, 2017 at 04:36:05PM +0800, Jason Wang wrote:
> > >
> > >
> > > On 2017年01月18日 16:11, Peter Xu wrote:
> > > >On Wed, Jan 18, 2017 at 11:10:53AM +0800, Jason Wang wrote:
> > > >>
> > > >>On 2017年01月17日 22:45, Peter Xu wrote:
> > > >>>On Mon, Jan 16, 2017 at 05:54:55PM +0800, Jason Wang wrote:
> > > >>>>On 2017年01月16日 17:18, Peter Xu wrote:
> > > >>>>>>>  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s,
> > > >>>>>>> uint16_t
> > domain_id,
> > > >>>>>>>                                        hwaddr addr, uint8_t
> > > >>>>>>>am)
> > > >>>>>>>  {
> > > >>>>>>>@@ -1222,6 +1251,7 @@ static void
> > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > > >>>>>>>      info.addr = addr;
> > > >>>>>>>      info.mask = ~((1 << am) - 1);
> > > >>>>>>>      g_hash_table_foreach_remove(s->iotlb,
> > > >>>>>>> vtd_hash_remove_by_page,
> > &info);
> > > >>>>>>>+    vtd_iotlb_page_invalidate_notify(s, domain_id, addr,
> > > >>>>>>>+ am);
> > > >>>>>>Is the case of GLOBAL or DSI flush missed, or we don't care it at 
> > > >>>>>>all?
> > > >>>>>IMHO we don't. For device assignment, since we are having CM=1
> > > >>>>>here, we should have explicit page invalidations even if guest
> > > >>>>>sends global/domain invalidations.
> > > >>>>>
> > > >>>>>Thanks,
> > > >>>>>
> > > >>>>>-- peterx
> > > >>>>Is this spec required?
> > > >>>I think not. IMO the spec is very coarse grained on describing
> > > >>>cache mode...
> > > >>>
> > > >>>>Btw, it looks to me that both DSI and GLOBAL are indeed explicit
> > > >>>>flush.
> > > >>>Actually when cache mode is on, it is unclear to me on how we
> > > >>>should treat domain/global invalidations, at least from the spec
> > > >>>(as mentioned earlier). My understanding is that they are not
> > > >>>"explicit flushes", which IMHO should only mean page selective
> > > >>>IOTLB invalidations.
> > > >>Probably not, at least from the view of performance. DSI and
> > > >>global should be more efficient in some cases.
> > > >I agree with you that DSI/GLOBAL flushes are more efficient in some
> > > >way. But IMHO that does not mean these invalidations are "explicit
> > > >invalidations", and I suspect whether cache mode has to coop with it.
> > >
> > > Well, the spec does not forbid DSI/GLOBAL with CM and the driver
> > > codes had used them for almost ten years. I can hardly believe it's wrong.
> >
> > I think we have misunderstanding here. :)
> >
> > I never thought we should not send DSI/GLOBAL invalidations with cache
> > mode. I just think we should not do anything special even if we have
> > cache mode on when we receive these signals.
> >
> > In the spec, "explicit invalidation" is mentioned in the cache mode
> > chapter:
> >
> >     The Caching Mode (CM) field in Capability Register indicates if
> >     the hardware implementation caches not-present or erroneous
> >     translation-structure entries. When the CM field is reported as
> >     Set, any software updates to any remapping structures (including
> >     updates to not-present entries or present entries whose
> >     programming resulted in translation faults) requires explicit
> >     invalidation of the caches.
> >
> > And I thought we were discussion about "what is explicit invalidation"
> > mentioned above.
> 
> Check 6.5.3.1 Implicit Invalidation on Page Requests
> 
>       In addition to the explicit invalidation through invalidation commands
>       (see Section 6.5.1 and Section 6.5.2) or through deferred invalidation
>       messages (see Section 6.5.4), identified above, Page Requests from
>       endpoint devices invalidate entries in the IOTLBs and paging-structure
>       caches.
> 
> My impression is that above indirectly defines invalidation commands (
> PSI/DSI/GLOBAL) as explicit invalidation, because they are explicitly issued 
> by
> driver. Then section 6.5.3.1 further describes implicit invalidations caused 
> by
> other VT-d operations.
> 
> I will check with VT-d spec owner to clarify.
> 
> >
> > >
> > > >
> > > >But here I should add one more thing besides PSI - context entry
> > > >invalidation should be one of "the explicit invalidations" as well,
> > > >which we need to handle just like PSI when cache mode is on.
> > > >
> > > >>>>Just have a quick go through on driver codes and find this
> > > >>>>something interesting in intel_iommu_flush_iotlb_psi():
> > > >>>>
> > > >>>>...
> > > >>>>     /*
> > > >>>>      * Fallback to domain selective flush if no PSI support or the 
> > > >>>> size is
> > > >>>>      * too big.
> > > >>>>      * PSI requires page size to be 2 ^ x, and the base address is
> naturally
> > > >>>>      * aligned to the size
> > > >>>>      */
> > > >>>>     if (!cap_pgsel_inv(iommu->cap) || mask >
> > cap_max_amask_val(iommu->cap))
> > > >>>>         iommu->flush.flush_iotlb(iommu, did, 0, 0,
> > > >>>>                         DMA_TLB_DSI_FLUSH);
> > > >>>>     else
> > > >>>>         iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
> > > >>>>                         DMA_TLB_PSI_FLUSH); ...
> > > >>>I think this is interesting... and I doubt its correctness while
> > > >>>with cache mode enabled.
> > > >>>
> > > >>>If so (sending domain invalidation instead of a big range of page
> > > >>>invalidations), how should we capture which pages are unmapped in
> > > >>>emulated IOMMU?
> > > >>We don't need to track individual pages here, since all pages for
> > > >>a specific domain were unmapped I believe?
> > > >IMHO this might not be the correct behavior.
> > > >
> > > >If we receive one domain specific invalidation, I agree that we
> > > >should invalidate the IOTLB cache for all the devices inside the domain.
> > > >However, when cache mode is on, we should be depending on the PSIs
> > > >to unmap each page (unless we want to unmap the whole address
> > > >space, in this case it's very possible that the guest is just
> > > >unmapping a range, not the entire space). If we convert several
> > > >PSIs into one big DSI, IMHO we will leave those pages
> > > >mapped/unmapped while we should unmap/map them.
> > >
> > > Confused, do you have an example for this? (I fail to understand why
> > > DSI can't work, at least implementation can convert DSI to several
> > > PSIs internally).
> >
> > That's how I understand it. It might be wrong. Btw, could you
> > elaborate a bit on how can we convert a DSI into several PSIs?
> >
> > Thanks,
> 
> If my understanding above is correct, there is nothing wrong with above
> IOMMU driver code - actually it makes sense on bare metal when CM is
> disabled.
> 
> But yes, DSI/GLOBAL is far less efficient than PSI when CM is enabled.
> We rely on cache invalidations to indirectly capture remapping structure 
> change.
> PSI provides accurate info, while DSI/GLOBAL doesn't. To emulate correct
> behavior of DSI/GLOBAL, we have to pretend that the whole address space
> (iova=0, size=agaw) needs to be unmapped (for GLOBAL it further means
> multiple address spaces)
> 
> Though not efficient, it doesn't mean it's wrong since guest driver follows 
> spec.
> We can ask for linux IOMMU driver change (CC Ashok) to not use above
> optimization when cache mode is enabled, but anyway we need emulate correct
> DSI/GLOBAL behavior to follow spec, because:
> 
> - even when driver fix is in place, old version still has this logic;
> 
> - there is still scenario where guest IOMMU driver does want to invalidate the
> whole address space, e.g. when changing context entry. Asking guest driver to
> use PSI for such purpose is another bad thing.

Hi Kevin/Peter/Jason,

I agree we should think DSI/GLOBAL. Herby, I guess there may be a chance to 
ignore
DSI/GLOBAL flush if the following assumption is correct.

It seems like that all DSI/GLOBAL flush would always be after a context entry 
invalidation. 

With this assumption, I remember Peter added memory_replay in context 
invalidation.
This memory_replay would walk guest second-level page table and do map. So the
second-level page table in host should be able to get the latest mapping info. 
Guest
IOMMU driver would issue an DSI/GLOBAL flush after changing context. Since the
mapping info has updated in host, then there is no need to deal this DSI/GLOBAL 
flush.

So gentlemen, pls help judge if the assumption is correct. If it is correct, 
then Peter's patch
may just work without special process against DSI/GLOBAL flush.
 
Regards,
Yi L 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]