qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to


From: Tian, Kevin
Subject: Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM
Date: Thu, 4 Apr 2019 07:58:04 +0000

> From: Peter Xu [mailto:address@hidden
> Sent: Thursday, April 4, 2019 3:00 PM
> 
> On Wed, Apr 03, 2019 at 10:10:35PM +0000, Elijah Shakkour wrote:
> 
> [...]
> 
> > > > > > > > You can also try to enable VT-d device log by appending:
> > > > > > > >
> > > > > > > >   -trace enable="vtd_*"
> > > > > > > >
> > > > > > > > In case it dumps anything useful for you.
> > > > >
> > > > > Here is the relevant dump (dev 01:00.01 is my VF):
> > > > > "
> > > > > vtd_inv_desc_cc_device context invalidate device 01:00.01
> > > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present
> > > > > vtd_switch_address_space Device 01:00.1 switching address space
> > > > > (iommu
> > > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not
> > > > > present vtd_err Detected invalid context entry when trying to sync
> > > > > shadow page table
> > > >
> > > > These lines mean that the guest sent a device invalidation to your VF
> > > > but the IOMMU found that the device context entry is missing.
> > > >
> > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high
> > > > > 0x102 low 0x2d007003 gen 0 -> gen 2 vtd_err_dmar_slpte_resv_error
> > > > > iova
> > > > > 0xf08e7000 level 2 slpte 0x2a54008f7
> > > >
> > > > This line should not exist in latest QEMU.  Are you sure you're using
> > > > the latest QEMU?
> > >
> > > I moved now to QEMU 4.0 RC2.
> > > This is the what I get now:
> > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high 0x102
> low
> > > 0x2f007003 gen 0 -> gen 1
> > > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve non-zero
> > > iova=0xf0d29000, level=0x2slpte=0x29f6008f7) vtd_fault_disabled Fault
> > > processing disabled for context entry
> > > qemu-system-x86_64: vtd_iommu_translate: detected translation failure
> > > (dev=01:00:01, iova=0xf0d29000) Unassigned mem read
> 00000000f0d29000
> > >
> > > I'm not familiar with vIOMMU registers, but I noticed that I must report
> > > snoop control support to Hyper-V (i.e. bit 7 in extended capability 
> > > register
> of
> > > vIOMMU) in-order to satisfy IOMMU support for SRIOV.
> > > vIOMMU.ecap before    0xf00f5e
> > > vIOMMU.ecap after       0xf00fde
> > > But I see that vIOMMU doesn't really support snoop control.
> > > Could this be the problem that fails IOVA range check in this function
> > > vtd_iova_range_check()?
> >
> > Sorry, I meant the SLPTE reserved non-zero check failure in
> vtd_slpte_nonzero_rsvd()
> > And NOT IOVA range check failure (since range check didn't fail)
> 
> Probably.  Currently VT-d emulation does not support snooping control,
> and if you modify that ecap only you probably will encounter this
> problem because then the guest kernel will setup the SNP bit in the
> IOMMU page table entries which will violate the reserved bits in the
> emulation code then you can see these errors.
> 
> Now talking about implementing the Snoop Control for Intel IOMMU for
> real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
> clear on what does the "snooping" mean and what we need to do as an
> emulator. I'm quotting from spec:
> 
>   "Snoop behavior for a memory access (to a translation structure
>   entry or access to the mapped page) specifies if the access is
>   coherent (snoops the processor caches) or not."
> 
> If it is only a capability showing that whether the hardware is
> capable of snooping processor caches, then I don't think we need to do
> much here as an emulator of VT-d simply because when we access the
> data we're still from the processor's side (because we're emulating
> the IOMMU behavior only) so the cache should always been coherent from
> the POV of guest vCPUs, just like how the processors provide cache
> coherence between two cores (so IMHO here the VT-d emulation code can
> be run on one core/thread, and the vcpu which runs the guest iommu
> driver can be run on another core/thread).  If so, maybe we can simply
> declare support of that but we at least also need to remove the SNP
> bit from vtd_paging_entry_rsvd_field[] array to reflect that we
> understand that bit.
> 
> CCing Alex and Kevin to see whether I'm misunderstanding or in case of
> any further input on the snooping support.
> 

for software DMA yes snoop is guaranteed since it's just CPU access.

However for VFIO device i.e. hardware DMA, snoop should be reported
based on physical IOMMU capability. It's fine to report no snoop control 
on vIOMMU (current state) even when it's physically supported. It just 
results that L1 VMM must favor guest cache attributes instead of forcing 
WB in L1 EPT when doing nested passthrough. However it's incorrect to 
report snoop control on vIOMMU when physically it's not supported, 
otherwise L1 VMM may force WB in L1 EPT and enable snoop field in
vIOMMU 2nd level PTE with assumption that hardware snoop is guaranteed
(however it isn't). Then it becomes a correctness issue.

The thing is a bit tricky regarding to two VFIO devices which are under
two pIOMMUs with one supporting snoop and the other doesn't, which
leaves us two options:

1). create one vIOMMU without snoop control. Current state and safe.
2). create two vIOMMUs with one supporting snoop and the other
doesn't. Then report VFIO device to vIOMMU based on matching
snoop capability. match hardware topology but adds extra config
burden and footprint. 

Thanks,
Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]