qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to


From: Elijah Shakkour
Subject: Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM
Date: Sun, 7 Apr 2019 13:47:11 +0000


> -----Original Message-----
> From: Tian, Kevin <address@hidden>
> Sent: Thursday, April 4, 2019 10:58 AM
> To: Peter Xu <address@hidden>; Elijah Shakkour
> <address@hidden>
> Cc: Knut Omang <address@hidden>; Michael S. Tsirkin
> <address@hidden>; Alex Williamson <address@hidden>;
> Marcel Apfelbaum <address@hidden>; Stefan Hajnoczi
> <address@hidden>; address@hidden
> Subject: RE: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> > From: Peter Xu [mailto:address@hidden
> > Sent: Thursday, April 4, 2019 3:00 PM
> >
> > On Wed, Apr 03, 2019 at 10:10:35PM +0000, Elijah Shakkour wrote:
> >
> > [...]
> >
> > > > > > > > > You can also try to enable VT-d device log by appending:
> > > > > > > > >
> > > > > > > > >   -trace enable="vtd_*"
> > > > > > > > >
> > > > > > > > > In case it dumps anything useful for you.
> > > > > >
> > > > > > Here is the relevant dump (dev 01:00.01 is my VF):
> > > > > > "
> > > > > > vtd_inv_desc_cc_device context invalidate device 01:00.01
> > > > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present
> > > > > > vtd_switch_address_space Device 01:00.1 switching address
> > > > > > space (iommu
> > > > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not
> > > > > > present vtd_err Detected invalid context entry when trying to
> > > > > > sync shadow page table
> > > > >
> > > > > These lines mean that the guest sent a device invalidation to
> > > > > your VF but the IOMMU found that the device context entry is
> missing.
> > > > >
> > > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1
> > > > > > high
> > > > > > 0x102 low 0x2d007003 gen 0 -> gen 2
> > > > > > vtd_err_dmar_slpte_resv_error iova
> > > > > > 0xf08e7000 level 2 slpte 0x2a54008f7
> > > > >
> > > > > This line should not exist in latest QEMU.  Are you sure you're
> > > > > using the latest QEMU?
> > > >
> > > > I moved now to QEMU 4.0 RC2.
> > > > This is the what I get now:
> > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high
> > > > 0x102
> > low
> > > > 0x2f007003 gen 0 -> gen 1
> > > > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve
> > > > non-zero iova=0xf0d29000, level=0x2slpte=0x29f6008f7)
> > > > vtd_fault_disabled Fault processing disabled for context entry
> > > > qemu-system-x86_64: vtd_iommu_translate: detected translation
> > > > failure (dev=01:00:01, iova=0xf0d29000) Unassigned mem read
> > 00000000f0d29000
> > > >
> > > > I'm not familiar with vIOMMU registers, but I noticed that I must
> > > > report snoop control support to Hyper-V (i.e. bit 7 in extended
> > > > capability register
> > of
> > > > vIOMMU) in-order to satisfy IOMMU support for SRIOV.
> > > > vIOMMU.ecap before    0xf00f5e
> > > > vIOMMU.ecap after       0xf00fde
> > > > But I see that vIOMMU doesn't really support snoop control.
> > > > Could this be the problem that fails IOVA range check in this
> > > > function vtd_iova_range_check()?
> > >
> > > Sorry, I meant the SLPTE reserved non-zero check failure in
> > vtd_slpte_nonzero_rsvd()
> > > And NOT IOVA range check failure (since range check didn't fail)
> >
> > Probably.  Currently VT-d emulation does not support snooping control,
> > and if you modify that ecap only you probably will encounter this
> > problem because then the guest kernel will setup the SNP bit in the
> > IOMMU page table entries which will violate the reserved bits in the
> > emulation code then you can see these errors.
> >
> > Now talking about implementing the Snoop Control for Intel IOMMU for
> > real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
> > clear on what does the "snooping" mean and what we need to do as an
> > emulator. I'm quotting from spec:
> >
> >   "Snoop behavior for a memory access (to a translation structure
> >   entry or access to the mapped page) specifies if the access is
> >   coherent (snoops the processor caches) or not."
> >
> > If it is only a capability showing that whether the hardware is
> > capable of snooping processor caches, then I don't think we need to do
> > much here as an emulator of VT-d simply because when we access the
> > data we're still from the processor's side (because we're emulating
> > the IOMMU behavior only) so the cache should always been coherent from
> > the POV of guest vCPUs, just like how the processors provide cache
> > coherence between two cores (so IMHO here the VT-d emulation code can
> > be run on one core/thread, and the vcpu which runs the guest iommu
> > driver can be run on another core/thread).  If so, maybe we can simply
> > declare support of that but we at least also need to remove the SNP
> > bit from vtd_paging_entry_rsvd_field[] array to reflect that we
> > understand that bit.
> >
> > CCing Alex and Kevin to see whether I'm misunderstanding or in case of
> > any further input on the snooping support.
> >
> 
> for software DMA yes snoop is guaranteed since it's just CPU access.
> 
> However for VFIO device i.e. hardware DMA, snoop should be reported
> based on physical IOMMU capability. It's fine to report no snoop control on
> vIOMMU (current state) even when it's physically supported. It just results
> that L1 VMM must favor guest cache attributes instead of forcing WB in L1
> EPT when doing nested passthrough. However it's incorrect to report snoop
> control on vIOMMU when physically it's not supported, otherwise L1 VMM
> may force WB in L1 EPT and enable snoop field in vIOMMU 2nd level PTE with
> assumption that hardware snoop is guaranteed (however it isn't). Then it
> becomes a correctness issue.
> 

If my device is fully emulated, can I ignore the SNP bit in the SLPTE? What is 
the cost of ignoring it in such a case? What could go wrong?
(I tried to ignore it and it seems that translations work for me now).

> The thing is a bit tricky regarding to two VFIO devices which are under two
> pIOMMUs with one supporting snoop and the other doesn't, which leaves us
> two options:
> 
> 1). create one vIOMMU without snoop control. Current state and safe.
> 2). create two vIOMMUs with one supporting snoop and the other doesn't.
> Then report VFIO device to vIOMMU based on matching snoop capability.
> match hardware topology but adds extra config burden and footprint.
> 
> Thanks,
> Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]