qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RESEND PATCH 2/6] memory: introduce AddressSpaceOps an


From: David Gibson
Subject: Re: [Qemu-devel] [RESEND PATCH 2/6] memory: introduce AddressSpaceOps and IOMMUObject
Date: Mon, 18 Dec 2017 17:30:55 +1100
User-agent: Mutt/1.9.1 (2017-09-22)

On Tue, Nov 14, 2017 at 09:53:07AM +0100, Auger Eric wrote:
> Hi Yi L,
> 
> On 13/11/2017 10:58, Liu, Yi L wrote:
> > On Mon, Nov 13, 2017 at 04:56:01PM +1100, David Gibson wrote:
> >> On Fri, Nov 03, 2017 at 08:01:52PM +0800, Liu, Yi L wrote:
> >>> From: Peter Xu <address@hidden>
> >>>
> >>> AddressSpaceOps is similar to MemoryRegionOps, it's just for address
> >>> spaces to store arch-specific hooks.
> >>>
> >>> The first hook I would like to introduce is iommu_get(). Return an
> >>> IOMMUObject behind the AddressSpace.
> >>>
> >>> For systems that have IOMMUs, we will create a special address
> >>> space per device which is different from system default address
> >>> space for it (please refer to pci_device_iommu_address_space()).
> >>> Normally when that happens, there will be one specific IOMMU (or
> >>> say, translation unit) stands right behind that new address space.
> >>>
> >>> This iommu_get() fetches that guy behind the address space. Here,
> >>> the guy is defined as IOMMUObject, which includes a notifier_list
> >>> so far, may extend in future. Along with IOMMUObject, a new iommu
> >>> notifier mechanism is introduced. It would be used for virt-svm.
> >>> Also IOMMUObject can further have a IOMMUObjectOps which is similar
> >>> to MemoryRegionOps. The difference is IOMMUObjectOps is not relied
> >>> on MemoryRegion.
> >>>
> >>> Signed-off-by: Peter Xu <address@hidden>
> >>> Signed-off-by: Liu, Yi L <address@hidden>
> >>
> >> Hi, sorry I didn't reply to the earlier postings of this after our
> >> discussion in China.  I've been sick several times and very busy.
> > 
> > Hi David,
> > 
> > Fully understood. I'll try my best to address your question. Also,
> > feel free to input further questions, anyhow, the more we discuss the
> > better work we done.
> > 
> >> I still don't feel like there's an adequate explanation of exactly
> >> what an IOMMUObject represents.   Obviously it can represent more than
> > 
> > IOMMUObject is aimed to represent the iommu itself. e.g. the iommu
> > specific operations. One of the key purpose of IOMMUObject is to
> > introduce a notifier framework to let iommu emulator to be able to
> > do iommu operations other than MAP/UNMAP. As IOMMU grows more and
> > more feature, MAP/UNMAP is not the only operation iommu emulator needs
> > to deal. e.g. shared virtual memory. So far, as I know AMD/ARM also
> > has it. may correct me on it. As my cover letter mentioned, MR based
> > notifier framework doesn’t work for the newly added IOMMU operations.
> > Like bind guest pasid table pointer to host and propagate guest's
> > iotlb flush to host.
> > 
> >> a single translation window - since that's represented by the
> >> IOMMUMR.  But what exactly do all the MRs - or whatever else - that
> >> are represented by the IOMMUObject have in common, from a functional
> >> point of view.
> > 
> > Let me take virt-SVM as an example. As far as I know, for virt-SVM,
> > the implementation of different vendors are similar. The key design
> > is to have a nested translation(aka. two stage translation). It is to
> > have guest maintain gVA->gPA mapping and hypervisor builds gPA->hPA
> > mapping. Similar to EPT based virt-MMU solution.
> > 
> > In Qemu, gPA->hPA mapping is done through MAP/UNMAP notifier, it can
> > keep going. But for gVA->gPA mapping, only guest knows it, so hypervisor
> > needs to trap specific guest iommu operation and pass the gVA->gPA
> > mapping knowledge to host through a notifier(newly added one). In VT-d,
> > it is called bind guest pasid table to host.
> 
> What I don't get is the PASID table is per extended context entry. I
> understand this latter is indexed by PCI device function. And today MR
> are created per PCIe device if I am not wrong. So why can't we have 1
> new MR notifier dedicated to PASID table passing? My understanding is
> the MR, having a 1-1 correspondence with a PCIe device and thus a
> context could be of right granularity.

Not really.  The MR(s) and AS is created per a group of devices which
will always see the same mappings.  On Intel that's the IOMMU domain.
On PAPR that's a partitionable-endpoint - except that we choose to
only have one PE per guest host bridge (but multiple host bridges is
standard for POWER).

There's a qemu hook to get the right AS for a device, which takes the
devfn as a parameter.  Depending on the host bridge implementation,
though, it won't necessary return a different AS for every device
though.

> Then I understand the only flags
> we currently have are NONE, MAP and UNMAP but couldn't we add a new one
> for PASID TABLE passing? So this is not crystal clear to me why MR
> notifiers are not adapted to PASID table passing.

Right, to me either.  Things get more complicated if both the 1st
level (per PASID) and 2nd level translations (per PCI RID) are visible
to the guest.  Having level 1 owned by the guest and 2nd level owned
by the host is the typical mode of operation, but if we want to model
bare metal machines we do need to handle the case of both.  Similarly,
implementing virt-SVM can't go and break our modelling of
"traditional" non-PASID aware IOMMUs.  Those are not usually present
in x86 guests, although they can be, and they are *always* present for
PAPR guests.

> > Also, for the gVA iotlb flushing, only guest knows it. So hypervisor
> > needs to propagate it to host. Here, MAP/UNMAP is not suitable since
> > this gVA iotlb flush here doesn’t require to modify host iommu
> > translation table.
> I don't really get this argument. IOMMUNotifier just is a notifier that
> is attached to an IOMMU MR and calls a an IOMMUNotify function, right?
> Then the role of the function currently is attached to the currently
> existing flags, MAP, UNMAP. This is not linked to an action on the
> physical IOMMU, right?

Maybe, maybe not.  In the case of emulated devices, it need not touch
the host MMU.  However, for the case of VFIO devices, we need to
mirror mappings in the guest IOMMU to the host IOMMU.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]