qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RESEND PATCH 2/6] memory: introduce AddressSpaceOps an


From: David Gibson
Subject: Re: [Qemu-devel] [RESEND PATCH 2/6] memory: introduce AddressSpaceOps and IOMMUObject
Date: Mon, 18 Dec 2017 16:41:32 +1100
User-agent: Mutt/1.9.1 (2017-09-22)

Sorry I've taken so long to reply, I've been super busy with other
things.

On Tue, Nov 14, 2017 at 11:31:00AM +0800, Peter Xu wrote:
> On Tue, Nov 14, 2017 at 11:59:34AM +1100, David Gibson wrote:
> > On Mon, Nov 13, 2017 at 04:28:45PM +0800, Peter Xu wrote:
> > > On Mon, Nov 13, 2017 at 04:56:01PM +1100, David Gibson wrote:
> > > > On Fri, Nov 03, 2017 at 08:01:52PM +0800, Liu, Yi L wrote:
> > > > > From: Peter Xu <address@hidden>
> > > > > 
> > > > > AddressSpaceOps is similar to MemoryRegionOps, it's just for address
> > > > > spaces to store arch-specific hooks.
> > > > > 
> > > > > The first hook I would like to introduce is iommu_get(). Return an
> > > > > IOMMUObject behind the AddressSpace.
> > > > > 
> > > > > For systems that have IOMMUs, we will create a special address
> > > > > space per device which is different from system default address
> > > > > space for it (please refer to pci_device_iommu_address_space()).
> > > > > Normally when that happens, there will be one specific IOMMU (or
> > > > > say, translation unit) stands right behind that new address space.
> > > > > 
> > > > > This iommu_get() fetches that guy behind the address space. Here,
> > > > > the guy is defined as IOMMUObject, which includes a notifier_list
> > > > > so far, may extend in future. Along with IOMMUObject, a new iommu
> > > > > notifier mechanism is introduced. It would be used for virt-svm.
> > > > > Also IOMMUObject can further have a IOMMUObjectOps which is similar
> > > > > to MemoryRegionOps. The difference is IOMMUObjectOps is not relied
> > > > > on MemoryRegion.
> > > > > 
> > > > > Signed-off-by: Peter Xu <address@hidden>
> > > > > Signed-off-by: Liu, Yi L <address@hidden>
> > > > 
> > > > Hi, sorry I didn't reply to the earlier postings of this after our
> > > > discussion in China.  I've been sick several times and very busy.
> > > > 
> > > > I still don't feel like there's an adequate explanation of exactly
> > > > what an IOMMUObject represents.   Obviously it can represent more than
> > > > a single translation window - since that's represented by the
> > > > IOMMUMR.  But what exactly do all the MRs - or whatever else - that
> > > > are represented by the IOMMUObject have in common, from a functional
> > > > point of view.
> > > > 
> > > > Even understanding the SVM stuff better than I did, I don't really see
> > > > why an AddressSpace is an obvious unit to have an IOMMUObject
> > > > associated with it.
> > > 
> > > Here's what I thought about it: IOMMUObject was planned to be the
> > > abstraction of the hardware translation unit, which is a higher level
> > > of the translated address spaces.  Say, for each PCI device, it can
> > > have its own translated address space.  However for multiple PCI
> > > devices, they can be sharing the same translation unit that handles
> > > the translation requests from different devices.  That's the case for
> > > Intel platforms.  We introduced this IOMMUObject because sometimes we
> > > want to do something with that translation unit rather than a specific
> > > device, in which we need a general IOMMU device handle.
> > 
> > Ok, but what does "hardware translation unit" mean in practice.  The
> > guest neither knows nor cares, which bits of IOMMU translation happen
> > to be included in the same bundle of silicon.  It only cares what the
> > behaviour is.  What behavioural characteristics does a single
> > IOMMUObject have?
> 
> In VT-d (I believe the same to ARM SMMUs), IMHO the special thing is
> that the translation windows (and device address spaces in QEMU) are
> only talking about second level translations, but not first level,
> while virt-svm needs to play with first level translations.  Until
> now, AFAIU we don't really have a good interface for first level
> translations at all (aka. the process address space).

Ok, that explains why you need some kind of different model than the
existing one, but that wasn't really disputed.  It still doesn't say
why separating the unclearly defined IOMMUObject from the IOMMU MR is
a good way of modelling this.

If the issue is 1st vs. 2nd level translation, it really seems you
should be explicitly modelling the different 1st level vs. 2nd level
address spaces, rather than just splitting functions between the MR
and AS level with no clear rationale behind what goes where.

> > > IIRC one issue left over during last time's discussion was that there
> > > could be more complicated IOMMU models. E.g., one device's DMA request
> > > can be translated nestedly by two or multiple IOMMUs, and current
> > > proposal cannot really handle that complicated hierachy.  I'm just
> > > thinking whether we can start from a simple model (say, we don't allow
> > > nested IOMMUs, and actually we don't even allow multiple IOMMUs so
> > > far), then we can evolve from that point in the future.
> > > 
> > > Also, I thought there were something you mentioned that this approach
> > > is not correct for Power systems, but I can't really remember the
> > > details...  Anyways, I think this is not the only approach to solve
> > > the problem, and I believe any new better idea would be greatly
> > > welcomed as well. :)
> > 
> > So, some of my initial comments were based on a misunderstanding of
> > what was proposed here - since discussing this with Yi at LinuxCon
> > Beijing, I have a better idea of what's going on.
> > 
> > On POWER - or rather the "pseries" platform, which is paravirtualized.
> > We can have multiple vIOMMU windows (usually 2) for a single virtual
> > PCI host bridge.  Because of the paravirtualization, the mapping to
> > hardware is fuzzy, but for passthrough devices they will both be
> > implemented by the IOMMU built into the physical host bridge.  That
> > isn't importat to the guest, though - all operations happen at the
> > window level.
> 
> Now I know that for Power it may not have anything like a "translation
> unit" but everything is defined as "translation windows" in the
> guests.  However the problem still exist for some other platforms.
> Say, for Intel we have emulated VT-d; for ARM, we have vSMMU.  AFAIU
> these platforms do have their translation units, and even for ARM it
> should need such an interface (or any better interfaces, which are
> always welcomed) for virt-svm to work.  Otherwise I don't know a way
> to configure the first level translation tables.
> 
> Meanwhile, IMO this abstraction should not really affect pseries - it
> should be only useful for those platforms who would like to use it.
> For pseries, we can just ignore that new interface if we don't really
> even have such a translation unit.

But it's _still_ not clear to me what a "translation unit" means.
What is common across a translation unit that is not common across
different translation units?

> 
> > 
> > The other thing that bothers me here is the way it's attached to an
> > AddressSpace.  IIUC how SVM works, the whole point is that the device
> > no longer writes into a specific PCI address space.  Instead, it
> > writes directly into a process address space.  So it seems to me more
> > that SVM should operate at the PCI level, and disassociate the device
> > from the normal PCI address space entirely, rather than hooking up
> > something via that address space.
> 
> IMO the PCI address space is still used.  For virt-svm, host IOMMU
> will be working in nested translation mode, so we should be having two
> mappings working in parallel:
> 
>   1. DPDK process (in guest) address space mapping (GVA -> GPA)
>   2. guest direct memory mappings (GPA -> HPA)
> 
> And here AFAIU the 2nd mapping is working exactly like general PCI
> devices, the only difference is that the 2nd level mapping is always
> static, just like when IOMMU passthrough is enabled for that device.

Ok.  IIUC the 2nd level mapping isn't visible to the guest, is that
right?

But this doesn't really affect my point - from the guest's point of
view, the "usual" address space for a PCI device is equal to the GPA
(i.e. no guest visible translation), but for SVM devices they instead
use an SVM address space depending on the process id, which is not the
same as the GPA space.

> So, IMHO virt-SVM is not really in parallel with PCI subsystem.

Well, it's not clear to me if it's in parallel to, or on top of.  But
the crucial point is that an SVM device does not access the "main" PCI
address space (whether or not that has a traditional IOMMU).  Instead
it has access to a bunch of different address spaces, one for each
process ID.

> For
> the SVM in guest, it may be different, since it should only be using
> first level translations.  However to implement virt-SVM, IMHO we not
> only need existing PCI address space translation logic, we also need
> an extra way to configure the first level mappings, as discussed.

Right.  I'm not disputing that a way to configure those first level
mappings is necessary.  The proposed model just doesn't seem a good
fit.  It still only represents a single "PCI address space", then
attaches things about the process table mapping to that, when in fact
they affect the process table and therefore *different* address spaces
from that which the methods are attached to.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]