qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2


From: David Gibson
Subject: Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)
Date: Fri, 17 Jul 2015 15:20:55 +1000
User-agent: Mutt/1.5.23 (2014-03-12)

On Thu, Jul 16, 2015 at 08:44:59AM -0600, Alex Williamson wrote:
> On Thu, 2015-07-16 at 15:11 +1000, David Gibson wrote:
> > On Tue, Jul 14, 2015 at 10:21:54PM +1000, Alexey Kardashevskiy wrote:
> > > This makes use of the new "memory registering" feature. The idea is
> > > to provide the userspace ability to notify the host kernel about pages
> > > which are going to be used for DMA. Having this information, the host
> > > kernel can pin them all once per user process, do locked pages
> > > accounting (once) and not spent time on doing that in real time with
> > > possible failures which cannot be handled nicely in some cases.
> > > 
> > > This adds a guest RAM memory listener which notifies a VFIO container
> > > about memory which needs to be pinned/unpinned. VFIO MMIO regions
> > > (i.e. "skip dump" regions) are skipped.
> > > 
> > > The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
> > > are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
> > > not call it when v2 is detected and enabled.
> > > 
> > > This does not change the guest visible interface.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <address@hidden>
> > 
> > I've looked at this in more depth now, and attempting to unify the
> > pre-reg and mapping listeners like this can't work - they need to be
> > listening on different address spaces:  mapping actions need to be
> > listening on the PCI address space, whereas the pre-reg needs to be
> > listening on address_space_memory.  For x86 - for now - those end up
> > being the same thing, but on Power they're not.
> > 
> > We do need to be clear about what differences are due to the presence
> > of a guest IOMMU versus which are due to arch or underlying IOMMU
> > type.  For now Power has a guest IOMMU and x86 doesn't, but that could
> > well change in future: we could well implement the guest side IOMMU
> > for x86 in future (or x86 could invent a paravirt IOMMU interface).
> > On the other side, BenH's experimental powernv machine type could
> > introduce Power machines without a guest side IOMMU (or at least an
> > optional guest side IOMMU).
> > 
> > The quick and dirty approach here is:
> >    1. Leave the main listener as is
> >    2. Add a new pre-reg notifier to the spapr iommu specific code,
> >       which listens on address_space_memory, *not* the PCI space
> 
> It is dirty and that's exactly what I've been advising Alexey against
> because we have entirely too much dirty spapr specific code that doesn't
> need to be spapr specific.  I don't see why separate address space
> matters, that's done at the point of registering the listener and so far
> doesn't play much a role in the actual listener behavior, just which
> regions it sees.

Well, there's two parts to this - the different address spaces means
they need to be different listener instances.  They also need
different callbacks - or at least parameterized callback behaviour
because they do different things (one maps, the other preregs).

So I really don't see any sense in which these can be accomplished by
the same listener.  *Maybe* they could share some region walking code,
but I'm not sure there's going to be anything of significant size here.

> > The more generally correct approach, which allows for more complex
> > IOMMU arrangements and the possibility of new IOMMU types with pre-reg
> > is:
> >    1. Have the core implement both a mapping listener and a pre-reg
> >       listener (optionally enabled by a per-iommu-type flag).
> >       Basically the first one sees what *is* mapped, the second sees
> >       what *could* be mapped.
> 
> This just sounds like different address spaces, address_space_memory vs
> address_space_physical_memory

Um.. what?  I'm not even sure what you mean by
address_space_physical_memory (I see no such thing in the source).

The idea was that the (top level) pre-reg listener would spawn more
listeners for any AS which could get (partially) mapped into the PCI
addres space.

But.. I looked a bit closer and realised this scheme doesn't actually
work.  IOMMU memory regions don't actually have a fixed target AS
property (by which I mean the address space the IOMMU translates
*into* rather than translates from - address_space_memory in most
cases).  Instead any individual IOMMU mapping can point to a different
AS supplied in the IOMMUTLB structure.

> >    2. As now, the mapping listener listens on PCI address space, if
> >       RAM blocks are added, immediately map them into the host IOMMU,
> >       if guest IOMMU blocks appear register a notifier which will
> >       mirror guest IOMMU mappings to the host IOMMU (this is what we
> >       do now).
> 
> Right, this is done now, nothing new required.

Yes, I was just spelling that out for comparison with the other part.

> >    3. The pre-reg listener also listens on the PCI address space.  RAM
> >       blocks added are pre-registered immediately.  But, if guest
> >       IOMMU blocks are added, instead of registering a guest-iommu
> >       notifier, we register another listener on the *target* AS of the
> >       guest IOMMU, same callbacks as this one.  In practice that
> >       target AS will almost always resolve to address_space_memory,
> >       but this can at least in theory handle crazy guest setups with
> >       multiple layers of IOMMU.
> > 
> >    4. Have to ensure that the pre-reg callbacks always happen before
> >       the mapping calls.  For a system with an IOMMU backend which
> >       requires pre-registration, but doesn't have a guest IOMMU, we
> >       need to pre-reg, then host-iommu-map RAM blocks that appear in
> >       PCI address space.

> Both of these just seem like details of the iommu-type (pre-reg)
> specific parts of the listener, I'm not understanding what's
> fundamentally different about it that we can't do it now, within a
> single listener, nor do I see how it's all that different from x86.

So, we have two different address spaces to listen on (PCI and
address_space_memory) so we need to different listener instances
registered, clearly.

Those two different listeners need to do different things.  1) maps
memory blocks into VFIO and sets up a notifier to mirror guest IOMMU
mappings into VFIO.  2) preregisters memory blocks, and it would be a
bug if it ever saw a guest IOMMU.

So, what's left to share?

> x86
> mappings are more dynamic, but that's largely in legacy space for x86
> due to some bizarre compatibility things.  x86 also tries to map mmio
> regions for p2p, but these are just minor details to mappings that are
> largely the same.  Thanks,
> 
> Alex
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: pgpFDnZRdhg5C.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]