Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

From:	Alex Williamson
Subject:	Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
Date:	Tue, 02 Apr 2013 15:32:04 -0600

On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote:
> On 04/02/2013 03:32:17 PM, Alex Williamson wrote:
> > On Tue, 2013-04-02 at 17:32 +0000, Yoder Stuart-B08248 wrote:
> > > 2.   MSI window mappings
> > >
> > >    The more problematic question is how to deal with MSIs.  We need  
> > to
> > >    create mappings for up to 3 MSI banks that a device may need to  
> > target
> > >    to generate interrupts.  The Linux MSI driver can allocate MSIs  
> > from
> > >    the 3 banks any way it wants, and currently user space has no  
> > way of
> > >    knowing which bank may be used for a given device.
> > >
> > >    There are 3 options we have discussed and would like your  
> > direction:
> > >
> > >    A.  Implicit mappings -- with this approach user space would not
> > >        explicitly map MSIs.  User space would be required to set the
> > >        geometry so that there are 3 unused windows (the last 3  
> > windows)
> > >        for MSIs, and it would be up to the kernel to create the  
> > mappings.
> > >        This approach requires some specific semantics (leaving 3  
> > windows)
> > >        and it potentially gets a little weird-- when should the  
> > kernel
> > >        actually create the MSI mappings?  When should they be  
> > unmapped?
> > >        Some convention would need to be established.
> > 
> > VFIO would have control of SET/GET_ATTR, right?  So we could reduce  
> > the
> > number exposed to userspace on GET and transparently add MSI entries  
> > on
> > SET.
> 
> What do you mean by "reduce the number exposed"?  Userspace decides how  
> many entries there are, but it must be a power of two beteen 1 and 256.

I didn't understand the API.

> > On x86 the interrupt remapper handles this transparently when MSI
> > is enabled and userspace never gets direct access to the device MSI
> > address/data registers.
> 
> x86 has a totally different mechanism here, as far as I understand --  
> even before you get into restrictions on mappings.

So what control will userspace have over programming the actually MSI
vectors on PAMU?

> > What kind of restrictions do you have around
> > adding and removing windows while the aperture is enabled?
> 
> Subwindows can be modified while the aperture is enabled, but the  
> aperture size and number of subwindows cannot be changed.
> 
> > >    B.  Explicit mapping using DMA map flags.  The idea is that a new
> > >        flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
> > >        a mapping is to be created for the supplied iova.  No vaddr
> > >        is given though.  So in the above example there would be a
> > >        a dma map at 0x10000000 for 24KB (and no vaddr).   It's
> > >        up to the kernel to determine which bank gets mapped where.
> > >        So, this option puts user space in control of which windows
> > >        are used for MSIs and when MSIs are mapped/unmapped.   There
> > >        would need to be some semantics as to how this is used-- it
> > >        only makes sense
> > 
> > This could also be done as another "type2" ioctl extension.
> 
> Again, what is "type2", specifically?  If someone else is adding their  
> own IOMMU that is kind of, sort of like PAMU, how would they know if  
> it's close enough?  What assumptions can a user make when they see that  
> they're dealing with "type2"?

Naming always has and always will be a problem.  I assume this is named
type2 rather than PAMU because it's trying to expose a generic windowed
IOMMU fitting the IOMMU API.  Like type1, it doesn't really make sense
to name it "IOMMU API" because that's a kernel internal interface and
we're designing a userspace interface that just happens to use that.
Tagging it to a piece of hardware makes it less reusable.  Type1 is
arbitrary.  It might as well be named "brown" and this one can be
"blue".

> > What's the value to userspace in determining which windows are used  
> > by which banks?
> 
> That depends on who programs the MSI config space address.  What is  
> important is userspace controlling which iovas will be dedicated to  
> this, in case it wants to put something else there.

So userspace is programming the MSI vectors, targeting a user programmed
iova?  But an iova selects a window and I thought there were some number
of MSI banks and we don't really know which ones we'll need...  still
confused.

> > It sounds like the case that there are X banks and if userspace wants  
> > to
> > use MSI it needs to leave X windows available for that.  Is this just
> > buying userspace a few more windows to allow them the choice between  
> > MSI
> > or RAM?
> 
> Well, there could be that.  But also, userspace will generally have a  
> much better idea of the type of mappings it's creating, so it's easier  
> to keep everything explicit at the kernel/user interface than require  
> more complicated code in the kernel to figure things out automatically  
> (not just for MSIs but in general).
> 
> If the kernel automatically creates the MSI mappings, when does it  
> assume that userspace is done creating its own?  What if userspace  
> doesn't need any DMA other than the MSIs?  What if userspace wants to  
> continue dynamically modifying its other mappings?

Yep, valid arguments.

> > >    C.  Explicit mapping using normal DMA map.  The last idea is that
> > >        we would introduce a new ioctl to give user-space an fd to
> > >        the MSI bank, which could be mmapped.  The flow would be
> > >        something like this:
> > >           -for each group user space calls new ioctl  
> > VFIO_GROUP_GET_MSI_FD
> > >           -user space mmaps the fd, getting a vaddr
> > >           -user space does a normal DMA map for desired iova
> > >        This approach makes everything explicit, but adds a new ioctl
> > >        applicable most likely only to the PAMU (type2 iommu).
> > 
> > And the DMA_MAP of that mmap then allows userspace to select the  
> > window
> > used?  This one seems like a lot of overhead, adding a new ioctl, new
> > fd, mmap, special mapping path, etc.
> 
> There's going to be special stuff no matter what.  This would keep it  
> separated from the IOMMU map code.
> 
> I'm not sure what you mean by "overhead" here... the runtime overhead  
> of setting things up is not particularly relevant as long as it's  
> reasonable.  If you mean development and maintenance effort, keeping  
> things well separated should help.

Overhead in terms of code required and complexity.  More things to
reference count and shut down in the proper order on userspace exit.
Thanks,

Alex

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, (continued)

Prev by Date: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
Next by Date: [Qemu-devel] Regarding migration
Previous by thread: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
Next by thread: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
Index(es):
- Date
- Thread