[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
From: |
Alex Williamson |
Subject: |
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc |
Date: |
Tue, 02 Apr 2013 15:32:04 -0600 |
On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote:
> On 04/02/2013 03:32:17 PM, Alex Williamson wrote:
> > On Tue, 2013-04-02 at 17:32 +0000, Yoder Stuart-B08248 wrote:
> > > 2. MSI window mappings
> > >
> > > The more problematic question is how to deal with MSIs. We need
> > to
> > > create mappings for up to 3 MSI banks that a device may need to
> > target
> > > to generate interrupts. The Linux MSI driver can allocate MSIs
> > from
> > > the 3 banks any way it wants, and currently user space has no
> > way of
> > > knowing which bank may be used for a given device.
> > >
> > > There are 3 options we have discussed and would like your
> > direction:
> > >
> > > A. Implicit mappings -- with this approach user space would not
> > > explicitly map MSIs. User space would be required to set the
> > > geometry so that there are 3 unused windows (the last 3
> > windows)
> > > for MSIs, and it would be up to the kernel to create the
> > mappings.
> > > This approach requires some specific semantics (leaving 3
> > windows)
> > > and it potentially gets a little weird-- when should the
> > kernel
> > > actually create the MSI mappings? When should they be
> > unmapped?
> > > Some convention would need to be established.
> >
> > VFIO would have control of SET/GET_ATTR, right? So we could reduce
> > the
> > number exposed to userspace on GET and transparently add MSI entries
> > on
> > SET.
>
> What do you mean by "reduce the number exposed"? Userspace decides how
> many entries there are, but it must be a power of two beteen 1 and 256.
I didn't understand the API.
> > On x86 the interrupt remapper handles this transparently when MSI
> > is enabled and userspace never gets direct access to the device MSI
> > address/data registers.
>
> x86 has a totally different mechanism here, as far as I understand --
> even before you get into restrictions on mappings.
So what control will userspace have over programming the actually MSI
vectors on PAMU?
> > What kind of restrictions do you have around
> > adding and removing windows while the aperture is enabled?
>
> Subwindows can be modified while the aperture is enabled, but the
> aperture size and number of subwindows cannot be changed.
>
> > > B. Explicit mapping using DMA map flags. The idea is that a new
> > > flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
> > > a mapping is to be created for the supplied iova. No vaddr
> > > is given though. So in the above example there would be a
> > > a dma map at 0x10000000 for 24KB (and no vaddr). It's
> > > up to the kernel to determine which bank gets mapped where.
> > > So, this option puts user space in control of which windows
> > > are used for MSIs and when MSIs are mapped/unmapped. There
> > > would need to be some semantics as to how this is used-- it
> > > only makes sense
> >
> > This could also be done as another "type2" ioctl extension.
>
> Again, what is "type2", specifically? If someone else is adding their
> own IOMMU that is kind of, sort of like PAMU, how would they know if
> it's close enough? What assumptions can a user make when they see that
> they're dealing with "type2"?
Naming always has and always will be a problem. I assume this is named
type2 rather than PAMU because it's trying to expose a generic windowed
IOMMU fitting the IOMMU API. Like type1, it doesn't really make sense
to name it "IOMMU API" because that's a kernel internal interface and
we're designing a userspace interface that just happens to use that.
Tagging it to a piece of hardware makes it less reusable. Type1 is
arbitrary. It might as well be named "brown" and this one can be
"blue".
> > What's the value to userspace in determining which windows are used
> > by which banks?
>
> That depends on who programs the MSI config space address. What is
> important is userspace controlling which iovas will be dedicated to
> this, in case it wants to put something else there.
So userspace is programming the MSI vectors, targeting a user programmed
iova? But an iova selects a window and I thought there were some number
of MSI banks and we don't really know which ones we'll need... still
confused.
> > It sounds like the case that there are X banks and if userspace wants
> > to
> > use MSI it needs to leave X windows available for that. Is this just
> > buying userspace a few more windows to allow them the choice between
> > MSI
> > or RAM?
>
> Well, there could be that. But also, userspace will generally have a
> much better idea of the type of mappings it's creating, so it's easier
> to keep everything explicit at the kernel/user interface than require
> more complicated code in the kernel to figure things out automatically
> (not just for MSIs but in general).
>
> If the kernel automatically creates the MSI mappings, when does it
> assume that userspace is done creating its own? What if userspace
> doesn't need any DMA other than the MSIs? What if userspace wants to
> continue dynamically modifying its other mappings?
Yep, valid arguments.
> > > C. Explicit mapping using normal DMA map. The last idea is that
> > > we would introduce a new ioctl to give user-space an fd to
> > > the MSI bank, which could be mmapped. The flow would be
> > > something like this:
> > > -for each group user space calls new ioctl
> > VFIO_GROUP_GET_MSI_FD
> > > -user space mmaps the fd, getting a vaddr
> > > -user space does a normal DMA map for desired iova
> > > This approach makes everything explicit, but adds a new ioctl
> > > applicable most likely only to the PAMU (type2 iommu).
> >
> > And the DMA_MAP of that mmap then allows userspace to select the
> > window
> > used? This one seems like a lot of overhead, adding a new ioctl, new
> > fd, mmap, special mapping path, etc.
>
> There's going to be special stuff no matter what. This would keep it
> separated from the IOMMU map code.
>
> I'm not sure what you mean by "overhead" here... the runtime overhead
> of setting things up is not particularly relevant as long as it's
> reasonable. If you mean development and maintenance effort, keeping
> things well separated should help.
Overhead in terms of code required and complexity. More things to
reference count and shut down in the proper order on userspace exit.
Thanks,
Alex
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, (continued)
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Alex Williamson, 2013/04/03
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Scott Wood, 2013/04/03
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Scott Wood, 2013/04/03
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Stuart Yoder, 2013/04/03
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Scott Wood, 2013/04/03
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Scott Wood, 2013/04/02
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc,
Alex Williamson <=
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Scott Wood, 2013/04/02
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Alex Williamson, 2013/04/02
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Stuart Yoder, 2013/04/03
- Re: [Qemu-devel] RFC: vfio API changes needed for powerpc, Scott Wood, 2013/04/03