qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC: vfio API changes needed for powerpc


From: Scott Wood
Subject: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
Date: Wed, 3 Apr 2013 16:19:36 -0500

On 04/02/2013 10:37:20 PM, Alex Williamson wrote:
On Tue, 2013-04-02 at 17:50 -0500, Scott Wood wrote:
> On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
> > On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
> > > On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood
> > <address@hidden> wrote:
> > > >> > C. Explicit mapping using normal DMA map. The last idea
> > is that
> > > >> > we would introduce a new ioctl to give user-space an fd
> > to
> > > >> > the MSI bank, which could be mmapped. The flow would be
> > > >> >        something like this:
> > > >> >           -for each group user space calls new ioctl
> > > >> > VFIO_GROUP_GET_MSI_FD
> > > >> >           -user space mmaps the fd, getting a vaddr
> > > >> > -user space does a normal DMA map for desired iova > > > >> > This approach makes everything explicit, but adds a new
> > ioctl
> > > >> > applicable most likely only to the PAMU (type2 iommu).
> > > >>
> > > >> And the DMA_MAP of that mmap then allows userspace to select the
> > window
> > > >> used?  This one seems like a lot of overhead, adding a new
> > ioctl, new
> > > >> fd, mmap, special mapping path, etc.
> > > >
> > > >
> > > > There's going to be special stuff no matter what.  This would
> > keep it
> > > > separated from the IOMMU map code.
> > > >
> > > > I'm not sure what you mean by "overhead" here... the runtime
> > overhead of
> > > > setting things up is not particularly relevant as long as it's
> > reasonable.
> > > > If you mean development and maintenance effort, keeping things
> > well
> > > > separated should help.
> > >
> > > We don't need to change DMA_MAP. If we can simply add a new "type
> > 2"
> > > ioctl that allows user space to set which windows are MSIs, it
> > seems vastly
> > > less complex than an ioctl to supply a new fd, mmap of it, etc.
> > >
> > > So maybe 2 ioctls:
> > >     VFIO_IOMMU_GET_MSI_COUNT
>
> Do you mean a count of actual MSIs or a count of MSI banks used by the
> whole VFIO group?

I hope the latter, which would clarify how this is distinct from
DEVICE_GET_IRQ_INFO.  Is hotplug even on the table?  Presumably
dynamically adding a device could bring along additional MSI banks?

I'm not sure -- maybe we could say that hotplug can add banks, but not remove them or change the order, so userspace would just need to check if the number of banks changed, and map the extras.

The current VFIO MSI support has the host handling everything about MSI.
The user never programs an MSI vector to the physical device, they set
up everything through ioctl. On interrupt, we simply trigger an eventfd and leave it to things like KVM irqfd or QEMU to do the right thing in a
virtual machine.

Here the MSI vector has to go through a PAMU window to hit the correct
MSI bank.  So that means it has some component of the iova involved,
which we're proposing here is controlled by userspace (whether that
vector uses an offset from 0x10000000 or 0x00000000 depending on which
window slot is used to make the MSI bank). I assume we're still working
in a model where the physical interrupt fires into the host and a
host-based interrupt handler triggers an eventfd, right?

Yes (subject to possible future optimizations).

So that means the vector also has host components so we trigger the correct ISR. How
is that coordinated?

Everything but the iova component needs to come from the host MSI allocator.

Would is be possible for userspace to simply leave room for MSI bank
mapping (how much room could be determined by something like
VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
DMA_MAP starting at the 0x0 address of the aperture, growing up, and
VFIO will map banks on demand at the top of the aperture, growing down?
Wouldn't that avoid a lot of issues with userspace needing to know
anything about MSI banks (other than count) and coordinating irq numbers
and enabling handlers?

This would restrict a (possibly unlikely) use case where the user wants to map something near the top of the aperture but has another place MSIs can go (or is willing to live without MSIs). Otherwise it could be workable, as long as we can require an explicit MSI enabling on a device to happen after the aperture and subwindow count are set up. I'm not sure it would really buy anything over having userspace iterate over the MSI bank count, though -- it would probably be a bit more complicated.

> > On x86 MSI count is very
> > device specific, which means it wold be a VFIO_DEVICE_* ioctl
> > (actually
> > VFIO_DEVICE_GET_IRQ_INFO does this for us on x86). The trouble with
> > it
> > being a device ioctl is that you need to get the device FD, but the > > IOMMU protection needs to be established before you can get that... so
> > there's an ordering problem if you need it from the device before
> > configuring the IOMMU.  Thanks,
>
> What do you mean by "IOMMU protection needs to be established"?
> Wouldn't we just start with no mappings in place?

If no mappings blocks all DMA, sure, that's fine. Once the VFIO device
FD is accessible by userspace we have to protect the host against DMA.
If any IOMMU_SET_ATTR calls temporarily disable DMA protection, that
could be exploitable.  Thanks,

Unless the PAMU is globally in bypass mode (which it wouldn't be), there's no way to disable protection other than creating one giant mapping.

-Scott



reply via email to

[Prev in Thread] Current Thread [Next in Thread]