qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma


From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps
Date: Thu, 21 Jun 2012 07:52:53 +1000

On Wed, 2012-06-20 at 16:25 -0500, Anthony Liguori wrote:

> So this cancellation stuff is hopelessly broken
> 
> It's simply not possible to fully cancel pending DMA in a synchronous 
> callback.

Well, at least for PAPR H_PUT_TCE, cancellation must be synchronous, ie
the hypercall must not return until the cancellation is complete.

> Indeed, bdrv_aio_cancel ends up having a nasty little loop in it:
> 
>      if (active) {
>          /* fail safe: if the aio could not be canceled, we wait for
>             it */
>          while (qemu_paio_error(acb) == EINPROGRESS)
>              ;
>      }
> 
> That spins w/100% CPU.
> 
> Can you explain when DMA cancellation really happens and what the effect 
> would 
> be if we simply ignored it?

It will almost never happen in practice. It will actually never happen
with the current patch series. Where it -will- eventually happen in the
long run is if the guest removes a translation that is "in use" by a
dma_map() mapping established by a device. It's always a guest
programming error though and it's not an attack vector since the guest
can only shoot itself in the foot, but it might make things like kdump
less reliable inside the guest.

We need a way to signal the device that the translation is going away
and we need -a- way to synchronize though it could be a two step process
(see below).

> Can we do something more clever like use an asynchronous callback to handle 
> flushing active DMA mappings?
> 
> There's just no way a callback like this is going to work.

Ok so first let's see what happens in real HW: One of the DMA accesses
gets a target abort return from the host bridge. The device interrupts
it's operations and signals an error.

Now, I agree that requiring a cancel callback to act synchronously might
be a bit fishy, so what about we define the following semantics:

 - First this assumes our iommu backend decides to implement that level
of correctness, as I said above, none do in that patch series (yet)

 - The basic idea is that for most iommu, there's an MMIO to start a TLB
flush and an MMIO the guest uses to spin on to get the status as to
whether the TLB flush has completed, so we can do things asynchronously
that way. However we -still- need to do things synchronously for the
hypercall used by PAPR, but as we discussed earlier that can be done
without spinning, by delaying the completion of the hypercall.

 - So step 1, no callback at all. When an iommu TLB flush operation is
started, we tag all pending maps (see below). We signal completion when
all those maps have been unmapped.

 - The above tagging can be done using some kind of generation count
along with an ordered list of maps, we keep track of the "oldest" map
still active, that sort of thing. Not too hard.

 - step 2, because some maps can be long lived and we don't want to
delay invalidations for ever, we add a cancel callback which device can
optionally installed along with a map. This callback is only meant to
-initiate- a cancellation in order to speed up when the unmap will
occur.

What do you think ? Would that work ? As I explained in my email
exchange with Gerd, there's quite a few issues in actually implementing
cancellation properly anyway, for example, today on PAPR, H_PUT_TCE is
implemented by the kernel KVM in real mode for performance reasons. So
we would need to change the KVM API to be able to keep the kernel
informed that there are maps covering portions of the iommu space (a
bitmap ?) to force exists to qemu when an invalidation collides with a
map for example.

Additionally, to be totally -correct-, we would need synchronization
with qemu to a much larger extent. IE. Any invalidation must also make
sure that anything that used a previous translation has completed, ie,
even the simple dma_ld/st ops must be synchronized in theory.

My conclusion is that the complexity of solving the problem is huge,
while the actual problem scope is close to non-existent. So I think we
can safely merge the series and ignore the issue for the time being.

Cheers,
Ben.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]