On Tue, Jul 14, 2015 at 10:21:54PM +1000, Alexey Kardashevskiy wrote:
This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.
This adds a guest RAM memory listener which notifies a VFIO container
about memory which needs to be pinned/unpinned. VFIO MMIO regions
(i.e. "skip dump" regions) are skipped.
The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.
This does not change the guest visible interface.
Signed-off-by: Alexey Kardashevskiy <address@hidden>
I've looked at this in more depth now, and attempting to unify the
pre-reg and mapping listeners like this can't work - they need to be
listening on different address spaces: mapping actions need to be
listening on the PCI address space, whereas the pre-reg needs to be
listening on address_space_memory. For x86 - for now - those end up
being the same thing, but on Power they're not.
We do need to be clear about what differences are due to the presence
of a guest IOMMU versus which are due to arch or underlying IOMMU
type. For now Power has a guest IOMMU and x86 doesn't, but that could
well change in future: we could well implement the guest side IOMMU
for x86 in future (or x86 could invent a paravirt IOMMU interface).
On the other side, BenH's experimental powernv machine type could
introduce Power machines without a guest side IOMMU (or at least an
optional guest side IOMMU).
The quick and dirty approach here is:
1. Leave the main listener as is
2. Add a new pre-reg notifier to the spapr iommu specific code,
which listens on address_space_memory, *not* the PCI space
The more generally correct approach, which allows for more complex
IOMMU arrangements and the possibility of new IOMMU types with pre-reg
is:
1. Have the core implement both a mapping listener and a pre-reg
listener (optionally enabled by a per-iommu-type flag).
Basically the first one sees what *is* mapped, the second sees
what *could* be mapped.
2. As now, the mapping listener listens on PCI address space, if
RAM blocks are added, immediately map them into the host IOMMU,
if guest IOMMU blocks appear register a notifier which will
mirror guest IOMMU mappings to the host IOMMU (this is what we
do now).
3. The pre-reg listener also listens on the PCI address space. RAM
blocks added are pre-registered immediately.