qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MM


From: David Gibson
Subject: Re: [Qemu-ppc] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions
Date: Mon, 12 Feb 2018 16:19:54 +1100
User-agent: Mutt/1.9.2 (2017-12-15)

On Fri, Feb 09, 2018 at 06:55:01PM +1100, Alexey Kardashevskiy wrote:
> At the moment if vfio_memory_listener is registered in the system memory
> address space, it maps/unmaps every RAM memory region for DMA.
> It expects system page size aligned memory sections so vfio_dma_map
> would not fail and so far this has been the case. A mapping failure
> would be fatal. A side effect of such behavior is that some MMIO pages
> would not be mapped silently.
> 
> However we are going to change MSIX BAR handling so we will end having
> non-aligned sections in vfio_memory_listener (more details is in
> the next patch) and vfio_dma_map will exit QEMU.
> 
> In order to avoid fatal failures on what previously was not a failure and
> was just silently ignored, this checks the section alignment to
> the smallest supported IOMMU page size and prints an error if not aligned;
> it also prints an error if vfio_dma_map failed despite the page size check.
> Both errors are not fatal; only MMIO RAM regions are checked
> (aka "RAM device" regions).
> 
> If the amount of errors printed is overwhelming, the MSIX relocation
> could be used to avoid excessive error output.
> 
> This is unlikely to cause any behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <address@hidden>

There are some relatively superficial problems noted below.

But more fundamentally, this feels like it's extending an existing
hack past the point of usefulness.

The explicit check for is_ram_device() here has always bothered me -
it's not like a real bus bridge magically knows whether a target
address maps to RAM or not.

What I think is really going on is that even for systems without an
IOMMU, it's not really true to say that the PCI address space maps
directly onto address_space_memory.  Instead, there's a large, but
much less than 2^64 sized, "upstream window" at address 0 on the PCI
bus, which is identity mapped to the system bus.  Details will vary
with the system, but in practice we expect nothing but RAM to be in
that window.  Addresses not within that window won't be mapped to the
system bus but will just be broadcast on the PCI bus and might be
picked up as a p2p transaction.  Actually on classic PC, I suspect
there may be two windows, below and above the "ISA IO hole".

With PCIe it gets more complicated, of course.  I still suspect
there's some sort of upstream window to the host, but whether things
outside the window get reflected back down the PCIe heirarchy will
depend on those p2p relevant configuration parameters.

Maybe it's time we had a detailed look at what really happens in
physical bridges, rather than faking it with the is_ram_device()
check?

Anyway, that said, the patch below might still be a reasonable interim
hack, once the smaller problems are fixed.

> ---
>  hw/vfio/common.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 49 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f895e3c..736f271 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -544,18 +544,40 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  
>      llsize = int128_sub(llend, int128_make64(iova));
>  
> +    if (memory_region_is_ram_device(section->mr)) {
> +        hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
> +
> +        if ((iova & pgmask) || (llsize & pgmask)) {
> +            error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
> +                         " is not aligned to 0x%"HWADDR_PRIx
> +                         " and cannot be mapped for DMA",
> +                         section->offset_within_region,
> +                         int128_getlo(section->size),
> +                         pgmask + 1);
> +            return;
> +        }
> +    }
> +
>      ret = vfio_dma_map(container, iova, int128_get64(llsize),
>                         vaddr, section->readonly);
>      if (ret) {
>          error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>                       "0x%"HWADDR_PRIx", %p) = %d (%m)",
>                       container, iova, int128_get64(llsize), vaddr, ret);
> +        if (memory_region_is_ram_device(section->mr)) {
> +            /* Allow unexpected mappings not to be fatal for RAM devices */
> +            return;
> +        }
>          goto fail;
>      }
>  
>      return;
>  
>  fail:
> +    if (memory_region_is_ram_device(section->mr)) {
> +        error_report("failed to vfio_dma_map. pci p2p may not work");

Isn't this logic exactly backwards?  p2p will be affected when a *non
RAM* device (itself probably a PCI MMIO window) can't be mapped in.

> +        return;
> +    }
>      /*
>       * On the initfn path, store the first error in the container so we
>       * can gracefully fail.  Runtime, there's not much we can do other
> @@ -577,6 +599,7 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>      hwaddr iova, end;
>      Int128 llend, llsize;
>      int ret;
> +    bool try_unmap = true;
>  
>      if (vfio_listener_skipped_section(section)) {
>          trace_vfio_listener_region_del_skip(
> @@ -629,13 +652,33 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  
>      trace_vfio_listener_region_del(iova, end);
>  
> -    ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
> +    if (memory_region_is_ram_device(section->mr)) {
> +        hwaddr pgmask;
> +        VFIOHostDMAWindow *hostwin;
> +        bool hostwin_found = false;
> +
> +        QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +            if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> +                hostwin_found = true;
> +                break;
> +            }
> +        }
> +        assert(hostwin_found); /* or region_add() would have failed */
> +
> +        pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
> +        try_unmap = !((iova & pgmask) || (llsize & pgmask));

Explicit page mask checks seem bogus here.  We should unmap based on
whether we mapped, not on some other criterion which may or may not
evaluate to the same thing.

> +    }
> +
> +    if (try_unmap) {
> +        ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
> +        if (ret) {
> +            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> +                         "0x%"HWADDR_PRIx") = %d (%m)",
> +                         container, iova, int128_get64(llsize), ret);
> +        }
> +    }
> +
>      memory_region_unref(section->mr);
> -    if (ret) {
> -        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> -                     "0x%"HWADDR_PRIx") = %d (%m)",
> -                     container, iova, int128_get64(llsize), ret);
> -    }
>  
>      if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>          vfio_spapr_remove_window(container,

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]