|
From: | David Hildenbrand |
Subject: | Re: [PATCH 1/2] s390x/pci: add support for guests that request direct mapping |
Date: | Tue, 10 Dec 2024 09:58:36 +0100 |
User-agent: | Mozilla Thunderbird |
On 10.12.24 00:22, Matthew Rosato wrote:
On 12/9/24 5:09 PM, David Hildenbrand wrote:On 09.12.24 22:45, Matthew Rosato wrote:On 12/9/24 4:01 PM, David Hildenbrand wrote:On 09.12.24 20:29, Matthew Rosato wrote: Hi, Trying to wrap my head around that ... you mention that "pin the entirety of guest memory". Do you mean that we will actually end up longterm pinning all guest RAM in the kernel, similar to what vfio ends up doing?Yes. Actually, the usecase here is specifically PCI passthrough via vfio-pci on s390. Unlike other platforms, the default s390 approach only pins on-demand and doesn't longterm pin all guest RAM, which is nice from a memory footprint perspective but pays a price via all those guest-2 RPCIT instructions. The goal here is now provide the optional alternative to longterm pin like other platforms.Okay, thanks for confirming. One more question: who will trigger this longterm-pinning? Is it vfio? (the code flow from your code to the pinning code would be nice)
Thanks for the details.
Yes, the vfio IOMMU code triggers it. My s390_pci_setup_stage2_map added by this patch calls memory_region_notify_iommu in a loop such that we trigger iommu notifications to map iova X+pba -> GPA X for all GPA, where pba = a "base address" offset that has to be applied when mapping on s390.
Ah, now I see that you use "ms->ram_size", so indeed, this will map initial RAM only.
The more-future-proof approach would indeed be to register a memory listener on &address_space_memory, to then map/unmap whenever notified about map/unmap.
See "struct vfio_memory_listener" with its region_add/region_del callbacks that do that, and also implement the RAMDiscardManager plumbing.
And now I wonder if there would be a way to just reuse "struct vfio_memory_listener" here some way? Or at least reuse the map/unmap functions, because ...
The notifications are sent in the largest batches possible to minimize vfio ioctls / use the least number of vfio dma mappings.
The iommu notifications get picked up in vfio_iommu_map_notify where we will follow the container DMA path down to vfio_legacy_dma_map; then ultimately vfio is handling the pinning via the VFIO_IOMMU_MAP_DMA ioctl.
... vfio_listener_region_add() will also just call vfio_container_dma_map().Maybe there is a reason s390x needs to handle this using memory_region_notify_iommu() callbacks instead of doing it similar to "struct vfio_memory_listener" when registered on &address_space_memory without a viommu.
In that case, it would be incompatible with virtio-balloon (and without modifications with upcoming virtio-mem). Is there already a mechanism in place to handle that -- a call to ram_block_discard_disable() -- or even a way to support coordinated discarding of RAM (e.g., virtio-mem + vfio)?Good point, should be calling add ram_block_discard_disable(true) when set register + a corresponding (false) during deregister... Will add for v2. As for supporting coordinated discard, I was hoping to subsequently look at virtio-mem for this.As long as discarding is blocked for now, we're good. To support it, the RAMDiscardManager would have to be wired up, similar to vfio. I think the current way of handling it via vf + IOMMUTLBEvent event = { + .type = IOMMU_NOTIFIER_MAP, + .entry = { + .target_as = &address_space_memory, + .translated_addr = 0, + .perm = IOMMU_RW, + }, + }; Is probably not ideal: it cannot cope with memory holes (which virtio-mem would create). Likely, you'd instead want an address space notifier, and really only map the memory region sections you get notified about. There, you can test for RAMDiscardManager and handle it like vfio does.I'll start looking into this; for the moment I'll plan on blocking discarding in this series with a follow-on to then enable virtio-mem, but if I get something working sooner I'll add it to this series. Either way I'll put you on CC.
Thanks, exploring whether we can reuse the vfio bits to map/unmap might be very valuable and simplify things.
-- Cheers, David / dhildenb
[Prev in Thread] | Current Thread | [Next in Thread] |