[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3] intel-iommu: Document iova_tree
From: |
Jason Wang |
Subject: |
Re: [PATCH v3] intel-iommu: Document iova_tree |
Date: |
Fri, 23 Dec 2022 15:48:01 +0800 |
On Wed, Dec 7, 2022 at 6:13 AM Peter Xu <peterx@redhat.com> wrote:
>
> It seems not super clear on when iova_tree is used, and why. Add a rich
> comment above iova_tree to track why we needed the iova_tree, and when we
> need it.
>
> Also comment for the map/unmap messages, on how they're used and
> implications (e.g. unmap can be larger than the mapped ranges).
>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> v3:
> - Adjust according to Eric's comment
> ---
> include/exec/memory.h | 28 ++++++++++++++++++++++++++
> include/hw/i386/intel_iommu.h | 38 ++++++++++++++++++++++++++++++++++-
> 2 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 91f8a2395a..269ecb873b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -129,6 +129,34 @@ struct IOMMUTLBEntry {
> /*
> * Bitmap for different IOMMUNotifier capabilities. Each notifier can
> * register with one or multiple IOMMU Notifier capability bit(s).
> + *
> + * Normally there're two use cases for the notifiers:
> + *
> + * (1) When the device needs accurate synchronizations of the vIOMMU page
> + * tables, it needs to register with both MAP|UNMAP notifies (which
> + * is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).
> + *
> + * Regarding to accurate synchronization, it's when the notified
> + * device maintains a shadow page table and must be notified on each
> + * guest MAP (page table entry creation) and UNMAP (invalidation)
> + * events (e.g. VFIO). Both notifications must be accurate so that
> + * the shadow page table is fully in sync with the guest view.
> + *
> + * (2) When the device doesn't need accurate synchronizations of the
> + * vIOMMU page tables, it needs to register only with UNMAP or
> + * DEVIOTLB_UNMAP notifies.
> + *
> + * It's when the device maintains a cache of IOMMU translations
> + * (IOTLB) and is able to fill that cache by requesting translations
> + * from the vIOMMU through a protocol similar to ATS (Address
> + * Translation Service).
> + *
> + * Note that in this mode the vIOMMU will not maintain a shadowed
> + * page table for the address space, and the UNMAP messages can be
> + * actually larger than the real invalidations (just like how the
> + * Linux IOMMU driver normally works, where an invalidation can be
> + * enlarged as long as it still covers the target range). The IOMMU
Just spot this when testing your fix for DSI:
assert(entry->iova >= notifier->start && entry_end <= notifier->end);
Do we need to remove this (but it seems a partial revert of
03c7140c1a0336af3d4fca768de791b9c0e2b128)?
Thanks
> + * notifiee should be able to take care of over-sized invalidations.
> */
> typedef enum {
> IOMMU_NOTIFIER_NONE = 0,
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 46d973e629..89dcbc5e1e 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -109,7 +109,43 @@ struct VTDAddressSpace {
> QLIST_ENTRY(VTDAddressSpace) next;
> /* Superset of notifier flags that this address space has */
> IOMMUNotifierFlag notifier_flags;
> - IOVATree *iova_tree; /* Traces mapped IOVA ranges */
> + /*
> + * @iova_tree traces mapped IOVA ranges.
> + *
> + * The tree is not needed if no MAP notifier is registered with current
> + * VTD address space, because all guest invalidate commands can be
> + * directly passed to the IOMMU UNMAP notifiers without any further
> + * reshuffling.
> + *
> + * The tree OTOH is required for MAP typed iommu notifiers for a few
> + * reasons.
> + *
> + * Firstly, there's no way to identify whether an PSI (Page Selective
> + * Invalidations) or DSI (Domain Selective Invalidations) event is an
> + * MAP or UNMAP event within the message itself. Without having prior
> + * knowledge of existing state vIOMMU doesn't know whether it should
> + * notify MAP or UNMAP for a PSI message it received when caching mode
> + * is enabled (for MAP notifiers).
> + *
> + * Secondly, PSI messages received from guest driver can be enlarged in
> + * range, covers but not limited to what the guest driver wanted to
> + * invalidate. When the range to invalidates gets bigger than the
> + * limit of a PSI message, it can even become a DSI which will
> + * invalidate the whole domain. If the vIOMMU directly notifies the
> + * registered device with the unmodified range, it may confuse the
> + * registered drivers (e.g. vfio-pci) on either:
> + *
> + * (1) Trying to map the same region more than once (for
> + * VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or,
> + *
> + * (2) Trying to UNMAP a range that is still partially mapped.
> + *
> + * That accuracy is not required for UNMAP-only notifiers, but it is a
> + * must-to-have for notifiers registered with MAP events, because the
> + * vIOMMU needs to make sure the shadow page table is always in sync
> + * with the guest IOMMU pgtables for a device.
> + */
> + IOVATree *iova_tree;
> };
>
> struct VTDIOTLBEntry {
> --
> 2.37.3
>