qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] intel-iommu: build iova tree during IOMMU translation


From: Jason Wang
Subject: Re: [PATCH 3/3] intel-iommu: build iova tree during IOMMU translation
Date: Mon, 5 Dec 2022 12:12:04 +0800

`


On Thu, Dec 1, 2022 at 10:59 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Dec 01, 2022 at 04:35:48PM +0800, Jason Wang wrote:
> > On Wed, Nov 30, 2022 at 11:17 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 02:33:51PM +0800, Jason Wang wrote:
> > > > On Tue, Nov 29, 2022 at 11:57 PM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > On Tue, Nov 29, 2022 at 04:10:37PM +0800, Jason Wang wrote:
> > > > > > The IOVA tree is only built during page walk this breaks the device
> > > > > > that tries to use UNMAP notifier only. One example is vhost-net, it
> > > > > > tries to use UNMAP notifier when vIOMMU doesn't support 
> > > > > > DEVIOTLB_UNMAP
> > > > > > notifier (e.g when dt mode is not enabled). The interesting part is
> > > > > > that it doesn't use MAP since it can query the IOMMU translation by
> > > > > > itself upon a IOTLB miss.
> > > > > >
> > > > > > This doesn't work since Qemu doesn't build IOVA tree in IOMMU
> > > > > > translation which means the UNMAP notifier won't be triggered during
> > > > > > the page walk since Qemu think it is never mapped. This could be
> > > > > > noticed when vIOMMU is used with vhost_net but dt is disabled.
> > > > > >
> > > > > > Fixing this by build the iova tree during IOMMU translation, this
> > > > > > makes sure the UNMAP notifier event could be identified during page
> > > > > > walk. And we need to walk page table not only for UNMAP notifier but
> > > > > > for MAP notifier during PSI.
> > > > > >
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > >  hw/i386/intel_iommu.c | 43 
> > > > > > ++++++++++++++++++-------------------------
> > > > > >  1 file changed, 18 insertions(+), 25 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > > > index d025ef2873..edeb62f4b2 100644
> > > > > > --- a/hw/i386/intel_iommu.c
> > > > > > +++ b/hw/i386/intel_iommu.c
> > > > > > @@ -1834,6 +1834,8 @@ static bool 
> > > > > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > > > >      uint8_t access_flags;
> > > > > >      bool rid2pasid = (pasid == PCI_NO_PASID) && s->root_scalable;
> > > > > >      VTDIOTLBEntry *iotlb_entry;
> > > > > > +    const DMAMap *mapped;
> > > > > > +    DMAMap target;
> > > > > >
> > > > > >      /*
> > > > > >       * We have standalone memory region for interrupt addresses, we
> > > > > > @@ -1954,6 +1956,21 @@ out:
> > > > > >      entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) 
> > > > > > & page_mask;
> > > > > >      entry->addr_mask = ~page_mask;
> > > > > >      entry->perm = access_flags;
> > > > > > +
> > > > > > +    target.iova = entry->iova;
> > > > > > +    target.size = entry->addr_mask;
> > > > > > +    target.translated_addr = entry->translated_addr;
> > > > > > +    target.perm = entry->perm;
> > > > > > +
> > > > > > +    mapped = iova_tree_find(vtd_as->iova_tree, &target);
> > > > > > +    if (!mapped) {
> > > > > > +        /* To make UNMAP notifier work, we need build iova tree 
> > > > > > here
> > > > > > +         * in order to have the UNMAP iommu notifier to be 
> > > > > > triggered
> > > > > > +         * during the page walk.
> > > > > > +         */
> > > > > > +        iova_tree_insert(vtd_as->iova_tree, &target);
> > > > > > +    }
> > > > > > +
> > > > > >      return true;
> > > > > >
> > > > > >  error:
> > > > > > @@ -2161,31 +2178,7 @@ static void 
> > > > > > vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > > > >          ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > > > > >                                         vtd_as->devfn, &ce);
> > > > > >          if (!ret && domain_id == vtd_get_domain_id(s, &ce, 
> > > > > > vtd_as->pasid)) {
> > > > > > -            if (vtd_as_has_map_notifier(vtd_as)) {
> > > > > > -                /*
> > > > > > -                 * As long as we have MAP notifications registered 
> > > > > > in
> > > > > > -                 * any of our IOMMU notifiers, we need to sync the
> > > > > > -                 * shadow page table.
> > > > > > -                 */
> > > > > > -                vtd_sync_shadow_page_table_range(vtd_as, &ce, 
> > > > > > addr, size);
> > > > > > -            } else {
> > > > > > -                /*
> > > > > > -                 * For UNMAP-only notifiers, we don't need to walk 
> > > > > > the
> > > > > > -                 * page tables.  We just deliver the PSI down to
> > > > > > -                 * invalidate caches.
> > > > > > -                 */
> > > > > > -                IOMMUTLBEvent event = {
> > > > > > -                    .type = IOMMU_NOTIFIER_UNMAP,
> > > > > > -                    .entry = {
> > > > > > -                        .target_as = &address_space_memory,
> > > > > > -                        .iova = addr,
> > > > > > -                        .translated_addr = 0,
> > > > > > -                        .addr_mask = size - 1,
> > > > > > -                        .perm = IOMMU_NONE,
> > > > > > -                    },
> > > > > > -                };
> > > > > > -                memory_region_notify_iommu(&vtd_as->iommu, 0, 
> > > > > > event);
>
> [1]
>
> > > > >
> > > > > Isn't this path the one that will be responsible for pass-through the 
> > > > > UNMAP
> > > > > events from guest to vhost when there's no MAP notifier requested?
> > > >
> > > > Yes, but it doesn't do the iova tree removing. More below.
> > > >
> > > > >
> > > > > At least that's what I expected when introducing the iova tree, 
> > > > > because for
> > > > > unmap-only device hierachy I thought we didn't need the tree at all.
> > > >
> > > > Then the problem is the UNMAP notifier won't be trigger at all during
> > > > DSI page walk in vtd_page_walk_one() because there's no DMAMap stored
> > > > in the iova tree.:
> > > >
> > > >         if (!mapped) {
> > > >             /* Skip since we didn't map this range at all */
> > > >             trace_vtd_page_walk_one_skip_unmap(entry->iova, 
> > > > entry->addr_mask);
> > > >             return 0;
> > > >         }
> > > >
> > > > So I choose to build the iova tree in translate then we won't go
> > > > within the above condition.
> > >
> > > That's also why it's weird because IIUC we should never walk a page table
> > > at all if there's no MAP notifier regiestered.
> >
> > If this is true, we probably need to document this somewhere.
>
> Agree.  I'll post a patch.
>
> >
> > >
> > > When I'm looking at the walk callers I found that indeed there's one path
> > > missing where can cause it to actually walk the pgtables without !MAP, 
> > > then
> > > I also noticed commit f7701e2c7983b6, and I'm wondering what we really 
> > > want
> > > is something like this:
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index a08ee85edf..c46f3db992 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -1536,7 +1536,7 @@ static int 
> > > vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
> > >      VTDContextEntry ce;
> > >      IOMMUNotifier *n;
> > >
> > > -    if (!(vtd_as->iommu.iommu_notify_flags & 
> > > IOMMU_NOTIFIER_IOTLB_EVENTS)) {
> > > +    if (!vtd_as_has_map_notifier(vtd_as)) {
> > >          return 0;
> > >      }
> > >
> > > So I'm not sure whether this patch is the problem resolver; so far I feel
> > > like it's patch 2 who does the real fix.  Then we can have the above
> > > oneliner so we stop any walks when there's no map notifiers.
> > >
> > > Thanks,
> >
> > I may miss something but as state above, the problem is a missing
> > UNMAP notification during DSI when there's only UNMAP notifier.
>
> I got confused too on why we didn't notify UNMAP for DSI already, that's so
> weird because I thought it should be there or it should be broken for a
> long time.. as we discussed multiple times around this one:
>
>                 /*
>                  * Fallback to domain selective flush if no PSI support or
>                  * the size is too big.
>                  */
>                 if (!cap_pgsel_inv(iommu->cap) ||
>                     mask > cap_max_amask_val(iommu->cap))
>                         iommu->flush.flush_iotlb(iommu, did, 0, 0,
>                                                         DMA_TLB_DSI_FLUSH);
>                 else
>                         iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
>                                                         DMA_TLB_PSI_FLUSH);
>
> I guess we were just always lucky?..

Probably, or the reason is that we a notifier with UNMAP only is not
commonly used before until patch 2.

>
> >
> > To solve it we might have two ways:
> >
> > 1) build the iova tree during iommu translation then we can correctly
> > trigger UNMAP during page walk caused by DSI
> > 2) don't do the iova tree walk for !MAP notifier, need new logic to
> > trigger UNMAP notifier in PSI/DSI
> >
> > This patch choose to go 1) (which seems easier at least for -stable).
> > Do you mean you prefer to go with 2)?
>
> Yes.
>
> IOVA tree is unnecessary overhead IMHO because UNMAP (both IOTLB or
> DEVIOTLB unmap) shouldn't need that complexity at all.  Using the iova tree
> can be accurate on which page got unmapped when the kernel driver used DSI
> for a large PSI as shown above, however IMHO it needs more justification
> that the pgtable walk is worth the effort.  Not to mention if a device in
> QEMU that wants to use the iova tree for some reason, one can just register
> with MAP and ignore all MAP events, while by default we keep UNMAP simple.
>
> So we could do:
>
>   (1) Rename vtd_sync_shadow_page_table() to vtd_sync_domain()
>   (2) instead of optimizing dev-iotlb only there:
>
>       if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
>           return 0;
>       }
>
>       we should firstly check if UNMAP or DEVUNMAP registered, we directly
>       send a notification to the whole domain.  We need to choose the event
>       that the register happens with but not both.
>
> This also reminded me that whether we should sanity check on iommu
> notifiers on some invalid cases.  E.g. it seems to me when registered with
> DEVIOTLB_UNMAP it should not register with either MAP or UNMAP anymore or
> it doesn't make sense.

It seems it doesn't' harm to allow both UNMAP and DEVIOTLB_UNMAP work.

>
> One step further, I'm wondering whether the DEV_IOTLB event should exist at
> all.  Maybe we want to have DEV_IOTLB typed iommu notifier only,

This seems cleaner ( I remember we had some discussion before ).

> but then
> when we got dev-iotlb PSI/DSI we notify with type=UNMAP just like normal
> UNMAP events, then in above (2) we can send UNMAP constantly as long as
> !MAP.  But even if so that'll just be another optional change on top.

I'm fine to go without iova-tree. Would you mind to post patches for
fix? I can test and include it in this series then.

Thanks

>
> Thanks,
>
> --
> Peter Xu
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]