qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/10] intel-iommu: nested vIOMMU, cleanups,


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v2 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes
Date: Wed, 16 May 2018 14:30:09 +0800
User-agent: Mutt/1.9.3 (2018-01-21)

On Fri, May 04, 2018 at 11:08:01AM +0800, Peter Xu wrote:
> v2:
> - fix patchew code style warnings
> - interval tree: postpone malloc when inserting; simplify node remove
>   a bit where proper [Jason]
> - fix up comment and commit message for iommu lock patch [Kevin]
> - protect context cache too using the iommu lock [Kevin, Jason]
> - add vast comment in patch 8 to explain the modify-PTE problem
>   [Jason, Kevin]

We can hold a bit on reviewing this series.  Jintack reported a scp
DMAR issue that might happen even with L1 guest with this series, and
the scp can stall after copied tens or hundreds of MBs randomly.  I'm
still investigating the problem.  This problem should be related to
deferred flushing of VT-d kernel driver, since the problem will go
away if we use "intel_iommu=on,strict".  However I'm still trying to
figure out what's the thing behind the scene even with that deferred
flushing feature.

Meanwhile, during the investigation I found another "possibly valid"
use case about the modify-PTE problem that Jason has mentioned when
with deferred flushing:

         vcpu1                        vcpu2
      map page A
  explicitly send PSI for A
   queue unmap page A [1]
                                 map the same page A [2]
                                explcitly send PSI for A [3]
   flush unmap page A [4]

Due to deferred flushing, the UNMAP PSI might be postponed (or it can
be finally a DSI) from step [1] to step [4].  If we allocate the same
page somewhere else, we might trigger this modify-PTE at [2] since we
haven't yet received the deferred PSI to unmap A from vcpu1.

Note that this will not happen with latest upstream Linux, since the
IOVA allocation algorithm in current Linux kernel made sure that the
IOVA range won't be freed until [4], so we can't really allocate the
same page address at [2].  However this let me tend to agree with
Jason and Kevin's worry on future potential issues if that can be
triggered easily by common guest kernel bugs.  So now I'm considering
to drop my mergable interval tree but just use a simpler tree to cache
everything including translated addresses.  The metadata will possibly
take 2% of managed memory if with that.

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]