qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 10/10] vhost: iommu: cache static mapping if


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v4 10/10] vhost: iommu: cache static mapping if there is
Date: Mon, 29 May 2017 12:29:51 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Thu, May 25, 2017 at 09:14:55PM +0300, Michael S. Tsirkin wrote:
> On Mon, May 22, 2017 at 10:42:00AM +0800, Peter Xu wrote:
> > On Fri, May 19, 2017 at 07:55:26PM +0300, Michael S. Tsirkin wrote:
> > > On Fri, May 19, 2017 at 11:19:49AM +0800, Peter Xu wrote:
> > > > This patch pre-heat vhost iotlb cache when passthrough mode enabled.
> > > > 
> > > > Sometimes, even if user specified iommu_platform for vhost devices,
> > > > IOMMU might still be disabled. One case is passthrough mode in VT-d
> > > > implementation. We can detect this by observing iommu_list. If it's
> > > > empty, it means IOMMU translation is disabled, then we can actually
> > > > pre-heat the translation (it'll be static mapping then) by first
> > > > invalidating all IOTLB, then cache existing memory ranges into vhost
> > > > backend iotlb using 1:1 mapping.
> > > > 
> > > > Signed-off-by: Peter Xu <address@hidden>
> > > 
> > > I don't really understand. Is this a performance optimization?
> > > Can you post some #s please?
> > 
> > Yes, it is. Vhost can work even without this patch, but it should be
> > faster when with this patch.
> 
> You'll have to include perf testing numbers then.

My mistake to not have compared the numbers before, since it's just so
obvious to me that this patch should help.

Though after some simple streaming tests, it shows that this patch
didn't boost performance at all. I added some traces in vhost kernel
to know what's happened, please see below.

Without this patch, boot with iommu=pt, I see IOTLB cache insertion
like this:

vhost_process_iotlb_msg: iotlb new: 1 (0x17879b240-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 2 (0x17879d240-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 3 (0x17879a000-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 4 (0x178570000-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 5 (0x178532606-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 6 (0x177bad0e2-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 7 (0x1768560e2-0x17fffffff)

(Note: we can see the pattern is (ADDR-0x17fffffff), while ADDR is
 increasing, finally the range will cover all addresses that vhost
 needs for DMA, then we won't have cache miss, and 0x17fffffff is the
 upper limit for a 4G memory guest)

While after this patch (this is expected):

vhost_process_iotlb_msg: iotlb new: 1 (0x100000000-0x17fffffff)
vhost_process_iotlb_msg: iotlb new: 2 (0x0-0x9ffff)
vhost_process_iotlb_msg: iotlb new: 3 (0xc0000-0x7fffffff)

(Note: this is one entry per RAM memory region)

So it explained well on why performance didn't really change even
before applying this patch: currently when iommu=pt is on,
address_space_get_iotlb_entry() can get IOTLB that is bigger than page
size (if you see the code, plen decides the page mask, and plen is
only limited by memory region sizes when PT is enabled). So until the
7th cache miss IOTLB request, the range is big enough to cover the
rest of DMA addresses.

My preference is that we still apply this patch even there is no
performance gain on simple streaming test. Reasons:

- the old code has good performance depending on implementation of
  address_space_get_iotlb_entry(), which may alter in the future

- after apply the patch, we are 100% sure that we won't cache miss,
  while we cannot guarantee that without it. If not apply the patch,
  we may still encounter cache miss (e.g., access address <0x1768560e2
  after the 7th cache miss in above test), which can introduce that
  cache-missed IO to be delayed.

>
> > As mentioned in the commit message and below comment [1], this patch
> > pre-heat the cache for vhost. Currently the cache entries depends on
> > the system memory ranges (dev->mem->nregions), and it should be far
> > smaller than vhost's cache count (currently it is statically defined
> > as max_iotlb_entries=2048 in kernel). If with current patch, these
> > cache entries can cover the whole possible DMA ranges that PT mode
> > would allow, so we won't have any cache miss then.
> > 
> > For the comments, do you have any better suggestion besides commit
> > message and [1]?
> > 
> > > 
> > > Also, if it's PT, can't we bypass iommu altogether? That would be
> > > even faster ...
> > 
> > Yes, but I don't yet know a good way to do it... Any suggestion is
> > welcomed as well.
> > 
> > Btw, do you have any comment on other patches besides this one? Since
> > this patch can really be isolated from the whole PT support series.
> > 
> > Thanks,
> 
> I've applied the rest of the series.

Thank you very much.

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]