[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memo
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently |
Date: |
Tue, 27 Mar 2018 15:07:03 +0800 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Thu, Mar 22, 2018 at 07:57:54PM +0800, Xiao Guangrong wrote:
>
>
> On 03/21/2018 05:06 PM, Peter Xu wrote:
> > On Tue, Mar 13, 2018 at 03:57:33PM +0800, address@hidden wrote:
> > > From: Xiao Guangrong <address@hidden>
> > >
> > > Current code uses compress2()/uncompress() to compress/decompress
> > > memory, these two function manager memory allocation and release
> > > internally, that causes huge memory is allocated and freed very
> > > frequently
> > >
> > > More worse, frequently returning memory to kernel will flush TLBs
> > > and trigger invalidation callbacks on mmu-notification which
> > > interacts with KVM MMU, that dramatically reduce the performance
> > > of VM
> > >
> > > So, we maintain the memory by ourselves and reuse it for each
> > > compression and decompression
> > >
> > > Signed-off-by: Xiao Guangrong <address@hidden>
> > > ---
> > > migration/qemu-file.c | 34 ++++++++++--
> > > migration/qemu-file.h | 6 ++-
> > > migration/ram.c | 142
> > > +++++++++++++++++++++++++++++++++++++-------------
> > > 3 files changed, 140 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > > index 2ab2bf362d..1ff33a1ffb 100644
> > > --- a/migration/qemu-file.c
> > > +++ b/migration/qemu-file.c
> > > @@ -658,6 +658,30 @@ uint64_t qemu_get_be64(QEMUFile *f)
> > > return v;
> > > }
> > > +/* return the size after compression, or negative value on error */
> > > +static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t
> > > dest_len,
> > > + const uint8_t *source, size_t source_len)
> > > +{
> > > + int err;
> > > +
> > > + err = deflateReset(stream);
> >
> > I'm not familiar with zlib, but I saw this in manual:
> >
> > https://www.zlib.net/manual.html
> >
> > This function is equivalent to deflateEnd followed by deflateInit,
> > but does not free and reallocate the internal compression state. The
> > stream will leave the compression level and any other attributes that
> > may have been set unchanged.
> >
> > I thought it was deflateInit() who is slow? Can we avoid the reset as
>
> deflateEnd() is worse as it frees memory to kernel which triggers
> TLB flush and mmu-notifier.
>
> > long as we make sure to deflateInit() before doing anything else?
>
> Actually, deflateReset() is cheap... :)
>
> >
> > Meanwhile, is there any performance number for this single patch?
> > Since I thought the old code is calling compress2() which contains
> > deflateInit() and deflateEnd() too, just like what current patch do?
>
> No, after the patch, we just call deflateInit() / deflateEnd() one
> time (in _setup() handler and _cleanup handler).
>
> Yes. This is the perf data from our production,
> after revert this patch:
> + 57.88% kqemu [kernel.kallsyms] [k] queued_spin_lock_slowpath
> + 10.55% kqemu [kernel.kallsyms] [k] __lock_acquire
> + 4.83% kqemu [kernel.kallsyms] [k] flush_tlb_func_common
>
> - 1.16% kqemu [kernel.kallsyms] [k] lock_acquire
> ▒
> - lock_acquire
> ▒
> - 15.68% _raw_spin_lock
> ▒
> + 29.42% __schedule
> ▒
> + 29.14% perf_event_context_sched_out
> ▒
> + 23.60% tdp_page_fault
> ▒
> + 10.54% do_anonymous_page
> ▒
> + 2.07% kvm_mmu_notifier_invalidate_range_start
> ▒
> + 1.83% zap_pte_range
> ▒
> + 1.44% kvm_mmu_notifier_invalidate_range_end
>
>
> apply our work:
> + 51.92% kqemu [kernel.kallsyms] [k] queued_spin_lock_slowpath
> + 14.82% kqemu [kernel.kallsyms] [k] __lock_acquire
> + 1.47% kqemu [kernel.kallsyms] [k] mark_lock.clone.0
> + 1.46% kqemu [kernel.kallsyms] [k] native_sched_clock
> + 1.31% kqemu [kernel.kallsyms] [k] lock_acquire
> + 1.24% kqemu libc-2.12.so [.] __memset_sse2
>
> - 14.82% kqemu [kernel.kallsyms] [k] __lock_acquire
> ▒
> - __lock_acquire
> ▒
> - 99.75% lock_acquire
> ▒
> - 18.38% _raw_spin_lock
> ▒
> + 39.62% tdp_page_fault
> ▒
> + 31.32% __schedule
> ▒
> + 27.53% perf_event_context_sched_out
> ▒
> + 0.58% hrtimer_interrupt
>
>
> You can see the TLB flush and mmu-lock contention have gone after this patch.
Yes. Obviously I misunderstood the documentation for deflateReset().
It's not really a combined "End+Init", a quick glance in zlib code
shows that deflateInit() will do the mallocs, then call deflateReset()
at last. So the buffers should be kept for reset(), as you explained.
>
> >
> > It would be nice too if we can split the patch into two (decode,
> > encode) if you want, but that's optional.
>
> That's good to me, thank you, Peter.
Thanks for explaining.
--
Peter Xu
Re: [Qemu-devel] [PATCH 2/8] migration: stop allocating and freeing memory frequently, Peter Xu, 2018/03/21
[Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors, guangrong . xiao, 2018/03/13