qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB


From: Zhong, Yang
Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB
Date: Thu, 16 Mar 2017 02:47:37 +0000

Hello Paolo,

As for your said tcmalloc or jemalloc for optimization memory allocation, we 
also tried the malloc_trim() in glibc, which can get the same memory 
optimization.

diff --git a/util/rcu.c b/util/rcu.c
index 9adc5e4..3643e43 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -32,7 +32,7 @@
#include "qemu/atomic.h"
#include "qemu/thread.h"
#include "qemu/main-loop.h"
-
+#include "malloc.h"
/*
  * Global grace period counter.  Bit 0 is always one in rcu_gp_ctr.
  * Bits 1 and above are defined in synchronize_rcu.
@@ -271,6 +271,7 @@ static void *call_rcu_thread(void *opaque)
             n--;
             node->func(node);
         }
+        malloc_trim(0);
         qemu_mutex_unlock_iothread();
     }
     abort();

The man info in malloc_trim  is only release free memory from the top of the 
heap, in fact, we found this function also release hole memory below top of 
heap.

Thanks!

ZhongYang


-----Original Message-----
From: Paolo Bonzini [mailto:address@hidden 
Sent: Thursday, March 16, 2017 4:22 AM
To: Xu, Anthony <address@hidden>
Cc: Zhong, Yang <address@hidden>; Peng, Chao P <address@hidden>; address@hidden
Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 
12252kB to 2752KB



----- Original Message -----
> From: "Anthony Xu" <address@hidden>
> To: "Paolo Bonzini" <address@hidden>
> Cc: "Yang Zhong" <address@hidden>, "Chao P Peng" 
> <address@hidden>, address@hidden
> Sent: Wednesday, March 15, 2017 8:05:48 PM
> Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size 
> from 12252kB to 2752KB
> 
> > The first unref is done after as->current_map is overwritten.
> > as->current_map is accessed under RCU, so it needs call_rcu.  It
> > balances the initial reference that is present since flatview_init.
> 
> Got it, thanks for explanation.
> 
> > > but it is not clear to me, is this a bug or by design? Is 
> > > flatview_destroy called
> > in current thread
> > > or RCU thread?
> > 
> > You cannot know, because there are also other callers of 
> > address_space_get_flatview.  Usually it's from the RCU thread.
> 
> If it is from current thread, do we need to check if the global lock 
> is acquired?
> As you mentioned flatview_unref may call memory_region_unref.

No, you don't need to check it.  Most functions (in this case,
memory_region_transaction_commit) can only be called under the global lock.
In fact, only address_space_read/write/ld*/st*, plus memory_region_find can be 
called without the global lock.

> In the comments of memory_region_finalize
> "   /* We know the region is not visible in any address space (it
>      * does not have a container and cannot be a root either because
>      * it has no references,
> "
> 
> We know the memory region is not a root region, and the memory region 
> has already been removed from address space even it has sub regions.
> memory_region_transaction_commit should be called when the memory 
> region is removed from address space.
> memory_region_transaction_commit seems not be needed in 
> memory_region_finalize.
> Let me know if you think otherwise.

Yes, you can replace memory_region_del_subregion in memory_region_finalize with 
special code that does

    assert(!mr->enabled);
    assert(subregion->container == mr);
    subregion->container = NULL;
    QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
    memory_region_unref(subregion);

The last four lines are shared with memory_region_del_subregion, so please 
factor them in a new function, for example memory_region_del_subregion_internal.

> > > How about fall back to synchronize_rcu?
> > 
> > I'm afraid it would be too slow, but you can test.
> 
> It is not slow, the contention is not high. Yes we can test.

It's not a matter of contention.  It's a matter of how long an RCU critical 
section can be---not just at startup, but at any point during QEMU execution.

Have you tried tcmalloc or jemalloc?  They use the brk region less and 
sometimes are more aggressive in releasing mmap-ed memory areas.

> Please review below patch, MemoryRegion is protected by RCU.

Removing transaction begin/commit in memory_region_finalize makes little sense 
because memory_region_del_subregion calls those two functions again.  See above 
for an alternative.  Apart from this, the patch is at least correct, though I'm 
not sure it's a good idea (synchronize_rcu needs testing).  I'm not sure I 
understand the address_space_write change).

Paolo

> Thanks,
> Anthony
> 
> 
> 
> diff --git a/exec.c b/exec.c
> index a22f5a0..6631668 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2521,7 +2521,8 @@ static void mem_commit(MemoryListener *listener)
> 
>      atomic_rcu_set(&as->dispatch, next);
>      if (cur) {
> -        call_rcu(cur, address_space_dispatch_free, rcu);
> +        synchronize_rcu();
> +        address_space_dispatch_free(cur);
>      }
>  }
> 
> @@ -2567,7 +2568,8 @@ void address_space_destroy_dispatch(AddressSpace 
> *as)
> 
>      atomic_rcu_set(&as->dispatch, NULL);
>      if (d) {
> -        call_rcu(d, address_space_dispatch_free, rcu);
> +        synchronize_rcu();
> +        address_space_dispatch_free(d);
>      }
>  }
> 
> @@ -2712,19 +2714,22 @@ static bool prepare_mmio_access(MemoryRegion *mr)
>      return release_lock;
>  }
> 
> -/* Called within RCU critical section.  */ -static MemTxResult 
> address_space_write_continue(AddressSpace *as, hwaddr addr,
> -                                                MemTxAttrs attrs,
> -                                                const uint8_t *buf,
> -                                                int len, hwaddr addr1,
> -                                                hwaddr l, MemoryRegion *mr)
> +MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
> +                  MemTxAttrs attrs, const uint8_t *buf, int len)
>  {
>      uint8_t *ptr;
>      uint64_t val;
> +    hwaddr l=len;
> +    hwaddr addr1;
> +    MemoryRegion *mr;
>      MemTxResult result = MEMTX_OK;
>      bool release_lock = false;
> 
>      for (;;) {
> +        rcu_read_lock();
> +        mr = address_space_translate(as, addr, &addr1, &l, true);
> +        memory_region_ref(mr);
> +        rcu_read_unlock();
>          if (!memory_access_is_direct(mr, true)) {
>              release_lock |= prepare_mmio_access(mr);
>              l = memory_access_size(mr, l, addr1); @@ -2764,7 +2769,7 
> @@ static MemTxResult address_space_write_continue(AddressSpace *as, 
> hwaddr addr,
>              memcpy(ptr, buf, l);
>              invalidate_and_set_dirty(mr, addr1, l);
>          }
> -
> +        memory_region_unref(mr);
>          if (release_lock) {
>              qemu_mutex_unlock_iothread();
>              release_lock = false;
> @@ -2779,27 +2784,6 @@ static MemTxResult 
> address_space_write_continue(AddressSpace *as, hwaddr addr,
>          }
> 
>          l = len;
> -        mr = address_space_translate(as, addr, &addr1, &l, true);
> -    }
> -
> -    return result;
> -}
> -
> -MemTxResult address_space_write(AddressSpace *as, hwaddr addr, 
> MemTxAttrs attrs,
> -                                const uint8_t *buf, int len)
> -{
> -    hwaddr l;
> -    hwaddr addr1;
> -    MemoryRegion *mr;
> -    MemTxResult result = MEMTX_OK;
> -
> -    if (len > 0) {
> -        rcu_read_lock();
> -        l = len;
> -        mr = address_space_translate(as, addr, &addr1, &l, true);
> -        result = address_space_write_continue(as, addr, attrs, buf, len,
> -                                              addr1, l, mr);
> -        rcu_read_unlock();
>      }
> 
>      return result;
> diff --git a/memory.c b/memory.c
> index 64b0a60..d12437c 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1505,13 +1505,10 @@ static void memory_region_finalize(Object *obj)
>       * and cause an infinite loop.
>       */
>      mr->enabled = false;
> -    memory_region_transaction_begin();
>      while (!QTAILQ_EMPTY(&mr->subregions)) {
>          MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
>          memory_region_del_subregion(mr, subregion);
>      }
> -    memory_region_transaction_commit();
> -
>      mr->destructor(mr);
>      memory_region_clear_coalescing(mr);
>      g_free((char *)mr->name);
> 
> 
> 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]