[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache |
Date: |
Fri, 24 Apr 2015 10:26:21 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Thu, Apr 23, 2015 at 01:50:28PM +0200, Alberto Garcia wrote:
> On Thu 23 Apr 2015 12:15:04 PM CEST, Stefan Hajnoczi wrote:
>
> >> For a cache size of 128MB, the PSS is actually ~10MB larger without
> >> the patch, which seems to come from posix_memalign().
> >
> > Do you mean RSS or are you using a tool that reports a "PSS" number
> > that I don't know about?
> >
> > We should understand what is going on instead of moving the code
> > around to hide/delay the problem.
>
> Both RSS and PSS ("proportional set size", also reported by the kernel).
>
> I'm not an expert in memory allocators, but I measured the overhead like
> this:
>
> An L2 cache of 128MB implies a refcount cache of 32MB, in total 160MB.
> With a default cluster size of 64k, that's 2560 cache entries.
>
> So I wrote a test case that allocates 2560 blocks of 64k each using
> posix_memalign and mmap, and here's how their /proc/<pid>/smaps compare:
>
> -Size: 165184 kB
> -Rss: 10244 kB
> -Pss: 10244 kB
> +Size: 161856 kB
> +Rss: 0 kB
> +Pss: 0 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 0 kB
> Private_Clean: 0 kB
> -Private_Dirty: 10244 kB
> -Referenced: 10244 kB
> -Anonymous: 10244 kB
> +Private_Dirty: 0 kB
> +Referenced: 0 kB
> +Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap: 0 kB
> KernelPageSize: 4 kB
>
> Those are the 10MB I saw. For the record I also tried with malloc() and
> the results are similar to those of posix_memalign().
The posix_memalign() call wastes memory. I compared:
posix_memalign(&memptr, 65536, 2560 * 65536);
memset(memptr, 0, 2560 * 65536);
with:
for (i = 0; i < 2560; i++) {
posix_memalign(&memptr, 65536, 65536);
memset(memptr, 0, 65536);
}
Here are the results:
-Size: 163920 kB
-Rss: 163860 kB
-Pss: 163860 kB
+Size: 337800 kB
+Rss: 183620 kB
+Pss: 183620 kB
Note the memset simulates a fully occupied cache.
The 19 MB RSS difference between the two seems wasteful. The large
"Size" difference hints that the mmap pattern is very different when
posix_memalign() is called multiple times.
We could avoid the 19 MB overhead by switching to a single allocation.
What's more is that dropping the memset() to simulate no cache entry
usage (like your example) gives us a grand total of 20 kB RSS. There is
no point in delaying allocations if we do a single big upfront
allocation.
I'd prefer a patch that replaces the small allocations with a single big
one. That's a win in both empty and full cache cases.
Stefan
pgpvTe8gD5TsD.pgp
Description: PGP signature
- [Qemu-devel] [PATCH] qcow2: do lazy allocation of the L2 cache, Alberto Garcia, 2015/04/21
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Stefan Hajnoczi, 2015/04/22
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Alberto Garcia, 2015/04/22
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Stefan Hajnoczi, 2015/04/23
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Alberto Garcia, 2015/04/23
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache,
Stefan Hajnoczi <=
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Kevin Wolf, 2015/04/24
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Kevin Wolf, 2015/04/24
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Alberto Garcia, 2015/04/24
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Stefan Hajnoczi, 2015/04/24
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Alberto Garcia, 2015/04/24
- Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache, Kevin Wolf, 2015/04/24