qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [Qemu-commits] [COMMIT 3086844] Instead of writing


From: Avi Kivity
Subject: Re: [Qemu-devel] Re: [Qemu-commits] [COMMIT 3086844] Instead of writing a zero page, madvise it away
Date: Mon, 22 Jun 2009 19:38:40 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Lightning/1.0pre Thunderbird/3.0b2

On 06/22/2009 07:25 PM, Anthony Liguori wrote:
Avi Kivity wrote:
On 06/22/2009 06:51 PM, Anthony Liguori wrote:
From: Anthony Liguori<address@hidden>

Otherwise, after migration, we end up with a much larger RSS size then we
ought to have.


We have the same issue on the migration source node. I don't see a simple way to solve it, though.

I don't follow.  In this case, the issue is:

1) Start a guest with 1024, balloon down to 128MB. RSS size is now ~128MB
2) Live migrate to a different node
3) RSS on different node jumps to ~1GB

3.5) RSS on source node jumps to ~1GB, since reading the page instantiates the pte

4) Weep at all your lost memory

4.5) And at the swapping going on in the source node


Xen had a similar issue. This ends up biting people who overcommit their VMs via ballooning, live migration, and badness ensues. At least for us, the error is swapping but madvise also avoids the issue by never consuming that memory to begin with.

Right. I'd love to do madvise() on the source node as well if we fault in a page and find out it's zero, but the guest (and aio) is still running and we might drop live data. We need a madvise(MADV_DONTNEED_IFZERO), or a mincore() flag that tells us if the page exists (vs. swapped). ksm would also do this, but it is overkill for some applications.

Note that the patch contains a small bug -- the kernel is allowed to ignore the advise according to the manual page, so it's better to memset() the memory before dropping it.

--
error compiling committee.c: too many arguments to function





reply via email to

[Prev in Thread] Current Thread [Next in Thread]