qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence ru


From: Uri Lublin
Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
Date: Wed, 20 May 2009 20:34:28 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Lightning/1.0pre Thunderbird/3.0b2

On 05/20/2009 08:28 PM, Blue Swirl wrote:
On 5/20/09, Uri Lublin<address@hidden>  wrote:
On 05/19/2009 09:17 PM, Anthony Liguori wrote:

Glauber Costa wrote:

On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote:


We can also make it configurable using the monitor migrate command.
For example:
migrate -d -no_progress -threshold=x tcp:....

it can be done, but it fits better as a different monitor command

anthony, do you have any strong opinions here, or is this scheme
acceptable?

Threshold is a bad metric. There's no way to choose a right number. If
we were going to have a means to support metrics-based forced
convergence (and I really think this belongs in libvirt) I'd rather see
something based on bandwidth or wall clock time.

Let me put it this way, why 50? What were the guidelines for choosing
that number and how would you explain what number a user should choose?

  I've changed the threshold of the first convergence rule, to 50 from 10.
Why 10 ? For this rule the threshold (number of dirty pages) and the number
of bytes to transfer are equivalent.

  50 pages is about 200K, which can be still sent quickly.
  I've added debug messages and noticed we never hit a number smaller than 10
(excluding 0). The truth is there were very little number of runs with less
than 50 dirty pages too. I don't mind leaving it at 10 (should be
configurable too).

  For the second migration convergence rule I've set the limit to 10, as it
seems much larger than what I've needed (all the runs I've made a number of
2-4 no-progress iterations was good enough, as it seems to have a repetitive
behavior later), but I've enlarged it "just in case". No real research work
was done here.

  Note that a no-progress iteration depends on both network bandwidth and
guest actions.

Instead of freezing the guest or aborting the migration, the guest
could be throttled a bit by giving it less CPU time relative to
migration, or by incurring a small delay for each page dirtying write
access. Maybe this method would find the balance faster.

Enlarging the bandwidth limitation (while migration is active) is one way of doing that (give more cpu to migration code and less to guest). I think currently it's not possible though.
I don't think I want to "play" with delaying write accesses.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]