qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence ru


From: Uri Lublin
Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
Date: Wed, 20 May 2009 20:17:39 +0300
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Lightning/1.0pre Thunderbird/3.0b2

On 05/19/2009 09:15 PM, Anthony Liguori wrote:
Dor Laor wrote:
The problem is that if migration is not progressing since the guest is
dirtying pages
faster than the migration protocol can send, than we just waist time
and cpu.
The minimum is to notify the monitor interface in order to let mgmt
daemon to trap it.
We can easily see this issue while running iperf in the guest or any
other high load/dirty
pages scenario.

The problem is, what's the metric for determining the guest isn't
progressing? A raw iteration count is not a valid metric. It may be
expected that the migration take 50 iterations.

We've defined "no-progress" as a memory transfer iteration where the number of pages that got dirty is larger than the number of pages transferred. For such iterations we have more data to transfer when the iteration completes. Note that we did not limit the number of iterations (yet), we want to limit the number of no-progress iterations. Migrations with many such iterations just waste resources (cpu, network, etc).


The management tool knows the guest isn't progressing when it decides
that a guest isn't progressing :-)

Currently the management tool only knows the migration is still active.


We can also make it configurable using the monitor migrate command.
For example:
migrate -d -no_progress -threshold=x tcp:....

Theshold is really a bad metric to use. You have no idea how much data
has been passed in each iteration. If you only needed one more
iteration, then stopping the migration short was a really bad idea.

You can never know there is only one more iteration needed, no matter what metric you use.
Again this threshold limits the number of no-progress iterations.

We can extend this rule (or add another flag/command) to enlarge the bandwidth limitation upon a no-progress iteration.


The only thing that this does is give a false sense of security.
Management tools have to deal with forcing migration convergence based
on policies. If a management tool isn't doing this today, it's broken IMHO.

I agree migration convergence rules should be based on policies.

What Dor is suggesting is that the management tool do that by passing parameters to the migrate command (or using other migrate_X monitor commands).

I'm not sure management tools can have good such policies today. The only information they have is how much time passed since the migration started.
The only actions they can take is stop the guest or cancel the migration.


Basically, threshold introduces a regression. If you run iperf and
migrate a guest with a very large memory size, after migration, you'll
get soft lockups because the guest hasn't been running for 10 seconds.
This is bad.

Just keep resending pages that are constantly changing is bad too, probably 
worse.


Regards,
    Uri.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]