qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration


From: Dor Laor
Subject: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
Date: Sun, 01 Jan 2012 11:52:27 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20111222 Thunderbird/9.0

On 12/30/2011 12:39 AM, Anthony Liguori wrote:
On 12/28/2011 07:25 PM, Isaku Yamahata wrote:
Intro
=====
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=====
You need load umem character device on the host before starting
migration.
Postcopy can be used for tcg and kvm accelarator. The implementation
depend
on only linux umem character device. But the driver dependent code is
split
into a file.
I tested only host page size == guest page size case, but the
implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
command line options
-postcopy [-postcopy-flags<flags>]
where flags is for changing behavior for benchmark/debugging
Currently the following flags are available
0: default
1: enable touching page request

example:
qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm

- outging part
options for migrate command
migrate [-p [-n]] URI
-p: indicate postcopy migration
-n: disable background transferring pages: This is for
benchmark/debugging

example:
migrate -p -n tcp:<dest ip address>:4444


TODO
====
- benchmark/evaluation. Especially how async page fault affects the
result.

I'll review this series next week (Mike/Juan, please also review when
you can).

But we really need to think hard about whether this is the right thing
to take into the tree. I worry a lot about the fact that we don't test
pre-copy migration nearly enough and adding a second form just
introduces more things to test.

It is an issue but it can't be a merge criteria, Isaku is not blame of pre copy live migration lack of testing.

I would say that 90% of issues of live migration problems are not related to the pre|post stage but more of issues of device model save state. So post-copy shouldn't add a significant regression here.

Probably it will be good to ask every migration patch writer to write an additional unit test for migration.

It's also not clear to me why post-copy is better. If you were going to
sit down and explain to someone building a management tool when they
should use pre-copy and when they should use post-copy, what would you
tell them?

Today, we have a default of max-downtime of 100ms.
If either the guest work set size or the host networking throughput can't match the downtime, migration won't end.
The mgmt user options are:
 - increase the downtime more and more to an actual stop
 - fail migrate

W/ post-copy there is another option.
Performance measurements will teach us (probably prior to commit) when this stage is valuable. Most likely, we better try first with pre-copy and if we can't meet the downtime we can optionally use post-copy.

Here's a paper by Umesh (the migration thread writer):
http://osnet.cs.binghamton.edu/publications/hines09postcopy_osr.pdf

Regards,
Dor


Regards,

Anthony Liguori





reply via email to

[Prev in Thread] Current Thread [Next in Thread]