qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration


From: Takuya Yoshikawa
Subject: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
Date: Wed, 04 Jan 2012 10:30:09 +0900
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.2.25) Gecko/20111213 Thunderbird/3.1.17

(2012/01/01 18:52), Dor Laor wrote:
But we really need to think hard about whether this is the right thing
to take into the tree. I worry a lot about the fact that we don't test
pre-copy migration nearly enough and adding a second form just
introduces more things to test.

It is an issue but it can't be a merge criteria, Isaku is not blame of pre copy 
live migration lack of testing.

I would say that 90% of issues of live migration problems are not related to 
the pre|post stage but more of issues of device model save state. So post-copy 
shouldn't add a significant regression here.

Though they may be only 10% the remaining issues tend to be hard to find.


Probably it will be good to ask every migration patch writer to write an 
additional unit test for migration.

It's also not clear to me why post-copy is better. If you were going to
sit down and explain to someone building a management tool when they
should use pre-copy and when they should use post-copy, what would you
tell them?

Today, we have a default of max-downtime of 100ms.
If either the guest work set size or the host networking throughput can't match 
the downtime, migration won't end.
The mgmt user options are:
- increase the downtime more and more to an actual stop
- fail migrate

W/ post-copy there is another option.
Performance measurements will teach us (probably prior to commit) when this 
stage is valuable. Most likely, we better try first with pre-copy and if we 
can't meet the downtime we can optionally use post-copy.

It is difficult to recommend mixing two methods which have different 
requirements
to users:

        post-copy cannot be canceled and, probably, needs some 
dedicated/reliable
        lines to make it sure that guests will not be broken during copy stage.

What we want to know, from user's point of view, is clear/simple criteria:

        what is needed for post-copy
        for what services we should select post-copy


        Takuya


Here's a paper by Umesh (the migration thread writer):
http://osnet.cs.binghamton.edu/publications/hines09postcopy_osr.pdf

Regards,
Dor





reply via email to

[Prev in Thread] Current Thread [Next in Thread]