[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration |
Date: |
Sun, 1 Jan 2012 16:27:56 +0000 |
On Sun, Jan 1, 2012 at 9:43 AM, Orit Wasserman <address@hidden> wrote:
> On 12/30/2011 12:39 AM, Anthony Liguori wrote:
>> On 12/28/2011 07:25 PM, Isaku Yamahata wrote:
>>> Intro
>>> =====
>>> This patch series implements postcopy live migration.[1]
>>> As discussed at KVM forum 2011, dedicated character device is used for
>>> distributed shared memory between migration source and destination.
>>> Now we can discuss/benchmark/compare with precopy. I believe there are
>>> much rooms for improvement.
>>>
>>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>>
>>>
>>> Usage
>>> =====
>>> You need load umem character device on the host before starting migration.
>>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>>> on only linux umem character device. But the driver dependent code is split
>>> into a file.
>>> I tested only host page size == guest page size case, but the implementation
>>> allows host page size != guest page size case.
>>>
>>> The following options are added with this patch series.
>>> - incoming part
>>> command line options
>>> -postcopy [-postcopy-flags<flags>]
>>> where flags is for changing behavior for benchmark/debugging
>>> Currently the following flags are available
>>> 0: default
>>> 1: enable touching page request
>>>
>>> example:
>>> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>>
>>> - outging part
>>> options for migrate command
>>> migrate [-p [-n]] URI
>>> -p: indicate postcopy migration
>>> -n: disable background transferring pages: This is for
>>> benchmark/debugging
>>>
>>> example:
>>> migrate -p -n tcp:<dest ip address>:4444
>>>
>>>
>>> TODO
>>> ====
>>> - benchmark/evaluation. Especially how async page fault affects the result.
>>
>> I'll review this series next week (Mike/Juan, please also review when you
>> can).
>>
>> But we really need to think hard about whether this is the right thing to
>> take into the tree. I worry a lot about the fact that we don't test
>> pre-copy migration nearly enough and adding a second form just introduces
>> more things to test.
>>
>> It's also not clear to me why post-copy is better. If you were going to sit
>> down and explain to someone building a management tool when they should use
>> pre-copy and when they should use post-copy, what would you tell them?
>
> Start with pre-copy , if it doesn't converge switch to post-copy
Post-copy throttles the guest when page faults are encountered because
the destination machine waits for memory pages from the source
machine. Is there a reason this page fault-based throttling cannot be
done on the source machine with pre-copy migration? I'm not sure
post-copy provides new behavior in terms of convergence, we could do
the same with pre-copy migration.
Post-copy has other advantages though, it immediately frees logical
CPUs on the source machine (though RAM and network bandwidth is still
required until migration completes).
Stefan