qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for mi


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost
Date: Fri, 23 Aug 2013 09:48:42 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8

Il 23/08/2013 08:25, Lei Li ha scritto:
> On 08/21/2013 06:48 PM, Paolo Bonzini wrote:
>> Il 21/08/2013 09:18, Lei Li ha scritto:
>>> Send all the ram blocks hooked by save_page, which will copy
>>> ram page and MADV_DONTNEED the page just copied.
>> You should implement this entirely in the hook.
>>
>> It will be a little less efficient because of the dirty bitmap overhead,
>> but you should aim at having *zero* changes in arch_init.c and
>> migration.c.
> 
> Yes, the reason I modify the migration_thread() to have new process that
> send all the ram pages in adjusted qemu_savevm_state_begin stage and send 
> device
> states in qemu_savevm_device_state stage for localhost migration is to avoid 
> the
> bitmap thing, which is a little less efficient just like you mentioned above.
> 
> The performance assurance is very important to this feature, our goal is
> 100ms of downtime for a 1TB guest.

Do not _start_ by introducing encapsulation violations all over the place.

Juan has been working on optimizing the dirty bitmap code.  His patches
could introduce a speedup of a factor of up to 64.  Thus it is possible
that his work will help you enough that you can work with the dirty bitmap.

Also, this feature (not looking at the dirty bitmap if the machine is
stopped) is not limited to localhost migration, add it later once the
basic vmsplice plumbing is in place.  This will also let you profile the
code and understand whether the goal is attainable.

I honestly doubt that 100ms of downtime is possible while the machine is
stopped.  A 1TB guest has 2^28 = 268*10^6 pages, which you want to
process in 100*10^6 nanoseconds.  Thus, your approach would require 0.4
nanoseconds per page, or roughly 2 clock cycles per page.  This is
impossible without _massive_ parallelization at all levels, starting
from the kernel.

As a matter of fact, 2^28 madvise system calls will take much, much
longer than 100ms.

Have you thought of using shared memory (with -mempath) instead of vmsplice?

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]