[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shar
From: |
Lai Jiangshan |
Subject: |
Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory |
Date: |
Thu, 12 Apr 2018 10:34:27 +0800 |
On Tue, Apr 10, 2018 at 1:30 AM, Dr. David Alan Gilbert
<address@hidden> wrote:
> Hi,
>
> * Lai Jiangshan (address@hidden) wrote:
>> 1) What's this
>>
>> When the migration capability 'bypass-shared-memory'
>> is set, the shared memory will be bypassed when migration.
>>
>> It is the key feature to enable several excellent features for
>> the qemu, such as qemu-local-migration, qemu-live-update,
>> extremely-fast-save-restore, vm-template, vm-fast-live-clone,
>> yet-another-post-copy-migration, etc..
>>
>> The philosophy behind this key feature, including the resulting
>> advanced key features, is that a part of the memory management
>> is separated out from the qemu, and let the other toolkits
>> such as libvirt, kata-containers (https://github.com/kata-containers)
>> runv(https://github.com/hyperhq/runv/) or some multiple cooperative
>> qemu commands directly access to it, manage it, provide features on it.
>>
>> 2) Status in real world
>>
>> The hyperhq(http://hyper.sh http://hypercontainer.io/)
>> introduced the feature vm-template(vm-fast-live-clone)
>> to the hyper container for several years, it works perfect.
>> (see https://github.com/hyperhq/runv/pull/297).
>>
>> The feature vm-template makes the containers(VMs) can
>> be started in 130ms and save 80M memory for every
>> container(VM). So that the hyper containers are fast
>> and high-density as normal containers.
>>
>> kata-containers project (https://github.com/kata-containers)
>> which was launched by hyper, intel and friends and which descended
>> from runv (and clear-container) should have this feature enabled.
>> Unfortunately, due to the code confliction between runv&cc,
>> this feature was temporary disabled and it is being brought
>> back by hyper and intel team.
>>
>> 3) How to use and bring up advanced features.
>>
>> In current qemu command line, shared memory has
>> to be configured via memory-object.
>>
>> a) feature: qemu-local-migration, qemu-live-update
>> Set the mem-path on the tmpfs and set share=on for it when
>> start the vm. example:
>> -object \
>> memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \
>> -numa node,nodeid=0,cpus=0-7,memdev=mem
>>
>> when you want to migrate the vm locally (after fixed a security bug
>> of the qemu-binary, or other reason), you can start a new qemu with
>> the same command line and -incoming, then you can migrate the
>> vm from the old qemu to the new qemu with the migration capability
>> 'bypass-shared-memory' set. The migration will migrate the device-state
>> *ONLY*, the memory is the origin memory backed by tmpfs file.
>>
>> b) feature: extremely-fast-save-restore
>> the same above, but the mem-path is on the persistent file system.
>>
>> c) feature: vm-template, vm-fast-live-clone
>> the template vm is started as 1), and paused when the guest reaches
>> the template point(example: the guest app is ready), then the template
>> vm is saved. (the qemu process of the template can be killed now, because
>> we need only the memory and the device state files (in tmpfs)).
>>
>> Then we can launch one or multiple VMs base on the template vm states,
>> the new VMs are started without the “share=on”, all the new VMs share
>> the initial memory from the memory file, they save a lot of memory.
>> all the new VMs start from the template point, the guest app can go to
>> work quickly.
>
> How do you handle the storage in this case, or giving each VM it's own
> MAC address?
The user or the upper layer tools can copy/clone the storage
(on xfs,btrfs,ceph...). The user or the upper layer tools can handle the
interface MAC itself while this patch just focus on memory.
hyper/runv clone the vm before the interfaces are inserted.
vm-template are often used along with hotplugging.
>
>> The new VM booted from template vm can’t become template again,
>> if you need this unusual chained-template feature, you can write
>> a cloneable-tmpfs kernel module for it.
>>
>> The libvirt toolkit can’t manage vm-template currently, in the
>> hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
>> “libvrit managed template” feature to libvirt.
>
>> d) feature: yet-another-post-copy-migration
>> It is a possible feature, no toolkit can do it well now.
>> Using nbd server/client on the memory file is reluctantly Ok but
>> inconvenient. A special feature for tmpfs might be needed to
>> fully complete this feature.
>> No one need yet another post copy migration method,
>> but it is possible when some crazy man need it.
>
> As the crazy person who did the existing postcopy; one is enough!
>
Very true. This part of comments just shows how much
potentials there are for such a simple migration capability.
> Some minor fix requests below, but this looks nice and simple.
>
Will do soon. Thank for your review.
> Shared memory is interesting because tehre are lots of different uses;
> e.g. your uses, but also vhost-user which is sharing for a completely
> different reason.
>
>> Cc: Samuel Ortiz <address@hidden>
>> Cc: Sebastien Boeuf <address@hidden>
>> Cc: James O. D. Hunt <address@hidden>
>> Cc: Xu Wang <address@hidden>
>> Cc: Peng Tao <address@hidden>
>> Cc: Xiao Guangrong <address@hidden>
>> Cc: Xiao Guangrong <address@hidden>
>> Signed-off-by: Lai Jiangshan <address@hidden>
>> ---
>>
>> Changes in V4:
>> fixes checkpatch.pl errors
>>
>> Changes in V3:
>> rebased on upstream master
>> update the available version of the capability to
>> v2.13
>>
>> Changes in V2:
>> rebased on 2.11.1
>>
>> migration/migration.c | 14 ++++++++++++++
>> migration/migration.h | 1 +
>> migration/ram.c | 27 ++++++++++++++++++---------
>> qapi/migration.json | 6 +++++-
>> 4 files changed, 38 insertions(+), 10 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 52a5092add..6a63102d7f 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1509,6 +1509,20 @@ bool migrate_release_ram(void)
>> return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM];
>> }
>>
>> +bool migrate_bypass_shared_memory(void)
>> +{
>> + MigrationState *s;
>> +
>> + /* it is not workable with postcopy yet. */
>> + if (migrate_postcopy_ram()) {
>> + return false;
>> + }
>
> Please change this to work in the same way as the check for
> postcopy+compress in migration.c migrate_caps_check.
>
>> + s = migrate_get_current();
>> +
>> + return
>> s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
>> +}
>> +
>> bool migrate_postcopy_ram(void)
>> {
>> MigrationState *s;
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 8d2f320c48..cfd2513ef0 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void);
>>
>> bool migrate_postcopy(void);
>>
>> +bool migrate_bypass_shared_memory(void);
>> bool migrate_release_ram(void);
>> bool migrate_postcopy_ram(void);
>> bool migrate_zero_blocks(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 0e90efa092..bca170c386 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs,
>> RAMBlock *rb,
>> unsigned long *bitmap = rb->bmap;
>> unsigned long next;
>>
>> + /* when this ramblock is requested bypassing */
>> + if (!bitmap) {
>> + return size;
>> + }
>> +
>> if (rs->ram_bulk_stage && start > 0) {
>> next = start + 1;
>> } else {
>> @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs)
>> qemu_mutex_lock(&rs->bitmap_mutex);
>> rcu_read_lock();
>> RAMBLOCK_FOREACH(block) {
>> - migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
>> + migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> + }
>> }
>> rcu_read_unlock();
>> qemu_mutex_unlock(&rs->bitmap_mutex);
>> @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp)
>> qemu_mutex_init(&(*rsp)->src_page_req_mutex);
>> QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
>>
>> - /*
>> - * Count the total number of pages used by ram blocks not including any
>> - * gaps due to alignment or unplugs.
>> - */
>> - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
>> -
>> ram_state_reset(*rsp);
>>
>> return 0;
>> }
>>
>> -static void ram_list_init_bitmaps(void)
>> +static void ram_list_init_bitmaps(RAMState *rs)
>> {
>> RAMBlock *block;
>> unsigned long pages;
>> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void)
>> /* Skip setting bitmap if there is no RAM */
>> if (ram_bytes_total()) {
>
> I think you need to add here a :
> rs->migration_dirty_pages = 0;
>
> I don't see anywhere else that initialises it, and there is the case of
> a migration that fails, followed by a 2nd attempt.
>
>> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>> + if (migrate_bypass_shared_memory() &&
>> qemu_ram_is_shared(block)) {
>> + continue;
>> + }
>> pages = block->max_length >> TARGET_PAGE_BITS;
>> block->bmap = bitmap_new(pages);
>> bitmap_set(block->bmap, 0, pages);
>> + /*
>> + * Count the total number of pages used by ram blocks not
>> + * including any gaps due to alignment or unplugs.
>> + */
>> + rs->migration_dirty_pages += pages;
>> if (migrate_postcopy_ram()) {
>> block->unsentmap = bitmap_new(pages);
>> bitmap_set(block->unsentmap, 0, pages);
>> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs)
>> qemu_mutex_lock_ramlist();
>> rcu_read_lock();
>>
>> - ram_list_init_bitmaps();
>> + ram_list_init_bitmaps(rs);
>> memory_global_dirty_log_start();
>> migration_bitmap_sync(rs);
>>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 9d0bf82cf4..45326480bd 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -357,13 +357,17 @@
>> # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
>> # (since 2.12)
>> #
>> +# @bypass-shared-memory: the shared memory region will be bypassed on
>> migration.
>> +# This feature allows the memory region to be reused by new qemu(s)
>> +# or be migrated separately. (since 2.13)
>> +#
>> # Since: 1.2
>> ##
>> { 'enum': 'MigrationCapability',
>> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>> 'block', 'return-path', 'pause-before-switchover', 'x-multifd',
>> - 'dirty-bitmaps' ] }
>> + 'dirty-bitmaps', 'bypass-shared-memory' ] }
>>
>> ##
>> # @MigrationCapabilityStatus:
>> --
>> 2.14.3 (Apple Git-98)
>>
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] [PATCH V3] migration: add capability to bypass the shared memory, Lai Jiangshan, 2018/04/01
- Re: [Qemu-devel] [PATCH V3] migration: add capability to bypass the shared memory, no-reply, 2018/04/01
- Re: [Qemu-devel] [PATCH V3] migration: add capability to bypass the shared memory, no-reply, 2018/04/01
- [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory, Lai Jiangshan, 2018/04/04
- Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory, Xiao Guangrong, 2018/04/04
- Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory, Dr. David Alan Gilbert, 2018/04/09
- Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory,
Lai Jiangshan <=
- [Qemu-devel] [PATCH V5] migration: add capability to bypass the shared memory, Lai Jiangshan, 2018/04/16
- Re: [Qemu-devel] [PATCH V5] migration: add capability to bypass the shared memory, Dr. David Alan Gilbert, 2018/04/19
- Re: [Qemu-devel] [PATCH V5] migration: add capability to bypass the shared memory, Lai Jiangshan, 2018/04/25
- Re: [Qemu-devel] [PATCH V5] migration: add capability to bypass the shared memory, Dr. David Alan Gilbert, 2018/04/26
- Re: [Qemu-devel] [PATCH V5] migration: add capability to bypass the shared memory, Cédric Le Goater, 2018/04/27
- Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory, Lai Jiangshan, 2018/04/16
- Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory, Dr. David Alan Gilbert, 2018/04/19