qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] add migration capability to bypass the shared m


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH] add migration capability to bypass the shared memory
Date: Tue, 9 Aug 2016 20:12:17 +0100
User-agent: Mutt/1.6.2 (2016-07-01)

* Lai Jiangshan (address@hidden) wrote:
> When the migration capability 'bypass-shared-memory'
> is set, the shared memory will be bypassed when migration.
> 
> It is the key feature to enable several excellent features for
> the qemu, such as qemu-local-migration, qemu-live-update,
> extremely-fast-save-restore, vm-template, vm-fast-live-clone,
> yet-another-post-copy-migration, etc..
> 
> The philosophy behind this key feature and the advanced
> key features is that a part of the memory management is
> separated out from the qemu, and let the other toolkits
> such as libvirt, runv(https://github.com/hyperhq/runv/)
> or the next qemu-cmd directly access to it, manage it,
> provide features to it.
> 
> The hyperhq(http://hyper.sh  http://hypercontainer.io/)
> introduced the feature vm-template(vm-fast-live-clone)
> to the hyper container for several months, it works perfect.
> (see https://github.com/hyperhq/runv/pull/297)
> 
> The feature vm-template makes the containers(VMs) can
> be started in 130ms and save 80M memory for every
> container(VM). So that the hyper containers are fast
> and high-density as normal containers.

Very nice.

> In current qemu command line, shared memory has
> to be configured via memory-object. Anyone can add a
> -mem-path-share to the qemu command line for combining
> with -mem-path for this feature. This patch doesn’t include
> this change of -mem-path-share.
> 
> Advanced features:
> 1) qemu-local-migration, qemu-live-update
> Set the mem-path on the tmpfs and set share=on for it when
> start the vm. example:
> -object \
> memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \
> -numa node,nodeid=0,cpus=0-7,memdev=mem
> 
> when you want to migrate the vm locally (after fixed a security bug
> of the qemu-binary, or other reason), you can start a new qemu with
> the same command line and -incoming, then you can migrate the
> vm from the old qemu to the new qemu with the migration capability
> 'bypass-shared-memory' set. The migration will migrate the device-state
> *ONLY*, the memory is the origin memory backed by tmpfs file.

Interesting; I was wondering about using the xen-save-devices and
xen-load-devices commands to do the same trick; but you're allowing
it to be specified on individual RAM blocks which is probably a good
thing.
(I'm not sure what happens to things like vram).

> 2) extremely-fast-save-restore
> the same above, but the mem-path is on the persistent file system.
> 
> 3)  vm-template, vm-fast-live-clone
> the template vm is started as 1), and paused when the guest reaches
> the template point(example: the guest app is ready), then the template
> vm is saved. (the qemu process of the template can be killed now, because
> we need only the memory and the device state files (in tmpfs)).
> 
> Then we can launch one or multiple VMs base on the template vm states,
> the new VMs are started without the “share=on”, all the new VMs share
> the initial memory from the memory file, they save a lot of memory.
> all the new VMs start from the template point, the guest app can go to
> work quickly.
> 
> The new VM booted from template vm can’t become template
> again, if you need this special feature, you can write a cloneable-tmpfs
> kernel module for it.

You could probably use the new write detection mode in userfaultfd to do
that.

> The libvirt toolkit can’t manage vm-template currently, in the
> hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
> “libvrit managed template” feature to libvirt.
> 
> 4) yet-another-post-copy-migration
> It is a possible feature, no toolkit can do it well now.
> Using nbd server/client on the memory file is reluctantly Ok but
> inconvenient. A special feature for tmpfs might be needed to
> fully complete this feature.
> No one need yet another post copy migration method,
> but it is possible when some crazy man need it.

Well as the crazy man who wrote the current postcopy code, what would
you need it to do that the current one can't?

I can't see anything noticeably wrong with your code; but I see the bot
gave you some style warnings.

Dave

> Signed-off-by: Lai Jiangshan <address@hidden>
> ---
>  exec.c                        |  5 +++++
>  include/exec/cpu-common.h     |  1 +
>  include/migration/migration.h |  1 +
>  migration/migration.c         |  9 +++++++++
>  migration/ram.c               | 37 ++++++++++++++++++++++++++++---------
>  qapi-schema.json              |  6 +++++-
>  qmp-commands.hx               |  3 +++
>  7 files changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8ffde75..888919a 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1402,6 +1402,11 @@ static void qemu_ram_setup_dump(void *addr, ram_addr_t 
> size)
>      }
>  }
>  
> +bool qemu_ram_is_shared(RAMBlock *rb)
> +{
> +    return rb->flags & RAM_SHARED;
> +}
> +
>  const char *qemu_ram_get_idstr(RAMBlock *rb)
>  {
>      return rb->idstr;
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 952bcfe..7c18db9 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -58,6 +58,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool 
> round_offset,
>  void qemu_ram_set_idstr(RAMBlock *block, const char *name, DeviceState *dev);
>  void qemu_ram_unset_idstr(RAMBlock *block);
>  const char *qemu_ram_get_idstr(RAMBlock *rb);
> +bool qemu_ram_is_shared(RAMBlock *rb);
>  
>  void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
>                              int len, int is_write);
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 3c96623..080b6b2 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -290,6 +290,7 @@ void migrate_add_blocker(Error *reason);
>   */
>  void migrate_del_blocker(Error *reason);
>  
> +bool migrate_bypass_shared_memory(void);
>  bool migrate_postcopy_ram(void);
>  bool migrate_zero_blocks(void);
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index 955d5ee..c87d136 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1189,6 +1189,15 @@ void qmp_migrate_set_downtime(double value, Error 
> **errp)
>      max_downtime = (uint64_t)value;
>  }
>  
> +bool migrate_bypass_shared_memory(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return 
> s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
> +}
> +
>  bool migrate_postcopy_ram(void)
>  {
>      MigrationState *s;
> diff --git a/migration/ram.c b/migration/ram.c
> index 815bc0e..880972d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -605,6 +605,28 @@ static void migration_bitmap_sync_init(void)
>      num_dirty_pages_period = 0;
>      xbzrle_cache_miss_prev = 0;
>      iterations_prev = 0;
> +    migration_dirty_pages = 0;
> +}
> +
> +static void migration_bitmap_init(unsigned long *bitmap)
> +{
> +    RAMBlock *block;
> +
> +    bitmap_clear(bitmap, 0, last_ram_offset() >> TARGET_PAGE_BITS);
> +    rcu_read_lock();
> +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> +        if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
> +            bitmap_set(bitmap, block->offset >> TARGET_PAGE_BITS,
> +                       block->used_length >> TARGET_PAGE_BITS);
> +
> +            /*
> +             * Count the total number of pages used by ram blocks not 
> including
> +             * any gaps due to alignment or unplugs.
> +             */
> +         migration_dirty_pages += block->used_length >> TARGET_PAGE_BITS;
> +     }
> +    }
> +    rcu_read_unlock();
>  }
>  
>  static void migration_bitmap_sync(void)
> @@ -631,7 +653,9 @@ static void migration_bitmap_sync(void)
>      qemu_mutex_lock(&migration_bitmap_mutex);
>      rcu_read_lock();
>      QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> -        migration_bitmap_sync_range(block->offset, block->used_length);
> +        if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
> +            migration_bitmap_sync_range(block->offset, block->used_length);
> +        }
>      }
>      rcu_read_unlock();
>      qemu_mutex_unlock(&migration_bitmap_mutex);
> @@ -1926,19 +1950,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>      migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
>      migration_bitmap_rcu->bmap = bitmap_new(ram_bitmap_pages);
> -    bitmap_set(migration_bitmap_rcu->bmap, 0, ram_bitmap_pages);
> +    migration_bitmap_init(migration_bitmap_rcu->bmap);
>  
>      if (migrate_postcopy_ram()) {
>          migration_bitmap_rcu->unsentmap = bitmap_new(ram_bitmap_pages);
> -        bitmap_set(migration_bitmap_rcu->unsentmap, 0, ram_bitmap_pages);
> +        bitmap_copy(migration_bitmap_rcu->unsentmap,
> +                 migration_bitmap_rcu->bmap, ram_bitmap_pages);
>      }
>  
> -    /*
> -     * Count the total number of pages used by ram blocks not including any
> -     * gaps due to alignment or unplugs.
> -     */
> -    migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
> -
>      memory_global_dirty_log_start();
>      migration_bitmap_sync();
>      qemu_mutex_unlock_ramlist();
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5658723..453e6d9 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -553,11 +553,15 @@
>  #          been migrated, pulling the remaining pages along as needed. NOTE: 
> If
>  #          the migration fails during postcopy the VM will fail.  (since 2.6)
>  #
> +# @bypass-shared-memory: the shared memory region will be bypassed on 
> migration.
> +#          This feature allows the memory region to be reused by new qemu(s)
> +#          or be migrated separately. (since 2.8)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> -           'compress', 'events', 'postcopy-ram'] }
> +           'compress', 'events', 'postcopy-ram', 'bypass-shared-memory'] }
>  
>  ##
>  # @MigrationCapabilityStatus
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index c8d360a..c31152c 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3723,6 +3723,7 @@ Enable/Disable migration capabilities
>  - "compress": use multiple compression threads to accelerate live migration
>  - "events": generate events for each migration state change
>  - "postcopy-ram": postcopy mode for live migration
> +- "bypass-shared-memory": bypass shared memory region
>  
>  Arguments:
>  
> @@ -3753,6 +3754,7 @@ Query current migration capabilities
>           - "compress": Multiple compression threads state (json-bool)
>           - "events": Migration state change event state (json-bool)
>           - "postcopy-ram": postcopy ram state (json-bool)
> +         - "bypass-shared-memory": bypass shared memory state (json-bool)
>  
>  Arguments:
>  
> @@ -3767,6 +3769,7 @@ Example:
>       {"state": false, "capability": "compress"},
>       {"state": true, "capability": "events"},
>       {"state": false, "capability": "postcopy-ram"}
> +     {"state": false, "capability": "bypass-shared-memory"}
>     ]}
>  
>  EQMP
> -- 
> 2.7.4 (Apple Git-66)
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]