[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH V4 00/19] Live update: cpr-transfer
From: |
Steve Sistare |
Subject: |
[PATCH V4 00/19] Live update: cpr-transfer |
Date: |
Mon, 2 Dec 2024 05:19:52 -0800 |
What?
This patch series adds the live migration cpr-transfer mode, which
allows the user to transfer a guest to a new QEMU instance on the same
host with minimal guest pause time, by preserving guest RAM in place,
albeit with new virtual addresses in new QEMU, and by preserving device
file descriptors.
The new user-visible interfaces are:
* cpr-transfer (MigMode migration parameter)
* cpr (MigrationChannelType)
* incoming MigrationChannel (command-line argument)
* aux-ram-share (machine option)
The user sets the mode parameter before invoking the migrate command.
In this mode, the user starts new QEMU on the same host as old QEMU, with
the same arguments as old QEMU, plus two -incoming options; one for the main
channel, and one for the CPR channel. The user issues the migrate command to
old QEMU, which stops the VM, saves state to the migration channels, and
enters the postmigrate state. Execution resumes in new QEMU.
Memory-backend objects must have the share=on attribute, but memory-backend-epc
is not supported. The VM must be started with the '-machine aux-ram-share=on'
option, which allows auxilliary guest memory to be transferred in place to the
new process.
This mode requires a second migration channel of type "cpr", in the channel
arguments on the outgoing side, and in a second -incoming command-line
parameter on the incoming side. The channel must be a type, such as unix
socket, that supports SCM_RIGHTS.
Why?
This mode has less impact on the guest than any other method of updating
in place. The pause time is much lower, because devices need not be torn
down and recreated, DMA does not need to be drained and quiesced, and minimal
state is copied to new QEMU. Further, there are no constraints on the guest.
By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
and suspending plus resuming vfio devices adds multiple seconds to the
guest pause time.
These benefits all derive from the core design principle of this mode,
which is preserving open descriptors. This approach is very general and
can be used to support a wide variety of devices that do not have hardware
support for live migration, including but not limited to: vfio, chardev,
vhost, vdpa, and iommufd. Some devices need new kernel software interfaces
to allow a descriptor to be used in a process that did not originally open it.
How?
All memory that is mapped by the guest is preserved in place. Indeed,
it must be, because it may be the target of DMA requests, which are not
quiesced during cpr-transfer. All such memory must be mmap'able in new QEMU.
This is easy for named memory-backend objects, as long as they are mapped
shared, because they are visible in the file system in both old and new QEMU.
Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
so the memfd's can be sent to new QEMU. Pages that were locked in memory
for DMA in old QEMU remain locked in new QEMU, because the descriptor of
the device that locked them remains open.
cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
channel, which must support SCM_RIGHTS, and by sending the unique name of
each descriptor to new QEMU via CPR state.
For device descriptors, new QEMU reuses the descriptor when creating the
device, rather than opening it again. For memfd descriptors, new QEMU
mmap's the preserved memfd when a ramblock is created.
CPR state cannot be sent over the normal migration channel, because devices
and backends are created prior to reading the channel, so this mode sends
CPR state over a second "cpr" migration channel. New QEMU reads the second
channel prior to creating devices or backends.
Example:
In this example, we simply restart the same version of QEMU, but in
a real scenario one would use a new QEMU binary path in terminal 2.
Terminal 1: start old QEMU
# qemu-kvm -qmp stdio -object
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
-m 4G -machine aux-ram-share=on ...
Terminal 2: start new QEMU
# qemu-kvm -monitor stdio ... -incoming tcp:0:44444
-incoming '{"channel-type": "cpr",
"addr": { "transport": "socket", "type": "unix",
"path": "cpr.sock"}}'
Terminal 1:
{"execute":"qmp_capabilities"}
{"execute": "query-status"}
{"return": {"status": "running",
"running": true}}
{"execute":"migrate-set-parameters",
"arguments":{"mode":"cpr-transfer"}}
{"execute": "migrate", "arguments": { "channels": [
{"channel-type": "main",
"addr": { "transport": "socket", "type": "inet",
"host": "0", "port": "44444" }},
{"channel-type": "cpr",
"addr": { "transport": "socket", "type": "unix",
"path": "cpr.sock" }}]}}
{"execute": "query-status"}
{"return": {"status": "postmigrate",
"running": false}}
Terminal 2:
QEMU 10.0.50 monitor - type 'help' for more information
(qemu) info status
VM status: running
This patch series implements a minimal version of cpr-transfer. Additional
series are ready to be posted to deliver the complete vision described
above, including
* vfio
* chardev
* vhost and tap
* blockers
* cpr-exec mode
* iommufd
Changes in V2:
* cpr-transfer is the first new mode proposed, and cpr-exec is deferred
* anon-alloc does not apply to memory-backend-object
* replaced hack with proper synchronization between source and target
* defined QEMU_CPR_FILE_MAGIC
* addressed misc review comments
Changes in V3:
* added cpr-transfer to migration-test
* documented cpr-transfer in CPR.rst
* fix size_t trace format for 32-bit build
* drop explicit fd value in VMSTATE_FD
* defer cpr_walk_fd() and cpr_resave_fd() to later series
* drop "migration: save cpr mode".
delete mode from cpr state, and use cpr_uri to infer transfer mode.
* drop "migration: stop vm earlier for cpr"
* dropped cpr helpers, to be re-added later when needed
* fixed an unreported bug for cpr-transfer and migrate cancel
* documented cpr-transfer restrictions in qapi
* added trace for cpr_state_save and cpr_state_load
* added ftruncate to "preserve ram blocks"
Changes in V4:
* cleaned up qtest deferred connection code
* renamed pass_fd -> can_pass_fd
* squashed patch "split qmp_migrate"
* deleted cpr-uri and its patches
* added cpr channel and its patches
* added patch "hostmem-shm: preserve for cpr"
* added patch "fd-based shared memory"
* added patch "factor out allocation of anonymous shared memory"
* added RAM_PRIVATE and its patch
* added aux-ram-share and its patch
The first 8 patches below are foundational and are needed for both cpr-transfer
mode and the proposed cpr-exec mode. The next 6 patches are specific to
cpr-transfer and implement the mechanisms for sharing state across a socket
using SCM_RIGHTS. The last 5 patches supply tests and documentation.
Steve Sistare (19):
backends/hostmem-shm: factor out allocation of "anonymous shared
memory with an fd"
physmem: fd-based shared memory
memory: add RAM_PRIVATE
machine: aux-ram-share option
migration: cpr-state
physmem: preserve ram blocks for cpr
hostmem-memfd: preserve for cpr
hostmem-shm: preserve for cpr
migration: incoming channel
migration: cpr channel
migration: SCM_RIGHTS for QEMUFile
migration: VMSTATE_FD
migration: cpr-transfer save and load
migration: cpr-transfer mode
tests/migration-test: memory_backend
tests/qtest: defer connection
tests/migration-test: defer connection
migration-test: cpr-transfer
migration: cpr-transfer documentation
backends/hostmem-epc.c | 2 +-
backends/hostmem-file.c | 2 +-
backends/hostmem-memfd.c | 14 ++-
backends/hostmem-ram.c | 2 +-
backends/hostmem-shm.c | 51 ++-------
docs/devel/migration/CPR.rst | 176 ++++++++++++++++++++++++++++++-
hw/core/machine.c | 18 ++++
include/exec/memory.h | 10 ++
include/hw/boards.h | 1 +
include/migration/cpr.h | 31 ++++++
include/migration/misc.h | 2 +
include/migration/vmstate.h | 9 ++
include/qemu/osdep.h | 2 +
meson.build | 8 +-
migration/cpr-transfer.c | 76 ++++++++++++++
migration/cpr.c | 226 ++++++++++++++++++++++++++++++++++++++++
migration/meson.build | 2 +
migration/migration.c | 126 +++++++++++++++++++++-
migration/migration.h | 4 +-
migration/options.c | 12 ++-
migration/qemu-file.c | 83 ++++++++++++++-
migration/qemu-file.h | 2 +
migration/ram.c | 2 +
migration/trace-events | 11 ++
migration/vmstate-types.c | 24 +++++
qapi/migration.json | 40 ++++++-
qemu-options.hx | 32 ++++++
stubs/vmstate.c | 7 ++
system/physmem.c | 135 ++++++++++++++++++++++--
system/trace-events | 1 +
system/vl.c | 47 ++++++++-
tests/qtest/libqtest.c | 80 ++++++++------
tests/qtest/libqtest.h | 19 +++-
tests/qtest/migration-helpers.c | 19 ++--
tests/qtest/migration-test.c | 120 +++++++++++++++++++--
util/oslib-posix.c | 58 +++++++++++
util/oslib-win32.c | 11 ++
37 files changed, 1339 insertions(+), 126 deletions(-)
create mode 100644 include/migration/cpr.h
create mode 100644 migration/cpr-transfer.c
create mode 100644 migration/cpr.c
--
1.8.3.1
- [PATCH V4 00/19] Live update: cpr-transfer,
Steve Sistare <=
[PATCH V4 14/19] migration: cpr-transfer mode, Steve Sistare, 2024/12/02