qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COL


From: zhanghailiang
Subject: [Qemu-devel] [PATCH COLO-Frame v14 00/40] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Sat, 6 Feb 2016 17:28:12 +0800

This is the 14th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.5-periodic-mode

There are little changes for this series except the network releated part.
We have re-implement this part according to Jason's suggestion. Most of other
parts have been reviewed by Dave.

QEMU has approached soft-freeze for 2.6, we hope COLO prototype to be merged
in 2.6, but we are not sure if we have enough time to catch this train.
So please help us, thanks very much.

Test procedure:
1. Startup qemu
Primary side:
#x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp 
stdio -vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci 
-device usb-tablet -netdev tap,id=hn0,vhost=off -device 
virtio-net-pci,id=net-pci0,netdev=hn0 -drive 
if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=/mnt/sdd/rhel_6.5_64_2U_ide,children.0.driver=raw
Secondary side:
#x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 
-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci 
-device usb-tablet -netdev tap,id=hn0,vhost=off -device 
virtio-net-pci,id=net-pci0,netdev=hn0 -drive 
if=none,id=colo-disk0,file.filename=/mnt/sdd/rhel_6.5_64_2U_ide,driver=raw,node-name=node0
 -drive 
if=virtio,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/mnt/ramfs/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.backing=colo-disk0
 -incoming tcp:0:8888
2. On Secondary VM's QEMU monitor, issue command
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': 
{'host': '192.168.2.88', 'port': '8889'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': 
true } }
{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': 
true} }

3. On Primary VM's QEMU monitor, issue command:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add 
buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0,if=none,id=blk-buddy0'}}
{'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 
'node0' } }
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ 
{'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:192.168.2.88:8888' } }

4. After the above steps, you will see, whenever you make changes to PVM, SVM 
will be synced.
You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ 
"x-checkpoint-delay": 2000 } }'
to change the checkpoint period time.

5. Failover test
You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
monitor at the same time, then SVM will failover and client will not feel this 
change.

Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
issue block related command to stop block replication.
Primary:
  Remove the nbd child from the quorum:
  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 
'child': 'children.1'}}
  { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del 
blk-buddy0'}}
  Note: there is no qmp command to remove the blockdev now

Secondary:
  The primary host is down, so we should do the following thing:
  { 'execute': 'nbd-server-stop' }

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v14:
 - Re-implement the network processing based on netfilter (Jason Wang)
 - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
 - Split two new patches (patch 27/28) from patch 29
 - Fix some other comments from Dave and Markus.

v13:
 - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
  instead of return value to indicate success or failure. (patch 10)
 - Remove the optional error message for COLO_EXIT event. (patch 25)
 - Use semaphore to notify colo/colo incoming loop that failover work is
   finished. (patch 26)
 - Move COLO shutdown related codes to colo.c file. (patch 28)
 - Fix memory leak bug for colo incoming loop. (new patch 31)
 - Re-use some existed helper functions to realize the process of
   saving/loading ram and device. (patch 32)
 - Fix some other comments from Dave and Markus.

zhanghailiang (40):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO/migration: Create a new communication path from destination to
    source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add COLO_EXIT event to notify users while exited from COLO
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Introduce two helper functions for save/find loadvm_handlers
    entry
  migration/savevm: Add new helpers to process the different stages of
    loadvm
  migration/savevm: Export two helper functions for savevm process
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter: Add a 'status' property for filter object
  net/filter: Introduce a helper to add a filter to the netdev
  filter-buffer: Accept zero interval
  net: Add notifier/callback for netdev init
  COLO/filter: add each netdev a buffer filter
  net/filter: Add a helper to traverse all the filters
  COLO: enable buffer filters for PVM
  filter-buffer: make filter_buffer_flush() public
  COLO: flush buffered packets in checkpoint process or exit COLO
  COLO: Add block replication into colo process

 configure                     |  11 +
 docs/qmp-events.txt           |  16 +
 hmp-commands.hx               |  15 +
 hmp.c                         |  15 +
 hmp.h                         |   1 +
 include/exec/ram_addr.h       |   1 +
 include/migration/colo.h      |  42 +++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  16 +
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h          |  12 +
 include/net/net.h             |   8 +
 include/sysemu/sysemu.h       |   9 +
 migration/Makefile.objs       |   2 +
 migration/colo-comm.c         |  76 ++++
 migration/colo-failover.c     |  83 +++++
 migration/colo.c              | 846 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c         | 109 +++++-
 migration/qemu-file-buf.c     |  61 +++
 migration/ram.c               | 175 ++++++++-
 migration/savevm.c            | 114 ++++--
 net/filter-buffer.c           |  14 +-
 net/filter.c                  |  79 ++++
 net/net.c                     |  57 +++
 qapi-schema.json              | 104 +++++-
 qapi/event.json               |  15 +
 qmp-commands.hx               |  23 +-
 stubs/Makefile.objs           |   1 +
 stubs/migration-colo.c        |  54 +++
 trace-events                  |   8 +
 vl.c                          |  31 +-
 31 files changed, 1959 insertions(+), 75 deletions(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-comm.c
 create mode 100644 migration/colo-failover.c
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.8.3.1





reply via email to

[Prev in Thread] Current Thread [Next in Thread]