qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame v9 00/32] COarse-grain LOck-stepping(


From: zhanghailiang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v9 00/32] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Wed, 9 Sep 2015 11:36:41 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

Ping...

Hi Juan & Amit,

Could you please help review this series ?
Since it is already comes v9, i really hope to get your feedback on this series 
:)

Thanks,
zhanghailiang

On 2015/9/2 16:22, zhanghailiang wrote:
This is the 9th version of COLO.

Please Note that, this version is very different from the previous versions.
since we have decided to realize proxy in qemu, which based on slirp in qemu.
We dropped all the original colo proxy related part.

It will be a long time for proxy to be ready for merging, so here we extract
the basic periodic checkpoint part that not depend on proxy into this series.
Actually, the 'periodic' mode is also what we want to support in COLO, it is
based on Yang Hongyang's netfilter series. and this mode is very like
MicroCheckpointing and Remus.

You can find the discussion about why & how to realize the colo proxy in qemu
from the follow link:
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html

As usual, here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.0-periodic-mode

Compared with previous versions, this version is more easy to test.

Test procedure:
1. Startup qemu
Primary side:
# x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=bn0 -netfilter 
buffer,id=f0,netdev=bn0,chain=in -device virtio-net-pci,id=net-pci0,netdev=bn0 
-boot c -drive 
if=virtio,id=disk1,driver=quorum,read-pattern=fifo,cache=none,aio=native,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw
 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor 
stdio -S

Secondary side:
# x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=bn0 -device 
virtio-net-pci,id=net-pci0,netdev=bn0 -drive 
if=none,driver=raw,file=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,id=colo1,cache=none,aio=native
 -drive 
if=virtio,driver=replication,mode=secondary,throttling.bps-total=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.backing.backing_reference=colo1,file.backing.allow-write-backing-file=on
 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-table -monitor stdio 
-incoming tcp:0:8888

2. On Secondary VM's QEMU monitor, issue command
(qemu) nbd_server_start 192.168.2.88:8889
(qemu) nbd_server_add -w colo1

3. On Primary VM's QEMU monitor, issue command:
(qemu) child_add disk1 
child.driver=replication,child.mode=primary,child.file.host=192.168.2.88,child.file.port=8889,child.file.export=colo1,child.file.driver=nbd,child.ignore-errors=on
(qemu) migrate_set_capability colo on
(qemu) migrate tcp:192.168.2.88:8888

4. After the above steps, you will see, whenever you make changes to PVM, SVM 
will be synced.
You can by issue command "migrate_set_parameter checkpoint-delay 2000"
to change the checkpoint period time.

5. Failover test
You can kill PVM  and run 'colo_lost_heartbeat' in SVM's
monitor at the same time, then SVM will failover and client will not feel this 
change.

COLO is a totally new feature which is still in early stage,
your comments and feedback are warmly welcomed.

TODO:
1. checkpoint based on proxy in qemu
2. The capability of continuous FT

v9:
- Drop colo proxy related part (colo-nic.c file)
- Convert COLO protocol name definition to QAPI
- Smash failover related patch (patch 19/20/23)
- Fix colo exit event according Eric's comments.
- Fix some typos from Eric's comments
- Fix bug 'invalid runstate transition: 'colo' -> 'prelaunch' reported
   by Dave (patch 27)
- Use migrate_set_parameter intead of ecolo-set-checkpoint-period to set
   checkpoint delay time (patch 25)
- Add new patch (patch 29/30) to seperate the process of saving/loading
   device and state during checkpoint. which will reduce the data size
   for sending and also reduce the qsb size used in checkpoint.

Wen Congyang (1):
   COLO: Add block replication into colo process

zhanghailiang (31):
   configure: Add parameter for configure to enable/disable COLO support
   migration: Introduce capability 'colo' to migration
   COLO: migrate colo related info to slave
   migration: Add state records for migration incoming
   migration: Integrate COLO checkpoint process into migration
   migration: Integrate COLO checkpoint process into loadvm
   migration: Rename the'file' member of MigrationState and
     MigrationIncomingState
   COLO/migration: establish a new communication path from destination to
     source
   COLO: Implement colo checkpoint protocol
   COLO: Add a new RunState RUN_STATE_COLO
   QEMUSizedBuffer: Introduce two help functions for qsb
   COLO: Save PVM state to secondary side when do checkpoint
   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
   COLO: Load VMState into qsb before restore it
   COLO: Flush PVM's cached RAM into SVM's memory
   COLO: synchronize PVM's state to SVM periodically
   COLO failover: Introduce a new command to trigger a failover
   COLO failover: Introduce state to record failover process
   COLO: Implement failover work for Primary VM
   COLO: Implement failover work for Secondary VM
   COLO: implement default failover treatment
   qmp event: Add event notification for COLO error
   COLO failover: Shutdown related socket fd when do failover
   COLO failover: Don't do failover during loading VM's state
   COLO: Control the checkpoint delay time by migrate-set-parameters
     command
   COLO: Implement shutdown checkpoint
   COLO: Update the global runstate after going into colo state
   savevm: Split load vm state function qemu_loadvm_state
   COLO: Separate the process of saving/loading ram and device state
   COLO: Split qemu_savevm_state_begin out of checkpoint process
   COLO: Add net packets treatment into COLO

  configure                     |  11 +
  docs/qmp/qmp-events.txt       |  17 +
  hmp-commands.hx               |  15 +
  hmp.c                         |  16 +
  hmp.h                         |   1 +
  include/exec/cpu-all.h        |   1 +
  include/migration/colo.h      |  44 +++
  include/migration/failover.h  |  33 ++
  include/migration/migration.h |  16 +-
  include/migration/qemu-file.h |   3 +-
  include/sysemu/sysemu.h       |   8 +
  migration/Makefile.objs       |   2 +
  migration/colo-comm.c         |  75 ++++
  migration/colo-failover.c     |  83 +++++
  migration/colo.c              | 782 ++++++++++++++++++++++++++++++++++++++++++
  migration/exec.c              |   4 +-
  migration/fd.c                |   4 +-
  migration/migration.c         | 184 +++++++---
  migration/qemu-file-buf.c     |  58 ++++
  migration/ram.c               | 185 +++++++++-
  migration/savevm.c            | 309 +++++++++++++----
  migration/tcp.c               |   4 +-
  migration/unix.c              |   4 +-
  qapi-schema.json              | 101 +++++-
  qapi/event.json               |  17 +
  qmp-commands.hx               |  20 ++
  stubs/Makefile.objs           |   1 +
  stubs/migration-colo.c        |  45 +++
  trace-events                  |   8 +
  vl.c                          |  37 +-
  30 files changed, 1930 insertions(+), 158 deletions(-)
  create mode 100644 include/migration/colo.h
  create mode 100644 include/migration/failover.h
  create mode 100644 migration/colo-comm.c
  create mode 100644 migration/colo-failover.c
  create mode 100644 migration/colo.c
  create mode 100644 stubs/migration-colo.c






reply via email to

[Prev in Thread] Current Thread [Next in Thread]