[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuou
From: |
Wen Congyang |
Subject: |
Re: [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing |
Date: |
Wed, 28 Jan 2015 14:42:07 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 |
On 12/27/2014 11:23 PM, Paolo Bonzini wrote:
>
>
> On 26/12/2014 04:31, Yang Hongyang wrote:
>> Please feel free to comment.
>> We want comments/feedbacks as many as possiable please, thanks in advance.
>
> Hi Yang,
>
> I think it's possible to build COLO block replication from many basic
> blocks that are already in QEMU. The only new piece would be the disk
> buffer on the secondary.
>
> virtio-blk ||
> ^ || .----------
> | || | Secondary
> 1 Quorum || '----------
> / \ ||
> / \ ||
> Primary 2 NBD -------> 2 NBD
> disk client || server virtio-blk
> || ^ ^
> --------. || | |
> Primary | || Secondary disk <--------- COLO buffer 3
> --------' || backing
>
>
> 1) The disk on the primary is represented by a block device with two
> children, providing replication between a primary disk and the host that
> runs the secondary VM. The read pattern patches for quorum
> (http://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02381.html) can
> be used/extended to make the primary always read from the local disk
> instead of going through NBD.
>
> 2) The secondary disk receives writes from the primary VM through QEMU's
> embedded NBD server (speculative write-through).
>
> 3) The disk on the secondary is represented by a custom block device
> ("COLO buffer"). The disk buffer's backing image is the secondary disk,
> and the disk buffer uses bdrv_add_before_write_notifier to implement
> copy-on-write, similar to block/backup.c.
>
> 4) Checkpointing can use new bdrv_prepare_checkpoint and
> bdrv_do_checkpoint members in BlockDriver to discard the COLO buffer,
> similar to your patches (you did not explain why you do checkpointing in
> two steps). Failover instead is done with bdrv_commit or can even be
> done without stopping the secondary (live commit, block/commit.c).
>
>
> The missing parts are:
>
> 1) NBD server on the backing image of the COLO buffer. This means the
> backing image needs its own BlockBackend. Apart for this, no new
> infrastructure is needed to receive writes on the secondary.
Backing image is always opened read-only. How to remove this limitaion?
Add a option to control it?
Thanks
Wen Congyang
>
> 2) Read pattern support for quorum need to be extended for the needs of
> the COLO primary. It may be simpler or faster to write a simple
> "replication" driver that writes to N children but always reads from the
> first. But in any case initial tests can be done with the quorum
> driver, even without read pattern support. Again, all the network
> infrastructure to replicate writes already exists in QEMU.
>
> 3) Of course the disk buffer itself.
>
> Paolo
>
>> Thanks,
>> Yang.
>>
>> Wen Congyang (1):
>> PoC: Block replication for COLO
>>
>> Yang Hongyang (1):
>> Block: Block replication design for COLO
>>
>> block.c | 48 +++++++
>> block/blkcolo.c | 338
>> ++++++++++++++++++++++++++++++++++++++++++++++
>> docs/blkcolo.txt | 85 ++++++++++++
>> include/block/block.h | 6 +
>> include/block/block_int.h | 21 +++
>> 5 files changed, 498 insertions(+)
>> create mode 100644 block/blkcolo.c
>> create mode 100644 docs/blkcolo.txt
>>
>
> .
>