qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuou


From: Wen Congyang
Subject: Re: [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing
Date: Wed, 28 Jan 2015 14:42:07 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 12/27/2014 11:23 PM, Paolo Bonzini wrote:
> 
> 
> On 26/12/2014 04:31, Yang Hongyang wrote:
>> Please feel free to comment.
>> We want comments/feedbacks as many as possiable please, thanks in advance.
> 
> Hi Yang,
> 
> I think it's possible to build COLO block replication from many basic
> blocks that are already in QEMU.  The only new piece would be the disk
> buffer on the secondary.
> 
>          virtio-blk       ||
>              ^            ||                            .----------
>              |            ||                            | Secondary
>         1 Quorum          ||                            '----------
>          /      \         ||
>         /        \        ||
>    Primary      2 NBD  ------->  2 NBD
>      disk       client    ||     server                  virtio-blk
>                           ||        ^                         ^
> --------.                 ||        |                         |
> Primary |                 ||  Secondary disk <--------- COLO buffer 3
> --------'                 ||                   backing
> 
> 
> 1) The disk on the primary is represented by a block device with two
> children, providing replication between a primary disk and the host that
> runs the secondary VM.  The read pattern patches for quorum
> (http://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02381.html) can
> be used/extended to make the primary always read from the local disk
> instead of going through NBD.
> 
> 2) The secondary disk receives writes from the primary VM through QEMU's
> embedded NBD server (speculative write-through).
> 
> 3) The disk on the secondary is represented by a custom block device
> ("COLO buffer").  The disk buffer's backing image is the secondary disk,
> and the disk buffer uses bdrv_add_before_write_notifier to implement
> copy-on-write, similar to block/backup.c.
> 
> 4) Checkpointing can use new bdrv_prepare_checkpoint and
> bdrv_do_checkpoint members in BlockDriver to discard the COLO buffer,
> similar to your patches (you did not explain why you do checkpointing in
> two steps).  Failover instead is done with bdrv_commit or can even be
> done without stopping the secondary (live commit, block/commit.c).
> 
> 
> The missing parts are:
> 
> 1) NBD server on the backing image of the COLO buffer.  This means the
> backing image needs its own BlockBackend.  Apart for this, no new
> infrastructure is needed to receive writes on the secondary.

Backing image is always opened read-only. How to remove this limitaion?
Add a option to control it?

Thanks
Wen Congyang

> 
> 2) Read pattern support for quorum need to be extended for the needs of
> the COLO primary.  It may be simpler or faster to write a simple
> "replication" driver that writes to N children but always reads from the
> first.  But in any case initial tests can be done with the quorum
> driver, even without read pattern support.  Again, all the network
> infrastructure to replicate writes already exists in QEMU.
> 
> 3) Of course the disk buffer itself.
> 
> Paolo
> 
>> Thanks,
>> Yang.
>>
>> Wen Congyang (1):
>>   PoC: Block replication for COLO
>>
>> Yang Hongyang (1):
>>   Block: Block replication design for COLO
>>
>>  block.c                   |  48 +++++++
>>  block/blkcolo.c           | 338 
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>  docs/blkcolo.txt          |  85 ++++++++++++
>>  include/block/block.h     |   6 +
>>  include/block/block_int.h |  21 +++
>>  5 files changed, 498 insertions(+)
>>  create mode 100644 block/blkcolo.c
>>  create mode 100644 docs/blkcolo.txt
>>
> 
> .
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]