qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] replication agent module


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC PATCH] replication agent module
Date: Tue, 7 Feb 2012 13:50:25 +0000

On Tue, Feb 7, 2012 at 1:34 PM, Kevin Wolf <address@hidden> wrote:
> Am 07.02.2012 11:29, schrieb Ori Mamluk:
>> Repagent is a new module that allows an external replication system to
>> replicate a volume of a Qemu VM.

I recently joked with Kevin that QEMU is on its way to reimplementing
the Linux block and device-mapper layers.  Now we have drbd, thanks!
:P

Except for image files, the way to do this on a Linux host would be
using drbd block devices.  We still haven't figured out a nice way to
make image files full-fledged Linux block devices, so we're
reimplementing all the block code in QEMU userspace.

>>
>> This RFC patch adds the repagent client module to Qemu.
>>
>>
>>
>> Documentation of the module role and API is in the patch at
>> replication/qemu-repagent.txt
>>
>>
>>
>> The main motivation behind the module is to allow replication of VMs in
>> a virtualization environment like RhevM.
>>
>> To achieve this we need basic replication support in Qemu.
>>
>>
>>
>> This is the first submission of this module, which was written as a
>> Proof Of Concept, and used successfully for replicating and recovering a
>> Qemu VM.
>
> I'll mostly ignore the code for now and just comment on the design.
>
> One thing to consider for the next version of the RFC would be to split
> this in a series smaller patches. This one has become quite large, which
> makes it hard to review (and yes, please use git send-email).
>
>> Points and open issues:
>>
>> *             The module interfaces the Qemu storage stack at block.c
>> generic layer. Is this the right place to intercept/inject IOs?
>
> There are two ways to intercept I/O requests. The first one is what you
> chose, just add some code to bdrv_co_do_writev, and I think it's
> reasonable to do this.
>
> The other one would be to add a special block driver for a replication:
> protocol that writes to two different places (the real block driver for
> the image, and the network connection). Generally this feels even a bit
> more elegant, but it brings new problems with it: For example, when you
> create an external snapshot, you need to pay attention not to lose the
> replication because the protocol is somewhere in the middle of a backing
> file chain.
>
>> *             The patch contains performing IO reads invoked by a new
>> thread (a TCP listener thread). See repaget_read_vol in repagent.c. It
>> is not protected by any lock – is this OK?
>
> No, definitely not. Block layer code expects that it holds
> qemu_global_mutex.
>
> I'm not sure if a thread is the right solution. You should probably use
> something that resembles other asynchronous code in qemu, i.e. either
> callback or coroutine based.

There is a flow control problem here which is interesting.  If the
rephub is slower than the writer or unavailable, then eventually we
either need to stop replicating writes or we need to throttle the
guest writes.  I haven't read through the whole patch yet but the flow
control solution is very closely tied to how you use
threads/coroutines and how you use network sockets.

>> +             * Read a protected volume - allows the Rephub to read a
>> protected volume, to enable the protected hub to syncronize the content
>> of a protected volume.
>
> We were discussing using NBD as the protocol for any data that is
> transferred from/to the replication hub, so that we can use the existing
> NBD client and server code that qemu has. Seems you came to the
> conclusion to use different protocol? What are the reasons?
>
> The other message types could possibly be implemented as QMP commands. I
> guess we might need to attach multiple QMP monitors for this to work
> (one for libvirt, one for the rephub). I'm not sure if there is a
> fundamental problem with this or if it just needs to be done.

Agreed.  You can already query block devices using QMP 'query-block'.
By adding in-process NBD server support you could then launch an NBD
server for each volume which you wish to replicate.  However, in this
case it sounds almost like you want the reverse - you could provide an
NBD server on the rephub and QEMU would mirror writes to it (the NBD
client code is already in QEMU).

There is also interest from other external software (like libvirt) to
be able to read volumes while the VM is running.

BTW, do you poll the volumes or how do you handle hotplug?  Does
anything special need to be done when a volume is unplugged?

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]