qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] BlockDriverState stack and BlockListeners (was: [RFC] Repli


From: Kevin Wolf
Subject: [Qemu-devel] BlockDriverState stack and BlockListeners (was: [RFC] Replication agent design)
Date: Tue, 21 Feb 2012 10:03:14 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20120131 Thunderbird/10.0

Am 20.02.2012 15:32, schrieb Paolo Bonzini:
> On 02/19/2012 02:40 PM, Ori Mamluk wrote:
>>
>> I think it might be better to go back to my original less generic design.
>> We can regard it as a 'plugin' for a specific application - in this
>> case, replication.
>> I can add a plugin interface in the generic block layer that allows
>> building a proper storage stack.
>> The plugin will have capabilities like a filter driver - getting hold of
>> the request on its way down (from VM to storage) and on its way up (IO
>> completion), allowing to block or stall both.
> 
> I and Stefan talked about this recently... we called it a BlockListener.
>  It seems like a good idea, and probably copy-on-read should be
> converted in due to time to a BlockListener, too.

After thinking a bit about it, I tend to agree. However, I wouldn't call
it a BlockListener because it could do much more than just observing
requests, it can modify them. Basically it would take a request and do
anything with it. It could enqueue the request and do nothing for the
moment (I/O throttling), it could use a different buffer and do copy on
read, it could mirror writes, etc.

So let's check which features could make use of it:

- Copy on read
- I/O throttling
- blkmirror for precopy storage migration
- replication agent
- Old style block migration (btw, we should deprecate this)
- Maybe even bdrv_check_request and high watermark? However, they are
  not optional, so probably makes less sense.

I think these are enough cases to justify it. Now, which operations do
we need to intercept?

- bdrv_co_read
- bdrv_co_write
- bdrv_drain (btw, we need a version for only one BDS)
- Probably bdrv_co_discard as well

Anything I missed? Now the interesting question that comes to mind is:
What is really the difference between the proposed BlockListener and a
BlockDriver? Sure, a listener would implement much less functionality,
but we also have BlockDrivers today that implement very few of the
callbacks.

A bdrv_drain callback doesn't exist yet in BlockDrivers, but I consider
this a bug (qemu_aio_flush() is really the implementation for raw-posix
and possibly some network protocols), so we should just add this to
BlockDriver.

The main difference that I see is that the listeners stay always on top.
For example, let's assume that if implemented a blkmirror driver in
today's infrastructure, you would get a BlockDriverState stack like
blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't
want to have the blkmirror applied to the old top-level image, which is
now a read-only backing file. Instead, it should move to the new
top-level image. I believe this is similar with I/O throttling, to some
degree with copy on read, etc.

So maybe we just need to extend the current BlockDriverState stack to
distinguish "normal" and "always on top" BlockDrivers, where the latter
would roughly correspond to BlockListeners?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]