qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Block Filters


From: Kevin Wolf
Subject: Re: [Qemu-devel] Block Filters
Date: Fri, 6 Sep 2013 10:45:14 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 06.09.2013 um 09:56 hat Fam Zheng geschrieben:
> On Tue, 09/03 18:24, Benoît Canet wrote:
> > 
> > Hello list,
> > 
> > I am thinking about QEMU block filters lately.
> > 
> > I am not a block.c/blockdev.c expert so tell me what you think of the 
> > following.
> > 
> > The use cases I see would be:
> > 
> > -$user want to have some real cryptography on top of qcow2/qed or another
> > format.
> >  snapshots and other block features should continue to work
> > 
> > -$user want to use a raid like feature like QUORUM in QEMU.
> >  other features should continue to work
> > 
> > -$user want to use the future SSD deduplication implementation with 
> > metadata on
> > SSD and data on spinning disks.
> >  other features should continue to work
> > 
> > -$user want to I/O throttle one drive of his vm.
> > 
> > -$user want to do Copy On Read
> > 
> > -$user want to do a combination of the above
> > 
> > -$developer want to make the minimum of required steps to keep changes small
> > 
> > -$developer want to keep user interface changes for later
> > 
> > Lets take a example case of an user wanting to do I/O throttled encrypted 
> > QUORUM
> > on top of QCOW2.
> > 
> > Assuming we want to implement throttle and encryption as something remotely
> > being like a block filter this makes a pretty complex BlockDriverState tree.
> > 
> > The tree would look like the following:
> > 
> >                     I/O throttling BlockDriverState (bs)
> >                                |
> >                                |
> >                                |
> >                                |
> >                     Encryption BlockDriverState (bs)
> >                                |
> >                                |
> >                                |
> >                                |
> >                     Quorum BlockDriverState (bs)
> >                    /           |           \
> >                   /            |            \
> >                  /             |             \
> >                 /              |              \
> >             QCOW2 bs       QCOW2 b s       QCOW2 bs
> >                |               |               |
> >                |               |               |
> >                |               |               |
> >                |               |               |
> >             RAW bs         RAW bs           RAW bs
> > 
> > An external snapshot should result in a tree like the following.
> >                     I/O throttling BlockDriverState (bs)
> >                                |
> >                                |
> >                                |
> >                                |
> >                     Encryption BlockDriverState (bs)
> >                                |
> >                                |
> >                                |
> >                                |
> >                     Quorum BlockDriverState (bs)
> >                    /           |           \
> >                   /            |            \
> >                  /             |             \
> >                 /              |              \
> >             QCOW2 bs       QCOW2 bs         QCOW2 bs
> >                |               |               |
> >                |               |               |
> >                |               |               |
> >                |               |               |
> >             QCOW2 bs       QCOW2 bs         QCOW2 bs
> >                |               |               |
> >                |               |               |
> >                |               |               |
> >                |               |               |
> >             RAW bs         RAW bs           RAW bs
> > 
> > In the current state of QEMU we can code some block drivers to implement 
> > this
> > tree.
> > 
> > However when doing operations like snapshots blockdev.c would have no real 
> > idea
> > of what should be snapshotted and how. (The 3 top bs should be kept on top)
> > 
> > Moreover it would have no way to manipulate easily this tree of 
> > BlockDriverState
> > has each one is encapsulated in it's parent.
> > 
> > Also there no generic way to tell the block layer that two or more 
> > BlockDriverState
> > are siblings.
> > 
> > The current mail is here to propose some additionals structures in order to 
> > cope
> > with these problems.
> > 
> > The overall strategy of the proposed structures is to push out the
> > BlockDriverStates relationships out of each BlockDriverState.
> > 
> > The idea is that it would make it easier for the block layer to manipulate a
> > well known structure instead of being forced to enter into each 
> > BlockDriverState
> > specificity.
> > 
> > The first structure is the BlockStackNode.
> > 
> > The BlockStateNode would be used to represent the relationship between the
> > various BlockDriverStates
> > 
> > struct BlockStackNode {
> >     BlockDriverState *bs;  /* the BlockDriverState holded by this node */
> > 
> >     /* this doubly linked list entry points to the child node and the parent
> >      * node
> >      */
> >     QLIST_ENTRY(BlockStateNode) down;
> > 
> >     /* This doubly linked list entry point to the siblings of this node
> >      */
> >     QLIST_ENTRY(BlockStateNode) siblings;
> > 
> >     /* a hash or an array of the sibbling of this node for fast access
> >      * should be recomputed when updating the tree */
> >     QHASH_ENTRY<BlockStateNode, index> sibblings_hash;
> > }
> > 
> > The BlockBackend would be the structure used to hold the "drive" the guest 
> > use.
> > 
> > struct BlockBackend {
> >     /* the following doubly linked list header point to the top 
> > BlockStackNode
> >      * in our case it's the one containing the I/O throttling bs
> >      */
> >     QLIST_HEAD(, BlockStateNode) block_stack_head;
> >     /* this is a pointer to the topest node below the block filter chain
> >      * in our case the first QCOW2 sibling
> >      */
> >     BlockStackNode *top_node_below_filters;
> > }
> > 
> > 
> > Updated diagram:
> > 
> > (Here bsn means BlockStacknode)
> > 
> >     ------------------------BlockBackend
> >     |                             |
> >     |                          block_stack_head
> >     |                             |
> >     |                             |
> >     |                       I/O throttling BlockStackNode (contains it's bs)
> >     |                             |
> >     |                            down
> >     |                             |
> >     |                             |
> > top_node_below_filter     Encryption BlockStacknode (contains it's bs)
> >     |                             |
> >     |                            down
> >     |                             |
> >     |                             |
> >     |                Quorum BlockStackNode (contain's it's bs)
> >     |               /
> >     |             down
> >     |             /               
> >     |            /     S              S
> >     ------  QCOW2 bsn--i---QCOW2 bsn--i------ QCOW2 bsn (each bsn contains 
> > a bs)
> >                |       b       |      b         |
> >              down      l      down    l        down
> >                |       i       |      i         |
> >                |       n       |      n         |
> >                |       g       |      g         |
> >                |       s       |      s         |
> >                |               |                |
> >             RAW bsn         RAW bsn           RAW bsn  (each bsn contains a 
> > bs)
> > 
> > 
> > Block driver point of view:
> > 
> > to construct the tree each BlockDriver would have some utility functions 
> > looking
> > like.
> > 
> > bdrv_register_child_bs(bs, child_bs, int index);
> > 
> > multiples calls to this function could be done to register multiple siblings
> > childs identified by their index.
> > 
> > This way something like quorum could register multiple QCOW2 instances.
> > 
> > driver would have a
> > BlockDriverSTate *bdrv_access_child(bs, int index);
> > 
> > to access their childs.
> > 
> > These functions can be implemented without the driver knowing about
> > BlockStateNodes using container_of.
> > 
> > blockdev point of view: (here I need your help)
> > 
> > When doing a snapshot blockdev.c would access
> > BlockBackend->top_node_below_filter and make a snapshot of the bs contained 
> > in
> > this node and it's sibblings.
> > 
> Since BlockDriver.bdrv_snapshot_create() is an optional operation, blockdev.c
> can navigate down the tree from top node, until hitting some layer where the 
> op
> is implemented (the QCow2 bs), so we get rid of this top_node_below_filter
> pointer.

Is it even inherent to a block driver (like a filter), if a snapshot is
to be taken at its level? Or is it rather a policy decision that should
be made by the user?

In our example, the quorum driver, it's not at all clear to me that you
want to snapshot all children. In order to roll back to a previous
state, one snapshot is enough, you don't need multiple copies of the
same one. Perhaps you want two so that we can still compare them for
verification. Or all of them because you can afford the disk space and
want ultimate safety. I don't think qemu can know which one is true.

In the same way, in a typical case you may want to keep I/O throttling
for the whole drive, including the new snapshot. But what if the
throttling was used in order to not overload the network where the image
is stored, and you're now doing a local snapshot, to which you want to
stream the image? The I/O throttling should apply only to the backing
file, not the new snapshot.

So perhaps what we really need is a more flexible snapshot/BDS tree
manipulation command that describes in detail which structure you want
to have in the end.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]