qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] block: Make op blockers recursive


From: Fam Zheng
Subject: Re: [Qemu-devel] [PATCH] block: Make op blockers recursive
Date: Wed, 10 Sep 2014 16:54:19 +0800
User-agent: Mutt/1.5.23 (2014-03-12)

On Thu, 09/04 21:42, Stefan Hajnoczi wrote:
> On Tue, Aug 26, 2014 at 06:45:54AM +0000, Benoît Canet wrote:
> > On Tue, Aug 26, 2014 at 12:42:04PM +0800, Fam Zheng wrote:
> > > On Mon, 08/25 12:12, Benoît Canet wrote:
> > > > On Mon, Aug 25, 2014 at 05:37:37PM +0800, Fam Zheng wrote:
> > > > > On Mon, 08/25 09:06, Benoît Canet wrote:
> > > > > > On Mon, Aug 25, 2014 at 02:04:24PM +0800, Fam Zheng wrote:
> > > > > > > On Fri, 08/22 18:11, Benoît Canet wrote:
> > > > > > > > Since the block layer code is starting to modify the BDS graph 
> > > > > > > > right in the
> > > > > > > > middle of BDS chains (block-mirror's replace parameter for 
> > > > > > > > example) QEMU needs
> > > > > > > > to properly block and unblock whole BDS subtrees; recursion is 
> > > > > > > > a neat way to
> > > > > > > > achieve this task.
> > > > > > > > 
> > > > > > > > This patch also takes care of modifying the op blockers users.
> > > > > > > 
> > > > > > > Is this going to replace backing_blocker?
> > > > > > > 
> > > > > > > I think it is too general an approach to control the operation 
> > > > > > > properly,
> > > > > > > because the op blocker may not work in the same way for all types 
> > > > > > > of BDS
> > > > > > > connections.  In other words, the choosing of op blockers are 
> > > > > > > likely
> > > > > > > conditional on graph edge types, that's why backing_blocker was 
> > > > > > > added here. For
> > > > > > > example, A VMDK extent connection will probably need a different 
> > > > > > > set of
> > > > > > > blockers than bs->file connection.
> > > > > > > 
> > > > > > > So could you explain in which cases is the recursive 
> > > > > > > blocking/unblocking
> > > > > > > useful?
> > > > > > 
> > > > > > It's designed for the new crop of block operations operating on BDS 
> > > > > > located in
> > > > > > the middle of the backing chain: Jeff's patches, intermediate live 
> > > > > > streaming or
> > > > > > intermediate mirroring.
> > > > > > Recursively blocking BDS allows to do these operations safely.
> > > > > 
> > > > > Sorry I may be slow on this, but it's still not clear to me.
> > > > > 
> > > > > That doesn't immediately show how backing_blocker doesn't work. These
> > > > > operations are in the category of operations that update graph 
> > > > > topology,
> > > > > meaning that they drop, add or swap some nodes in the middle of the 
> > > > > chain. It
> > > > > is not safe because there are used by the other nodes, but they are 
> > > > > supposed to
> > > > > be protected by backing_blocker. Could you be more specific?
> > > > 
> > > > I don't know particularly about the backing blocker case.
> > > > 
> > > > > 
> > > > > I can think of something more than backing_hd: there are also link 
> > > > > types other
> > > > > than backing_hd, for example ->file, (vmdk)->extents or 
> > > > > (quorum)->qcrs, etc.
> > > > > They should be protected as well.
> > > > 
> > > > This patch takes cares of recursing everywhere.
> > > > 
> > > > I can give you an example for quorum.
> > > > 
> > > > If a streaming operation is running on a quorum block backend the 
> > > > recursive
> > > > blocking will help to block any operation done directly on any of the 
> > > > children.
> > > 
> > > At what points should block layer recursively block/unblock the 
> > > operations in
> > > this quorum case?
> > 
> > When the streaming starts it should block all the top bs children.
> > So after when an operation tries to operate on a child of the top bs it 
> > will be
> > forbidden.
> > 
> > The beauty of it is that recursive blockers can easily replace regular 
> > blockers.
> 
> Let's think of a situation that recursive blockers protect but
> backing_blocker does not:
> 
> a <- b <- c <- d
> 
> c is the backing file and is therefore protected by the op blocker.
> 
> The block-commit command works with node-names, however, so we can
> manipulate any nodes in the graph, not just the topmost one.  Try this:
> 
> block-commit d
> block-commit b
> 
> I haven't checked yet but I suspect it will launch two block-commit jobs
> on the same partial chain (that's a bad thing because it can lead to
> corruption).

1) Does block-commit work with node-names already? In other words, is
   block-commit b possible now? I only see drive-mirror works with it, but not
   drive-backup, block-mirror or block-commit.

2) Regardless of the answer to 1), I think we could use a similar approach as
   drive-backup here: split BLOCK_OP_TYPE_COMMIT to
   BLOCK_OP_TYPE_COMMIT_{SOURCE,TARGET}, and only unblock
   BLOCK_OP_TYPE_COMMIT_TARGET in bdrv_set_backing_hd.

Unblocking BLOCK_OP_TYPE_COMMIT on backing_hd is wrong as long as we expose "d"
to block-commit, with node-name.

As the next step, let's think about what it takes to safely allow commit d to b
with:

a <- b <- c <- d <- e <- f <- g

I think the answer is: if the commit job checks and sets blockers on range [d, 
b],
it is safe. Because from e, f and g's points of view, the whole backing chain
is always complete and working, just like how device model thinks of the top 
node
when it's being active committed nowadays.

Based on that, we can even start commit on non-intersecting partial chains at
the same time:

block-commit g up to e
block-commit d up to b

, if we want to.

That is only possible with a more specific blocking logic, instead of recursive
blocker.

Fam

> 
> With recursive blockers, not just c but also a and b are protected.
> 
> To me, this demonstrates that recursive op blockers are safer than
> non-recursive backing_blocker and will prevent real-world cases that
> could lead to corruption.
> 
> BTW I'm fairly happy with the patch itself, but like Fam I'm still
> thinking through different cases to convince myself the design is sound.
> 
> Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]