qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end
Date: Thu, 30 Nov 2017 16:10:38 +0100
User-agent: Mutt/1.9.1 (2017-09-22)

Am 30.11.2017 um 15:34 hat Fam Zheng geschrieben:
> On Thu, 11/30 11:31, Kevin Wolf wrote:
> > What you really mean is probably connected components in the graph, but
> > do we really want to manage merging and splitting object representing
> > connected components when a node is added or removed from the graph?
> > Especially when that graph change occurs in a drain callback?
> > 
> > You can also still easily introduce bugs where graph changes during a
> > drain end up in nodes not being drained, possibly drained twice, you
> > still access the next pointer of a deleted node or you accidentally
> > switch to draining a different component.
> > 
> > It's probably possible to get this right, but essentially you're just
> > switching from iterating a tree to iterating a list. You get roughly the
> > same set of problems that you have to consider as today, and getting it
> > right should be about the same difficulty.
> 
> Not quite. The essential idea is redo the drain begin/end in a correct way:
> 
> bdrv_drained_begin(bs):
>     1.a) stop all sources of new requests in the connected components, 
> including
>         * aio_disable_external()
>         * stop QED timer
>         * stop block job
>     1.b) aio_poll() until:
>         * no request is in flight for bs
>         * no progress is made (any progress may generate new requests)
> 
> bdrv_drained_end(bs):
>     2.a) resume stopped sources of new requests
>     2.b) no need to do aio_poll() because the main loop will move on
> 
> 1.a) can be either done with either a graph recursion like now, or a list
> traversing if managed somewhere, like in this series. Or better, BlockGraph 
> will
> be the manager. The rule is that these operations will not cause graph change,
> so it's an improvement.
> 
> 1.b) doesn't need recursion IMO, only looking at bs->in_flight and the return
> value of aio_poll() should be enough.

The trouble is with 1.b), which can cause graph changes, including nodes
being added to or removed from a connected component. This means that
you can get nodes in the component for which 1.a) hasn't been executed,
or nodes that are outside the component, but for which 1.a) has still
been executed.

You then need to compensate for that somehow, and basically execute 1.a)
for any new nodes (otherwise you have nodes in the component that could
still be issuing requests; skipping drained_end for them is not enough)
and execute 2.*) for the removed ones.

> For 2.a) if the drain begin/end callbacks are mananged in a list, it's
> easy to only call drained_end() for entries that got drained_begin()
> earlier, as shown in patch 2.

Yes, but not draining them was a bug in the first place.

I'll give you that trying to separate 1.a) and 1.b) so that we have only
a single BDRV_POLL_WHILE() after the most critical code, is an
interesting idea. But as long as 1.b) can involve graph changes, it
remains quite tricky.

And of course, it can't solve the problem with graph changes in callers
of bdrv_drained_begin/end because at least one BDRV_POLL_WHILE() will
stay there that can cause graph changes.

> > > > And finally, I don't really think that the recursion is even a problem.
> > > > The problem is with graph changes made by callbacks that drain allows to
> > > > run. With your changes, it might be a bit easier to avoid that
> > > > bdrv_drain() itself gets into trouble due to graph changes, but this
> > > > doesn't solve the problem for any (possibly indirect) callers of
> > > > bdrv_drain().
> > > 
> > > The recursion is the one big place that can be easily broken by graph 
> > > changes,
> > > fixing this doesn't make the situation any worse. We could still fix the
> > > indirect callers by taking references or by introducing "ubiquitous 
> > > coroutines".
> > 
> > But hiding a bug in 80% of the cases where it shows isn't enough.
> 
> I think they are separate bugs. And with the one that this series is fixing,
> others (bdrv_drain*() callers') may even not show up, because
> bdrv_drained_begin() crashed already.
> 
> > 
> > I think the only real solution is to forbid graph changes until after
> > any critical operation has completed. I haven't tried it out in
> > practice, but I suppose we could use a CoMutex around them and take it
> > in bdrv_drained_begin/end() and all other places that can get into
> > trouble with graph changes.
> 
> Yes, I agree, but that (using CoMutex around graph change) requires
> everything, especially the defer_to_main_loop_bh, runs in a coroutine
> context, which is exactly what I mean by "introducing 'ubiquitous
> coroutines'", because currently we don't have them.

Is it hard to do, though? Instead of using a BH to switch to the main
loop and outside of coroutine context, you could use aio_co_schedule()
and yield, which would leave you in the main loop, but still in
coroutine context.

Would this have any bad side effects I'm not aware of?

I'm not sure about other places, of course. We'd have to check that.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]