[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK
From: |
Kevin Wolf |
Subject: |
Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK |
Date: |
Fri, 3 Nov 2023 11:32:39 +0100 |
Am 27.10.2023 um 23:33 hat Eric Blake geschrieben:
> On Fri, Oct 27, 2023 at 05:53:25PM +0200, Kevin Wolf wrote:
> > Instead of taking the writer lock internally, require callers to already
> > hold it when calling bdrv_replace_node(). Its callers may already want
> > to hold the graph lock and so wouldn't be able to call functions that
> > take it internally.
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> > include/block/block-global-state.h | 6 ++++--
> > block.c | 26 +++++++-------------------
> > block/commit.c | 13 +++++++++++--
> > block/mirror.c | 26 ++++++++++++++++----------
> > blockdev.c | 5 +++++
> > tests/unit/test-bdrv-drain.c | 6 ++++++
> > tests/unit/test-bdrv-graph-mod.c | 13 +++++++++++--
> > 7 files changed, 60 insertions(+), 35 deletions(-)
> >
> > +++ b/block.c
> > @@ -5484,25 +5484,7 @@ out:
> > int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
> > Error **errp)
> > {
> > - int ret;
> > -
> > - GLOBAL_STATE_CODE();
> > -
> > - /* Make sure that @from doesn't go away until we have successfully
> > attached
> > - * all of its parents to @to. */
>
> Useful comment that you just moved here in the previous patch...
>
> > - bdrv_ref(from);
> > - bdrv_drained_begin(from);
> > - bdrv_drained_begin(to);
> > - bdrv_graph_wrlock(to);
> > -
> > - ret = bdrv_replace_node_common(from, to, true, false, errp);
> > -
> > - bdrv_graph_wrunlock();
> > - bdrv_drained_end(to);
> > - bdrv_drained_end(from);
> > - bdrv_unref(from);
> > -
> > - return ret;
> > + return bdrv_replace_node_common(from, to, true, false, errp);
> > }
> >
> > int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
> > @@ -5717,9 +5699,15 @@ BlockDriverState *bdrv_insert_node(BlockDriverState
> > *bs, QDict *options,
> > goto fail;
> > }
> >
> > + bdrv_ref(bs);
>
> ...but now it is gone. Intentional?
I figured it was obvious enough that bdrv_ref() is always called to make
sure that the node doesn't go away too early, but I can add it back.
> > bdrv_drained_begin(bs);
> > + bdrv_drained_begin(new_node_bs);
> > + bdrv_graph_wrlock(new_node_bs);
> > ret = bdrv_replace_node(bs, new_node_bs, errp);
> > + bdrv_graph_wrunlock();
> > + bdrv_drained_end(new_node_bs);
> > bdrv_drained_end(bs);
> > + bdrv_unref(bs);
> >
> > if (ret < 0) {
> > error_prepend(errp, "Could not replace node: ");
> > diff --git a/block/commit.c b/block/commit.c
> > index d92af02ead..2fecdce86f 100644
> > --- a/block/commit.c
> > +++ b/block/commit.c
> > @@ -68,6 +68,7 @@ static void commit_abort(Job *job)
> > {
> > CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
> > BlockDriverState *top_bs = blk_bs(s->top);
> > + BlockDriverState *commit_top_backing_bs;
> >
> > if (s->chain_frozen) {
> > bdrv_graph_rdlock_main_loop();
> > @@ -94,8 +95,12 @@ static void commit_abort(Job *job)
> > * XXX Can (or should) we somehow keep 'consistent read' blocked even
> > * after the failed/cancelled commit job is gone? If we already wrote
> > * something to base, the intermediate images aren't valid any more. */
> > - bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
> > - &error_abort);
> > + commit_top_backing_bs = s->commit_top_bs->backing->bs;
> > + bdrv_drained_begin(commit_top_backing_bs);
> > + bdrv_graph_wrlock(commit_top_backing_bs);
>
> Here, and elsewhere in the patch, drained_begin/end is outside
> wr(un)lock...
>
> > + bdrv_replace_node(s->commit_top_bs, commit_top_backing_bs,
> > &error_abort);
> > + bdrv_graph_wrunlock();
> > + bdrv_drained_end(commit_top_backing_bs);
> >
> > bdrv_unref(s->commit_top_bs);
> > bdrv_unref(top_bs);
> > @@ -425,7 +430,11 @@ fail:
> > /* commit_top_bs has to be replaced after deleting the block job,
> > * otherwise this would fail because of lack of permissions. */
> > if (commit_top_bs) {
> > + bdrv_graph_wrlock(top);
> > + bdrv_drained_begin(top);
> > bdrv_replace_node(commit_top_bs, top, &error_abort);
> > + bdrv_drained_end(top);
> > + bdrv_graph_wrunlock();
>
> ...but here you do it in the opposite order. Intentional?
No, this is actually wrong. bdrv_drained_begin() has a nested event
loop, and running a nested event loop while holding the graph lock can
cause deadlocks, so it's forbidden. Thanks for catching this!
> > +++ b/tests/unit/test-bdrv-drain.c
> > @@ -2000,7 +2000,13 @@ static void do_test_replace_child_mid_drain(int
> > old_drain_count,
> > parent_s->was_undrained = false;
> >
> > g_assert(parent_bs->quiesce_counter == old_drain_count);
> > + bdrv_drained_begin(old_child_bs);
> > + bdrv_drained_begin(new_child_bs);
> > + bdrv_graph_wrlock(NULL);
>
> Why is this locking on NULL instead of new_child_bs?
The parameter for bdrv_graph_wrlock() is a BDS whose AioContext is
locked and needs to be temporarily unlocked to avoid deadlocks. We don't
hold any AioContext lock here, so NULL is right.
> > bdrv_replace_node(old_child_bs, new_child_bs, &error_abort);
> > + bdrv_graph_wrunlock();
> > + bdrv_drained_end(new_child_bs);
> > + bdrv_drained_end(old_child_bs);
> > g_assert(parent_bs->quiesce_counter == new_drain_count);
> >
> > if (!old_drain_count && !new_drain_count) {
Since the two comments above are the only thing you found in the review,
I'll just directly fix them while applying the series.
Kevin
- Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK,
Kevin Wolf <=