qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drai


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin
Date: Wed, 7 Feb 2018 11:51:10 +0100
User-agent: Mutt/1.9.1 (2017-09-22)

Am 07.02.2018 um 02:48 hat Fam Zheng geschrieben:
> On Tue, Feb 6, 2018 at 11:32 PM, Kevin Wolf <address@hidden> wrote:
> > Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben:
> >> During block job completion, nothing is preventing
> >> block_job_defer_to_main_loop_bh from being called in a nested
> >> aio_poll(), which is a trouble, such as in this code path:
> >>
> >>     qmp_block_commit
> >>       commit_active_start
> >>         bdrv_reopen
> >>           bdrv_reopen_multiple
> >>             bdrv_reopen_prepare
> >>               bdrv_flush
> >>                 aio_poll
> >>                   aio_bh_poll
> >>                     aio_bh_call
> >>                       block_job_defer_to_main_loop_bh
> >>                         stream_complete
> >>                           bdrv_reopen
> >>
> >> block_job_defer_to_main_loop_bh is the last step of the stream job,
> >> which should have been "paused" by the bdrv_drained_begin/end in
> >> bdrv_reopen_multiple, but it is not done because it's in the form of a
> >> main loop BH.
> >>
> >> Similar to why block jobs should be paused between drained_begin and
> >> drained_end, BHs they schedule must be excluded as well.  To achieve
> >> this, this patch forces draining the BH in BDRV_POLL_WHILE.
> >>
> >> As a side effect this fixes a hang in block_job_detach_aio_context
> >> during system_reset when a block job is ready:
> >>
> >>     #0  0x0000555555aa79f3 in bdrv_drain_recurse
> >>     #1  0x0000555555aa825d in bdrv_drained_begin
> >>     #2  0x0000555555aa8449 in bdrv_drain
> >>     #3  0x0000555555a9c356 in blk_drain
> >>     #4  0x0000555555aa3cfd in mirror_drain
> >>     #5  0x0000555555a66e11 in block_job_detach_aio_context
> >>     #6  0x0000555555a62f4d in bdrv_detach_aio_context
> >>     #7  0x0000555555a63116 in bdrv_set_aio_context
> >>     #8  0x0000555555a9d326 in blk_set_aio_context
> >>     #9  0x00005555557e38da in virtio_blk_data_plane_stop
> >>     #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd
> >>     #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd
> >>     #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd
> >>     #13 0x00005555559f6a18 in virtio_pci_reset
> >>     #14 0x00005555559139a9 in qdev_reset_one
> >>     #15 0x0000555555916738 in qbus_walk_children
> >>     #16 0x0000555555913318 in qdev_walk_children
> >>     #17 0x0000555555916738 in qbus_walk_children
> >>     #18 0x00005555559168ca in qemu_devices_reset
> >>     #19 0x000055555581fcbb in pc_machine_reset
> >>     #20 0x00005555558a4d96 in qemu_system_reset
> >>     #21 0x000055555577157a in main_loop_should_exit
> >>     #22 0x000055555577157a in main_loop
> >>     #23 0x000055555577157a in main
> >>
> >> The rationale is that the loop in block_job_detach_aio_context cannot
> >> make any progress in pausing/completing the job, because bs->in_flight
> >> is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
> >> BH. With this patch, it does.
> >>
> >> Reported-by: Jeff Cody <address@hidden>
> >> Signed-off-by: Fam Zheng <address@hidden>
> >
> > Fam, do you remember whether this was really only about drain? Because
> > in that case...
> 
> Yes I believe so.
> 
> >
> >> diff --git a/include/block/block.h b/include/block/block.h
> >> index 97d4330..5ddc0cf 100644
> >> --- a/include/block/block.h
> >> +++ b/include/block/block.h
> >> @@ -381,12 +381,13 @@ void bdrv_drain_all(void);
> >>
> >>  #define BDRV_POLL_WHILE(bs, cond) ({                       \
> >>      bool waited_ = false;                                  \
> >> +    bool busy_ = true;                                     \
> >>      BlockDriverState *bs_ = (bs);                          \
> >>      AioContext *ctx_ = bdrv_get_aio_context(bs_);          \
> >>      if (aio_context_in_iothread(ctx_)) {                   \
> >> -        while ((cond)) {                                   \
> >> -            aio_poll(ctx_, true);                          \
> >> -            waited_ = true;                                \
> >> +        while ((cond) || busy_) {                          \
> >> +            busy_ = aio_poll(ctx_, (cond));                \
> >> +            waited_ |= !!(cond) | busy_;                   \
> >>          }                                                  \
> >>      } else {                                               \
> >>          assert(qemu_get_current_aio_context() ==           \
> >> @@ -398,11 +399,16 @@ void bdrv_drain_all(void);
> >>           */                                                \
> >>          assert(!bs_->wakeup);                              \
> >>          bs_->wakeup = true;                                \
> >> -        while ((cond)) {                                   \
> >> -            aio_context_release(ctx_);                     \
> >> -            aio_poll(qemu_get_aio_context(), true);        \
> >> -            aio_context_acquire(ctx_);                     \
> >> -            waited_ = true;                                \
> >> +        while (busy_) {                                    \
> >> +            if ((cond)) {                                  \
> >> +                waited_ = busy_ = true;                    \
> >> +                aio_context_release(ctx_);                 \
> >> +                aio_poll(qemu_get_aio_context(), true);    \
> >> +                aio_context_acquire(ctx_);                 \
> >> +            } else {                                       \
> >> +                busy_ = aio_poll(ctx_, false);             \
> >> +                waited_ |= busy_;                          \
> >> +            }                                              \
> >>          }                                                  \
> >>          bs_->wakeup = false;                               \
> >>      }                                                      \
> >
> > ...fixing it in BDRV_POLL_WHILE() rather than the drain functions may
> > have been a bad idea.
> >
> > I got a bug assigned where we have a large (200 GB) fragmented qcow2
> > image, and qemu-img convert takes two hours before it even starts to
> > copy data. What it does in this time is iterating through the whole
> > image with bdrv_block_status_above() in order to estimate the work to be
> > done (and of course it will repeat the same calls while actually copying
> > data).
> 
> The convert_iteration_sectors loop looks wasteful. Why cannot we
> report progress simply based on (offset/size) * 100% so we don't need
> to do this estimation?

The assumption is that it's quick and it makes the progress much more
meaningful. You know those progress bars that slowly crawl towards 50%
and then suddenly complete within a second. Or the first 20% are quick,
but then things get really slow. They are not a great example.

There must have been something wrong with that image file that was
reported, because they reported that it was the only image causing
trouble and if they copied it away, it became quick, too.

Even for a maximally fragmented 100 GB qcow2 image it only takes about a
second on my laptop. So while I don't feel as certain about the loop as
before, in practice it normally doesn't seem to hurt.

> >
> > One of the things I saw is that between each pair of lseek() calls, we
> > also have unnecessary poll() calls, and these were introduced by this
> > patch. If I modify bdrv_common_block_status_above() so that it doesn't
> > poll if data.done == true already, the time needed to iterate through my
> > test image is cut in half.
> >
> > Now, of course, I'm still only seeing a few seconds for a 100 GB image,
> > so there must be more that is wrong for the reporter, but it suggests to
> > me that BDRV_POLL_WHILE() shouldn't be polling unconditionally when only
> > one of its users actually needs this.
> 
> Sounds fine to me. Maybe we could add a boolean parameter to
> BDRV_POLL_WHILE?

Why not call aio_poll() explicitly in the BDRV_POLL_WHILE() condition in
the one place that needs it?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]