qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 o


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
Date: Wed, 16 Mar 2016 18:18:19 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Feb 16, 2016 at 06:56:12PM +0100, Paolo Bonzini wrote:
> So the fine-grained locking series has grown from 2 parts to 3...
> 
> This first part stops where we remove RFifoLock.
> 
> In the previous submission, the bulk of the series took care of
> associating an AioContext to a thread, so that aio_poll could run event
> handlers only on that thread.  This was done to fix a race in bdrv_drain.
> There were two disadvantages:
> 
> 1) reporting progress from aio_poll to the main thread was Really Hard.
> Despite being relatively sure of the correctness---also thanks to formal
> models---it's not the kind of code that I'd lightly donate to posterity.
> 
> 2) the strict dependency between bdrv_drain and a single AioContext
> would have been bad for multiqueue.
> 
> Instead, this series does the same trick (do not run dataplane event handlers
> from the main thread) but reports progress directly at the BlockDriverState
> level.
> 
> To do so, the mechanism to track in-flight requests is changed to a
> simple counter.  This is more flexible than BdrvTrackedRequest, and lets
> the block/io.c code track precisely the initiation and completion of the
> requests.  In particular, the counter remains nonzero when a bottom half
> is required to "really" complete the request.  bdrv_drain doesn't rely
> anymore on a non-blocking aio_poll to execute those bottom halves.
> 
> It is also more efficient; while BdrvTrackedRequest has to remain
> for request serialisation, with this series we can in principle make
> BdrvTrackedRequest optional and enable it only when something (automatic
> RMW or copy-on-read) requires request serialisation.
> 
> In general this flows nicely, the only snag being patch 5.  The problem
> here is that mirror is calling bdrv_drain from an AIO callback, which
> used to be okay (because bdrv_drain more or less tried to guess when
> all AIO callbacks were done) but now causes a deadlock (because bdrv_drain
> really checks for AIO callbacks to be complete).  To add to the complication,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters.  The situation here is admittedly underdefined,
> and Stefan pointed out that the same solution is found in many other places
> in the QEMU block layer.  Therefore I think it's acceptable.  I'm pointing
> it out explicitly though, because (together with patch 8) it is an example
> of the issues caused by the new, stricter definition of bdrv_drain.
> 
> Patches 1-7 organize bdrv_drain into separate phases, with a well-defined
> order between the BDS and it children.  The phases are: 1) disable
> throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait
> for I/O to finish; 5) re-enable io_plug and throttling.  Previously,
> throttling of throttling and io_plug were handled by bdrv_flush_io_queue,
> which was repeatedly called as part of the I/O wait loop.  This part of
> the series removes bdrv_flush_io_queue.
> 
> Patch 8-11 replace aio_poll with bdrv_drain so that the work done
> so far applies to all former callers of aio_poll.
> 
> Patches 12-13 introduce the logic that lets the main thread wait remotely
> for an iothread to drain the BDS.  Patches 14-16 then achieve the purpose
> of this series, removing the long-running AioContext critical section
> from iothread_run and removing RFifoLock.
> 
> The series passes check-block.sh with a few fixes applied for pre-existing
> bugs:
> 
>     vl: fix migration from prelaunch state
>     migration: fix incorrect memory_global_dirty_log_start outside BQL
>     qed: fix bdrv_qed_drain
> 
> Paolo
> 
> Paolo Bonzini (16):
>   block: make bdrv_start_throttled_reqs return void
>   block: move restarting of throttled reqs to block/throttle-groups.c
>   block: introduce bdrv_no_throttling_begin/end
>   block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
>   mirror: use bottom half to re-enter coroutine
>   block: add BDS field to count in-flight requests
>   block: change drain to look only at one child at a time
>   blockjob: introduce .drain callback for jobs
>   block: wait for all pending I/O when doing synchronous requests
>   nfs: replace aio_poll with bdrv_drain
>   sheepdog: disable dataplane
>   aio: introduce aio_context_in_iothread
>   block: only call aio_poll from iothread
>   iothread: release AioContext around aio_poll
>   qemu-thread: introduce QemuRecMutex
>   aio: convert from RFifoLock to QemuRecMutex
> 
>  async.c                         |  28 +---
>  block.c                         |   4 +-
>  block/backup.c                  |   7 +
>  block/io.c                      | 281 
> +++++++++++++++++++++++++---------------
>  block/linux-aio.c               |  13 +-
>  block/mirror.c                  |  37 +++++-
>  block/nfs.c                     |  50 ++++---
>  block/qed-table.c               |  16 +--
>  block/qed.c                     |   4 +-
>  block/raw-aio.h                 |   2 +-
>  block/raw-posix.c               |  16 +--
>  block/sheepdog.c                |  19 +++
>  block/throttle-groups.c         |  19 +++
>  blockjob.c                      |  16 ++-
>  docs/multiple-iothreads.txt     |  40 +++---
>  include/block/aio.h             |  13 +-
>  include/block/block.h           |   3 +-
>  include/block/block_int.h       |  22 +++-
>  include/block/blockjob.h        |   7 +
>  include/block/throttle-groups.h |   1 +
>  include/qemu/rfifolock.h        |  54 --------
>  include/qemu/thread-posix.h     |   6 +
>  include/qemu/thread-win32.h     |  10 ++
>  include/qemu/thread.h           |   3 +
>  iothread.c                      |  20 +--
>  stubs/Makefile.objs             |   1 +
>  stubs/iothread.c                |   8 ++
>  tests/.gitignore                |   1 -
>  tests/Makefile                  |   2 -
>  tests/qemu-iotests/060          |   8 +-
>  tests/qemu-iotests/060.out      |   4 +-
>  tests/test-aio.c                |  22 ++--
>  tests/test-rfifolock.c          |  91 -------------
>  util/Makefile.objs              |   1 -
>  util/qemu-thread-posix.c        |  13 ++
>  util/qemu-thread-win32.c        |  25 ++++
>  util/rfifolock.c                |  78 -----------
>  37 files changed, 471 insertions(+), 474 deletions(-)
>  delete mode 100644 include/qemu/rfifolock.h
>  create mode 100644 stubs/iothread.c
>  delete mode 100644 tests/test-rfifolock.c
>  delete mode 100644 util/rfifolock.c

Looks good overall.  I'm a little nervous about merging it for QEMU 2.6
but the block job, NBD, and data plane tests should give it a good
workout.

I have posted comments on a few patches.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]