qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 07/16] block: change drain to look only at one c


From: Stefan Hajnoczi
Subject: Re: [Qemu-block] [PATCH 07/16] block: change drain to look only at one child at a time
Date: Wed, 16 Mar 2016 16:39:08 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Feb 16, 2016 at 06:56:19PM +0100, Paolo Bonzini wrote:
> bdrv_requests_pending is checking children to also wait until internal
> requests (such as metadata writes) have completed.  However, checking
> children is in general overkill because, apart from this special case,
> the parent's in_flight count will always be incremented by at least one
> for every request in the child.
> 
> Since internal requests are only generated by the parent in the child,
> instead visit the tree parent first, and then wait for internal I/O in
> the children to complete.

This assumption is true if the BDS graph is a tree.  When there a
multiple roots (i.e. it's a directed acyclic graph), especially with
roots at different levels, then I wonder if pre-order bdrv_drain
traversal leaves open the possibility that I/Os sneak into parts of the
BDS graph that we thought was already drained.

The advantage of the current approach is that it really wait until the
whole (sub)tree is idle.

Concrete example: image fleecing involves adding an ephemeral qcow2 file
which is exported via NBD.  The guest keeps writing to the disk image as
normal but the original data will be backed up to the ephemeral qcow2
file before being overwritten, so that the client sees a point-in-time
snapshot of the disk image.

The tree looks like this:

          [NBD export]
             /
            v
[guest] temporary qcow2
   \    /
    v  v
    disk

Block backend access is in square brackets.  Nodes without square
brackets are BDS nodes.

If the guest wants to drain the disk, it's possible for new I/O requests
to enter the disk BDS while we're recursing to disk's children because
the NBD export socket fd is in the same AIOContext.  The socket fd is
therefore handled during aio_poll() calls.

I'm not 100% sure that this is a problem, but I wonder if you've thought
about this?

> Signed-off-by: Paolo Bonzini <address@hidden>
> ---
>  block/io.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index a9a23a6..e0c9215 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -249,6 +249,23 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>      }
>  }
>  
> +static bool bdrv_drain_io_recurse(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    bool waited = false;
> +
> +    while (atomic_read(&bs->in_flight) > 0) {
> +        aio_poll(bdrv_get_aio_context(bs), true);
> +        waited = true;
> +    }
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        waited |= bdrv_drain_io_recurse(child->bs);
> +    }
> +
> +    return waited;
> +}
> +
>  /*
>   * Wait for pending requests to complete on a single BlockDriverState 
> subtree,
>   * and suspend block driver's internal I/O until next request arrives.
> @@ -265,10 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
>      bdrv_no_throttling_begin(bs);
>      bdrv_io_unplugged_begin(bs);
>      bdrv_drain_recurse(bs);
> -    while (bdrv_requests_pending(bs)) {
> -        /* Keep iterating */
> -         aio_poll(bdrv_get_aio_context(bs), true);
> -    }
> +    bdrv_drain_io_recurse(bs);
>      bdrv_io_unplugged_end(bs);
>      bdrv_no_throttling_end(bs);
>  }
> @@ -319,10 +333,7 @@ void bdrv_drain_all(void)
>              aio_context_acquire(aio_context);
>              while ((bs = bdrv_next(bs))) {
>                  if (aio_context == bdrv_get_aio_context(bs)) {
> -                    if (bdrv_requests_pending(bs)) {
> -                        aio_poll(aio_context, true);
> -                        waited = true;
> -                    }
> +                    waited |= bdrv_drain_io_recurse(bs);
>                  }
>              }
>              aio_context_release(aio_context);
> -- 
> 2.5.0
> 
> 

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]