qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v14 08/14] block: Support dropping active in bdr


From: Jeff Cody
Subject: Re: [Qemu-devel] [PATCH v14 08/14] block: Support dropping active in bdrv_drop_intermediate
Date: Thu, 20 Feb 2014 00:57:31 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Feb 20, 2014 at 12:37:17PM +0800, Fam Zheng wrote:
> On Wed, 02/19 18:24, Jeff Cody wrote:
> > On Wed, Feb 19, 2014 at 04:22:30PM -0500, Jeff Cody wrote:
> > > On Wed, Feb 19, 2014 at 09:42:25PM +0800, Fam Zheng wrote:
> > > > Dropping intermediate could be useful both for commit and stream, and
> > > > BDS refcnt plus bdrv_swap could do most of the job nicely. It also needs
> > > > to work with op blockers.
> > > > 
> > > > Signed-off-by: Fam Zheng <address@hidden>
> > > > ---
> > > >  block.c        | 146 
> > > > +++++++++++++++++++++++++--------------------------------
> > > >  block/commit.c |   1 +
> > > >  2 files changed, 66 insertions(+), 81 deletions(-)
> > > > 
> > > > diff --git a/block.c b/block.c
> > > > index a2bf24c..cf41f3d 100644
> > > > --- a/block.c
> > > > +++ b/block.c
> > > > @@ -2485,115 +2485,99 @@ BlockDriverState 
> > > > *bdrv_find_overlay(BlockDriverState *active,
> > > >      return overlay;
> > > >  }
> > > >  
> > > > -typedef struct BlkIntermediateStates {
> > > > -    BlockDriverState *bs;
> > > > -    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
> > > > -} BlkIntermediateStates;
> > > > -
> > > > -
> > > >  /*
> > > > - * Drops images above 'base' up to and including 'top', and sets the 
> > > > image
> > > > - * above 'top' to have base as its backing file.
> > > > + * Drops images above 'base' up to and including 'top', and sets new 
> > > > 'base'
> > > > + * as backing_hd of top_overlay (the image orignally has 'top' as 
> > > > backing
> > > 
> > > What is 'top_overlay'?  Do you mean "top's overlay" by this?
> 
> Yes, as noted in the parenthesis.
>

I would just say "top's overlay".  What I found confusing by that, is
when you reference something like 'top_overlay', it looks like an
actual variable name.  So I was searching for that variable name, and
wondered if it was just vestigial from an earlier revision.  Maybe
that is just me, though :)

> > > 
> > > > + * file). top_overlay may be NULL if 'top' is active, no such update 
> > > > needed.
> > > > + * Requires that the top_overlay to 'top' is opened r/w.
> > > >   *
> > > > - * Requires that the overlay to 'top' is opened r/w, so that the 
> > > > backing file
> > > > - * information in 'bs' can be properly updated.
> > > > + * 1) This will convert the following chain:
> > > >   *
> > > > - * E.g., this will convert the following chain:
> > > > - * bottom <- base <- intermediate <- top <- active
> > > > + *     ... <- base <- ... <- top <- overlay <-... <- active
> > > >   *
> > > >   * to
> > > >   *
> > > > - * bottom <- base <- active
> > > > + *     ... <- base <- overlay <- active
> > > >   *
> > > > - * It is allowed for bottom==base, in which case it converts:
> > > > + * 2) It is allowed for bottom==base, in which case it converts:
> > > >   *
> > > > - * base <- intermediate <- top <- active
> > > > + *     base <- ... <- top <- overlay <- ... <- active
> > > >   *
> > > >   * to
> > > >   *
> > > > - * base <- active
> > > > + *     base <- overlay <- active
> > > > + *
> > > > + * 2) It also allows active==top, in which case it converts:
> > > > + *
> > > > + *     ... <- base <- ... <- top (active)
> > > > + *
> > > > + * to
> > > > + *
> > > > + *     ... <- base == active == top
> > > > + *
> > > > + * i.e. only base and lower remains: *top == *base when return.
> > > > + *
> > > > + * 3) If base==NULL, it will drop all the BDS below overlay and set its
> > > > + * backing_hd to NULL. I.e.:
> > > > + *
> > > > + *     base(NULL) <- ... <- overlay <- ... <- active
> > > > + *
> > > > + * to
> > > >   *
> > > > - * Error conditions:
> > > > - *  if active == top, that is considered an error
> > > > + *     overlay <- ... <- active
> > > >   *
> > > >   */
> > > >  int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState 
> > > > *top,
> > > >                             BlockDriverState *base)
> > > 
> > > With the active case, we aren't necessarily really just dropping
> > > intermediate images anymore. Maybe we should rename this function now to
> > > 'bdrv_rebase_chain()'?
> > > 
> > > >  {
> > > > -    BlockDriverState *intermediate;
> > > > -    BlockDriverState *base_bs = NULL;
> > > > -    BlockDriverState *new_top_bs = NULL;
> > > > -    BlkIntermediateStates *intermediate_state, *next;
> > > > -    int ret = -EIO;
> > > > -
> > > > -    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) 
> > > > states_to_delete;
> > > > -    QSIMPLEQ_INIT(&states_to_delete);
> > > > -
> > > > -    if (!top->drv || !base->drv) {
> > > > -        goto exit;
> > > > -    }
> > > > -
> > > > -    new_top_bs = bdrv_find_overlay(active, top);
> > > > +    BlockDriverState *drop_start, *overlay;
> > > > +    int ret = -EINVAL;
> > > >  
> > > > -    if (new_top_bs == NULL) {
> > > > -        /* we could not find the image above 'top', this is an error */
> > > > +    if (!top->drv || (base && !base->drv)) {
> > > >          goto exit;
> > > >      }
> > > > -
> > > > -    /* special case of new_top_bs->backing_hd already pointing to base 
> > > > - nothing
> > > > -     * to do, no intermediate images */
> > > > -    if (new_top_bs->backing_hd == base) {
> > > > +    if (top == base) {
> > > >          ret = 0;
> > > > -        goto exit;
> > > > -    }
> > > > -
> > > > -    intermediate = top;
> > > > -
> > > > -    /* now we will go down through the list, and add each BDS we find
> > > > -     * into our deletion queue, until we hit the 'base'
> > > > -     */
> > > > -    while (intermediate) {
> > > > -        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
> > > > -        intermediate_state->bs = intermediate;
> > > > -        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, 
> > > > entry);
> > > > -
> > > > -        if (intermediate->backing_hd == base) {
> > > > -            base_bs = intermediate->backing_hd;
> > > > -            break;
> > > > +    } else if (top == active) {
> > > > +        assert(base);
> > > > +        drop_start = active->backing_hd;
> > > > +        bdrv_swap(active, base);
> > > > +        base->backing_hd = NULL;
> > > > +        bdrv_unref(drop_start);
> > > > +        ret = 0;
> > > > +    } else {
> > > > +        /* If there's an overlay, its backing_hd points to top's BDS 
> > > > now,
> > > > +         * the top image is dropped but this BDS structure is kept and 
> > > > swapped
> > > > +         * with base, this way we keep the pointers valid after 
> > > > dropping top */
> > > > +        overlay = bdrv_find_overlay(active, top);
> > > > +        if (!overlay) {
> > > > +            goto exit;
> > > > +        }
> > > > +        if (base) {
> > > > +            ret = bdrv_change_backing_file(overlay, base->filename,
> > > > +                                           base->drv->format_name);
> > > > +        } else {
> > > > +            ret = bdrv_change_backing_file(overlay, NULL, NULL);
> > > > +        }
> > > > +        if (ret) {
> > > > +            goto exit;
> > > > +        }
> > > > +        if (base) {
> > > > +            drop_start = top->backing_hd;
> > > > +            bdrv_swap(top, base);
> > > > +            /* Break the loop formed by bdrv_swap */
> > > > +            bdrv_set_backing_hd(base, NULL);
> > > 
> > > And in the non-active case here, everything between top->backing_hd
> > > and the original base is orphaned as well.  These should all be
> > > explicitly unreferenced.
> > 
> > Same here, bdrv_unref() will eventually go through the chain, starting
> > from top->backing_hd.  But this is a problem; won't we end up in a
> > loop then?
> 
> Although the content is swapped, the pointer is not:
> 
> (I presume your "[base]" and "[top]" are denoting content, not pointer)
>

Correct.  But part of the content that is swapped, are the backing_hd
pointers.

> > 
> > Take this chain:
> > 
> > drop_start = [A]
> > 
> >     |||-- ([base]) <-- [B] <--- [A] <--- ([top]) <--- [active]
>                ^                              ^
>                |                              |
>               base                           top
> > 
> > 
> > bdrv_swap(top, base):
> > 
> >     -- [B] <-- [A] <-- ([top])    |||--- ([base]) <-- [active]
>                             ^                 ^
>                             |                 |
>                            base               top
> >     |                    ^
> >     |                    |
> >     ---------------------
> > 

Correct, those are the pointers.

> > Then we call bdrv_unref(drop_start (or bdrv_set_backing_hd() does),
> > and we end up with:
> > 

dropping an anchor here: [1]

> > bdrv_unref(A)
> >     bdrv_unref(B)
> >         bdrv_unref(top)
> >             bdrv_unref(A) <--- assert
> >                 .....
> >             
> > 
> > So I think we want this line:
> > 
> > > > +            bdrv_set_backing_hd(base, NULL);
> 
> so, this breaks the chain,

Yes, you are right, we want base->backing_hd to be NULL.  But the
chain has not been broken yet.

The loop [1] still exists, because once we enter bdrv_set_backing_hd()
we begin to call bdrv_unref(A). And base_ptr->backing_hd still points
to A, and B will point to base_ptr.

Here is the first part of bdrv_set_backing_hd():
    if (bs->backing_hd) {
        bdrv_op_unblock_all(bs->backing_hd, bs->backing_blocker);
        bdrv_unref(bs->backing_hd);


> 
> > 
> > To be:
> > 
> > > > +            bdrv_set_backing_hd(top, NULL);
> 
> This will lose track of original base's backing_hd.

Right, we don't want that, sorry...  I shouldn't have written that, my
brain failed me.  I mentally conflated top and [top].

> 
> So I think we are OK here.
>

I don't think we are, we still need to address the backing_hd loop,
and I think it needs to be done here, where we have the information.

> But I find that a fix is needed in bdrv_set_backing_hd to handle the rebase
> correctly.
>

What we really want, prior to starting to unref anything, is to set
the drop_start = base_ptr->backing_hd, and then set
base_ptr->backing_hd = NULL.  Then the bdrv_unref(drop_start) will
perform as expected (see [2], below).

And, at least in the usage here, we probably don't want
bdrv_set_backing_hd() to unref anything for us, but I'm sure there is
some way to make that work if it is cleaner that way.

That will get you what I was originally trying to get at in my
previous email, when I unfortunately conflated top contents with top
pointer:

> > 
> > 
> > Right?  Or, just set top->backing_hd = NULL, so we get:
                         ^^^ 
please read this as 'base' (as in base_ptr)
    

anchor [2]:

> > 
> >     -- [B] <-- [A]   |||-- ([top])    |||--- ([base]) <-- [active]
> >     |                         ^
> >     |                         |
> >     ---------------------------

base_ptr->backing_hd is set to NULL first ^^

then the bdrv_unref(drop_start):

> > 
> > bdrv_unref(A)
> >     bdrv_unref(B)
> >         bdrv_unref(top)
> > 
> > 
> > Which leaves:
> > 
> >     |||--- ([base]) <-- [active]
> > 
> > 
> > So this part above still needs addressing, I think.
> > 
> > > 
> > > Also, side effect:
> > > Caller needs to beware now that base and top are now swapped [1].
> > > 
> > > > +        } else {
> > > > +            bdrv_set_backing_hd(overlay, NULL);
> > > > +            drop_start = top;
> > > 
> > > Again, everything between top and the original base is orphaned, but
> > > should be cleaned up.
> > >
> > > Caller does not have to worry about base and top being swapped [1].
> > >
> > 
> > This should be fine, I think.
> > 
> > 
> > I think everything else I mentioned below this point is still
> > relevant, however.
> > 
> > > 
> > > >          }
> > > > -        intermediate = intermediate->backing_hd;
> > > > -    }
> > > > -    if (base_bs == NULL) {
> > > > -        /* something went wrong, we did not end at the base. safely
> > > > -         * unravel everything, and exit with error */
> > > > -        goto exit;
> > > > -    }
> > > > -
> > > > -    /* success - we can delete the intermediate states, and link 
> > > > top->base */
> > > > -    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
> > > > -                                   base_bs->drv ? 
> > > > base_bs->drv->format_name : "");
> > > > -    if (ret) {
> > > > -        goto exit;
> > > > -    }
> > > > -    new_top_bs->backing_hd = base_bs;
> > > > -
> > > > -    bdrv_refresh_limits(new_top_bs);
> > > >  
> > > > -    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, 
> > > > entry, next) {
> > > > -        /* so that bdrv_close() does not recursively close the chain */
> > > > -        intermediate_state->bs->backing_hd = NULL;
> > > > -        bdrv_unref(intermediate_state->bs);
> > > > +        bdrv_unref(drop_start);
> > > 
> > > We will get an assertion here.  In the non-active case, the backing_hd
> > > is explicitly set to NULL via bdrv_set_backing_hd().  That function
> > > will call bdrv_unref() on the same BDS that drop_start was assigned,
> > > so we have a double call to bdrv_unref().
> > > 
> > > >      }
> > > > -    ret = 0;
> > > > -
> > > >  exit:
> > > > -    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, 
> > > > entry, next) {
> > > > -        g_free(intermediate_state);
> > > > -    }
> > > >      return ret;
> > > >  }
> > > >  
> > > > -
> > > >  static int bdrv_check_byte_request(BlockDriverState *bs, int64_t 
> > > > offset,
> > > >                                     size_t size)
> > > >  {
> > > > diff --git a/block/commit.c b/block/commit.c
> > > > index acec4ac..b10eb79 100644
> > > > --- a/block/commit.c
> > > > +++ b/block/commit.c
> > > > @@ -142,6 +142,7 @@ wait:
> > > >      if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> > > >          /* success */
> > > >          ret = bdrv_drop_intermediate(active, top, base);
> > > > +        base = top;
> > > 
> > > This is where it is highlighted to me how odd it is to use the side
> > > effects of bdrv_swap() in bdrv_drop_intermediate() for the non-active
> > > layer case.
> > > 
> > > The function bdrv_drop_intermediate() is now actually pretty complex
> > > and tricky to use, with side effects that the caller needs to beware
> > > of, that change depending on the nature of the arguments passed.
> > > 
> > > [1] Side affects, depending on active, top, and base:
> > > 
> > >       active = top |  base = NULL  |   side effect
> > >     -----------------------------------------------
> > > (A) false          |     false     |  top and base are swapped
> > > (B) false          |     true      |  none
> > > (C) true           |     false     |  top and base are swapped
> > > (D) true           |     true      |  assert()
> > > 
> > > 
> > > Case (C) is reasonable, because active and base need to be swapped,
> > > and top == active.  It is expected almost by definition.
> > > 
> > > Case (A) is a bit odd, especially in light of case (B).
> 
> Makes sense, I will remove this side effect.
>
> > > 
> > > 
> > > >      }
> > > >  
> > > >  exit_free_buf:
> > > 
> > > 
> > > Further down, out of the context of this patch, we have:
> > > 
> > > 
> > >  exit_restore_reopen:
> > >      /* restore base open flags here if appropriate (e.g., change the 
> > > base back
> > >       * to r/o). These reopens do not need to be atomic, since we won't 
> > > abort
> > >       * even on failure here */
> > >      if (s->base_flags != bdrv_get_flags(base)) {
> > >          bdrv_reopen(base, s->base_flags, NULL);
> > >      }
> > > 
> > > OK, 'base' is the one we want to operate on now, that was set to
> > > 'top', which has the contents of the old 'base'.
> > > 
> 
> If I remove the swap, we don't need to set base to top here.
>

Yes, that will keep the usage more consistent, thanks.

> > > 
> > >      overlay_bs = bdrv_find_overlay(active, top);
> > > 
> > > Will we find the right overlay here?  I think now overlay_bs will
> > > always be NULL, so we won't restore the r/o flags (if set) for the
> > > overlay of the original 'top'.
> > > 
> > >      if (overlay_bs && s->orig_overlay_flags != 
> > > bdrv_get_flags(overlay_bs)) {
> > >          bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
> > >      }
> 
> Yes, need a fix here.
> 
> Thanks,
> Fam



reply via email to

[Prev in Thread] Current Thread [Next in Thread]