Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order am

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order am

From:	Max Reitz
Subject:	Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
Date:	Wed, 12 Nov 2014 10:55:34 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

On 2014-11-12 at 05:15, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz <address@hidden>
---
  block/qcow2-refcount.c | 424 +++++++++++++++++++++++++++++++++++++++++++++++++
  block/qcow2.h          |   4 +
  2 files changed, 428 insertions(+)

This is fairly big; you may want to get a second review from a
maintainer rather than blindly trusting me.

I didn't really see the point in splitting it up. Introducing the statichelper functions first would not be very helpful either, so I thoughtwhat I'd do as a reviewer, and that was "apply the patch and just readthrough all the code". Splitting it into multiple patches would not havehelped there (and I don't see how I could split this patch into logicalchanges, where at the start we have some inefficient but simpleimplementation and it gets better over time).

I'll need a review from a maintainer anyway, but I won't get one withoutbeing able to show another review first...

My review was not linear, but I left the email in linear order.  Feel
free to ask for clarification if my presentation is too hard to follow.

+
+/**
+ * This "operation" for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+                          uint64_t reftable_index, uint64_t *reftable_size,
+                          void *refblock, bool refblock_empty, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+
+    if (!refblock_empty && reftable_index >= *reftable_size) {
+        uint64_t *new_reftable;
+        uint64_t new_reftable_size;
+
+        new_reftable_size = ROUND_UP(reftable_index + 1,
+                                     s->cluster_size / sizeof(uint64_t));
+        if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+            error_setg(errp,
+                       "This operation would make the refcount table grow "
+                       "beyond the maximum size supported by QEMU, aborting");
+            return -ENOTSUP;
+        }
+
+        new_reftable = g_try_realloc(*reftable, new_reftable_size *
+                                                sizeof(uint64_t));

Safe from overflow based on checks a few lines earlier.  Good.

+        if (!new_reftable) {
+            error_setg(errp, "Failed to increase reftable buffer size");
+            return -ENOMEM;
+        }
+
+        memset(new_reftable + *reftable_size, 0,
+               (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+        *reftable      = new_reftable;
+        *reftable_size = new_reftable_size;

Just to check my math here:

Suppose we have an image with 512-byte clusters, and are changing from
16-bit refcount (order 4) to 64-bit refcount.  Also, suppose the
existing image has exactly filled a one-cluster refcount table (that is,
there are 64 refcount blocks, each describing at a refblock with all 256
refcount entries full, for a total image size of exactly 8M).  The
original image occupies a header (1 cluster), L1 and L2 tables, and
data; but 65 of the 16k clusters tied up in the image are dedicated to
the refcount structures.

Meanwhile, the new refcount table will have to point to 256 refcount
blocks, each holding only 64 entries, which in turn implies that the
refcount table now has to be at least 4 clusters long.  But as this
requires at least 260 clusters to represent, then even if we were able
to reuse the 65 clusters of the original table, we'd still be allocating
at least 195 clusters; in reality, your code doesn't free any old
clusters until after allocating the new, because it is easier to keep
the old table live until the new table is populated.  The process of
allocating the new clusters means we actually end up with a new refcount
table of 5 clusters long, where not all 320 refblocks will be populated.
  But as long as we are keeping the old table up-to-date for the refblock
allocations, it ALSO means that we caused a rollover of the old table
from 1 cluster into 2, which itself consumes several clusters (the
larger table must be contiguous, and we must also set up a refblock to
describe the larger table, so we've added at least three clusters
associated to the original table during the course of preparing the new
table).

Hmm - that means I found a bug in your implementation.  See [3] below.

+    }
+
+    if (refblock_empty) {
+        if (reftable_index < *reftable_size) {
+            (*reftable)[reftable_index] = 0;

Necessary since you used g_try_realloc which leaves the new reftable
uninitialized.  Reasonable (rather than a memset) since the caller will
be visiting every single refblock in the table anyways.

+        }
+    } else {
+        offset = qcow2_alloc_clusters(bs, s->cluster_size);

As mentioned above, this action will potentially change
s->refcount_table_size of the original table, which in turn makes the
caller execute its loops more often to cover the increased allocation.
Does qcow2_alloc_clusters() guarantee that the just-allocated cluster is
zero-initialized (and/or should we add a flag to the function to allow
the caller to choose whether to force zero allocation instead of leaving
uninitialized)?  See [4] below for why I ask.

+        if (offset < 0) {
+            error_setg_errno(errp, -offset, "Failed to allocate refblock");
+            return offset;
+        }
+        (*reftable)[reftable_index++] = offset;
+    }
+
+    return 0;
+}
+
+/**
+ * This "operation" for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+                          uint64_t reftable_index, uint64_t *reftable_size,
+                          void *refblock, bool refblock_empty, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+    int ret;
+
+    if (refblock_empty) {
+        if (reftable_index < *reftable_size) {
+            assert((*reftable)[reftable_index] == 0);
+        }
+    } else {
+        /* The first pass with alloc_refblock() made the reftable large enough
+         */
+        assert(reftable_index < *reftable_size);

Okay, I see why you couldn't hoist this assert outside of the if - the
caller may call this with refblock_empty for any refblocks at the tail
of the final partial reftable cluster.

+        offset = (*reftable)[reftable_index];
+        assert(offset != 0);
+
+        ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Overlap check failed");
+            return ret;
+        }
+
+        ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to write refblock");
+            return ret;

If we fail here, do we leak all clusters written so far?  At least the
image is still consistent.  After reading further, I think I answered
myself at point [5].

+        }
+    }
+
+    return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced 
refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @operation is expected to combine the @new_refblock and its entry in the new
+ * reftable (which is described by the parameters starting with "reftable").
+ * @refblock_empty is set if all entries in the refblock are zero.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status 
callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the 
total
+ * number of walk_over_reftable() calls per amend operation. Both are used for
+ * calculating the parameters for the status callback.

Nice writeup; I was referring to it frequently during review.

+ */
+static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
+                              uint64_t *new_reftable_index,
+                              uint64_t *new_reftable_size,
+                              void *new_refblock, int new_refblock_size,
+                              int new_refcount_bits,
+                              int (*operation)(BlockDriverState *bs,
+                                               uint64_t **reftable,
+                                               uint64_t reftable_index,
+                                               uint64_t *reftable_size,
+                                               void *refblock,
+                                               bool refblock_empty,
+                                               Error **errp),

Worth a typedef?  Maybe not; I managed.

+                              Qcow2SetRefcountFunc *new_set_refcount,
+                              BlockDriverAmendStatusCB *status_cb,
+                              void *cb_opaque, int index, int total,
+                              Error **errp)

After several reads of the patch, I see that this walk function gets
called twice - first with a NULL new_set_refcount (merely to figure out
how big the new reftable should be, as well as allocating all necessary
non-zero refcount blocks, but not committing the top-level reftable to
any particular file location); the second walk then commits the new
refcounts to disk (updating each non-zero entry in all the new refcount
blocks to match their original counterparts, but no allocation
required).  Pretty slick to ensure that we are sure that the new table
is feasible before actually swapping over to it, while still allowing a
fairly clean rollback on early failure.

+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t reftable_index;
+    bool new_refblock_empty = true;
+    int refblock_index;
+    int new_refblock_index = 0;
+    int ret;
+
+    for (reftable_index = 0; reftable_index < s->refcount_table_size;
+         reftable_index++)

Outer loop - for each cluster of the top-level reference table, visit
each child table and update the status callback.  On the first walk,
s->refcount_table_size might be increasing during calls to operation().

+    {
+        uint64_t refblock_offset = s->refcount_table[reftable_index]
+                                 & REFT_OFFSET_MASK;
+
+        status_cb(bs, (uint64_t)index * s->refcount_table_size + 
reftable_index,
+                  (uint64_t)total * s->refcount_table_size, cb_opaque);
+

This never quite reaches 100%, and the caller also never reaches 100%.
I think you want one more call to status_cb() at the end of the loop (at
either site [1] or [2]) that passes an equal index and total to make it
obvious that this (portion of the) long-running conversion is complete.
  Since s->refcount_table_size may grow during the loop, the callback
does not necessarily have a constant total size; good thing we already
documented that progress bars need not have a constant total.

+        if (refblock_offset) {
+            void *refblock;
+
+            if (offset_into_cluster(s, refblock_offset)) {
+                qcow2_signal_corruption(bs, true, -1, -1, "Refblock offset %#"
+                                        PRIx64 " unaligned (reftable index: %#"
+                                        PRIx64 ")", refblock_offset,
+                                        reftable_index);
+                error_setg(errp,
+                           "Image is corrupt (unaligned refblock offset)");
+                return -EIO;
+            }
+
+            ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offset,
+                                  &refblock);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to retrieve refblock");
+                return ret;
+            }
+
+            for (refblock_index = 0; refblock_index < s->refcount_block_size;
+                 refblock_index++)
+            {

If a child table (refcount block) exists, visit each refcount entry
within the table (at least one refcount in that visit should be
non-empty, otherwise we could garbage-collect the refblock and put a 0
entry in the outer loop).

+                uint64_t refcount;
+
+                if (new_refblock_index >= new_refblock_size) {
+                    /* new_refblock is now complete */
+                    ret = operation(bs, new_reftable, *new_reftable_index,
+                                    new_reftable_size, new_refblock,
+                                    new_refblock_empty, errp);

The new refcount table will either be filled faster than the original
(when going from small to large refcount - calling operation() multiple
times per inner loop) or will be filled slower than the original (when
going from large to small; operation() will only be called after several
outer loops).

+                    if (ret < 0) {
+                        qcow2_cache_put(bs, s->refcount_block_cache, 
&refblock);
+                        return ret;
+                    }
+
+                    (*new_reftable_index)++;
+                    new_refblock_index = 0;
+                    new_refblock_empty = true;
+                }
+
+                refcount = s->get_refcount(refblock, refblock_index);
+                if (new_refcount_bits < 64 && refcount >> new_refcount_bits) {

Technically, this get_refcount() call is dead code on the second walk,
since the first walk already validated things, so you could push all of
this code...

+                    uint64_t offset;
+
+                    qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+                    offset = ((reftable_index << s->refcount_block_bits)
+                              + refblock_index) << s->cluster_bits;
+
+                    error_setg(errp, "Cannot decrease refcount entry width to "
+                               "%i bits: Cluster at offset %#" PRIx64 " has a "
+                               "refcount of %" PRIu64, new_refcount_bits,
+                               offset, refcount);
+                    return -EINVAL;
+                }
+
+                if (new_set_refcount) {
+                    new_set_refcount(new_refblock, new_refblock_index++, 
refcount);
+                } else {

...here, in the branch only run on the first walk.

Well, yes, but I wanted to keep this function as agnostic to what thecaller wants to do with it as possible. I'd rather decide depending onwhether index == 0 because that's a better way of discerning the first walk.

+                    new_refblock_index++;
+                }
+                new_refblock_empty = new_refblock_empty && refcount == 0;

Worth condensing to 'new_refblock_empty &= !refcount'?  Maybe not.


I personally would find that harder to read.

+            }
+
+            ret = qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to put refblock back into 
"
+                                 "the cache");
+                return ret;
+            }
+        } else {
+            /* No refblock means every refcount is 0 */
+            for (refblock_index = 0; refblock_index < s->refcount_block_size;
+                 refblock_index++)

Again, visiting each (implied) entry for the given refcount block of the
outer loop.  When enlarging the width, each of the new blocks will also
be all zero; but when shrinking the width, even though all entries on
this pass are zero, we may be combining this pass with another outer
loop with non-zero data for a non-zero block in the resulting new table.

+            {
+                if (new_refblock_index >= new_refblock_size) {
+                    /* new_refblock is now complete */
+                    ret = operation(bs, new_reftable, *new_reftable_index,
+                                    new_reftable_size, new_refblock,
+                                    new_refblock_empty, errp);
+                    if (ret < 0) {
+                        return ret;
+                    }
+
+                    (*new_reftable_index)++;
+                    new_refblock_index = 0;
+                    new_refblock_empty = true;
+                }
+
+                if (new_set_refcount) {
+                    new_set_refcount(new_refblock, new_refblock_index++, 0);

Would it be worth guaranteeing that every new refblock is 0-initialized
when allocated, so that you can skip setting a refcount to 0?  This
question depends on the answer about block allocation asked at [4] above.

This function sets a value in the buffer new_refblock, not in thecluster on disk. Therefore, in order to be able to omit this call, we'dhave to call a memset() with 0 on new_refblock after each call tooperation(). I don't think it's worth it. This is more explicit andwon't cost much performance.

+                } else {
+                    new_refblock_index++;
+                }
+            }
+        }
+    }
+
+    if (new_refblock_index > 0) {
+        /* Complete the potentially existing partially filled final refblock */
+        if (new_set_refcount) {
+            for (; new_refblock_index < new_refblock_size;
+                 new_refblock_index++)
+            {
+                new_set_refcount(new_refblock, new_refblock_index, 0);

Again, if you 0-initialize refblocks when allocated, you could skip this
(another instance of [4] above).

+            }
+        }
+
+        ret = operation(bs, new_reftable, *new_reftable_index,
+                        new_reftable_size, new_refblock, new_refblock_empty,
+                        errp);
+        if (ret < 0) {
+            return ret;
+        }
+
+        (*new_reftable_index)++;
+    }

site [1] mentioned above, as a good place to make a final status
callback at 100%.  But if you do it here, it means that we call the
status callback twice with the same values (the 100% value of the first
loop is the 0% value of the second loop) - not the end of the world, but
may impact any testsuite that tracks progress reports.


Well, why not.

+
+    return 0;
+}
+
+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+                                BlockDriverAmendStatusCB *status_cb,
+                                void *cb_opaque, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    Qcow2GetRefcountFunc *new_get_refcount;
+    Qcow2SetRefcountFunc *new_set_refcount;
+    void *new_refblock = qemu_blockalign(bs->file, s->cluster_size);
+    uint64_t *new_reftable = NULL, new_reftable_size = 0;
+    uint64_t *old_reftable, old_reftable_size, old_reftable_offset;
+    uint64_t new_reftable_index = 0;
+    uint64_t i;
+    int64_t new_reftable_offset;
+    int new_refblock_size, new_refcount_bits = 1 << refcount_order;
+    int old_refcount_order;
+    int ret;
+
+    assert(s->qcow_version >= 3);
+    assert(refcount_order >= 0 && refcount_order <= 6);
+
+    /* see qcow2_open() */
+    new_refblock_size = 1 << (s->cluster_bits - (refcount_order - 3));

Safe (cluster_bits is always at least 9, and at most 21 in our current
implementation, so we are shifting anywhere from 6 to 24 positions).

+
+    get_refcount_functions(refcount_order,
+                           &new_get_refcount, &new_set_refcount);
+
+
+    /* First, allocate the structures so they are present in the refcount
+     * structures */
+    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+                             &new_reftable_size, NULL, new_refblock_size,
+                             new_refcount_bits, &alloc_refblock, NULL,
+                             status_cb, cb_opaque, 0, 2, errp);
+    if (ret < 0) {
+        goto done;
+    }
+
+    /* The new_reftable_size is now valid and will not be changed anymore,
+     * so we can now allocate the reftable */
+    new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
+                                                   sizeof(uint64_t));

And here is your bug, that I hinted at with the mention of [3] above.
This allocation can potentially cause an overflow of the existing
reftable to occupy one more cluster.

An additional bug is that the new reftable may be referenced by anexisting refblock which was completely empty, though (or at leastreferenced by part of an existing refblock which was to be turned into anew refblock which was then completely empty, and thus omitted fromallocation).

Remember my thought experiment
above, how an 8 megabyte image rolls from 1 to 2 clusters during the
course of allocating refblocks for the new table?  What if the original
image wasn't completely full, but things are perfectly sized with enough
free clusters, then all of the refblock allocations done during the
first walk will still fit, and it is only this final allocation of the
new reftable that will cause the rollover, at which point we've failed
to account for the new refblock size.  That is, I think I could craft an
image that would trigger either an assertion failure or an out-of-bounds
array access during the second walk.

You're completely right, this will be a pain to fix, though... Thesimplest way would probably to check whether the new_reftable_sizeshould be increased due to this operation and if it did, rerunwalk_over_reftable() with the alloc_refblock() function only allocatingrefblocks if the new reftable does not already point to a refblock for acertain index. This would be repeated until the new_reftable_size isconstant. And the really simplest incarnation of this would be to have aflag whether any allocations were done and repeat until everything is fine.

Another way would be to somehow integrate the allocation of the newreftable into walk_over_reftable() and then to only mind the additionalreftable entries.

But probably the first way is the correct one because due toreallocation of the old reftable, some intermediate refblocks which wereempty before are now filled.

I'll have to craft some test images myself, not least to be able toinclude them in the iotest.

+    if (new_reftable_offset < 0) {
+        error_setg_errno(errp, -new_reftable_offset,
+                         "Failed to allocate the new reftable");
+        ret = new_reftable_offset;
+        goto done;

If we fail here, do we leak allocations of the refblocks?  I guess not;
based on another forward reference to point [5].

+    }
+
+    new_reftable_index = 0;
+
+    /* Second, write the new refblocks */
+    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+                             &new_reftable_size, new_refblock,
+                             new_refblock_size, new_refcount_bits,
+                             &flush_refblock, new_set_refcount,
+                             status_cb, cb_opaque, 1, 2, errp);
+    if (ret < 0) {
+        goto done;
+    }

If we fail here, it looks like we DO leak the clusters allocated for the
new reftable (again, point [5]).

+
+
+    /* Write the new reftable */
+    ret = qcow2_pre_write_overlap_check(bs, 0, new_reftable_offset,
+                                        new_reftable_size * sizeof(uint64_t));
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Overlap check failed");
+        goto done;
+    }
+
+    for (i = 0; i < new_reftable_size; i++) {
+        cpu_to_be64s(&new_reftable[i]);
+    }
+
+    ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
+                      new_reftable_size * sizeof(uint64_t));
+
+    for (i = 0; i < new_reftable_size; i++) {
+        be64_to_cpus(&new_reftable[i]);
+    }
+
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to write the new reftable");
+        goto done;
+    }

Looks like you correctly maintain the in-memory copy in preferred cpu
byte order, while writing to disk in big-endian order.

+
+
+    /* Empty the refcount cache */
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to flush the refblock cache");
+        goto done;
+    }
+
+    /* Update the image header to point to the new reftable; this only updates
+     * the fields which are relevant to qcow2_update_header(); other fields
+     * such as s->refcount_table or s->refcount_bits stay stale for now
+     * (because we have to restore everything if qcow2_update_header() fails) 
*/
+    old_refcount_order  = s->refcount_order;
+    old_reftable_size   = s->refcount_table_size;
+    old_reftable_offset = s->refcount_table_offset;
+
+    s->refcount_order        = refcount_order;
+    s->refcount_table_size   = new_reftable_size;
+    s->refcount_table_offset = new_reftable_offset;
+
+    ret = qcow2_update_header(bs);
+    if (ret < 0) {
+        s->refcount_order        = old_refcount_order;
+        s->refcount_table_size   = old_reftable_size;
+        s->refcount_table_offset = old_reftable_offset;
+        error_setg_errno(errp, -ret, "Failed to update the qcow2 header");
+        goto done;
+    }

Failures up to here still have issues leaking the new reftable
allocation (point [5]).

+
+    /* Now update the rest of the in-memory information */
+    old_reftable = s->refcount_table;
+    s->refcount_table = new_reftable;
+
+    /* For cleaning up all old refblocks */
+    new_reftable      = old_reftable;
+    new_reftable_size = old_reftable_size;
+
+    s->refcount_bits = 1 << refcount_order;
+    if (refcount_order < 6) {
+        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
+    } else {
+        s->refcount_max = INT64_MAX;
+    }

Is it worth factoring this computation into a common helper, since it
appeared in an earlier patch as well?

Well, it does appear in qcow2_open(), as do the next two lines. The onlyreason to factor them out would be so that neither place is forgotten ifone of them is changed; but the only reason I could imagine for thiswould be to replace the INT64_MAX by UINT64_MAX at some point in thefuture, but I guess that point is still far away (because there's noreasonable way someone would be needing the 64th bit, as you agreed on), so it should be fine that way.

+
+    s->refcount_block_bits = s->cluster_bits - (refcount_order - 3);
+    s->refcount_block_size = 1 << s->refcount_block_bits;
+
+    s->get_refcount = new_get_refcount;
+    s->set_refcount = new_set_refcount;
+
+    /* And free the old reftable (the old refblocks are freed below the "done"
+     * label) */
+    qcow2_free_clusters(bs, old_reftable_offset,
+                        old_reftable_size * sizeof(uint64_t),
+                        QCOW2_DISCARD_NEVER);

site [2] mentioned above, as a possible point where you might want to
ensure the callback is called with equal progress and total values to
ensure the caller knows the job is done.  Except this site doesn't have
quite as much information as site [1] about what total size all the
other status callbacks were using.

+
+done:
+    if (new_reftable) {
+        /* On success, new_reftable actually points to the old reftable (and
+         * new_reftable_size is the old reftable's size); but that is just
+         * fine */
+        for (i = 0; i < new_reftable_size; i++) {
+            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
+            if (offset) {
+                qcow2_free_clusters(bs, offset, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+            }
+        }
+        g_free(new_reftable);

So here is point [5] - if we failed early, this tries to clean up all
allocated refblocks associated with the new table.  It does NOT clean up
any refblocks allocated due to resizing the old table to be slightly
larger, but that should be fine (not a leak, so much as an image that is
now a couple clusters larger than the minimum required size).  However,
while you clean up the clusters associated with refblocks (layer 2), the
cleanup of old clusters associated with the reftable (layer 1) happened
before the done: label on success, but that means that on failure, you
are NOT cleaning up the clusters associated with the new reftable.


Oops, right, will fix.

+    }
+
+    qemu_vfree(new_refblock);
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index fe12c54..5b96519 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -526,6 +526,10 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int 
ign, int64_t offset,
  int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t 
offset,
                                    int64_t size);

+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,

+                                BlockDriverAmendStatusCB *status_cb,
+                                void *cb_opaque, Error **errp);
+
  /* qcow2-cluster.c functions */
  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                          bool exact_size);

Interesting patch.  Hope my review helps you prepare a better v2.

If everything else fails, I'll just split the amend stuff from thisseries. But I'll work it out somehow. And your review will definitelyhelp, thanks a lot!

Max

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend, (continued)
- [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend, Max Reitz, 2014/11/10
  - Re: [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend, Eric Blake, 2014/11/11
- [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4, Max Reitz, 2014/11/10
  - Re: [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4, Eric Blake, 2014/11/11
    - Re: [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4, Max Reitz, 2014/11/12
- [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend, Max Reitz, 2014/11/10
  - Re: [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend, Eric Blake, 2014/11/11
    - Re: [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend, Max Reitz, 2014/11/12
- [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Max Reitz, 2014/11/10
  - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Eric Blake, 2014/11/11
    - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Max Reitz <=
    - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Eric Blake, 2014/11/12
  - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Eric Blake, 2014/11/12
    - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Max Reitz, 2014/11/12
    - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Max Reitz, 2014/11/12
    - Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment, Eric Blake, 2014/11/12
- [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function, Max Reitz, 2014/11/10
  - Re: [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function, Eric Blake, 2014/11/11
- [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check, Max Reitz, 2014/11/10
  - Re: [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check, Eric Blake, 2014/11/11
- [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths, Max Reitz, 2014/11/10

Prev by Date: Re: [Qemu-devel] [PATCH 1/4] migration: fix parameter validation on ram load
Next by Date: [Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select
Previous by thread: Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
Next by thread: Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
Index(es):
- Date
- Thread