[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH] qcow2: Optimize the refcount-block overlap chec
From: |
Max Reitz |
Subject: |
Re: [Qemu-block] [PATCH] qcow2: Optimize the refcount-block overlap check |
Date: |
Wed, 1 Feb 2017 02:46:20 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 |
On 30.01.2017 17:14, Alberto Garcia wrote:
> The metadata overlap checks introduced in a40f1c2add help detect
> corruption in the qcow2 image by verifying that data writes don't
> overlap with existing metadata sections.
>
> The 'refcount-block' check in particular iterates over the refcount
> table in order to get the addresses of all refcount blocks and check
> that none of them overlap with the region where we want to write.
>
> The problem with the refcount table is that since it always occupies
> complete clusters its size is usually very big.
Actually the main problem is that BDRVQcow2State.refcount_table_size is
updated very generously as opposed to BDRVQcow2State.l1_size. The latter
is usually exactly as large as it needs to be (because the L1 table size
usually doesn't change), whereas the refcount_table_size is just bumped
up every time the image gets bigger until it reaches or exceeds the
value it needs to be.
> With the default
> values of cluster_size=64KB and refcount_bits=16 this table holds 8192
> entries, each one of them enough to map 2GB worth of host clusters.
>
> So unless we're using images with several TB of allocated data this
> table is going to be mostly empty, and iterating over it is a waste of
> CPU. If the storage backend is fast enough this can have an effect on
> I/O performance.
>
> This patch keeps the index of the last used (i.e. non-zero) entry in
> the refcount table and updates it every time the table changes. The
> refcount-block overlap check then uses that index instead of reading
> the whole table.
>
> In my tests with a 4GB qcow2 file stored in RAM this doubles the
> amount of write IOPS.
>
> Signed-off-by: Alberto Garcia <address@hidden>
> ---
> block/qcow2-refcount.c | 21 ++++++++++++++++++++-
> block/qcow2.c | 1 +
> block/qcow2.h | 1 +
> 3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index cbfb3fe064..5e4d36587f 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -83,6 +83,16 @@ static Qcow2SetRefcountFunc *const set_refcount_funcs[] = {
> /*********************************************************/
> /* refcount handling */
>
> +static void update_max_refcount_table_index(BDRVQcow2State *s)
> +{
> + unsigned i = s->refcount_table_size - 1;
> + while (i > 0 && (s->refcount_table[i] & REFT_OFFSET_MASK) == 0) {
> + i--;
> + }
> + /* Set s->max_refcount_table_index to the index of the last used entry */
> + s->max_refcount_table_index = i;
> +}
> +
> int qcow2_refcount_init(BlockDriverState *bs)
> {
> BDRVQcow2State *s = bs->opaque;
> @@ -111,6 +121,7 @@ int qcow2_refcount_init(BlockDriverState *bs)
> }
> for(i = 0; i < s->refcount_table_size; i++)
> be64_to_cpus(&s->refcount_table[i]);
> + update_max_refcount_table_index(s);
> }
> return 0;
> fail:
> @@ -439,6 +450,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
> }
>
> s->refcount_table[refcount_table_index] = new_block;
> + s->max_refcount_table_index = refcount_table_index;
Should be MAX(s->max_refcount_table_index, refcount_table_index) or this
won't support refcount tables with holes in them.
Apart from this, the patch looks good to me. (Just a nagging comment below.)
>
> /* The new refcount block may be where the caller intended to put its
> * data, so let it restart the search. */
> @@ -580,6 +592,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
> s->refcount_table = new_table;
> s->refcount_table_size = table_size;
> s->refcount_table_offset = table_offset;
> + update_max_refcount_table_index(s);
>
> /* Free old table. */
> qcow2_free_clusters(bs, old_table_offset, old_table_size *
> sizeof(uint64_t),
> @@ -2171,6 +2184,7 @@ write_refblocks:
> s->refcount_table = on_disk_reftable;
> s->refcount_table_offset = reftable_offset;
> s->refcount_table_size = reftable_size;
> + update_max_refcount_table_index(s);
>
> return 0;
>
> @@ -2383,7 +2397,11 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs,
> int ign, int64_t offset,
> }
>
> if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) {
> - for (i = 0; i < s->refcount_table_size; i++) {
> + unsigned last_entry = s->max_refcount_table_index;
> + assert(last_entry < s->refcount_table_size);
> + assert(last_entry + 1 == s->refcount_table_size ||
> + (s->refcount_table[last_entry + 1] & REFT_OFFSET_MASK) == 0);
I'm not sure we need this second assertion, but I don't mind it too much
either.
I'd actually find an assertion that last_entry is less than UNSIGNED_MAX
more important because otherwise the loop below would be an endless one. :-)
(Yes, it's pretty obvious that it's less than UNSIGNED_MAX because it's
less than s->refcount_table_size, which is an unsigned int. I'm just
trying to say something while I'm not sure exactly what I'm trying to
say. Sorry.)
Max
> + for (i = 0; i <= last_entry; i++) {
> if ((s->refcount_table[i] & REFT_OFFSET_MASK) &&
> overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK,
> s->cluster_size)) {
> @@ -2871,6 +2889,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs,
> int refcount_order,
> /* Now update the rest of the in-memory information */
> old_reftable = s->refcount_table;
> s->refcount_table = new_reftable;
> + update_max_refcount_table_index(s);
>
> s->refcount_bits = 1 << refcount_order;
> s->refcount_max = UINT64_C(1) << (s->refcount_bits - 1);
signature.asc
Description: OpenPGP digital signature