Re: [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realiza

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realiza

From:	Eric Blake
Subject:	Re: [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realization for update_refcount
Date:	Mon, 24 Nov 2014 11:11:22 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

On 11/21/2014 05:41 AM, Max Reitz wrote:
> On 2014-10-26 at 16:20, Jun Li wrote:
>> When every item of refcount block is NULL, free refcount block and
>> reset the
>> corresponding item of refcount table with NULL. At the same time, put
>> this
>> cluster to s->free_cluster_index.
>>
>> Signed-off-by: Jun Li <address@hidden>
>> ---

>> +        refcount_table_index = cluster_index >> s->refcount_block_bits;
>> +
>> +        uint64_t refcount_block_offset =
>> +            s->refcount_table[refcount_table_index] & REFT_OFFSET_MASK;
>> +
>> +        int64_t self_block_index = refcount_block_offset >>
>> s->cluster_bits;
>> +        int self_block_index2 = (refcount_block_offset >>
>> s->cluster_bits) &
>> +            ((1 << s->refcount_block_bits) - 1);
> 
> You are assuming here that every refblock is described by itself. While
> that may be true for QEMU's current qcow2 driver, it's by no means
> defined that way.

In fact, it is NOT true for qemu's qcow2 driver.  Remember, the reftable
MUST be contiguous clusters.  I can create an image with enough data
such that the reftable itself will occupy more clusters than what a
single refblock will describe.  Worst case, with 512-byte clusters, and
after the series allowing for refcount_order != 4, the smallest such
image is a 64-bit refcount-width, where each cluster of the reftable
describes 64 refblocks, and where each refblock describes 64 clusters.
An image that is large enough to need 127 or more clusters in the
reftable (merely 127*64*64*512 == 266,338,304 host bytes) means that at
least 64 consecutive clusters are all tied up by the reftable and
therefore those clusters cannot contain a self-referential refblock.

The sizes get (much) bigger for the defaults of 64k clusters and 16-bit
refcounts (each cluster of the reftable covers 8192 refblocks, each
refblock covers 32768 clusters, so you need 65535 consecutive clusters
in the reftable before guaranteeing a refblock is not self-referential;
but this works out to an image of 65535*8192*32768*65536 ==
1,152,903,912,420,802,560 bytes; and I'm not going to try to create an
exabyte image any time soon).  But even if I can't guarantee such an
image in a reasonable size does not mean that such an image won't be
created at smaller sizes due to other allocation patterns.

You DO need to be able to recognize self-referential refblocks, in order
clean them up if the only refcount remaining in the refblock is
self-referential, but you must ALSO recognize that not all refblocks are
self-referential.

> So, about this whole patch: I'm not sure whether we need it in this
> series but I remember you saying that it's actually worth it, so I'm
> trusting you.
> 
> Second, I somehow don't like iterating over the whole refblock for each
> refcount updated or at least for each refcount set to 0. There are 32768
> entries per refblock with 16 bit refcounts and 64 kB clusters.
> Therefore, when freeing all clusters referenced by a refblock (and if
> that refblock doesn't have any free entries yet), we will loop over
> 16384 entries (in average) before noticing there are still entries with
> a non-zero refcount, and that we will do 32768 times (okay, 32767,
> because you're assuming a refblock always contains its own refcount). I
> think that a bit much. But I don't have any idea to fix this other than
> keeping the number of entries per refblock with refcount != 0 in a table
> in memory (which is probably not too bad, but not too nice either).

On the other hand, since we have a proposal on the floor to track an
in-memory representation of used clusters to (tremendously) speed up
cluster overlap checking, THAT would be a perfect place to add in-memory
tracking of each refblock's current count of used refcounts, and could
therefore make it much faster to tell when a refblock is
self-referential, as well as its current count of occupied refcounts, so
that we don't have to scan the entire cluster on every operation.  It
does mean more bookkeeping when allocating or freeing clusters, but the
benefits may be worth it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realization for update_refcount, Max Reitz, 2014/11/21
- Re: [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realization for update_refcount, Eric Blake <=

Prev by Date: Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Next by Date: Re: [Qemu-devel] [PATCH v4 33/47] Postcopy: Postcopy startup in migration thread
Previous by thread: Re: [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realization for update_refcount
Next by thread: Re: [Qemu-devel] [PATCH v5 3/3] qcow2: Add qemu-iotests for qcow2 shrinking
Index(es):
- Date
- Thread