[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
From: |
Alberto Garcia |
Subject: |
Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation |
Date: |
Fri, 07 Apr 2017 16:24:44 +0200 |
User-agent: |
Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu) |
On Fri 07 Apr 2017 02:41:21 PM CEST, Kevin Wolf <address@hidden> wrote:
>> 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
>> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> **<----> <-----------------------------------------------><---------->*
>> Rsrved host cluster offset of data Reserved
>> (6 bits) (44 bits) (11 bits)
>>
>> where you have 17 bits plus the "all zeroes" bit to play with, thanks to
>> the three bits of host cluster offset that are now guaranteed to be zero
>> due to cluster size alignment (but you're also right that the "all
>> zeroes" bit is now redundant information with the 8 subcluster-is-zero
>> bits, so repurposing it does not hurt)
>>
>> >
>> > * Pros:
>> > + Simple. Few changes compared to the current qcow2 format.
>> >
>> > * Cons:
>> > - Only 8 subclusters per cluster. We would not be making the
>> > most of this feature.
>> >
>> > - No reserved bits left for the future.
>>
>> I just argued you have at least one, and probably 2, bits left over for
>> future in-word expansion.
>
> I think only 8 subclusters is just too few. That the subcluster status
> would be split in two halves doesn't make me like this layout much
> better either.
I also agree that 8 are too few (splitting the subcluster field would
not be strictly necessary, but that's not so important).
>> > (2) Making L2 entries 128-bit wide.
>> >
>> > In this alternative we would double the size of L2 entries. The
>> > first half would remain unchanged and the second one would store
>> > the bitmap. That would leave us with 32 subclusters per cluster.
>>
>> Although for smaller cluster sizes (such as 4k clusters), you'd still
>> want to restrict that subclusters are at least 512-byte sectors, so
>> you'd be using fewer than 32 of those subcluster positions until the
>> cluster size is large enough.
>>
>> >
>> > * Pros:
>> > + More subclusters per cluster. We could have images with
>> > e.g. 128k clusters with 4k subclusters.
>>
>> Could allow variable-sized subclusters (your choice of 32 subclusters of
>> 4k each, or 16 subclusters of 8k each)
>
> I don't think using less subclusters is desirable if it doesn't come
> with savings elsewhere. We already need to allocate two clusters for an
> L2 table now, so we want to use it.
>
> The more interesting kind of variable-sized subclusters would be if you
> could select any multiple of 32, meaning three or more clusters per L2
> table (with 192 bits or more per entry).
Yeah, I agree. I think it's worth considering. One more drawback that I
can think of is that if we make L2 entries wider and we have compressed
clusters we'd be wasting space in their entries.
>> > - One more metadata structure to be updated for each
>> > allocation. This would probably impact I/O negatively.
>>
>> Having the subcluster table directly in the L2 means that updating
>> the L2 table is done with a single write. You are definitely right
>> that having the subcluster table as a bitmap in a separate cluster
>> means two writes instead of one, but as always, it's hard to predict
>> how much of an impact that is without benchmarks.
>
> Note that it's not just additional write requests, but that we can't
> update the L2 table entry and the bitmap atomically any more, so we
> have to worry about ordering. The ordering between L2 table and
> refcount blocks is already painful enough, I'm not sure if I would
> want to add a third type. Ordering also means disk flushes, which are
> a lot slower than just additional writes.
You're rightk, I think you just convinced me that this is a bad idea and
I'm also more inclined towards (2) now.
Berto