qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster


From: Kevin Wolf
Subject: Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date: Fri, 7 Apr 2017 14:41:21 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 06.04.2017 um 18:40 hat Eric Blake geschrieben:
> On 04/06/2017 10:01 AM, Alberto Garcia wrote:
> > I thought of three alternatives for storing the subcluster bitmaps. I
> > haven't made my mind completely about which one is the best one, so
> > I'd like to present all three for discussion. Here they are:
> > 
> > (1) Storing the bitmap inside the 64-bit entry
> > 
> >     This is a simple alternative and is the one that I chose for my
> >     prototype. There are 14 unused bits plus the "all zeroes" one. If
> >     we steal one from the host offset we have the 16 bits that we need
> >     for the bitmap and we have 46 bits left for the host offset, which
> >     is more than enough.
> 
> Note that because you are using exactly 8 subclusters, you can require
> that the minimum cluster size when subclusters are enabled be 4k (since
> we already have a lower-limit of 512-byte sector operation, and don't
> want subclusters to be smaller than that); at which case you are
> guaranteed that the host cluster offset will be 4k aligned.  So in
> reality, once you turn on subclusters, you have:
> 
> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> **<----> <-----------------------------------------------><---------->*
>   Rsrved              host cluster offset of data             Reserved
>   (6 bits)                (44 bits)                           (11 bits)
> 
> where you have 17 bits plus the "all zeroes" bit to play with, thanks to
> the three bits of host cluster offset that are now guaranteed to be zero
> due to cluster size alignment (but you're also right that the "all
> zeroes" bit is now redundant information with the 8 subcluster-is-zero
> bits, so repurposing it does not hurt)
> 
> > 
> >     * Pros:
> >       + Simple. Few changes compared to the current qcow2 format.
> > 
> >     * Cons:
> >       - Only 8 subclusters per cluster. We would not be making the
> >         most of this feature.
> > 
> >       - No reserved bits left for the future.
> 
> I just argued you have at least one, and probably 2, bits left over for
> future in-word expansion.

I think only 8 subclusters is just too few. That the subcluster status
would be split in two halves doesn't make me like this layout much
better either.

Intuitively one might think that squeezing everything into the existing
data structures will save us memory, but as you correctly state below,
the difference between 8 and 32 subclusters means that we can use a
larger cluster size and the doubled L2 entry size uses actually less
space to cover the whole image than using the existing one.

I can see how few changes compared to the current format can look
attractive, but I also think that there is a danger of forgetting to
support subclusters in some place. When the layout is radically
different, such mistakes would be caught very quickly, so maybe being
more different is actually a plus.

> > 
> > (2) Making L2 entries 128-bit wide.
> > 
> >     In this alternative we would double the size of L2 entries. The
> >     first half would remain unchanged and the second one would store
> >     the bitmap. That would leave us with 32 subclusters per cluster.
> 
> Although for smaller cluster sizes (such as 4k clusters), you'd still
> want to restrict that subclusters are at least 512-byte sectors, so
> you'd be using fewer than 32 of those subcluster positions until the
> cluster size is large enough.
> 
> > 
> >     * Pros:
> >       + More subclusters per cluster. We could have images with
> >         e.g. 128k clusters with 4k subclusters.
> 
> Could allow variable-sized subclusters (your choice of 32 subclusters of
> 4k each, or 16 subclusters of 8k each)

I don't think using less subclusters is desirable if it doesn't come
with savings elsewhere. We already need to allocate two clusters for an
L2 table now, so we want to use it.

The more interesting kind of variable-sized subclusters would be if you
could select any multiple of 32, meaning three or more clusters per L2
table (with 192 bits or more per entry).

I'm doubtful if this would be worth the effort, though.

> > 
> >     * Cons:
> >       - More space needed for L2 entries. The same cluster size would
> >         require a cache twice as large, although having subcluster
> >         allocation would compensate for this.
> > 
> >       - More changes to the code to handle 128-bit entries.
> 
> Dealing with variable-sized subclusters, or with unused subclsuster
> entries when the cluster size is too small (such as a 4k cluster should
> not be allowed any subclusters smaller than 512 bytes, but that's at
> most 8 out of the 32 slots available), can get tricky.

If it's too tricky, we just don't allow it. 32 * 512 = 16k would be the
minimal cluster size that allows enabling subclusters.

> > 
> >       - We would still be wasting the 14 reserved bits that L2 entries
> >         have.

We discussed the concept of subclusters multiple times in the past, and
I think every time my conclusion was that option (2) is really what we
want if we implement this.

> > (3) Storing the bitmap somewhere else
> > 
> >     This would involve storing the bitmap separate from the L2 tables
> >     (perhaps using the bitmaps extension? I haven't looked much into
> >     this).
> > 
> >     * Pros:
> >       + Possibility to make the number of subclusters configurable
> >         by the user (32, 64, 128, ...)
> >       + All existing metadata structures would remain untouched
> >         (although the "all zeroes" bit in L2 entries would probably
> >         become unused).
> 
> It might still remain useful for optimization purposes, although then we
> get into image consistency questions (if the all zeroes bit is set but
> subcluster map claims allocation, or if the all zeroes bit is clear but
> all subclusters claim zero, which one wins).
> 
> > 
> >     * Cons:
> >       - As with alternative (2), more space needed for metadata.
> > 
> >       - The bitmap would also need to be cached for performance
> >         reasons.
> > 
> >       - Possibly one more *_cache_size option.
> > 
> >       - One more metadata structure to be updated for each
> >         allocation. This would probably impact I/O negatively.
> 
> Having the subcluster table directly in the L2 means that updating the
> L2 table is done with a single write. You are definitely right that
> having the subcluster table as a bitmap in a separate cluster means two
> writes instead of one, but as always, it's hard to predict how much of
> an impact that is without benchmarks.

Note that it's not just additional write requests, but that we can't
update the L2 table entry and the bitmap atomically any more, so we have
to worry about ordering. The ordering between L2 table and refcount
blocks is already painful enough, I'm not sure if I would want to add a
third type. Ordering also means disk flushes, which are a lot slower
than just additional writes.

We wouldn't have these ordering problems with a journal (because we
could then commit things atomically), but I suppose we don't want to
make two major changes at once. :-)

> > === Compressed clusters ===
> > 
> > My idea is that compressed clusters would remain the same. They are
> > read-only anyway so they would not be affected by any of these
> > changes.

Yes, this makes sense to me. Compression already uses its own kind of
splitting clusters.

Kevin

Attachment: pgpeYv2rI0SuK.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]