qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table


From: Jun Li
Subject: Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Date: Mon, 19 Jan 2015 21:16:11 +0800
User-agent: Mutt/1.5.23 (2014-03-12)

On Thu, 01/15 13:47, Max Reitz wrote:
> On 2015-01-03 at 07:23, Jun Li wrote:
> >On Fri, 11/21 11:56, Max Reitz wrote:
> >>So, as for what I think we do need to do when shrinking (and keep in mind:
> >>The offset given to qcow2_truncate() is the guest size! NOT the host image
> >>size!):
> >>
> >>(1) Determine the first L2 table and the first entry in the table which will
> >>lie beyond the new guest disk size.
> >Here is not correct always. Due to the COW, using offset to calculate the
> >first entry of the first L2 table will be incorrect.
> 
> Again: This is *not* about the host disk size or the host offset of some
> cluster, but about the *guest* disk size.
> 
> Let's make up an example. You have a 2 GB disk but you want to resize it to
> 1.25 GB. The cluster size is 64 kB, therefore we have 2 GB / 64 kB = 32,768
> data clusters (as long as there aren't any internal snapshots, which is a
> prerequisite for resizing qcow2 images).
> 
> Every L2 table contains 65,536 / 8 = 8,192 entries; there are thus 32,768 /
> 8,192 = 4 L2 tables.
> 
> As you can see, one can directly derive the number of data clusters and L2
> tables from the guest disk size (as long as there aren't any internal
> snapshots).
> 
> So of course we can do the same for the target disk size: 1.25 GB / 64 kB =
> 20,480 data clusters; 20,480 / 8,192 = 2.5 L2 tables, therefore we need
> three L2 tables but only half of the last one (4,096 entries).
> 

Sorry, last time is my mis-understanding. If do not use qcow2_truncate(), I
think don't existing above issue.

For my original thought, I want to say:
Sometimes the second L2 table will contain some entry, the pointer in this
entry will point to a cluster which address is larger than 1.25 GB.

So if not use qcow2_truncate(), won't discard above cluster which address is
larger than 1.25 GB.

But I still have another worry.

Suppose "virtual size" and "disk size" are all 2G. After we resize it to
1.25G, seems we will get "virtual size" is 1.25G but "disk size" is still 2G
if do not use "qcow2_truncate()" to truncate the file(Yes, I know use
qcow2_truncate is not a resolution). This seems strange, not so perfect.

> We know that every cluster references somewhere after that limit (that is,
> every entry in the fourth L2 table and every entry starting with index 4,096
> in the third L2 table) is a data cluster with a guest offset somewhere
> beyond 1.25 GB, so we don't need it anymore.
> 
> Thus, we simply discard all those data clusters and after that we can
> discard the fourth L2 table. That's it.
> 
> If we really want to we can calculate the highest cluster host offset in use
> and truncate the image accordingly. But that's optional, see the last point
> in my "problems with this approach" list (having discarded the clusters
> should save us all the space already). Furthermore, as I'm saying in that
> list, to really solve this issue, we'd need qcow2 defragmentation.
> 

Do we already have "qcow2 defragmentation" realization?

Jun Li

> >What I have done for this scenario:
> >(1) if the first entry is the first entry of the L2 table, so will scan "the
> >previous L2 table"("the previous L2 table" location is in front of "L2 table"
> >in L1 table). If the entry of previous L2 table is larger than offset, will
> >discard this entry, too.
> >(2) If the first entry is not the first entry of the L2 table, still to scan
> >the whole L2 table to make sure no entry is beyond offset.
> >
> >>(2) Discard all clusters beginning from there.
> >>(3) Discard all L2 tables which are then completely empty.
> >>(4) Update the header size.
> >For this patch current's realizion, have include above 4 steps I think.
> >Current patch, also have another step 5.
> >(5) truncate the file.
> 
> As I wrote above, you can do that but it shouldn't matter much because the
> discarded clusters should not use any disk space.
> 
> >Here I think we also should add discard refcount table and refcount block
> >table when they are completely empty.
> >
> >>And that's it. We can either speed up step (2) by implementing it manually,
> >>or we just use bdrv_discard() on the qcow2 BDS (in the simplest case:
> >>bdrv_discard(bs, DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE), bs->total_sectors -
> >>DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE));.
> >>
> >>We can incorporate step (3) by extending qcow2_discard_clusters() to free L2
> >>tables when they are empty after discard_single_l2(). But we don't even have
> >>to that now. It's an optimization we can go about later.
> >>
> >>So, we can do (1), (2) and (3) in a single step: Just one bdrv_discard()
> >>call. But it's probably better to use qcow2_discard_clusters() instead and
> >>set the full_discard parameter to true.
> >>
> >>So: qcow2_discard_clusters(bs, offset, bs->total_sectors - offset /
> >>BDRV_SECTOR_SIZE, true);. Then update the guest disk size field in the
> >>header. And we're done.
> >>
> >>There are four problems with this approach:
> >>- qcow2_discard_clusters() might be slower than optimal. I personally don't
> >>care at all.
> >>- If "bs->total_sectors * BDRV_SECTOR_SIZE - offset" is greater than
> >>INT_MAX, this won't work. Trivially solvable by encapsulating the
> >>qcow2_discard_clusters() call in a loop which limits nb_clusters to INT_MAX
> >>/ BDRV_SECTOR_SIZE.
> >>- The L1 table is not resized. Should not matter in practice at all.
> >Yes, agree with you.
> >
> >>- The file is not truncated. Does not matter either (because holes in the
> >>file are good enough), and we can't actually solve this problem without
> >>defragmentation anyway.
> >>
> >>There is one advantage:
> >>- It's extremely simple. It's literally below ten lines of code.
> >>
> >>I think the advantage far outweighs the disadvantage. But I may be wrong.
> >>What do you think?
> >Hi max,
> >
> >   Sorry for so late to reply as I am so busy recently. I think let's have an
> >agreement on how to realize qcow2 shrinking first, then type code is better.
> 
> Yes, this will probably be for the best. :-)
> 
> >Another issue, as gmail can not be used in current China, I have to use this
> >email to reply. :)
> 
> No problem.
> 
> Max




reply via email to

[Prev in Thread] Current Thread [Next in Thread]