Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table

From:	Max Reitz
Subject:	Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Date:	Thu, 22 Jan 2015 14:14:06 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 2015-01-19 at 08:16, Jun Li wrote:

On Thu, 01/15 13:47, Max Reitz wrote:

On 2015-01-03 at 07:23, Jun Li wrote:

On Fri, 11/21 11:56, Max Reitz wrote:

So, as for what I think we do need to do when shrinking (and keep in mind:
The offset given to qcow2_truncate() is the guest size! NOT the host image
size!):

(1) Determine the first L2 table and the first entry in the table which will
lie beyond the new guest disk size.

Here is not correct always. Due to the COW, using offset to calculate the
first entry of the first L2 table will be incorrect.

Again: This is *not* about the host disk size or the host offset of some
cluster, but about the *guest* disk size.

Let's make up an example. You have a 2 GB disk but you want to resize it to
1.25 GB. The cluster size is 64 kB, therefore we have 2 GB / 64 kB = 32,768
data clusters (as long as there aren't any internal snapshots, which is a
prerequisite for resizing qcow2 images).

Every L2 table contains 65,536 / 8 = 8,192 entries; there are thus 32,768 /
8,192 = 4 L2 tables.

As you can see, one can directly derive the number of data clusters and L2
tables from the guest disk size (as long as there aren't any internal
snapshots).

So of course we can do the same for the target disk size: 1.25 GB / 64 kB =
20,480 data clusters; 20,480 / 8,192 = 2.5 L2 tables, therefore we need
three L2 tables but only half of the last one (4,096 entries).

Sorry, last time is my mis-understanding. If do not use qcow2_truncate(), I
think don't existing above issue.

For my original thought, I want to say:
Sometimes the second L2 table will contain some entry, the pointer in this
entry will point to a cluster which address is larger than 1.25 GB.


Correct.

So if not use qcow2_truncate(), won't discard above cluster which address is
larger than 1.25 GB.

I'm sorry, I can't really follow what you are trying to say here, soI'll just try to reply with things that may or may not be what youwanted to talk about.

If you are using qemu-img resize and thus subsequently qcow2_truncate()to shrink an image, you cannot expect the image to shrink to thespecified file length, for several reasons.

First, if you shrink it to 1 GB, but only half of that is actually used,the image might of course very well have a length below 1 GB.

Second, there is metadata overhead. So if you are changing the guestdisk size to 1 GB (all of which is occupied), the host file size willexceed 1 GB because of that overhead.

Third, I keep repeating myself here, but file length is not file size.So you may observe a file length of 10 GB or more because the clustersare spread all over the image file. This is something we'd have tocombat with defragmentation; but the question is whether we really needto (see below for more on that). The point is that it doesn't matterwhether the image has a file length of 10 GB; the file size will bearound 1 GB anyway.

But I still have another worry.

Suppose "virtual size" and "disk size" are all 2G. After we resize it to
1.25G, seems we will get "virtual size" is 1.25G but "disk size" is still 2G


No, it won't. I can prove it to you:

$ qemu-img create -f qcow2 test.qcow2 64M
$ qemu-io -c 'write 0 64M' test.qcow2
$ qemu-img info test.qcow2
...
disk size: 64M
...

Okay, so far it's just what we'd expect. Now let's implement my proposalfor truncation: Let's assume the image should be shrinked to 32 MB, sowe discard all clusters starting at 32 MB (guest offset) (which is 64 MB- 32 MB = 32 MB of data):


$ qemu-io -c 'discard 32M 32M' test.qcow2
$ qemu-img info test.qcow2
...
disk size: 32M
...

Great!

if do not use "qcow2_truncate()" to truncate the file(Yes, I know use
qcow2_truncate is not a resolution). This seems strange, not so perfect.

We know that every cluster references somewhere after that limit (that is,
every entry in the fourth L2 table and every entry starting with index 4,096
in the third L2 table) is a data cluster with a guest offset somewhere
beyond 1.25 GB, so we don't need it anymore.

Thus, we simply discard all those data clusters and after that we can
discard the fourth L2 table. That's it.

If we really want to we can calculate the highest cluster host offset in use
and truncate the image accordingly. But that's optional, see the last point
in my "problems with this approach" list (having discarded the clusters
should save us all the space already). Furthermore, as I'm saying in that
list, to really solve this issue, we'd need qcow2 defragmentation.

Do we already have "qcow2 defragmentation" realization?

No, we don't. The only way to defragment a qcow2 image right now isusing qemu-img convert to create a (defragmented) copy and then deletethe old image, which has the disadvantage of temporarily requiringdouble the disk space and being an offline operation.

So far, nobody has implemented online defragmentation, mainly for tworeasons: It would probably be pretty complicated (it'd probably need tobe a block job which links into a pretty low-level function provided byqcow2 (defragment_some_clusters or something)) and second, so far therehas been little demand. Disk space is not an issue (as said before),because it doesn't really matter to a modern file system whether yourfile has a length of 100 MB of 100 GB; that's just some number. Whatreally matters is how much of that space is actually used; and if allunused clusters are discarded, there won't be any space used for them(well, maybe there is some metadata overhead, but that should benegligible).


There are a couple of reasons why you'd want to defragment an image:

First, it makes you feel better. I can relate to that, but it's not areal reason.

Second, it may improve performance: The guest may expect consecutivereads to be fast; but if the clusters are sprinkled all over the host,consecutive guest reads no longer necessarily translate to consecutivereads on the host (same for writes, of course). Defragmentation wouldprobably fix that, but if you want to rely on this, you'd better usepreallocated image files.

Third, it looks better. People expect the file length to be rawindicator of the file size. However, for me this is related to "it makesyou feel better", because this also is not a really good reason.

Fourth, using a non-modern file system may let your file size explodebecause suddenly, file length is actually equal to the file size. But Ithink, in this case you should just use a better file system.

I don't know whether "cp" copies holes in files; its manpage says itdoes create sparse images, but I don't know how well it works; but Ijust assume it works well enough.

Max

Jun Li

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking, Jun Li, 2015/01/03
- Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking, Max Reitz, 2015/01/15
  - Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking, Jun Li, 2015/01/19
    - Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking, Max Reitz <=
    - Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking, Jun Li, 2015/01/27

Prev by Date: Re: [Qemu-devel] [PATCH 1/1] edu: fix license information
Next by Date: [Qemu-devel] [PATCH] target-i386: Disable HLE and RTM on Haswell & Broadwell
Previous by thread: Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Next by thread: Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Index(es):
- Date
- Thread