qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/7] qcow2: make subclusters discardable


From: Jean-Louis Dupond
Subject: Re: [PATCH 4/7] qcow2: make subclusters discardable
Date: Fri, 19 Apr 2024 11:06:29 +0200
User-agent: Thunderbird Daily

On 16/04/2024 21:56, Andrey Drobyshev wrote:
On 10/27/23 14:10, Jean-Louis Dupond wrote:
[...]

I've checked all the code paths, and as far as I see it nowhere breaks
the discard_no_unref option.
It's important that we don't introduce new code paths that can make
holes in the qcow2 image when this option is enabled :)

If you can confirm my conclusion, that would be great.


Thanks
Jean-Louis

Hi Jean-Louis,

I've finally got to working on v2 for this series.  However I'm failing
to get a grasp on what this option is supposed to be doing and what are
we trying to avoid here.
The discard-no-unref option causes qemu to only zero the blocks/clusters that get discarded, but does NOT remove the reference of the cluster.
So the cluster stays allocated/referenced, but is just marked zero.

There are multiple scenario's where you would need this.
First of all when you have a pre-allocated image, you most likely created it because you don't want fragmentation. But if you don't have discard-no-unref enabled, you will end up with a fragmented image anyway, because discard will create holes in your image, and will be randomly allocated. Ending up with a fragmented image.

Another scenario (and why we implemented it), is that with a sparse image, you allocate new blocks at the end of the 'allocation pointer' (which points to the first available blocks in your image). But if you do discards, afaik the pointer is not moved to the freed cluster, but still allocates at the end until you reopen the image. And even then, take you created a hole of 5 free clusters, and you need to allocate 4 new clusters, it will use those 5 and leave 1 empty cluster. But the next allocation needs 2 clusters, it will jump to the next free space with at least 2 clusters. Leaving that 1 cluster unallocated. And this caused us to have 'sparse' images of 110GB for 100GB images for example. Just because the qcow2 images was full of small empty clusters completely fragmented.

Consider this simple example:

# cd build
# ./qemu-img create -f qcow2   unref.qcow2 192K
# ./qemu-img create -f qcow2 nounref.qcow2 192K
# ./qemu-io -c "write 0 192K"   unref.qcow2
# ./qemu-io -c "write 0 192K" nounref.qcow2
#
# strace -fv -e fallocate ./qemu-io -c "discard 64K 64K" unref.qcow2
[pid 887710] fallocate(9, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE,
393216, 65536) = 0
discard 65536/65536 bytes at offset 65536
64 KiB, 1 ops; 00.00 sec (252.123 MiB/sec and 4033.9660 ops/sec)
#
# strace -fv -e fallocate ./qemu-io -c "reopen -o discard-no-unref=on"
-c "discard 64K 64K" nounref.qcow2
# [pid 887789] fallocate(9, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE,
393216, 65536) = 0
discard 65536/65536 bytes at offset 65536
64 KiB, 1 ops; 00.00 sec (345.457 MiB/sec and 5527.3049 ops/sec)
#
# ./qemu-img check unref.qcow2

No errors were found on the image.
2/3 = 66.67% allocated, 50.00% fragmented, 0.00% compressed clusters
Image end offset: 524288
# ./qemu-img check nounref.qcow2
No errors were found on the image.
3/3 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 524288
#
# ls -la *.qcow2

-rw-r--r-- 1 root root 524288 Apr 16 22:42 nounref.qcow2
-rw-r--r-- 1 root root 524288 Apr 16 22:41 unref.qcow2
# du --block-size=1 *.qcow2
397312  nounref.qcow2
397312  unref.qcow2

I understand that by keeping the L2 entry we achieve that cluster
remains formally allocated, but no matter whether "discard-no-unref"
option is enabled fallocate(FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE) is
being called leaving a hole in the file (e.g. file becomes sparse).
However you say in the comment above that we can't allow making new
holes in the file when this option is enabled.  How does that correlate
and what do we achieve?  And which logic do you think we need to follow
when discarding separate subclusters?

Andrey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]