qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img


From: Martin Kletzander
Subject: Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
Date: Mon, 29 Apr 2019 09:27:34 +0200
User-agent: Mutt/1.11.4 (2019-03-13)

On Wed, Apr 24, 2019 at 09:19:17AM +0200, Kevin Wolf wrote:
Am 24.04.2019 um 08:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
23.04.2019 18:08, Kevin Wolf wrote:
> Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
>> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>>>> Hi,
>>>>
>>>> I am using qemu-img with nbdkit to transfer a disk image and the update it 
with
>>>> extra data from newer snapshots.  The end image cannot be transferred 
because
>>>> the snapshots will be created later than the first transfer and we want to 
save
>>>> some time up front.  You might think of it as a continuous 
synchronisation.  It
>>>> looks something like this:
>>>>
>>>> I first transfer the whole image:
>>>>
>>>>   qemu-img convert -p $nbd disk.raw
>>>>
>>>> Where `$nbd` is something along the lines of 
`nbd+unix:///?socket=nbdkit.sock`
>>>>
>>>> Then, after the next snapshot is created, I can update it thanks to the 
`-n`
>>>> parameter (the $nbd now points to the newer snapshot with unchanged data 
looking
>>>> like holes in the file):
>>>>
>>>>   qemu-img convert -p -n $nbd disk.raw
>>>>
>>>> This is fast and efficient as it uses block status nbd extension, so it 
only
>>>> transfers new data.
>>>
>>> This is an implementation detail. Don't rely on it. What you're doing is
>>> abusing 'qemu-img convert', so problems like what you describe are to be
>>> expected.
>>>
>>>> This can be done over and over again to keep the local
>>>> `disk.raw` image up to date with the latest remote snapshot.
>>>>
>>>> However, when the guest OS zeroes some of the data and it gets written 
into the
>>>> snapshot, qemu-img scans for those zeros and does not write them to the
>>>> destination image.  Checking the output of `qemu-img map --output=json 
$nbd`
>>>> shows that the zeroed data is properly marked as `data: true`.
>>>>
>>>> Using `-S 0` would write zeros even where the holes are, effectively 
overwriting
>>>> the data from the last snapshot even though they should not be changed.
>>>>
>>>> Having gone through some workarounds I would like there to be another way. 
 I
>>>> know this is far from the typical usage of qemu-img, but is this really the
>>>> expected behaviour or is this just something nobody really needed before?  
If it
>>>> is the former, would it be possible to have a parameter that would control 
this
>>>> behaviour?  If the latter is the case, can that behaviour be changed so 
that it
>>>> properly replicates the data when `-n` parameter is used?
>>>>
>>>> Basically the only thing we need is to either:
>>>>
>>>> 1) write zeros where they actually are or
>>>>
>>>> 2) turn off explicit sparsification without requesting dense image 
(basically
>>>>     sparsify only the par that is reported as hole on the source) or
>>>>
>>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report 
data,
>>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO 
which, I
>>>>     believe, is effectively the same)
>>>>
>>>> If you want to try this out, I found the easiest reproducible way is using
>>>> nbdkit's data plugin, which can simulate whatever source image you like.
>>>
>>> I think what you _really_ want is a commit block job. The problem is
>>> just that you don't have a proper backing file chain, but just a bunch
>>> of NBD connections.
>>>
>>> Can't you get an NBD connection that already provides the condensed form
>>> of the whole snapshot chain directly at the source? If the NBD server
>>> was QEMU, this would actually be easier than providing each snapshot
>>> individually.
>>>
>>> If this isn't possible, I think you need to replicate the backing chain
>>> on the destination instead of converting into the same image again and
>>> again so that qemu-img knows that it must take existing data of the
>>> backing file into consideration:
>>>
>>>     qemu-img convert -O qcow2 nbd://... base.qcow2
>>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... 
overlay1.qcow2
>>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... 
overlay2.qcow2
>>>     ...


So I spoke too soon.  This approach fixed the one thing that I was struggling 
with, but broke the rest, because it completely replicates the last image even 
when the source provides proper allocation data.  Best to show with an 
illustration:

 $ rm -f disk.img snap.img
 $ dd if=/dev/urandom of=disk.img bs=2M count=1
 $ dd if=/dev/zero of=snap.img bs=1M count=1
 $ truncate -s 2M snap.img
 $ qemu-img map --output=json snap.img
 [{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data": true, 
"offset": 0},
 { "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": false, 
"offset": 1048576}]
 $ qemu-img convert -f raw -O qcow2 disk.img disk.qcow2
 $ qemu-img convert -f raw -O qcow2 -B disk.qcow2 snap.img snap.qcow2
 $ qemu-img convert -f qcow2 -O raw snap.qcow2 output.raw
 $ hexdump -C output.raw
 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
 00200000

And qemu-img convert from qcow2 to raw is not broken

So it looks like either we add support for this specific feature in qemu-img or
we need to use our own client that does that.

Unless someone has other ideas, that is.

Martin

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]