[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-discuss] Converting qcow2 image to raw thin lv
From: |
Wolfgang Bumiller |
Subject: |
Re: [Qemu-block] [Qemu-discuss] Converting qcow2 image to raw thin lv |
Date: |
Mon, 13 Feb 2017 13:11:51 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Mon, Feb 13, 2017 at 11:04:30AM +0100, Kevin Wolf wrote:
> Am 12.02.2017 um 01:58 hat Nir Soffer geschrieben:
> > On Sat, Feb 11, 2017 at 12:23 AM, Nir Soffer <address@hidden> wrote:
> > > Hi all,
> > >
> > > I'm trying to convert images (mostly qcow2) to raw format on thin lv,
> > > hoping to write only the allocated blocks on the thin lv, but
> > > it seems that qemu-img cannot write sparse image on a block
> > > device.
> > >
> > > (...)
> >
> > So it seems that qemu-img is trying to write a sparse image.
> >
> > I tested again with empty file:
> >
> > truncate -s 20m empty
> >
> > Using strace, qemu-img checks the device discard_zeroes_data:
> >
> > ioctl(11, BLKDISCARDZEROES, 0) = 0
> >
> > Then it find that the source is empty:
> >
> > lseek(10, 0, SEEK_DATA) = -1 ENXIO (No such device
> > or address)
> >
> > Then it issues one call
> >
> > [pid 10041] ioctl(11, BLKZEROOUT, 0x7f6049c82ba0) = 0
> >
> > And fsync and close the destination.
> >
> > # grep -s "" /sys/block/dm-57/queue/discard_*
> > /sys/block/dm-57/queue/discard_granularity:65536
> > /sys/block/dm-57/queue/discard_max_bytes:17179869184
> > /sys/block/dm-57/queue/discard_zeroes_data:0
> >
> > I wonder why discard_zeroes_data is 0, while discarding
> > blocks seems to zero them.
> >
> > Seems that this this bug:
> > https://bugzilla.redhat.com/835622
> >
> > thin lv does promise (by default) to zero new allocated blocks,
> > and it does returns zeros when reading unallocated data, like
> > a sparse file.
> >
> > Since qemu does not know that the thin lv is not allocated, it cannot
> > skip empty blocks safely.
> >
> > It would be useful if it had a flag to force sparsness when the
> > user knows that this operation is safe, or maybe we need a thin lvm
> > driver?
>
> Yes, I think your analysis is correct, I seem to remember that I've seen
> this happen before.
>
> The Right Thing (TM) to do, however, seems to be fixing the kernel so
> that BLKDISCARDZEROES correctly returns that discard does in fact zero
> out blocks on this device. As soon as this ioctl works correctly,
> qemu-img should just automatically do what you want.
>
> Now if it turns out it is important to support older kernels without the
> fix, we can think about a driver-specific option for the 'file' driver
> that overrides the kernel's value. But I really want to make sure that
> we use such workarounds only in addition, not instead of doing the
> proper root cause fix in the kernel.
>
> So can you please bring it up with the LVM people?
I'm not sure it's that easy. The discard granularity of LVM thin is not
equal to their reported block/sector sizes, but to the size of the
chunks they allocate.
# blockdev --getss /dev/dm-9
512
# blockdev --getbsz /dev/dm-9
4096
# blockdev --getpbsz /dev/dm-9
4096
# cat /sys/block/dm-9/queue/discard_granularity
131072
#
I currently don't see qemu using the discard_granularity property for
this purpose. IIRC the code for write_zeroes() eg. simply checks the
discard_zeroes flag but not what size it is trying to zero-out/discard.
We have an experimental semi-complete "can-do-footshooting" 'zeroinit'
filter for this purpose to basically explicitly set the "has_zero_init"
flag and drop "write_zeroes()" calls to blocks at an address greater
than the highest written one up to that point.
It should use a dirty bitmap instead and is sort of dangerous this way
which is why it's not on the qemu-devel list. But if this approach is at
all acceptable (despite being a hack) I could improve it and send it to
the list?
https://github.com/Blub/qemu/commit/6f6f38d2ef8f22a12f72e4d60f8a1fa978ac569a
(you'd just prefix the destination with `zeroinit:` in the qemu-img
command)
Additionally I'm currently still playing with the details and quirks of
various storages (lvm/dm thin, rbd, zvols) in an attempt to create a
tool to convert between various storages. (I did some successful tests
converting disk images between these storages & qcow2 together with
their snapshots in a COW-aware way...) I'm planning on releasing some
experimental code soon-ish (there's still some polishing to do though to
the documentation, the library's API and the format - and the qcow2
support is a patch for qemu-img to use the library.)
My adventures into dm-thin metadata allows me to answer this one though:
> > or maybe we need a thin lvm driver?
Probably not. It does not support SEEK_DATA/SEEK_HOLE and to my
knowledge also has no other sane metadata querying methods. You'd have
to read the metadata device instead. To do this properly you have to
reserve a metadata snapshot and there can only ever be one of those per
pool, which means you could only have 1 such disk in total running on a
system and no other dm-thin metadata aware tool could be used during
that time (otherwise the reserver operations will fail with an error and
qemu would have to wait&retry a lot...).