qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroe


From: Denis V. Lunev
Subject: Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes()
Date: Wed, 27 Apr 2016 10:07:35 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 04/26/2016 01:19 PM, Kevin Wolf wrote:

[skipped]
Did you ever talk to the kernel people?

We can try to make the best out of suboptimal requests in qemu, but it
looks to me as if the real fix was in the kernel, and if we don't get it
fixed there, we'll see more and more of this kind of problems. I think
this is relevant not only for VMs, but probably on real hardware as
well.
it is difficult and could be done partially only :(


Locally with QEMU-IO the reproduction is simple. We can repeat above
requests or could simple do the following:
     qemu-io -c "write 0xff 32k 1M" 1.img
I assume you mean "-z" instead of "0xff"?
sure


The code without the patch will allocate 2 blocks for guest offsets
0-64k and 1M-(1M+64k) and performs writes there. The code with
the patch will skip creation of blocks if possible.
Okay, that's the second of the optimisations you mentioned in your
commit message. I can see how this adds something that the generic block
layer can't easily add, if it can be made safe (I think it can, even
though your patch doesn't get it completely right yet, see below).

I have recorded parameters in qcow2_co_do_write_zeroes for the
reference (1 MB is written, memory is not aligned as sent in the
letter above, [sudo ./a.out /dev/nbd3]):

qcow2_co_write_zeroes off=0 size=10000
qcow2_co_write_zeroes off=1fe00 size=200
qcow2_co_write_zeroes off=3fe00 size=200
qcow2_co_write_zeroes off=5fe00 size=200
qcow2_co_write_zeroes off=7fe00 size=200
qcow2_co_write_zeroes off=9fe00 size=200
qcow2_co_write_zeroes off=bfe00 size=200
qcow2_co_write_zeroes off=dfe00 size=200
qcow2_co_write_zeroes off=ffe00 size=200
qcow2_co_write_zeroes off=10000 size=fe00
qcow2_co_write_zeroes off=20000 size=10000
qcow2_co_write_zeroes off=30000 size=fe00
qcow2_co_write_zeroes off=60000 size=10000
qcow2_co_write_zeroes off=70000 size=fe00
qcow2_co_write_zeroes off=c0000 size=10000
qcow2_co_write_zeroes off=d0000 size=fe00
qcow2_co_write_zeroes off=e0000 size=10000
qcow2_co_write_zeroes off=f0000 size=fe00
qcow2_co_write_zeroes off=80000 size=10000
qcow2_co_write_zeroes off=90000 size=fe00
qcow2_co_write_zeroes off=a0000 size=10000
qcow2_co_write_zeroes off=b0000 size=fe00
qcow2_co_write_zeroes off=40000 size=10000
qcow2_co_write_zeroes off=50000 size=fe00
I don't see any requests here where your code actually ends up splitting
the request into head, aligned part and tail. Which is expected because
bdrv_co_do_write_zeroes() already does that.

What I can't see here is whether this actually happened (10000 + fe00
could be a split request) or whether it already came in this way over
NBD.

bdrv_aligned_pwritev off=0 size=1fe00
qcow2_co_write_zeroes off=0 size=10000
qcow2_co_write_zeroes off=10000 size=fe00
bdrv_aligned_pwritev off=1fe00 size=20000
qcow2_co_write_zeroes off=1fe00 size=200
qcow2_co_write_zeroes off=20000 size=10000
qcow2_co_write_zeroes off=30000 size=fe00
bdrv_aligned_pwritev off=3fe00 size=20000
qcow2_co_write_zeroes off=3fe00 size=200
qcow2_co_write_zeroes off=40000 size=10000
qcow2_co_write_zeroes off=50000 size=fe00
bdrv_aligned_pwritev off=5fe00 size=20000
qcow2_co_write_zeroes off=5fe00 size=200
qcow2_co_write_zeroes off=60000 size=10000
qcow2_co_write_zeroes off=70000 size=fe00
bdrv_aligned_pwritev off=7fe00 size=20000
qcow2_co_write_zeroes off=7fe00 size=200
qcow2_co_write_zeroes off=80000 size=10000
qcow2_co_write_zeroes off=90000 size=fe00
bdrv_aligned_pwritev off=9fe00 size=20000
qcow2_co_write_zeroes off=9fe00 size=200
qcow2_co_write_zeroes off=a0000 size=10000
qcow2_co_write_zeroes off=b0000 size=fe00
bdrv_aligned_pwritev off=bfe00 size=20000
qcow2_co_write_zeroes off=bfe00 size=200
qcow2_co_write_zeroes off=c0000 size=10000
qcow2_co_write_zeroes off=d0000 size=fe00
bdrv_aligned_pwritev off=dfe00 size=20000
qcow2_co_write_zeroes off=dfe00 size=200
qcow2_co_write_zeroes off=e0000 size=10000
qcow2_co_write_zeroes off=f0000 size=fe00
bdrv_aligned_pwritev off=ffe00 size=200
qcow2_co_write_zeroes off=ffe00 size=200


[skipped]

Here is the trick that I think will save us:

On a misaligned call, we call bdrv_get_block_status_above() for the
whole cluster that we're in. We know that it's only a single cluster
because bdrv_co_do_write_zeroes() splits things this way; only aligned
requests can be longer than a sector (we can even assert this).

If the result is that the cluster already reads as zero, instead of
doing nothing and possibly breaking backing chain manipulations, we
simply extend the write zeroes operation to the whole cluster and
continue as normal with an aligned request. This way we end up with a
zero cluster instead of an unallocated one, and that should be safe.

If the result is that the cluster isn't completed zeroed, return
-ENOTSUP as you did in the snippet above.

That approach should probably result in an (even) simpler patch, too.

let's start from the simple thing. I'll send a patch in a couple of minutes.
It really looks MUCH better than the original one. Thank you for this cool
suggestion.


Hm... Or actually, if we want something more complex that will help all
block drivers, extending the range of the request could even be done in
bdrv_co_do_write_zeroes(), I guess. I won't insist on it, though.
I do not think that this should be done now. This patch should go into
our own stable and thus it would better be as simple as possible.

Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]