[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
From: |
Kevin Wolf |
Subject: |
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above |
Date: |
Wed, 20 Nov 2019 12:44:08 +0100 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
Am 20.11.2019 um 11:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.11.2019 19:34, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> >
> > I wanted to understand, what is the real difference between
> > bdrv_block_status_above
> > and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work
> > through
> > bdrv_block_status_above..
> >
> > And I found the problem: bdrv_is_allocated_above considers space after EOF
> > as
> > UNALLOCATED for intermediate nodes..
> >
> > UNALLOCATED is not about allocation at fs level, but about should we go to
> > backing or
> > not.. And it seems incorrect for me, as in case of short backing file,
> > we'll read
> > zeroes after EOF, instead of going further by backing chain.
> >
> > This leads to the following effect:
> >
> > ./qemu-img create -f qcow2 base.qcow2 2M
> > ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> >
> > ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> > ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> >
> > Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> > ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> >
> > But after commit guest visible state is changed, which seems wrong for me:
> > ./qemu-img commit top.qcow2 -b mid.qcow2
> >
> > ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> > Pattern verification failed at offset 1048576, 1048576 bytes
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> >
> > ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> >
> >
> > I don't know, is it a real bug, as I don't know, do we support backing file
> > larger than
> > its parent. Still, I'm not sure that this behavior of
> > bdrv_is_allocated_above don't lead
> > to other problems.
> >
> > =====
> >
> > Hmm, bdrv_block_allocated_above behaves strange too:
> >
> > with want_zero=true, it may report unallocated zeroes because of short
> > backing files, which
> > are actually "allocated" in POV of backing chains. But I see this may
> > influence only
> > qemu-img compare, and I don't see can it trigger some bug..
> >
> > with want_zero=false, it may do no progress because of short backing file.
> > Moreover it may
> > report EOF in the middle!! But want_zero=false used only in
> > bdrv_is_allocated, which considers
> > onlyt top layer, so it seems OK.
> >
> > =====
> >
> > So, I propose these series, still I'm not sure is there a real bug.
> >
> > Vladimir Sementsov-Ogievskiy (4):
> > block/io: fix bdrv_co_block_status_above
> > block/io: bdrv_common_block_status_above: support include_base
> > block/io: bdrv_common_block_status_above: support bs == base
> > block/io: fix bdrv_is_allocated_above
> >
> > block/io.c | 104 ++++++++++++++++++-------------------
> > tests/qemu-iotests/154.out | 4 +-
> > 2 files changed, 53 insertions(+), 55 deletions(-)
> >
>
>
> Interesting that the problem illustrated here is not fixed by the series,
> it's actually
> relates to the fact that mirror does truncation with PREALLOC_MODE_OFF, which
> leads
> to unallocated qcow2 clusters, which I think should be fixed too.
Yes, this is what I posted yesterday. (With a suggested quick fix, but
it turns out it was not quite correct, see below.)
> To illustrate the problem fixed by the series, we should commit to base:
>
> # ./qemu-img commit top.qcow2 -b base.qcow2
> Image committed.
> # ./qemu-io -c "read -P 0 1M 1M" base.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (5.366 GiB/sec and 5494.4149 ops/sec)
Ok, I'll try that later.
> Hmm, but how to fix the problem about truncate? I think truncate must
> not make underlying backing available for read.. Discard operation
> doesn't do it.
>
> So, actually on PREALLOC_MODE_OFF we must allocated L2 tables and mark
> new clusters as ZERO?
Yes, we need to write zeroes to the new area if the backing file covers
it. We need to do this not only in mirror/commit/bdrv_commit(), but in
fact for all truncate operations: Berto mentioned on IRC yesterday that
you can get into the same situation with 'block_resize' monitor
commands.
So I tried to fix this yesterday, and I thought that I had a fix, when I
noticed that bdrv_co_do_zero_pwritev() takes a 32 bit bytes parameter.
So I'll still need to fix this. Other than that, I suppose the following
fix should work (but is probably a bit too invasive for -rc3).
Kevin
diff --git a/block/io.c b/block/io.c
index f75777f5ea..4118bf0118 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3382,6 +3382,32 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child,
int64_t offset, bool exact,
goto out;
}
+ /*
+ * If the image has a backing file that is large enough that it would
+ * provide data for the new area, we cannot leave it unallocated because
+ * then the backing file content would become visible. Instead, zero-fill
+ * the area where backing file and new area overlap.
+ */
+ if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+ int64_t backing_len;
+
+ backing_len = bdrv_getlength(backing_bs(bs));
+ if (backing_len < 0) {
+ ret = backing_len;
+ goto out;
+ }
+
+ if (backing_len > old_size) {
+ /* FIXME bytes parameter is 32 bits */
+ ret = bdrv_co_do_zero_pwritev(child, old_size,
+ MIN(new_bytes, backing_len -
old_size),
+ BDRV_REQ_ZERO_WRITE |
BDRV_REQ_MAY_UNMAP, &req);
+ if (ret < 0) {
+ goto out;
+ }
+ }
+ }
+
ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not refresh total sector count");
- Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, (continued)
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Kevin Wolf, 2019/11/19
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Stefan Hajnoczi, 2019/11/19
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Vladimir Sementsov-Ogievskiy, 2019/11/20
[PATCH 5/4] iotests: add commit top->base cases to 274, Vladimir Sementsov-Ogievskiy, 2019/11/20
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Vladimir Sementsov-Ogievskiy, 2019/11/25