[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above |
Date: |
Tue, 19 Nov 2019 12:17:49 +0000 |
19.11.2019 15:05, Kevin Wolf wrote:
> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> I wanted to understand, what is the real difference between
>> bdrv_block_status_above and bdrv_is_allocated_above, IMHO
>> bdrv_is_allocated_above should work through bdrv_block_status_above..
>>
>> And I found the problem: bdrv_is_allocated_above considers space after
>> EOF as UNALLOCATED for intermediate nodes..
>>
>> UNALLOCATED is not about allocation at fs level, but about should we
>> go to backing or not.. And it seems incorrect for me, as in case of
>> short backing file, we'll read zeroes after EOF, instead of going
>> further by backing chain.
>
> We actually have documentation what it means:
>
> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> * layer rather than any backing, set by block layer
>
> Say we have a short overlay, like this:
>
> base.qcow2: AAAAAAAA
> overlay.qcow2: BBBB
>
> Then of course the content of block 5 (the one after EOF of
> overlay.qcow2) is still determined by overlay.qcow2, which can be easily
> verified by reading it from overlay.qcow2 (produces zeros) and from
> base.qcow2 (produces As).
>
> So the correct result when querying the block status of block 5 on
> overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
>
> Interestingly, we already fixed the opposite case (large overlay over
> short backing file) in commit e88ae2264d9 from May 2014 according to the
> same logic.
>
>> This leads to the following effect:
>>
>> ./qemu-img create -f qcow2 base.qcow2 2M
>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
>>
>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
>>
>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
>> read 1048576/1048576 bytes at offset 1048576
>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
>>
>> But after commit guest visible state is changed, which seems wrong for me:
>> ./qemu-img commit top.qcow2 -b mid.qcow2
>>
>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
>> Pattern verification failed at offset 1048576, 1048576 bytes
>> read 1048576/1048576 bytes at offset 1048576
>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
>>
>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
>> read 1048576/1048576 bytes at offset 1048576
>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
>>
>>
>> I don't know, is it a real bug, as I don't know, do we support backing
>> file larger than its parent. Still, I'm not sure that this behavior of
>> bdrv_is_allocated_above don't lead to other problems.
>
> I agree it's a bug.
>
> Your fix doesn't look right to me, though. You leave the buggy behaviour
> of bdrv_co_block_status() as it is and then add four patches to work
> around it in some (but not all) callers of it.
>
> All that it should take to fix this is making the bs->backing check
> independent from want_zero and let it set ALLOCATED. What I expected
> would be something like the below patch.
>
> But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in
> qemu-io shows that the range is now considered allocated), so probably
> there is still a separate bug in bdrv_is_allocated_above().
>
> And I think we'll want an iotests case for both cases (short overlay,
> short backing file).
>
> Kevin
>
>
> diff --git a/block/io.c b/block/io.c
> index f75777f5ea..5eafcff01a 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2359,16 +2359,17 @@ static int coroutine_fn
> bdrv_co_block_status(BlockDriverState *bs,
>
> if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
> ret |= BDRV_BLOCK_ALLOCATED;
> - } else if (want_zero) {
> - if (bdrv_unallocated_blocks_are_zero(bs)) {
> - ret |= BDRV_BLOCK_ZERO;
> - } else if (bs->backing) {
> - BlockDriverState *bs2 = bs->backing->bs;
> - int64_t size2 = bdrv_getlength(bs2);
> -
> - if (size2 >= 0 && offset >= size2) {
> + } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) {
> + ret |= BDRV_BLOCK_ZERO;
> + } else if (bs->backing) {
> + BlockDriverState *bs2 = bs->backing->bs;
> + int64_t size2 = bdrv_getlength(bs2);
> +
> + if (size2 >= 0 && offset >= size2) {
> + if (want_zero) {
> ret |= BDRV_BLOCK_ZERO;
> }
> + ret |= BDRV_BLOCK_ALLOCATED;
No, I think it's wrong, as it's not allocated at bs, but at bs->backing->bs.
So, bdrv_co_block_status works correct, what we can change about it, is not
to return pnum=0 if requested range after eof, but return pnum=bytes, together
with BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO | BDRV_BLOCK_EOF.
And actual problem is in bdrv_block_status_above and bdrv_is_allocated_above,
which
I'm fixing.
--
Best regards,
Vladimir
- Re: [PATCH 1/4] block/io: fix bdrv_co_block_status_above, (continued)
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Max Reitz, 2019/11/19
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Kevin Wolf, 2019/11/19
- Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above,
Vladimir Sementsov-Ogievskiy <=
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Kevin Wolf, 2019/11/19
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Kevin Wolf, 2019/11/19
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Stefan Hajnoczi, 2019/11/19
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Vladimir Sementsov-Ogievskiy, 2019/11/20