[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above |
Date: |
Wed, 20 Nov 2019 13:37:23 +0000 |
20.11.2019 15:04, Vladimir Sementsov-Ogievskiy wrote:
> 20.11.2019 14:44, Kevin Wolf wrote:
>> Am 20.11.2019 um 11:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 16.11.2019 19:34, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi all!
>>>>
>>>> I wanted to understand, what is the real difference between
>>>> bdrv_block_status_above
>>>> and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work
>>>> through
>>>> bdrv_block_status_above..
>>>>
>>>> And I found the problem: bdrv_is_allocated_above considers space after EOF
>>>> as
>>>> UNALLOCATED for intermediate nodes..
>>>>
>>>> UNALLOCATED is not about allocation at fs level, but about should we go to
>>>> backing or
>>>> not.. And it seems incorrect for me, as in case of short backing file,
>>>> we'll read
>>>> zeroes after EOF, instead of going further by backing chain.
>>>>
>>>> This leads to the following effect:
>>>>
>>>> ./qemu-img create -f qcow2 base.qcow2 2M
>>>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
>>>>
>>>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
>>>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
>>>>
>>>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
>>>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
>>>> read 1048576/1048576 bytes at offset 1048576
>>>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
>>>>
>>>> But after commit guest visible state is changed, which seems wrong for me:
>>>> ./qemu-img commit top.qcow2 -b mid.qcow2
>>>>
>>>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
>>>> Pattern verification failed at offset 1048576, 1048576 bytes
>>>> read 1048576/1048576 bytes at offset 1048576
>>>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
>>>>
>>>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
>>>> read 1048576/1048576 bytes at offset 1048576
>>>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
>>>>
>>>>
>>>> I don't know, is it a real bug, as I don't know, do we support backing
>>>> file larger than
>>>> its parent. Still, I'm not sure that this behavior of
>>>> bdrv_is_allocated_above don't lead
>>>> to other problems.
>>>>
>>>> =====
>>>>
>>>> Hmm, bdrv_block_allocated_above behaves strange too:
>>>>
>>>> with want_zero=true, it may report unallocated zeroes because of short
>>>> backing files, which
>>>> are actually "allocated" in POV of backing chains. But I see this may
>>>> influence only
>>>> qemu-img compare, and I don't see can it trigger some bug..
>>>>
>>>> with want_zero=false, it may do no progress because of short backing file.
>>>> Moreover it may
>>>> report EOF in the middle!! But want_zero=false used only in
>>>> bdrv_is_allocated, which considers
>>>> onlyt top layer, so it seems OK.
>>>>
>>>> =====
>>>>
>>>> So, I propose these series, still I'm not sure is there a real bug.
>>>>
>>>> Vladimir Sementsov-Ogievskiy (4):
>>>> block/io: fix bdrv_co_block_status_above
>>>> block/io: bdrv_common_block_status_above: support include_base
>>>> block/io: bdrv_common_block_status_above: support bs == base
>>>> block/io: fix bdrv_is_allocated_above
>>>>
>>>> block/io.c | 104 ++++++++++++++++++-------------------
>>>> tests/qemu-iotests/154.out | 4 +-
>>>> 2 files changed, 53 insertions(+), 55 deletions(-)
>>>>
>>>
>>>
>>> Interesting that the problem illustrated here is not fixed by the series,
>>> it's actually
>>> relates to the fact that mirror does truncation with PREALLOC_MODE_OFF,
>>> which leads
>>> to unallocated qcow2 clusters, which I think should be fixed too.
>>
>> Yes, this is what I posted yesterday. (With a suggested quick fix, but
>> it turns out it was not quite correct, see below.)
>>
>>> To illustrate the problem fixed by the series, we should commit to base:
>>>
>>> # ./qemu-img commit top.qcow2 -b base.qcow2
>>> Image committed.
>>> # ./qemu-io -c "read -P 0 1M 1M" base.qcow2
>>> Pattern verification failed at offset 1048576, 1048576 bytes
>>> read 1048576/1048576 bytes at offset 1048576
>>> 1 MiB, 1 ops; 00.00 sec (5.366 GiB/sec and 5494.4149 ops/sec)
>>
>> Ok, I'll try that later.
>>
>>> Hmm, but how to fix the problem about truncate? I think truncate must
>>> not make underlying backing available for read.. Discard operation
>>> doesn't do it.
>>>
>>> So, actually on PREALLOC_MODE_OFF we must allocated L2 tables and mark
>>> new clusters as ZERO?
>>
>> Yes, we need to write zeroes to the new area if the backing file covers
>> it. We need to do this not only in mirror/commit/bdrv_commit(), but in
>> fact for all truncate operations: Berto mentioned on IRC yesterday that
>> you can get into the same situation with 'block_resize' monitor
>> commands.
>>
>> So I tried to fix this yesterday, and I thought that I had a fix, when I
>> noticed that bdrv_co_do_zero_pwritev() takes a 32 bit bytes parameter.
>> So I'll still need to fix this. Other than that, I suppose the following
>> fix should work (but is probably a bit too invasive for -rc3).
>>
>> Kevin
>>
>> diff --git a/block/io.c b/block/io.c
>> index f75777f5ea..4118bf0118 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -3382,6 +3382,32 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child,
>> int64_t offset, bool exact,
>> goto out;
>> }
>>
>> + /*
>> + * If the image has a backing file that is large enough that it would
>> + * provide data for the new area, we cannot leave it unallocated because
>> + * then the backing file content would become visible. Instead,
>> zero-fill
>> + * the area where backing file and new area overlap.
>> + */
>> + if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
>> + int64_t backing_len;
>> +
>> + backing_len = bdrv_getlength(backing_bs(bs));
>> + if (backing_len < 0) {
>> + ret = backing_len;
>> + goto out;
>> + }
>> +
>> + if (backing_len > old_size) {
>> + /* FIXME bytes parameter is 32 bits */
>> + ret = bdrv_co_do_zero_pwritev(child, old_size,
>> + MIN(new_bytes, backing_len -
>> old_size),
>> + BDRV_REQ_ZERO_WRITE |
>> BDRV_REQ_MAY_UNMAP, &req);
>> + if (ret < 0) {
>> + goto out;
>> + }
>> + }
>> + }
>> +
>> ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>> if (ret < 0) {
>> error_setg_errno(errp, -ret, "Could not refresh total sector
>> count");
>>
>
> I'm not sure that it is safe enough: we may not have opened backing at the
> moment, but
> still it may exist and managed by user.
>
> PREALLOC_MODE_OFF is documented as
> # @off: no preallocation
>
> - not very descriptive, but I think it's nothing about making backing file
> available
> through new clusters.
>
> I think PREALLOC_MODE_OFF should always make new clusters
> "BDRV_BLOCK_ALLOCATED". If
> for some scenarios (are they exist at all?) we need to preallocate cluster in
> manner
> that backing file would be visible through them, we'd better add another
> preallocation
> mode which will directly document this behaviour, like PREALLOC_MODE_BACKING.
>
> So, I'd consider PREALLOC_MODE_OFF as something that must not create
> UNALLOCATED (in POV
> of backing chains) clusters, and should be fixed in all formats.. Or as a
> quick fix may
> we may write zeros from bdrv_co_truncate, but independently of backing file
> existence
> and length.
>
> ===
Or visa-versa, if we can't change current qemu-img-create default, which means
preallocation="off"
=== UNALLOCATED transaprent image, we may improve preallocation:"off"
specification, mentioning
backing files, and add preallocation:"zero" mode, to be used in truncate.
>
> Also I think it's a wrong thing at all that qcow2 new file is transparent by
> default..
> It should be transparent only when we create snapshots and incremental
> backups. But when
> we create new disk for new vm it should be zeroed (and extending L1 table
> entry spec by
> "zero bit" may help)
>
--
Best regards,
Vladimir
- Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, (continued)
[PATCH 5/4] iotests: add commit top->base cases to 274, Vladimir Sementsov-Ogievskiy, 2019/11/20
Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above, Vladimir Sementsov-Ogievskiy, 2019/11/25