[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 1/3] block: zero data data corruption using prealloc-filte
From: |
Kevin Wolf |
Subject: |
Re: [PATCH v3 1/3] block: zero data data corruption using prealloc-filter |
Date: |
Thu, 18 Jul 2024 17:51:20 +0200 |
Am 16.07.2024 um 16:41 hat Andrey Drobyshev geschrieben:
> From: "Denis V. Lunev" <den@openvz.org>
>
> We have observed that some clusters in the QCOW2 files are zeroed
> while preallocation filter is used.
>
> We are able to trace down the following sequence when prealloc-filter
> is used:
> co=0x55e7cbed7680 qcow2_co_pwritev_task()
> co=0x55e7cbed7680 preallocate_co_pwritev_part()
> co=0x55e7cbed7680 handle_write()
> co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes()
> co=0x55e7cbed7680 raw_do_pwrite_zeroes()
> co=0x7f9edb7fe500 do_fallocate()
>
> Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine
> 0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is
> time to handle next coroutine, which
> co=0x55e7cbee91b0 qcow2_co_pwritev_task()
> co=0x55e7cbee91b0 preallocate_co_pwritev_part()
> co=0x55e7cbee91b0 handle_write()
> co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes()
> co=0x55e7cbee91b0 raw_do_pwrite_zeroes()
> co=0x7f9edb7deb00 do_fallocate()
>
> The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced
> file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for
> the same area. This means that if (once fallocate is started inside
> 0x7f9edb7deb00) original fallocate could end and the real write will
> be executed. In that case write() request is handled at the same time
> as fallocate().
>
> The patch moves s->file_lock assignment before fallocate and that is
s/file_lock/file_end/?
> crucial. The idea is that all subsequent requests into the area
> being preallocation will be issued as just writes without fallocate
> to this area and they will not proceed thanks to overlapping
> requests mechanics. If preallocation will fail, we will just switch
> to the normal expand-by-write behavior and that is not a problem
> except performance.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Tested-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
> block/preallocate.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/block/preallocate.c b/block/preallocate.c
> index d215bc5d6d..ecf0aa4baa 100644
> --- a/block/preallocate.c
> +++ b/block/preallocate.c
> @@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset,
> int64_t bytes,
>
> want_merge_zero = want_merge_zero && (prealloc_start <= offset);
>
> + /*
> + * Assign file_end before making actual preallocation. This will ensure
> + * that next request performed while preallocation is in progress will
> + * be passed without preallocation.
> + */
> + s->file_end = prealloc_end;
> +
> ret = bdrv_co_pwrite_zeroes(
> bs->file, prealloc_start, prealloc_end - prealloc_start,
> BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
> @@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset,
> int64_t bytes,
> return false;
> }
>
> - s->file_end = prealloc_end;
> return want_merge_zero;
> }
Until bdrv_co_pwrite_zeroes() completes successfully, the file size is
unchanged. In other words, if anything calls preallocate_co_getlength()
in the meantime, don't we run into...
ret = bdrv_co_getlength(bs->file->bs);
if (has_prealloc_perms(bs)) {
s->file_end = s->zero_start = s->data_end = ret;
}
...and reset s->file_end back to the old value, re-enabling the bug
you're trying to fix here?
Or do we know that no such code path can be called concurrently for some
reason?
Kevin