[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Stable-8.2.1 64/71] block/blklogwrites: Fix a bug when logging "write z
From: |
Michael Tokarev |
Subject: |
[Stable-8.2.1 64/71] block/blklogwrites: Fix a bug when logging "write zeroes" operations. |
Date: |
Sun, 28 Jan 2024 20:50:27 +0300 |
From: Ari Sundholm <ari@tuxera.com>
There is a bug in the blklogwrites driver pertaining to logging "write
zeroes" operations, causing log corruption. This can be easily observed
by setting detect-zeroes to something other than "off" for the driver.
The issue is caused by a concurrency bug pertaining to the fact that
"write zeroes" operations have to be logged in two parts: first the log
entry metadata, then the zeroed-out region. While the log entry
metadata is being written by bdrv_co_pwritev(), another operation may
begin in the meanwhile and modify the state of the blklogwrites driver.
This is as intended by the coroutine-driven I/O model in QEMU, of
course.
Unfortunately, this specific scenario is mishandled. A short example:
1. Initially, in the current operation (#1), the current log sector
number in the driver state is only incremented by the number of sectors
taken by the log entry metadata, after which the log entry metadata is
written. The current operation yields.
2. Another operation (#2) may start while the log entry metadata is
being written. It uses the current log position as the start offset for
its log entry. This is in the sector right after the operation #1 log
entry metadata, which is bad!
3. After bdrv_co_pwritev() returns (#1), the current log sector
number is reread from the driver state in order to find out the start
offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
offset will be the sector right after the (misplaced) operation #2 log
entry, which means that the zeroed-out region begins at the wrong
offset.
4. As a result of the above, the log is corrupt.
Fix this by only reading the driver metadata once, computing the
offsets and sizes in one go (including the optional zeroed-out region)
and setting the log sector number to the appropriate value for the next
operation in line.
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Cc: qemu-stable@nongnu.org
Message-ID: <20240109184646.1128475-1-megari@gmx.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a9c8ea95470c27a8a02062b67f9fa6940e828ab6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3678f6cf42..84e03f309f 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -328,22 +328,39 @@ static void coroutine_fn GRAPH_RDLOCK
blk_log_writes_co_do_log(BlkLogWritesLogReq *lr)
{
BDRVBlkLogWritesState *s = lr->bs->opaque;
- uint64_t cur_log_offset = s->cur_log_sector << s->sectorbits;
- s->nr_entries++;
- s->cur_log_sector +=
- ROUND_UP(lr->qiov->size, s->sectorsize) >> s->sectorbits;
+ /*
+ * Determine the offsets and sizes of different parts of the entry, and
+ * update the state of the driver.
+ *
+ * This needs to be done in one go, before any actual I/O is done, as the
+ * log entry may have to be written in two parts, and the state of the
+ * driver may be modified by other driver operations while waiting for the
+ * I/O to complete.
+ */
+ const uint64_t entry_start_sector = s->cur_log_sector;
+ const uint64_t entry_offset = entry_start_sector << s->sectorbits;
+ const uint64_t qiov_aligned_size = ROUND_UP(lr->qiov->size, s->sectorsize);
+ const uint64_t entry_aligned_size = qiov_aligned_size +
+ ROUND_UP(lr->zero_size, s->sectorsize);
+ const uint64_t entry_nr_sectors = entry_aligned_size >> s->sectorbits;
- lr->log_ret = bdrv_co_pwritev(s->log_file, cur_log_offset, lr->qiov->size,
+ s->nr_entries++;
+ s->cur_log_sector += entry_nr_sectors;
+
+ /*
+ * Write the log entry. Note that if this is a "write zeroes" operation,
+ * only the entry header is written here, with the zeroing being done
+ * separately below.
+ */
+ lr->log_ret = bdrv_co_pwritev(s->log_file, entry_offset, lr->qiov->size,
lr->qiov, 0);
/* Logging for the "write zeroes" operation */
if (lr->log_ret == 0 && lr->zero_size) {
- cur_log_offset = s->cur_log_sector << s->sectorbits;
- s->cur_log_sector +=
- ROUND_UP(lr->zero_size, s->sectorsize) >> s->sectorbits;
+ const uint64_t zeroes_offset = entry_offset + qiov_aligned_size;
- lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, cur_log_offset,
+ lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, zeroes_offset,
lr->zero_size, 0);
}
--
2.39.2
- [Stable-8.2.1 00/71] Patch Round-up for stable 8.2.1, frozen on 2024-01-27, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 57/71] tcg/s390x: Fix encoding of VRIc, VRSa, VRSc insns, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 61/71] linux-user/riscv: Adjust vdso signal frame cfa offsets, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 62/71] tcg/arm: Fix SIGILL in tcg_out_qemu_st_direct, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 56/71] accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 55/71] acpi/tests/avocado/bits: wait for 200 seconds for SHUTDOWN event from bits VM, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 65/71] iotests: add filter_qmp_generated_node_ids(), Michael Tokarev, 2024/01/28
- [Stable-8.2.1 60/71] linux-user: Fixed cpu restore with pc 0 on SIGBUS, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 64/71] block/blklogwrites: Fix a bug when logging "write zeroes" operations.,
Michael Tokarev <=
- [Stable-8.2.1 59/71] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 66/71] iotests: port 141 to Python for reliable QMP testing, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 58/71] coroutine-ucontext: Save fake stack for pooled coroutine, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 63/71] virtio-net: correctly copy vnet header when flushing TX, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 67/71] monitor: only run coroutine commands in qemu_aio_context, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 68/71] qtest: bump aspeed_smc-test timeout to 6 minutes, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 70/71] target/arm: Fix A64 scalar SQSHRN and SQRSHRN, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 71/71] target/arm: Fix incorrect aa64_tidcp1 feature check, Michael Tokarev, 2024/01/28
- [Stable-8.2.1 69/71] target/xtensa: fix OOB TLB entry access, Michael Tokarev, 2024/01/28