qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] block/blklogwrites: Fix a bug when logging "write zeroes" op


From: Ari Sundholm
Subject: Re: [PATCH] block/blklogwrites: Fix a bug when logging "write zeroes" operations.
Date: Wed, 10 Jan 2024 17:21:28 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

Hi, Kevin!

On 1/10/24 15:39, Kevin Wolf wrote:
Am 09.01.2024 um 19:46 hat megari@gmx.com geschrieben:
From: Ari Sundholm <ari@tuxera.com>

There is a bug in the blklogwrites driver pertaining to logging "write
zeroes" operations, causing log corruption. This can be easily observed
by setting detect-zeroes to something other than "off" for the driver.

The issue is caused by a concurrency bug pertaining to the fact that
"write zeroes" operations have to be logged in two parts: first the log
entry metadata, then the zeroed-out region. While the log entry
metadata is being written by bdrv_co_pwritev(), another operation may
begin in the meanwhile and modify the state of the blklogwrites driver.
This is as intended by the coroutine-driven I/O model in QEMU, of
course.

Unfortunately, this specific scenario is mishandled. A short example:
     1. Initially, in the current operation (#1), the current log sector
number in the driver state is only incremented by the number of sectors
taken by the log entry metadata, after which the log entry metadata is
written. The current operation yields.
     2. Another operation (#2) may start while the log entry metadata is
being written. It uses the current log position as the start offset for
its log entry. This is in the sector right after the operation #1 log
entry metadata, which is bad!
     3. After bdrv_co_pwritev() returns (#1), the current log sector
number is reread from the driver state in order to find out the start
offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
offset will be the sector right after the (misplaced) operation #2 log
entry, which means that the zeroed-out region begins at the wrong
offset.
     4. As a result of the above, the log is corrupt.

Fix this by only reading the driver metadata once, computing the
offsets and sizes in one go (including the optional zeroed-out region)
and setting the log sector number to the appropriate value for the next
operation in line.

Signed-off-by: Ari Sundholm <ari@tuxera.com>
Cc: qemu-stable@nongnu.org

Thanks, applied to the block branch.


Thank you.

Note that while this fixes the single threaded case, it is not thread
safe and will still break with multiqueue enabled (see the new
iothread-vq-mapping option added to virtio-blk very recently). Most
block drivers already take a lock when modifying their global state, and
it looks like we missed blklogwrites when checking what needs to be
prepared for the block layer to be thread safe.


I see. Thanks for the heads up. I had missed this new development.

So I think we'll want another patch to add a QemuMutex that can be taken
while you do the calculations on s->cur_log_sector. But this patch is
a good first step because it means that we don't need to keep a lock
across an I/O request (just for the sake of completeness, this would
have had to be a CoMutex rather than a QemuMutex).


OK. I guess I have a bit of additional work to do, then. What release would these fixes realistically make it to? Just trying to gauge the urgency for the fix for the multi-threaded case for prioritization purposes.

And yes, holding a CoMutex while doing I/O would have fixed this issue, with the tiny drawback of killing any concurrency in the driver. ;)

Best regards,
Ari Sundholm
ari@tuxera.com

Kevin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]