qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 3/4] qcow2: add zoned emulation capability


From: Sam Li
Subject: Re: [PATCH v7 3/4] qcow2: add zoned emulation capability
Date: Mon, 23 Sep 2024 15:40:54 +0200

Hi Damien,

Damien Le Moal <dlemoal@kernel.org> 于2024年9月23日周一 15:22写道:
>
> On 2024/09/23 13:06, Sam Li wrote:
>
> [...]
>
> >>> @@ -2837,6 +3180,19 @@ qcow2_co_pwritev_part(BlockDriverState *bs, 
> >>> int64_t offset, int64_t bytes,
> >>>          qiov_offset += cur_bytes;
> >>>          trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
> >>>      }
> >>> +
> >>> +    if (bs->bl.zoned == BLK_Z_HM) {
> >>> +        index = start_offset / zone_size;
> >>> +        wp = &bs->wps->wp[index];
> >>> +        if (!QCOW2_ZT_IS_CONV(*wp)) {
> >>> +            /* Advance the write pointer when the write completes */
> >>
> >> Updating the write pointer after I/O does not prevent other write
> >> requests from beginning at the same offset as this request. Multiple
> >> write request coroutines can run concurrently and only the first one
> >> should succeed. The others should fail if they are using the same
> >> offset.
> >>
> >> The comment above says "Real drives change states before it can write to
> >> the zone" and I think it's appropriate to update the write pointer
> >> before performing the write too. The qcow2 zone emulation code is
> >> different from the file-posix.c passthrough code. We are responsible for
> >> maintaining zoned metadata state and cannot wait for the result of the
> >> I/O to tell us what happened.
>
> Yes, correct. The wp MUST be updated when issuing the IO, with the assumption
> that the write IO will succeed (errors are rare !).
>
> > The problem of updating the write pointer before IO completion is the
> > failure case.  It can't be predicted in advance if an IO fails or not.
> > When write I/O fails, the wp should not be updated.
>
> Correct, if an IO fails, the wp should not be updated. However, that is not
> difficult to deal with:
> 1) under the zone lock, advance the wp position when issuing the write IO
> 2) When the write IO completes with success, nothing else needs to be done.
> 3) When *any* write IO completes with error you need to:
>         - Lock the zone
>         - Do a report zone for the target zone of the failed write to get the 
> current
> wp location
>         - Update bs->wps->wp[index] using that current wp location
>         - Unlock the zone
>
> With that, one may get a few errors if multiple async writes are being issued,
> but that behavior is consistent with the same happening with a real drive. So 
> no
> issue. And since the report zones gets you the current wp location, the user 
> can
> restart writing from that location once it has dealt with all the previous 
> write
> failures.

I see. To allow the concurrent writes, the lock will only be used on
the failure path while processing append writes.

>
> > The alternative way is to hold the wps lock as is also required for wp
> > accessing. Therefore only one of multiple concurrent write requests
> > will succeed.
>
> That is a very simple solution that avoids the above error recovery, but that
> would be very bad for performance (especially for a pure sequential write
> workload as we would limit IOs to quue depth 1). So if we can avoid this 
> simple
> approach, that would be a lot better.

Yeah, I'll drop this approach. Although, it reminds me of how
file-posix driver emulates zone_append. It holds the lock whenever
accessing wps. Does that limit IOs to QD 1 too? If so, it can be
improved.
-- one zone_append starts
>> wp_lock()
>>> IO processing
>>>> wp_update
>>>>> wp_unlock()
-- ends

https://github.com/qemu/qemu/blob/master/block/file-posix.c#L2492

Sam

>
>
> --
> Damien Le Moal
> Western Digital Research



reply via email to

[Prev in Thread] Current Thread [Next in Thread]