[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-stable] [PATCH v0 2/2] block: postpone the coroutine executing
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-stable] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained |
Date: |
Thu, 13 Sep 2018 10:44:50 +0200 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
Am 12.09.2018 um 19:03 hat Denis V. Lunev geschrieben:
> On 09/12/2018 04:15 PM, Kevin Wolf wrote:
> > Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
> >> On 10.09.2018 15:41, Kevin Wolf wrote:
> >>> Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> >>>> Fixes the problem of ide request appearing when the BDS is in
> >>>> the "drained section".
> >>>>
> >>>> Without the patch the request can come and be processed by the main
> >>>> event loop, as the ide requests are processed by the main event loop
> >>>> and the main event loop doesn't stop when its context is in the
> >>>> "drained section".
> >>>> The request execution is postponed until the end of "drained section".
> >>>>
> >>>> The patch doesn't modify ide specific code, as well as any other
> >>>> device code. Instead, it modifies the infrastructure of asynchronous
> >>>> Block Backend requests, in favor of postponing the requests arisen
> >>>> when in "drained section" to remove the possibility of request appearing
> >>>> for all the infrastructure clients.
> >>>>
> >>>> This approach doesn't make vCPU processing the request wait untill
> >>>> the end of request processing.
> >>>>
> >>>> Signed-off-by: Denis Plotnikov <address@hidden>
> >>> I generally agree with the idea that requests should be queued during a
> >>> drained section. However, I think there are a few fundamental problems
> >>> with the implementation in this series:
> >>>
> >>> 1) aio_disable_external() is already a layering violation and we'd like
> >>> to get rid of it (by replacing it with a BlockDevOps callback from
> >>> BlockBackend to the devices), so adding more functionality there
> >>> feels like a step in the wrong direction.
> >>>
> >>> 2) Only blk_aio_* are fixed, while we also have synchronous public
> >>> interfaces (blk_pread/pwrite) as well as coroutine-based ones
> >>> (blk_co_*). They need to be postponed as well.
> >> Good point! Thanks!
>
> Should we really prohibit all public interfaces, as they are reused
> inside block level?
We need to fix that. blk_*() should never be called from inside the BDS
layer.
> There is also a problem which is not stated in the clear words yet.
> We have potential deadlock in the code under the following
> conditions, which should be also taken into the consideration.
>
> <path from the controller>
> bdrv_co_pwritev
> bdrv_inc_in_flight
> bdrv_aligned_pwritev
> notifier_list_with_return_notify
> backup_before_write_notify
> backup_do_cow
> backup_cow_with_bounce_buffer
> blk_co_preadv
>
> Here blk_co_preadv() must finish its work before we will release the
> notifier and finish request initiated from the controller and which
> has incremented in-fligh counter.
Yes, before_write notifiers are evil. I've objected to them since the
day they were introduced and I'm surprised it's becoming a problem only
now.
We should probably change the backup job to insert a job node rather
sooner than later. Then it doesn't need to call blk_*() any more.
> Thus we should differentiate requests initiated at the controller
> level and requests initiated in the block layer. This is sad but
> true.
The difference is supposed to be whether a request goes through a
BlockBackend or not.
Kevin