[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queu
From: |
Eric Blake |
Subject: |
Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup |
Date: |
Mon, 25 Mar 2024 11:00:31 -0500 |
User-agent: |
NeoMutt/20240201 |
I've seen (and agree with) Stefan's reply that a more thorough audit
is needed, but am still providing a preliminary review based on what I
see here.
On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang via wrote:
> If g_main_loop_run()/aio_poll() is called in the coroutine context,
> the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> may be disordered.
coroutine context should not be doing anything that can block; you may
have uncovered a bigger bug that we need to address.
>
> When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> some listened events is completed. Therefore, the completion callback
> function is dispatched.
>
> If this callback function needs to invoke aio_co_enter(), it will only
> wake up the coroutine (because we are already in coroutine context),
> which may cause that the data on this listening event_fd/socket_fd
> is not read/cleared. When the next poll () exits, it will be woken up again
> and inserted into the wakeup queue again.
>
> For example, if TLS is enabled in NBD, the server will call g_main_loop_run()
> in the coroutine, and repeatedly wake up the io_read event on a socket.
> The call stack is as follows:
>
> aio_co_enter()
> aio_co_wake()
> qio_channel_restart_read()
> aio_dispatch_handler()
> aio_dispatch_handlers()
> aio_dispatch()
> aio_ctx_dispatch()
> g_main_context_dispatch()
> g_main_loop_run()
> nbd_negotiate_handle_starttls()
> nbd_negotiate_options()
> nbd_negotiate()
> nbd_co_client_start()
> coroutine_trampoline()
>
> Signed-off-by: zhuyangyang <zhuyangyang14@huawei.com>
> ---
> util/async.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/util/async.c b/util/async.c
> index 0467890052..25fc1e6083 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -705,7 +705,18 @@ void aio_co_enter(AioContext *ctx, Coroutine *co)
> if (qemu_in_coroutine()) {
> Coroutine *self = qemu_coroutine_self();
> assert(self != co);
> - QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
> + /*
> + * If the Coroutine *co is already in the co_queue_wakeup, this
> + * repeated insertion will causes the loss of other queue element
s/causes/cause/
> + * or infinite loop.
> + * For examplex:
s/examplex/example/
> + * Head->a->b->c->NULL, after insert_tail(head, b) =>
> Head->a->b->NULL
> + * Head->a-b>->NULL, after insert_tail(head, b) => Head->a->b->b...
s/b>->/b->/
> + */
> + if (!co->co_queue_next.sqe_next &&
> + self->co_queue_wakeup.sqh_last != &co->co_queue_next.sqe_next) {
> + QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
> + }
> } else {
> qemu_aio_coroutine_enter(ctx, co);
> }
Intuitively, attacking the symptoms (avoiding bogus list insertion
when it is already on the list) makes sense; but I want to make sure
we attack the root cause.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org