[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queu
From: |
Zhu Yangyang |
Subject: |
Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup |
Date: |
Mon, 1 Apr 2024 16:04:40 +0800 |
On Thu, 28 Mar 2024 07:40:14 -0500, Eric Blake wrote:
> On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang via wrote:
> > If g_main_loop_run()/aio_poll() is called in the coroutine context,
> > the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> > may be disordered.
> >
> > When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> > some listened events is completed. Therefore, the completion callback
> > function is dispatched.
> >
> > If this callback function needs to invoke aio_co_enter(), it will only
> > wake up the coroutine (because we are already in coroutine context),
> > which may cause that the data on this listening event_fd/socket_fd
> > is not read/cleared. When the next poll () exits, it will be woken up again
> > and inserted into the wakeup queue again.
> >
> > For example, if TLS is enabled in NBD, the server will call
> > g_main_loop_run()
> > in the coroutine, and repeatedly wake up the io_read event on a socket.
> > The call stack is as follows:
> >
> > aio_co_enter()
> > aio_co_wake()
> > qio_channel_restart_read()
> > aio_dispatch_handler()
> > aio_dispatch_handlers()
> > aio_dispatch()
> > aio_ctx_dispatch()
> > g_main_context_dispatch()
> > g_main_loop_run()
> > nbd_negotiate_handle_starttls()
> > nbd_negotiate_options()
> > nbd_negotiate()
> > nbd_co_client_start()
> > coroutine_trampoline()
>
> zhuyangyang, do you have a reliable reproduction setup for how you
> were able to trigger this? Obviously, it only happens when TLS is
> enabled (we aren't creating a g_main_loop_run for any other NBD
> command), and only when the server is first starting to serve a
> client; is this a case where you were hammering a long-running qemu
> process running an NBD server with multiple clients trying to
> reconnect to the server all near the same time?
This problem cannot be reproduced after
7c1f51bf38 ("nbd/server: Fix drained_poll to wake coroutine in right
AioContext")
that avoids repeatedly waking up the same coroutine.
Invoking g_main_loop_run() in the coroutine will cause that
event completion callback function qio_channel_restart_read() is called
repeatedly,
but the coroutine is woken up only once.
The key modifications are as follows:
static void qio_channel_restart_read(void *opaque)
{
QIOChannel *ioc = opaque;
- Coroutine *co = ioc->read_coroutine;
+ Coroutine *co = qatomic_xchg(&ioc->read_coroutine, NULL);
+
+ if (!co) {
+ return;
+ }
/* Assert that aio_co_wake() reenters the coroutine directly */
assert(qemu_get_current_aio_context() ==
qemu_coroutine_get_aio_context(co));
aio_co_wake(co);
}
The root cause is that poll() is invoked in coroutine context.
>
> If we can come up with a reliable formula for reproducing the
> corrupted coroutine list, it would make a great iotest addition
> alongside the existing qemu-iotests 233 for ensuring that NBD TLS
> traffic is handled correctly in both server and client.
>
> >
> > Signed-off-by: zhuyangyang <zhuyangyang14@huawei.com>
>
> Side note: this appears to be your first qemu contribution (based on
> 'git shortlog --author zhuyangyang'). While I am not in a position to
> presume how you would like your name Anglicized, I will point out that
> the prevailing style is to separate given name from family name (just
> because your username at work has no spaces does not mean that your
> S-o-b has to follow suit). It is also permissible to list your name
> in native characters alongside or in place of the Anglicized version;
> for example, 'git log --author="Stefano Dong"' shows this technique.
--
Best Regards,
Zhu Yangyang
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup,
Zhu Yangyang <=