[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH] aio-posix: honor is_external in Ai
From: |
Fam Zheng |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH] aio-posix: honor is_external in AioContext polling |
Date: |
Tue, 24 Jan 2017 20:04:31 +0800 |
User-agent: |
Mutt/1.7.1 (2016-10-04) |
On Tue, 01/24 09:53, Stefan Hajnoczi wrote:
> AioHandlers marked ->is_external must be skipped when aio_node_check()
> fails. bdrv_drained_begin() needs this to prevent dataplane from
> submitting new I/O requests while another thread accesses the device and
> relies on it being quiesced.
>
> This patch fixes the following segfault:
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at
> qemu/block/io.c:2650
> 2650 bdrv_io_plug(child->bs);
> [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))]
> (gdb) bt
> #0 0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at
> qemu/block/io.c:2650
> #1 0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at
> qemu/block/block-backend.c:1561
> #2 0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030,
> vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589
> #3 0x00005577f5d4240d in virtio_blk_data_plane_handle_output
> (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at
> qemu/hw/block/dataplane/virtio-blk.c:158
> #4 0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at
> qemu/hw/virtio/virtio.c:1304
> #5 0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll
> (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134
> #6 0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at
> qemu/aio-posix.c:493
> #7 0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true)
> at qemu/aio-posix.c:569
> #8 0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at
> qemu/aio-posix.c:601
> #9 0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at
> qemu/block/io.c:2403
> #10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at
> qemu/block.c:2322
> #11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at
> qemu/block.c:2465
> #12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at
> qemu/block.c:3425
> #13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at
> qemu/block.c:1361
> #14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at
> qemu/block/block-backend.c:491
> #15 0x00005577f6111b1b in blk_remove_all_bs () at
> qemu/block/block-backend.c:245
> #16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382
> #17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398,
> envp=0x7ffea6eb8440) at qemu/vl.c:4684
>
> The key thing is that bdrv_close() uses bdrv_drained_begin() and
> virtio_queue_host_notifier_aio_poll() must not be called.
>
> Thanks to Fam Zheng <address@hidden> for identifying the root cause of
> this crash.
>
> Reported-by: Alberto Garcia <address@hidden>
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
> aio-posix.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/aio-posix.c b/aio-posix.c
> index 9453d83..a8d7090 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -508,7 +508,8 @@ static bool run_poll_handlers_once(AioContext *ctx)
>
> QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
> if (!node->deleted && node->io_poll &&
> - node->io_poll(node->opaque)) {
> + aio_node_check(ctx, node->is_external) &&
> + node->io_poll(node->opaque)) {
> progress = true;
> }
>
> --
> 2.9.3
>
>
The patch is not wrong and I believe it is enough to fix the crash, however it's
not enough...
All in all I think we should skip external handlers regardless of
aio_disable_external(), or even skip try_poll_mode, in nested aio_poll()'s. The
reasons are 1) many nested aio_poll()'s don't have bdrv_drained_begin, so this
check is not sufficient; 2) aio_poll() on qemu_aio_context doesn't look at
ioeventfd before, but this was changed by adding try_poll_mode(), which is not
very correct.
These two factors combined together make it possible for bdrv_flush() etc to
spin longer than necessary, if not forever, when the guest keeps submitting more
requests with ioeventfd.
Fam