[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
aio-poll dead-lock
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
aio-poll dead-lock |
Date: |
Thu, 17 Dec 2020 15:16:27 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 |
I don't think that it's something new, but just to keep it in mind:
blk_prw do polling with increased in-flight counter.
So, if some bh wants to do drain, we definitely dead-lock in a nested aio_poll
loop.
Here is a backtrace (comes from Virtuozzo branch, so I don't have reproducer
for master, but probably I'll return to this later):
#0 0x00007f895d751b56 in ppoll () at /lib64/libc.so.6
#1 0x0000558f664e371c in qemu_poll_ns (fds=0x558f6778e630, nfds=1, timeout=-1)
at util/qemu-timer.c:335
#2 0x0000558f664bf5a5 in fdmon_poll_wait (ctx=0x558f67769480,
ready_list=0x7ffe3d6be730, timeout=-1) at util/fdmon-poll.c:79
#3 0x0000558f664beed3 in aio_poll (ctx=0x558f67769480, blocking=true) at
util/aio-posix.c:600
#4 0x0000558f663e8f82 in bdrv_do_drained_begin (bs=0x558f6855bcc0,
recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at
block/io.c:435
#5 0x0000558f663e9067 in bdrv_drained_begin (bs=0x558f6855bcc0) at
block/io.c:441
#6 0x0000558f66411df3 in bdrv_backup_top_drop (bs=0x558f6855bcc0) at
block/backup-top.c:296
#7 0x0000558f6640a0de in backup_clean (job=0x558f6814f130) at
block/backup.c:109
#8 0x0000558f66372019 in job_clean (job=0x558f6814f130) at job.c:678
#9 0x0000558f66372094 in job_finalize_single (job=0x558f6814f130) at job.c:694
#10 0x0000558f66370c41 in job_txn_apply (job=0x558f6814f130, fn=0x558f6637201c
<job_finalize_single>) at job.c:158
#11 0x0000558f6637243b in job_do_finalize (job=0x558f6814f130) at job.c:803
#12 0x0000558f663725d8 in job_completed_txn_success (job=0x558f6814f130) at
job.c:853
#13 0x0000558f66372678 in job_completed (job=0x558f6814f130) at job.c:866
#14 0x0000558f663726cb in job_exit (opaque=0x558f6814f130) at job.c:886
#15 0x0000558f664d48eb in aio_bh_call (bh=0x558f683ee370) at util/async.c:136
#16 0x0000558f664d49f5 in aio_bh_poll (ctx=0x558f67769480) at util/async.c:164
#17 0x0000558f664bf0c6 in aio_poll (ctx=0x558f67769480, blocking=true) at
util/aio-posix.c:650
#18 0x0000558f663d357a in blk_prw (blk=0x558f677804d0, offset=0, buf=0x558f67f34000 '\253'
<repeats 200 times>..., bytes=65536, co_entry=0x558f663d339f <blk_write_entry>,
flags=0) at block/block-backend.c:1336
#19 0x0000558f663d3be3 in blk_pwrite (blk=0x558f677804d0, offset=0,
buf=0x558f67f34000, count=65536, flags=0) at block/block-backend.c:1502
#20 0x0000558f66374355 in do_pwrite (blk=0x558f677804d0, buf=0x558f67f34000 '\253'
<repeats 200 times>..., offset=0, bytes=65536, flags=0, total=0x7ffe3d6bec38)
at qemu-io-cmds.c:551
#21 0x0000558f6637566a in write_f (blk=0x558f677804d0, argc=4,
argv=0x558f685600d0) at qemu-io-cmds.c:1192
#22 0x0000558f66373244 in command (blk=0x558f677804d0, ct=0x558f67544a58,
argc=4, argv=0x558f685600d0) at qemu-io-cmds.c:118
#23 0x0000558f66377d80 in qemuio_command (blk=0x558f677804d0, cmd=0x558f67ff0ee0
"write -P0xab 0 64k") at qemu-io-cmds.c:2465
#24 0x0000558f6608badd in hmp_qemu_io (mon=0x7ffe3d6bee50,
qdict=0x558f68125010) at block/monitor/block-hmp-cmds.c:628
#25 0x0000558f662c76b2 in handle_hmp_command (mon=0x7ffe3d6bee50, cmdline=0x7f8948007688
"drive0 \"write -P0xab 0 64k\"") at monitor/hmp.c:1082
#26 0x0000558f65fb12c6 in qmp_human_monitor_command (command_line=0x7f8948007680 "qemu-io
drive0 \"write -P0xab 0 64k\"", has_cpu_index=false, cpu_index=0,
errp=0x7ffe3d6bef58)
at /work/src/qemu/vz-8.0/monitor/misc.c:141
#27 0x0000558f662facb1 in qmp_marshal_human_monitor_command
(args=0x7f8948007930, ret=0x7ffe3d6befe0, errp=0x7ffe3d6befd8) at
qapi/qapi-commands-misc.c:653
#28 0x0000558f66468ff9 in qmp_dispatch (cmds=0x558f66a9bd10 <qmp_commands>,
request=0x7f8948005600, allow_oob=false) at qapi/qmp-dispatch.c:155
#29 0x0000558f662c416c in monitor_qmp_dispatch (mon=0x558f67790ab0,
req=0x7f8948005600) at monitor/qmp.c:145
#30 0x0000558f662c451b in monitor_qmp_bh_dispatcher (data=0x0) at
monitor/qmp.c:234
#31 0x0000558f664d48eb in aio_bh_call (bh=0x558f67594bb0) at util/async.c:136
#32 0x0000558f664d49f5 in aio_bh_poll (ctx=0x558f675945f0) at util/async.c:164
#33 0x0000558f664be7ca in aio_dispatch (ctx=0x558f675945f0) at
util/aio-posix.c:380
#34 0x0000558f664d4e26 in aio_ctx_dispatch (source=0x558f675945f0,
callback=0x0, user_data=0x0) at util/async.c:306
#35 0x00007f895f6bf570 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#36 0x0000558f664dcd13 in glib_pollfds_poll () at util/main-loop.c:217
#37 0x0000558f664dcd8d in os_host_main_loop_wait (timeout=985763000) at
util/main-loop.c:240
#38 0x0000558f664dce92 in main_loop_wait (nonblocking=0) at util/main-loop.c:516
#39 0x0000558f65fcfe66 in qemu_main_loop () at
/work/src/qemu/vz-8.0/softmmu/vl.c:1676
#40 0x0000558f664625e4 in main (argc=20, argv=0x7ffe3d6bf468,
envp=0x7ffe3d6bf510) at /work/src/qemu/vz-8.0/softmmu/main.c:49
As far as I know, the only way to figth with this thing is moving things to
coroutine. So, I think, moving backup_clean to coroutine is a necessary thing.
Any thoughts?
--
Best regards,
Vladimir
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- aio-poll dead-lock,
Vladimir Sementsov-Ogievskiy <=