[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] AioContext locking vs. coroutines
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-devel] AioContext locking vs. coroutines |
Date: |
Thu, 29 Nov 2018 12:15:43 +0000 |
29.11.2018 13:42, Kevin Wolf wrote:
> Am 28.11.2018 um 21:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> We've faced the following mirror bug:
>>
>> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
>>
>> Note: I've decided to send this as a patch with reproducer, to make it
>> easier to reproduce). No needs to commit this before mirror fix, but
>> after, commit message may be a bit shortened, may be even up to just
>> "iotests: add simple mirror test".
>>
>> Note2: if drop 'kvm' option from the test it still reproduces, but if
>> use iotests.VM() - does not (may be, because of qtest?). If add
>> iothread, it doesn't reproduce too. Also, it doesn't reproduce with raw
>> instead of qcow2 and doesn't reproduce for small images.
>>
>> So, here is the dead-lock:
>>
>> (gdb) info thr
>> Id Target Id Frame
>> 3 Thread 0x7fd7fd4ea700 (LWP 145412) "qemu-system-x86" syscall () at
>> ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>> 2 Thread 0x7fd7fc205700 (LWP 145416) "qemu-system-x86"
>> __lll_lock_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> * 1 Thread 0x7fd8102cfcc0 (LWP 145411) "qemu-system-x86"
>> __lll_lock_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> (gdb) bt
>> #0 0x00007fd80e8864cd in __lll_lock_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> #1 0x00007fd80e881dcb in _L_lock_812 () at /lib64/libpthread.so.0
>> #2 0x00007fd80e881c98 in __GI___pthread_mutex_lock (mutex=0x561032dce420
>> <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
>> #3 0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561032dce420
>> <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at
>> util/qemu-thread-posix.c:66
>> #4 0x00005610320cb327 in qemu_mutex_lock_iothread_impl
>> (file=0x5610327d8654 "util/main-loop.c", line=236) at
>> /work/src/qemu/up-mirror-dead-lock/cpus.c:1849
>> #5 0x000056103265038f in os_host_main_loop_wait (timeout=480116000) at
>> util/main-loop.c:236
>> #6 0x0000561032650450 in main_loop_wait (nonblocking=0) at
>> util/main-loop.c:497
>> #7 0x00005610322575c9 in main_loop () at vl.c:1892
>> #8 0x000056103225f3c7 in main (argc=13, argv=0x7ffcc8bb15a8,
>> envp=0x7ffcc8bb1618) at vl.c:4648
>> (gdb) p qemu_global_mutex
>> $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 145416,
>> __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev =
>> 0x0, __next = 0x0}},
>> __size = "\002\000\000\000\000\000\000\000\b8\002\000\001", '\000'
>> <repeats 26 times>, __align = 2}, file = 0x56103267bcb0
>> "/work/src/qemu/up-mirror-dead-lock/exec.c", line = 3197, initialized = true}
>>
>> So, we see, that thr1 is main loop, and now it wants BQL, which is
>> owned by thr2.
>>
>> (gdb) thr 2
>> [Switching to thread 2 (Thread 0x7fd7fc205700 (LWP 145416))]
>> #0 __lll_lock_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> 135 2: movl %edx, %eax
>> (gdb) bt
>> #0 0x00007fd80e8864cd in __lll_lock_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> #1 0x00007fd80e881de6 in _L_lock_870 () at /lib64/libpthread.so.0
>> #2 0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0)
>> at ../nptl/pthread_mutex_lock.c:114
>> #3 0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561034d25dc0,
>> file=0x5610327d81d1 "util/async.c", line=511) at util/qemu-thread-posix.c:66
>> #4 0x000056103264d840 in aio_context_acquire (ctx=0x561034d25d60) at
>> util/async.c:511
>> #5 0x0000561032254554 in dma_blk_cb (opaque=0x7fd7f41463b0, ret=0) at
>> dma-helpers.c:169
>> #6 0x000056103225479d in dma_blk_io (ctx=0x561034d25d60,
>> sg=0x561035a80210, offset=0, align=512, io_func=0x5610322547a3
>> <dma_blk_read_io_func>, io_func_opaque=0x561034d40870, cb=0x561032340b35
>> <ide_dma_cb>, opaque=0x561035a7fee8, dir=DMA_DIRECTION_FROM_DEVICE) at
>> dma-helpers.c:227
>> #7 0x0000561032254855 in dma_blk_read (blk=0x561034d40870,
>> sg=0x561035a80210, offset=0, align=512, cb=0x561032340b35 <ide_dma_cb>,
>> opaque=0x561035a7fee8) at dma-helpers.c:245
>> #8 0x0000561032340e6e in ide_dma_cb (opaque=0x561035a7fee8, ret=0) at
>> hw/ide/core.c:910
>> #9 0x000056103234a912 in bmdma_cmd_writeb (bm=0x561035a81030, val=9) at
>> hw/ide/pci.c:240
>> #10 0x000056103234b5bd in bmdma_write (opaque=0x561035a81030, addr=0,
>> val=9, size=1) at hw/ide/piix.c:76
>> #11 0x00005610320e5763 in memory_region_write_accessor (mr=0x561035a81180,
>> addr=0, value=0x7fd7fc204678, size=1, shift=0, mask=255, attrs=...) at
>> /work/src/qemu/up-mirror-dead-lock/memory.c:504
>> #12 0x00005610320e596d in access_with_adjusted_size (addr=0,
>> value=0x7fd7fc204678, size=1, access_size_min=1, access_size_max=4,
>> access_fn=
>> 0x5610320e5683 <memory_region_write_accessor>, mr=0x561035a81180,
>> attrs=...) at /work/src/qemu/up-mirror-dead-lock/memory.c:570
>> #13 0x00005610320e86ce in memory_region_dispatch_write (mr=0x561035a81180,
>> addr=0, data=9, size=1, attrs=...) at
>> /work/src/qemu/up-mirror-dead-lock/memory.c:1452
>> #14 0x0000561032083770 in flatview_write_continue (fv=0x7fd7f4108090,
>> addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", len=1, addr1=0, l=1,
>> mr=0x561035a81180)
>> at /work/src/qemu/up-mirror-dead-lock/exec.c:3233
>> #15 0x00005610320838cf in flatview_write (fv=0x7fd7f4108090, addr=49216,
>> attrs=..., buf=0x7fd810309000 "\t\371\061", len=1) at
>> /work/src/qemu/up-mirror-dead-lock/exec.c:3272
>> #16 0x0000561032083bef in address_space_write (as=0x561032dcdc40
>> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061",
>> len=1) at /work/src/qemu/up-mirror-dead-lock/exec.c:3362
>> #17 0x0000561032083c40 in address_space_rw (as=0x561032dcdc40
>> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061",
>> len=1, is_write=true) at /work/src/qemu/up-mirror-dead-lock/exec.c:3373
>> #18 0x0000561032104766 in kvm_handle_io (port=49216, attrs=...,
>> data=0x7fd810309000, direction=1, size=1, count=1) at
>> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:1775
>> #19 0x0000561032104f0e in kvm_cpu_exec (cpu=0x561034d60240) at
>> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:2021
>> #20 0x00005610320ca396 in qemu_kvm_cpu_thread_fn (arg=0x561034d60240) at
>> /work/src/qemu/up-mirror-dead-lock/cpus.c:1281
>> #21 0x00005610326557ab in qemu_thread_start (args=0x561034d81a70) at
>> util/qemu-thread-posix.c:498
>> #22 0x00007fd80e87fdd5 in start_thread (arg=0x7fd7fc205700) at
>> pthread_create.c:308
>> #23 0x00007fd804c97b3d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>> (gdb) fr 2
>> #2 0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0)
>> at ../nptl/pthread_mutex_lock.c:114
>> 114 LLL_MUTEX_LOCK (mutex);
>> (gdb) p *mutex
>> $2 = {__data = {__lock = 2, __count = 16, __owner = 145411, __nusers = 1,
>> __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next =
>> 0x0}},
>> __size =
>> "\002\000\000\000\020\000\000\000\003\070\002\000\001\000\000\000\001",
>> '\000' <repeats 22 times>, __align = 68719476738}
>>
>> And thr2 wants aio context mutex, which is owned by thr1. Classic
>> dead-lock.
>>
>> But who took aio context? Suppose that it is mirror and check:
>>
>> (gdb) p blk_next(0)->root->bs->job
>> $3 = (BlockJob *) 0x5610358e2520
>> (gdb) p (MirrorBlockJob *)$3
>> $4 = (MirrorBlockJob *) 0x5610358e2520
>> [...]
>> (gdb) p $4->target->root->bs->tracked_requests.lh_first->co
>> $7 = (Coroutine *) 0x561035dd0860
>> [...]
>> (gdb) qemu coroutine 0x561035dd0860
>> #0 0x0000561032670ab2 in qemu_coroutine_switch (from_=0x561035dd0860,
>> to_=0x7fd8102cfb80, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
>> #1 0x000056103266f0fc in qemu_coroutine_yield () at
>> util/qemu-coroutine.c:194
>> #2 0x000056103266fb76 in qemu_co_mutex_lock_slowpath (ctx=0x561034d25d60,
>> mutex=0x561035c5a4b0) at util/qemu-coroutine-lock.c:235
>> #3 0x000056103266fc82 in qemu_co_mutex_lock (mutex=0x561035c5a4b0) at
>> util/qemu-coroutine-lock.c:273
>> #4 0x0000561032544f4e in qcow2_co_pwritev (bs=0x561035b86890,
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at
>> block/qcow2.c:2063
>> #5 0x000056103258e5df in bdrv_driver_pwritev (bs=0x561035b86890,
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at
>> block/io.c:1092
>> #6 0x000056103259004b in bdrv_aligned_pwritev (child=0x561035b178c0,
>> req=0x7fd7a23eae30, offset=531628032, bytes=1048576, align=1,
>> qiov=0x561034fd08a8, flags=0) at block/io.c:1719
>> #7 0x0000561032590cb2 in bdrv_co_pwritev (child=0x561035b178c0,
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at
>> block/io.c:1961
>> #8 0x0000561032578478 in blk_co_pwritev (blk=0x561035dd0400,
>> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at
>> block/block-backend.c:1203
>> #9 0x00005610325866a0 in mirror_read_complete (op=0x561034fd08a0, ret=0)
>> at block/mirror.c:232
>> #10 0x0000561032586e1a in mirror_co_read (opaque=0x561034fd08a0) at
>> block/mirror.c:370
>> #11 0x000056103267081d in coroutine_trampoline (i0=903678048, i1=22032) at
>> util/coroutine-ucontext.c:116
>> #12 0x00007fd804be0fc0 in __start_context () at /lib64/libc.so.6
>> #13 0x00007fd7fafff600 in ()
>> #14 0x0000000000000000 in ()
>>
>> Ok, look at mirror_read_complete:
>> static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
>> {
>> MirrorBlockJob *s = op->s;
>>
>> aio_context_acquire(blk_get_aio_context(s->common.blk));
>> if (ret < 0) {
>> BlockErrorAction action;
>>
>> bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
>> action = mirror_error_action(s, true, -ret);
>> if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>> s->ret = ret;
>> }
>>
>> mirror_iteration_done(op, ret);
>> } else {
>> ret = blk_co_pwritev(s->target, op->offset,
>> op->qiov.size, &op->qiov, 0);
>> mirror_write_complete(op, ret);
>> }
>> aio_context_release(blk_get_aio_context(s->common.blk));
>> }
>>
>> Line 232 if blk_co_pwritev. So, we acquired aio context and go to
>> write, which may yield, which makes the dead-lock possible.
>>
>> And bisect points to
>> commit 2e1990b26e5aa1ba1a730aa6281df123bb7a71b6 (refs/bisect/bad)
>> Author: Max Reitz <address@hidden>
>> Date: Wed Jun 13 20:18:11 2018 +0200
>>
>> block/mirror: Convert to coroutines
>>
>> So, yes, 3.0 has the bug.
>>
>> Interesting thing is that pair of aio_context_acquire / release exists
>> even before that patch, and it was added in 2.9:
>> commit b9e413dd3756f71abe4e8cafe1d7a459ce74ccf4
>> Author: Paolo Bonzini <address@hidden>
>> Date: Mon Feb 13 14:52:32 2017 +0100
>>
>> block: explicitly acquire aiocontext in aio callbacks that need it
>>
>> In 2.9 mirror_read_complete was a callback for blk_aio_preadv. But
>> honestly, I don't understand why it needed to acquire the context: it
>> is inserted to this context as a BH, why it need to additionally
>> acquire the context?
>
> BHs don't automatically take the AioContext lock of the AioContext from
> which they are run.
so, if I call aio_bh_schedule_oneshot(aio_contex, cb), I can't be sure, that
this
cb will run in this aio_context, and I should acquire aio_context in cb code?
Then, what means the first parameter of aio_bh_schedule_oneshot(), and is text
in docs/devel/multiple-iothreads.txt wrong, which sais:
Side note: the best way to schedule a function call across threads is to call
aio_bh_schedule_oneshot(). No acquire/release or locking is needed.
>
>> Anyway, after 2e1990b26e5 mirror_read/write_complete are not
>> callbacks, so, even if b9e413dd3 was OK, it doesn't relate to this
>> place now.
>
> I guess in this specific instance we can just remove the locking from
> mirror_read/write_complete becasue it's only locking an AioContext whose
> lock we already hold anyway because we're in a Coroutine owned by the
> AioContext, so yielding will release the lock and avoid the deadlock.
>
> But more generally, does this mean that it is never correct to have an
> explicit aio_context_acquire/release() pair in coroutine context, at
> least for all practical cases where you call things that could yield?
> And if you're inside a coroutine and want to access something in a
> different AioContext, you need to drop out of the coroutine first?
>
> If so, should aio_context_acquire() then assert that it's not called
> from coroutine context?
>
> Paolo, will this restriction of coroutines be solved for good when you
> get your patches merged that remove the AioContext lock?
>
> Kevin
>
--
Best regards,
Vladimir