qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion proces


From: Roman Penyaev
Subject: Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
Date: Tue, 27 Sep 2016 19:55:06 +0200

On Tue, Sep 27, 2016 at 5:25 PM, Stefan Hajnoczi <address@hidden> wrote:
> On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote:
>> On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <address@hidden> wrote:
>> > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
>> > completions from ioq_submit()") added an optimization that processes
>> > completions each time ioq_submit() returns with requests in flight.
>> > This commit introduces a "Co-routine re-entered recursively" error which
>> > can be triggered with -drive format=qcow2,aio=native.
>> >
>> > Fam Zheng <address@hidden>, Kevin Wolf <address@hidden>, and I
>> > debugged the following backtrace:
>> >
>> >   (gdb) bt
>> >   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
>> >   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
>> >   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at 
>> > util/qemu-coroutine.c:113
>> >   #3  0x0000555555a4b663 in qemu_laio_process_completions (address@hidden) 
>> > at block/linux-aio.c:218
>> >   #4  0x0000555555a4b874 in ioq_submit (address@hidden) at 
>> > block/linux-aio.c:331
>> >   #5  0x0000555555a4ba12 in laio_do_submit (address@hidden, 
>> > address@hidden, address@hidden, address@hidden) at block/linux-aio.c:383
>> >   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, 
>> > s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) 
>> > at block/linux-aio.c:402
>> >   #7  0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden, 
>> > address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
>> >   #8  0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden, 
>> > address@hidden, address@hidden, address@hidden, address@hidden, 
>> > address@hidden, flags=0) at block/io.c:1041
>> >   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
>> > offset=2932727808, bytes=8192, address@hidden, address@hidden) at 
>> > block/io.c:1133
>> >   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, 
>> > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) 
>> > at block/qcow2.c:1509
>> >   #11 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden, 
>> > address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
>> >   #12 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden, 
>> > address@hidden, address@hidden, address@hidden, address@hidden, 
>> > address@hidden, flags=0) at block/io.c:1041
>> >   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
>> > address@hidden, address@hidden, address@hidden, address@hidden) at 
>> > block/io.c:1133
>> >   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, 
>> > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at 
>> > block/block-backend.c:783
>> >   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at 
>> > block/block-backend.c:991
>> >   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, 
>> > i1=<optimized out>) at util/coroutine-ucontext.c:78
>> >
>> > It turned out that re-entrant ioq_submit() and completion processing
>> > between three requests caused this error.  The following check is not
>> > sufficient to prevent recursively entering coroutines:
>> >
>> >   if (laiocb->co != qemu_coroutine_self()) {
>> >       qemu_coroutine_enter(laiocb->co);
>> >   }
>> >
>> > As the following coroutine backtrace shows, not just the current
>> > coroutine (self) can be entered.  There might also be other coroutines
>> > that are currently entered and transferred control due to the qcow2 lock
>> > (CoMutex):
>>
>> I doubt that that was introduced by the commit you've specified:
>> 0ed93d84edab.
>>
>> Before my patch coroutine was unconditionally entered.  The following
>> is what was changed by 0ed93d84edab:
>>
>>      if (laiocb->co) {
>> -        qemu_coroutine_enter(laiocb->co);
>> +        /* Jump and continue completion for foreign requests, don't do
>> +         * anything for current request, it will be completed shortly. */
>> +        if (laiocb->co != qemu_coroutine_self()) {
>> +            qemu_coroutine_enter(laiocb->co);
>> +        }
>
> Unconditionally entering was safe prior to 0ed93d84edab since all
> coroutines yielded and qemu_coroutine_entered() would be false all the
> time.  Therefore it wasn't necessary to protect against re-entering a
> coroutine.
>
>> If you have a strong reproduction, could you please verify that.
>
> The bug is 100% deterministic.  Just boot up a guest with -drive
> format=qcow2,aio=native.

It turns out to be that everything is broken.  I started all my
tests with format=raw,aio=native and immediately got coroutine
recursive.  That is completely weird.

So, what I did is the following:

1. Took latest master (nothing works)
2. Did interactive rebase to 12c8720
12c8720 2016-06-28 | Merge remote-tracking branch
'remotes/stefanha/tags/block-pull-request' into staging [Peter
Maydell]

this merge request includes all your patches related to
virtio-blk and MQ support.

3. Applied 0ed93d84edab. Everything works fine.

4. Rebased up till 0647d47:
0647d47 2016-09-13 | qcow2: avoid memcpy(dst, NULL, len) [Stefan Hajnoczi]

this is the point, after which 0ed93d84edab was applied
on master.

Got recursive coroutine, so nothing works.

5. Did a besect, which shows this commit:

--
commit 1a62d0accdf85fbeac149018ee8d1728e754de73
Author: Eric Blake <address@hidden>
Date:   Fri Jul 15 12:31:59 2016 -0600

    block: Fragment reads to max transfer length
--

So after this commit my commit 0ed93d84edab stops working.
And now for me is completely not clear what is happening there.

--
Roman



reply via email to

[Prev in Thread] Current Thread [Next in Thread]