qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race
Date: Tue, 28 Jul 2015 11:34:18 +0100

On Tue, Jul 28, 2015 at 11:31 AM, Stefan Hajnoczi <address@hidden> wrote:
> On Tue, Jul 28, 2015 at 11:26 AM, Cornelia Huck
> <address@hidden> wrote:
>> On Tue, 28 Jul 2015 09:34:46 +0100
>> Stefan Hajnoczi <address@hidden> wrote:
>>
>>> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
>>> > On Tue, 28 Jul 2015 09:07:00 +0200
>>> > Cornelia Huck <address@hidden> wrote:
>>> >
>>> > > On Mon, 27 Jul 2015 17:33:37 +0100
>>> > > Stefan Hajnoczi <address@hidden> wrote:
>>> > >
>>> > > > See Patch 2 for details on the deadlock after two 
>>> > > > aio_context_acquire() calls
>>> > > > race.  This caused dataplane to hang on startup.
>>> > > >
>>> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
>>> > > >
>>> > > > Stefan Hajnoczi (2):
>>> > > >   AioContext: avoid leaking BHs on cleanup
>>> > > >   AioContext: force event loop iteration using BH
>>> > > >
>>> > > >  async.c             | 29 +++++++++++++++++++++++++++--
>>> > > >  include/block/aio.h |  3 +++
>>> > > >  2 files changed, 30 insertions(+), 2 deletions(-)
>>> > > >
>>> > >
>>> > > Just gave this a try: The stripped-down guest that hangs during startup
>>> > > on master is working fine with these patches applied, and my full setup
>>> > > works as well.
>>> > >
>>> > > So,
>>> > >
>>> > > Tested-by: Cornelia Huck <address@hidden>
>>> >
>>> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
>>> > get
>>> >
>>> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
>>> > Assertion `ctx->first_bh->deleted' failed.
>>>
>>> Please pretty-print ctx->first_bh in gdb.  In particular, which function
>>> is ctx->first_bh->cb pointing to?
>>
>> (gdb) p/x *(QEMUBH *)ctx->first_bh
>> $2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next =
>>     0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0}
>>
>> cb is pointing at spawn_thread_bh_fn.
>>
>>>
>>> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
>>> couldn't trigger the assertion failure.
>>
>> I use the old x-data-plane attribute; if I turn it off, I don't hit the
>> assertion.
>
> Thanks.  I understand how to reproduce it now: use -drive aio=threads
> and do I/O during managedsave.
>
> I suspect there are more cases of this.  We need to clean it up during QEMU 
> 2.5.
>
> For now let's continue leaking these BHs as we've always done.

Actually, this case can be fixed in the patch by moving
thread_pool_free() before the BH cleanup loop.

But I still fear other parts of QEMU may be leaking BHs and we should
use a full release cycle to weed them out before introducing the
assertion.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]