qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, al


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me
Date: Fri, 17 Jul 2015 06:44:45 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1


On 16/07/2015 21:05, Richard W.M. Jones wrote:
> 
> Sorry to spoil things, but I'm still seeing this bug, although it is
> now a lot less frequent with your patch.  I would estimate it happens
> more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs
> with qemu.git + the v2 patch series.
> 
> It's the exact same hang in both cases.
> 
> Is it possible that this patch doesn't completely close any race?
> 
> Still, it is an improvement, so there is that.

Would seem at first glance like a different bug.

Interestingly, adding some "tracing" (qemu_clock_get_ns) makes the bug
more likely: now it reproduces in about 10 tries.  Of course :) adding
other kinds of tracing instead make it go away again (>50 tries).

Perhaps this:

   i/o thread         vcpu thread                   worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                      lock_iothread
                      notify_me = 3
                      ppoll
                      notify_me = 1
                                                     bh->scheduled = 1
                                                     event_notifier_set
                      event_notifier_test_and_clear
   ppoll
    ^^ hang
        
In the exact shape above, it doesn't seem too likely to happen, but
perhaps there's another simpler case.  Still, the bug exists.

The above is not really related to notify_me.  Here the notification is
not being optimized away!  So I wonder if this one has been there forever.

Fam suggested putting the event_notifier_test_and_clear before
aio_bh_poll(), but it does not work.  I'll look more close

However, an unconditional event_notifier_test_and_clear is pretty
expensive.  On one hand, obviously correctness comes first.  On the
other hand, an expensive operation at the wrong place can mask the race
very easily; I'll let the fix run for a while, but I'm not sure if a
successful test really says anything useful.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]