qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_n


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear
Date: Mon, 20 Jul 2015 07:32:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1


On 20/07/2015 05:55, Fam Zheng wrote:
> On Sat, 07/18 22:21, Paolo Bonzini wrote:
>> event_notifier_test_and_clear must be called before processing events.
>> Otherwise, an aio_poll could "eat" the notification before the main
>> I/O thread invokes ppoll().  The main I/O thread then never wakes up.
>> This is an example of what could happen:
>>
>>    i/o thread         vcpu thread                   worker thread
>>    ---------------------------------------------------------------------
>>    lock_iothread
>>    notify_me = 1
>>    ...
>>    unlock_iothread
>>                       lock_iothread
>>                       notify_me = 3
>>                       ppoll
>>                       notify_me = 1
>>                                                      bh->scheduled = 1
>>                                                      event_notifier_set
>>                       event_notifier_test_and_clear
>>    ppoll
>>    *** hang ***
> 
> It still seems possible for event_notifier_test_and_clear to happen before 
> main
> thread ppoll, will that be a problem? How does main thread get waken up in 
> that
> case?

You could have this:


   i/o thread         vcpu thread                   worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                      lock_iothread
                      notify_me = 3
                      ppoll
                      notify_me = 1
                                                    bh->scheduled = 1
                                                    event_notifier_set
                      event_notifier_test_and_clear
   ppoll

But then the bottom half *will* be processed by the VCPU thread.  It's a
similar "rule" to the one that you use with lock-free algorithms: the
consumer, 99% of the time, uses the variables in the opposite order of
the producer.

Here the producer calls event_notifier_set after bh->scheduled (of
course...); the consumer *must* clear the EventNotifier before reading
bh->scheduled.

Paolo


> 
>>
>> In contrast with the previous bug, there wasn't really much debugging
>> to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
>> same behavior as in the previous patch, so the only wait to find the
>> bug is staring at the code.
>>
>> One could also use a formal model, of course.  The included one shows
>> this with three processes: notifier corresponds to a QEMU thread pool
>> worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
>> waiter to the main I/O thread.  I would be happy to say that the
>> formal model found the bug for me, but actually I wrote it after the
>> fact.
>>
>> This patch is a bit of a big hammer.  The next one optimizes it,
>> with help (this time for real rather than a posteriori :)) from
>> another, similar formal model.
>>
>> Reported-by: Richard W. M. Jones <address@hidden>
>> Signed-off-by: Paolo Bonzini <address@hidden>
>> ---
>>  aio-posix.c                 |   2 +
>>  aio-win32.c                 |   2 +
>>  async.c                     |   8 ++-
>>  docs/aio_notify_bug.promela | 140 
>> ++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 151 insertions(+), 1 deletion(-)
>>  create mode 100644 docs/aio_notify_bug.promela
>>
>> diff --git a/aio-posix.c b/aio-posix.c
>> index 249889f..5c8b266 100644
>> --- a/aio-posix.c
>> +++ b/aio-posix.c
>> @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>>          aio_context_acquire(ctx);
>>      }
>>  
>> +    event_notifier_test_and_clear(&ctx->notifier);
>> +
>>      /* if we have any readable fds, dispatch event */
>>      if (ret > 0) {
>>          for (i = 0; i < npfd; i++) {
>> diff --git a/aio-win32.c b/aio-win32.c
>> index ea655b0..3e0db20 100644
>> --- a/aio-win32.c
>> +++ b/aio-win32.c
>> @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>>              aio_context_acquire(ctx);
>>          }
>>  
>> +        event_notifier_test_and_clear(&ctx->notifier);
>> +
>>          if (first && aio_bh_poll(ctx)) {
>>              progress = true;
>>          }
>> diff --git a/async.c b/async.c
>> index a232192..d625e8a 100644
>> --- a/async.c
>> +++ b/async.c
>> @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source)
>>      QEMUBH *bh;
>>  
>>      atomic_and(&ctx->notify_me, ~1);
>> +    event_notifier_test_and_clear(&ctx->notifier);
>> +
>>      for (bh = ctx->first_bh; bh; bh = bh->next) {
>>          if (!bh->deleted && bh->scheduled) {
>>              return true;
>> @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque)
>>      aio_notify(opaque);
>>  }
>>  
>> +static void event_notifier_dummy_cb(EventNotifier *e)
>> +{
>> +}
>> +
>>  AioContext *aio_context_new(Error **errp)
>>  {
>>      int ret;
>> @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp)
>>      g_source_set_can_recurse(&ctx->source, true);
>>      aio_set_event_notifier(ctx, &ctx->notifier,
>>                             (EventNotifierHandler *)
>> -                           event_notifier_test_and_clear);
>> +                           event_notifier_dummy_cb);
>>      ctx->thread_pool = NULL;
>>      qemu_mutex_init(&ctx->bh_lock);
>>      rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
>> diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela
>> new file mode 100644
>> index 0000000..b3bfca1
>> --- /dev/null
>> +++ b/docs/aio_notify_bug.promela
>> @@ -0,0 +1,140 @@
>> +/*
>> + * This model describes a bug in aio_notify.  If ctx->notifier is
>> + * cleared too late, a wakeup could be lost.
>> + *
>> + * Author: Paolo Bonzini <address@hidden>
>> + *
>> + * This file is in the public domain.  If you really want a license,
>> + * the WTFPL will do.
>> + *
>> + * To verify the buggy version:
>> + *     spin -a -DBUG docs/aio_notify_bug.promela
>> + *     gcc -O2 pan.c
>> + *     ./a.out -a -f
>> + *
>> + * To verify the working version:
>> + *     spin -a docs/aio_notify_bug.promela
>> + *     gcc -O2 pan.c
>> + *     ./a.out -a -f
>> + *
>> + * Add -DCHECK_REQ to test an alternative invariant and the
>> + * "notify_me" optimization.
>> + */
>> +
>> +int notify_me;
>> +bool event;
>> +bool req;
>> +bool notifier_done;
>> +
>> +#ifdef CHECK_REQ
>> +#define USE_NOTIFY_ME 1
>> +#else
>> +#define USE_NOTIFY_ME 0
>> +#endif
>> +
>> +active proctype notifier()
>> +{
>> +    do
>> +        :: true -> {
>> +            req = 1;
>> +            if
>> +               :: !USE_NOTIFY_ME || notify_me -> event = 1;
>> +               :: else -> skip;
>> +            fi
>> +        }
>> +        :: true -> break;
>> +    od;
>> +    notifier_done = 1;
>> +}
>> +
>> +#ifdef BUG
>> +#define AIO_POLL                                                    \
>> +    notify_me++;                                                    \
>> +    if                                                              \
>> +        :: !req -> {                                                \
>> +            if                                                      \
>> +                :: event -> skip;                                   \
>> +            fi;                                                     \
>> +        }                                                           \
>> +        :: else -> skip;                                            \
>> +    fi;                                                             \
>> +    notify_me--;                                                    \
>> +                                                                    \
>> +    req = 0;                                                        \
>> +    event = 0;
>> +#else
>> +#define AIO_POLL                                                    \
>> +    notify_me++;                                                    \
>> +    if                                                              \
>> +        :: !req -> {                                                \
>> +            if                                                      \
>> +                :: event -> skip;                                   \
>> +            fi;                                                     \
>> +        }                                                           \
>> +        :: else -> skip;                                            \
>> +    fi;                                                             \
>> +    notify_me--;                                                    \
>> +                                                                    \
>> +    event = 0;                                                      \
>> +    req = 0;
>> +#endif
>> +
>> +active proctype waiter()
>> +{
>> +    do
>> +       :: true -> AIO_POLL;
>> +    od;
>> +}
>> +
>> +/* Same as waiter(), but disappears after a while.  */
>> +active proctype temporary_waiter()
>> +{
>> +    do
>> +       :: true -> AIO_POLL;
>> +       :: true -> break;
>> +    od;
>> +}
>> +
>> +#ifdef CHECK_REQ
>> +never {
>> +    do
>> +        :: req -> goto accept_if_req_not_eventually_false;
>> +        :: true -> skip;
>> +    od;
>> +
>> +accept_if_req_not_eventually_false:
>> +    if
>> +        :: req -> goto accept_if_req_not_eventually_false;
>> +    fi;
>> +    assert(0);
>> +}
>> +
>> +#else
>> +/* There must be infinitely many transitions of event as long
>> + * as the notifier does not exit.
>> + *
>> + * If event stayed always true, the waiters would be busy looping.
>> + * If event stayed always false, the waiters would be sleeping
>> + * forever.
>> + */
>> +never {
>> +    do
>> +        :: !event    -> goto accept_if_event_not_eventually_true;
>> +        :: event     -> goto accept_if_event_not_eventually_false;
>> +        :: true      -> skip;
>> +    od;
>> +
>> +accept_if_event_not_eventually_true:
>> +    if
>> +        :: !event && notifier_done  -> do :: true -> skip; od;
>> +        :: !event && !notifier_done -> goto 
>> accept_if_event_not_eventually_true;
>> +    fi;
>> +    assert(0);
>> +
>> +accept_if_event_not_eventually_false:
>> +    if
>> +        :: event     -> goto accept_if_event_not_eventually_false;
>> +    fi;
>> +    assert(0);
>> +}
>> +#endif
>> -- 
>> 2.4.3
>>
>>
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]