qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_n


From: Fam Zheng
Subject: Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear
Date: Mon, 20 Jul 2015 11:55:35 +0800
User-agent: Mutt/1.5.23 (2014-03-12)

On Sat, 07/18 22:21, Paolo Bonzini wrote:
> event_notifier_test_and_clear must be called before processing events.
> Otherwise, an aio_poll could "eat" the notification before the main
> I/O thread invokes ppoll().  The main I/O thread then never wakes up.
> This is an example of what could happen:
> 
>    i/o thread         vcpu thread                   worker thread
>    ---------------------------------------------------------------------
>    lock_iothread
>    notify_me = 1
>    ...
>    unlock_iothread
>                       lock_iothread
>                       notify_me = 3
>                       ppoll
>                       notify_me = 1
>                                                      bh->scheduled = 1
>                                                      event_notifier_set
>                       event_notifier_test_and_clear
>    ppoll
>    *** hang ***

It still seems possible for event_notifier_test_and_clear to happen before main
thread ppoll, will that be a problem? How does main thread get waken up in that
case?

Fam

> 
> In contrast with the previous bug, there wasn't really much debugging
> to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
> same behavior as in the previous patch, so the only wait to find the
> bug is staring at the code.
> 
> One could also use a formal model, of course.  The included one shows
> this with three processes: notifier corresponds to a QEMU thread pool
> worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
> waiter to the main I/O thread.  I would be happy to say that the
> formal model found the bug for me, but actually I wrote it after the
> fact.
> 
> This patch is a bit of a big hammer.  The next one optimizes it,
> with help (this time for real rather than a posteriori :)) from
> another, similar formal model.
> 
> Reported-by: Richard W. M. Jones <address@hidden>
> Signed-off-by: Paolo Bonzini <address@hidden>
> ---
>  aio-posix.c                 |   2 +
>  aio-win32.c                 |   2 +
>  async.c                     |   8 ++-
>  docs/aio_notify_bug.promela | 140 
> ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 151 insertions(+), 1 deletion(-)
>  create mode 100644 docs/aio_notify_bug.promela
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index 249889f..5c8b266 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>          aio_context_acquire(ctx);
>      }
>  
> +    event_notifier_test_and_clear(&ctx->notifier);
> +
>      /* if we have any readable fds, dispatch event */
>      if (ret > 0) {
>          for (i = 0; i < npfd; i++) {
> diff --git a/aio-win32.c b/aio-win32.c
> index ea655b0..3e0db20 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              aio_context_acquire(ctx);
>          }
>  
> +        event_notifier_test_and_clear(&ctx->notifier);
> +
>          if (first && aio_bh_poll(ctx)) {
>              progress = true;
>          }
> diff --git a/async.c b/async.c
> index a232192..d625e8a 100644
> --- a/async.c
> +++ b/async.c
> @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source)
>      QEMUBH *bh;
>  
>      atomic_and(&ctx->notify_me, ~1);
> +    event_notifier_test_and_clear(&ctx->notifier);
> +
>      for (bh = ctx->first_bh; bh; bh = bh->next) {
>          if (!bh->deleted && bh->scheduled) {
>              return true;
> @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque)
>      aio_notify(opaque);
>  }
>  
> +static void event_notifier_dummy_cb(EventNotifier *e)
> +{
> +}
> +
>  AioContext *aio_context_new(Error **errp)
>  {
>      int ret;
> @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp)
>      g_source_set_can_recurse(&ctx->source, true);
>      aio_set_event_notifier(ctx, &ctx->notifier,
>                             (EventNotifierHandler *)
> -                           event_notifier_test_and_clear);
> +                           event_notifier_dummy_cb);
>      ctx->thread_pool = NULL;
>      qemu_mutex_init(&ctx->bh_lock);
>      rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
> diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela
> new file mode 100644
> index 0000000..b3bfca1
> --- /dev/null
> +++ b/docs/aio_notify_bug.promela
> @@ -0,0 +1,140 @@
> +/*
> + * This model describes a bug in aio_notify.  If ctx->notifier is
> + * cleared too late, a wakeup could be lost.
> + *
> + * Author: Paolo Bonzini <address@hidden>
> + *
> + * This file is in the public domain.  If you really want a license,
> + * the WTFPL will do.
> + *
> + * To verify the buggy version:
> + *     spin -a -DBUG docs/aio_notify_bug.promela
> + *     gcc -O2 pan.c
> + *     ./a.out -a -f
> + *
> + * To verify the working version:
> + *     spin -a docs/aio_notify_bug.promela
> + *     gcc -O2 pan.c
> + *     ./a.out -a -f
> + *
> + * Add -DCHECK_REQ to test an alternative invariant and the
> + * "notify_me" optimization.
> + */
> +
> +int notify_me;
> +bool event;
> +bool req;
> +bool notifier_done;
> +
> +#ifdef CHECK_REQ
> +#define USE_NOTIFY_ME 1
> +#else
> +#define USE_NOTIFY_ME 0
> +#endif
> +
> +active proctype notifier()
> +{
> +    do
> +        :: true -> {
> +            req = 1;
> +            if
> +               :: !USE_NOTIFY_ME || notify_me -> event = 1;
> +               :: else -> skip;
> +            fi
> +        }
> +        :: true -> break;
> +    od;
> +    notifier_done = 1;
> +}
> +
> +#ifdef BUG
> +#define AIO_POLL                                                    \
> +    notify_me++;                                                    \
> +    if                                                              \
> +        :: !req -> {                                                \
> +            if                                                      \
> +                :: event -> skip;                                   \
> +            fi;                                                     \
> +        }                                                           \
> +        :: else -> skip;                                            \
> +    fi;                                                             \
> +    notify_me--;                                                    \
> +                                                                    \
> +    req = 0;                                                        \
> +    event = 0;
> +#else
> +#define AIO_POLL                                                    \
> +    notify_me++;                                                    \
> +    if                                                              \
> +        :: !req -> {                                                \
> +            if                                                      \
> +                :: event -> skip;                                   \
> +            fi;                                                     \
> +        }                                                           \
> +        :: else -> skip;                                            \
> +    fi;                                                             \
> +    notify_me--;                                                    \
> +                                                                    \
> +    event = 0;                                                      \
> +    req = 0;
> +#endif
> +
> +active proctype waiter()
> +{
> +    do
> +       :: true -> AIO_POLL;
> +    od;
> +}
> +
> +/* Same as waiter(), but disappears after a while.  */
> +active proctype temporary_waiter()
> +{
> +    do
> +       :: true -> AIO_POLL;
> +       :: true -> break;
> +    od;
> +}
> +
> +#ifdef CHECK_REQ
> +never {
> +    do
> +        :: req -> goto accept_if_req_not_eventually_false;
> +        :: true -> skip;
> +    od;
> +
> +accept_if_req_not_eventually_false:
> +    if
> +        :: req -> goto accept_if_req_not_eventually_false;
> +    fi;
> +    assert(0);
> +}
> +
> +#else
> +/* There must be infinitely many transitions of event as long
> + * as the notifier does not exit.
> + *
> + * If event stayed always true, the waiters would be busy looping.
> + * If event stayed always false, the waiters would be sleeping
> + * forever.
> + */
> +never {
> +    do
> +        :: !event    -> goto accept_if_event_not_eventually_true;
> +        :: event     -> goto accept_if_event_not_eventually_false;
> +        :: true      -> skip;
> +    od;
> +
> +accept_if_event_not_eventually_true:
> +    if
> +        :: !event && notifier_done  -> do :: true -> skip; od;
> +        :: !event && !notifier_done -> goto 
> accept_if_event_not_eventually_true;
> +    fi;
> +    assert(0);
> +
> +accept_if_event_not_eventually_false:
> +    if
> +        :: event     -> goto accept_if_event_not_eventually_false;
> +    fi;
> +    assert(0);
> +}
> +#endif
> -- 
> 2.4.3
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]