qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race


From: Stefan Hajnoczi
Subject: Re: [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
Date: Tue, 6 Mar 2018 16:18:12 +0000
User-agent: Mutt/1.9.2 (2017-12-15)

On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote:
> On 2018-03-05 16:59, Stefan Hajnoczi wrote:
> > There is a race between the test's 'query-migrate' QMP command after the
> > QMP 'STOP' event and completing the migration:
> > 
> > The test case invokes 'query-migrate' upon receiving 'STOP'.  At this
> > point the migration thread may still be in the process of completing.
> > Therefore 'query-migrate' can return 'status': 'active' for a brief
> > window of time instead of 'status': 'completed'.  This results in
> > qemu-iotests 203 hanging.
> > 
> > Solve the race by enabling the 'events' migration capability, which
> > causes QEMU to emit migration-specific QMP events that do not suffer
> > from this race condition.  Wait for the QMP 'MIGRATION' event with
> > 'status': 'completed'.
> > 
> > Reported-by: Max Reitz <address@hidden>
> > Signed-off-by: Stefan Hajnoczi <address@hidden>
> > ---
> >  tests/qemu-iotests/203     | 15 +++++++++++----
> >  tests/qemu-iotests/203.out |  5 +++++
> >  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> So much for "the ppoll() dungeon"...

It was still a pain to debug :).

I put a ring buffer into the QMP monitor input/output code.  Then it was
possible to figure out the issue via GDB on a hung QEMU:

  (gdb) p current_run_state
  RUN_STATE_POSTMIGRATE
  (gdb) p current_migration->status
  MIGRATION_STATUS_COMPLETED
  (gdb) p monitor_out_ring
  ...'STOP' event...
  (gdb) p monitor_in_ring
  ...query-migrate...  <-- okay, the test checked if migration finished

Then looking at the code:

  static void migration_completion(MigrationState *s)
  {
      ...
      if (s->state == MIGRATION_STATUS_ACTIVE) {
          qemu_mutex_lock_iothread();
          s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
          qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
          s->vm_was_running = runstate_is_running();
          ret = global_state_store();

          if (!ret) {
              bool inactivate = !migrate_colo_enabled();

                v---- The stop event comes from here
              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
              ...
          }
          qemu_mutex_unlock_iothread(); <--- oh, no!
      ...
      if (!migrate_colo_enabled()) {
          migrate_set_state(&s->state, current_active_state,
                            MIGRATION_STATUS_COMPLETED); <-- too late!
      }

      return;

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]