qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires


From: Zhoujian (jay)
Subject: Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires
Date: Thu, 16 Aug 2018 07:22:03 +0000

> -----Original Message-----
> From: Paolo Bonzini [mailto:address@hidden
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) <address@hidden>; Dr. David Alan Gilbert
> <address@hidden>
> Cc: address@hidden; address@hidden; wangxin (U)
> <address@hidden>
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:address@hidden
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert <address@hidden>
> >> Cc: Zhoujian (jay) <address@hidden>; address@hidden;
> >> address@hidden; wangxin (U) <address@hidden>
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >          |
> >          |  <-----  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >          |          but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

After a second thought, I think it should keep the reset request process
in main_loop_should_exit(), since pause_all_vcpus(or do_vm_stop) is not in
a loop, it can't detect all the reset requests immediately.
If processing the reset request both in main_loop_should_exit() and do_vm_stop
or pause_all_vcpus will lead to a race of referencing to the global variable
'reset_requested'.
Could we add the check !runstate_check(RUN_STATE_FINISH_MIGRATE) before setting
to RUN_STATE_PRELAUNCH, just like !runstate_check(RUN_STATE_RUNNING) and
!runstate_check(RUN_STATE_INMIGRATE) did? But I'm not sure whether this will
cause any side effect.

Regards,
Jay Zhou

> 
> Paolo
> 
> >          |  <-----  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo


reply via email to

[Prev in Thread] Current Thread [Next in Thread]