qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Bug?] BQL about live migration


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [Bug?] BQL about live migration
Date: Fri, 3 Mar 2017 15:03:42 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

* Paolo Bonzini (address@hidden) wrote:
> 
> 
> On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
> > * Paolo Bonzini (address@hidden) wrote:
> >>
> >>
> >> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
> >>> * Paolo Bonzini (address@hidden) wrote:
> >>>>
> >>>>
> >>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
> >>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago 
> >>>>> that
> >>>>> their were times when run_on_cpu would have to drop the BQL and I 
> >>>>> worried about it,
> >>>>> but this is the 1st time I've seen an error due to it.
> >>>>>
> >>>>> Do you know what the migration state was at that point? Was it 
> >>>>> MIGRATION_STATUS_CANCELLING?
> >>>>> I'm thinking perhaps we should stop 'cont' from continuing while 
> >>>>> migration is in
> >>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED 
> >>>>> - so that
> >>>>> perhaps libvirt could avoid sending the 'cont' until then?
> >>>>
> >>>> No, there's no event, though I thought libvirt would poll until
> >>>> "query-migrate" returns the cancelled state.  Of course that is a small
> >>>> consolation, because a segfault is unacceptable.
> >>>
> >>> I think you might get an event if you set the new migrate capability 
> >>> called
> >>> 'events' on!
> >>>
> >>> void migrate_set_state(int *state, int old_state, int new_state)
> >>> {
> >>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
> >>>         trace_migrate_set_state(new_state);
> >>>         migrate_generate_event(new_state);
> >>>     }
> >>> }
> >>>
> >>> static void migrate_generate_event(int new_state)
> >>> {
> >>>     if (migrate_use_events()) {
> >>>         qapi_event_send_migration(new_state, &error_abort); 
> >>>     }
> >>> }
> >>>
> >>> That event feature went in sometime after 2.3.0.
> >>>
> >>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
> >>>> resume it (with add_migration_state_change_notifier) when we hit the
> >>>> CANCELLED state.  I'm not sure what the latency would be between the end
> >>>> of migrate_fd_cancel and finally reaching CANCELLED.
> >>>
> >>> I don't like suspending monitors; it can potentially take quite a 
> >>> significant
> >>> time to do a cancel.
> >>> How about making 'cont' fail if we're in CANCELLING?
> >>
> >> Actually I thought that would be the case already (in fact CANCELLING is
> >> internal only; the outside world sees it as "active" in query-migrate).
> >>
> >> Lei, what is the runstate?  (That is, why did cont succeed at all)?
> > 
> > I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the 
> > device
> > save, and that's what we get at the end of a migrate and it's legal to 
> > restart
> > from there.
> 
> Yeah, but I think we get there at the end of a failed migrate only.  So
> perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid
> "cont" from finish-migrate (only allow it from failed-migrate)?

OK, I was wrong in my previous statement; we actually go 
FINISH_MIGRATE->POSTMIGRATE
so no new state is needed; you shouldn't be restarting the cpu in 
FINISH_MIGRATE.

My preference is to get libvirt to wait for the transition to POSTMIGRATE before
it issues the 'cont'.  I'd rather not block the monitor with 'cont' but I'm
not sure how we'd cleanly make cont fail without breaking existing libvirts
that usually don't hit this race. (cc'ing in Jiri).

Dave

> Paolo
> 
> >> Paolo
> >>
> >>> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
> >>> we really need all of the rest of the devices to stay quiesced at times.
> >>
> >> That's not really possible, because of how condition variables work. :(
> > 
> > *Really* we need to find a solution to that - there's probably lots of 
> > other things that can spring up in that small window other than the
> > 'cont'.
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]