qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME


From: Peter Xu
Subject: Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
Date: Fri, 4 Aug 2017 17:22:36 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Fri, Aug 04, 2017 at 09:30:01AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> > > 
> > > [...]
> > > 
> > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > > +{
> > > > > +    /*
> > > > > +     * This means source VM is ready to resume the postcopy 
> > > > > migration.
> > > > > +     * It's time to switch state and release the fault thread to
> > > > > +     * continue service page faults.
> > > > > +     */
> > > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > > > 
> > > > Is it worth sanity checking that you were in RECOVER at this point?
> > > 
> > > Yeah, it never hurts.  Will do.
> > 
> > Not sure whether this would be good (note: I returned 0 in the if):
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index b7843c2..b34f59b 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1709,6 +1709,12 @@ static int 
> > loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> >  
> >  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> >  {
> > +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> > +        error_report("%s: illegal resume received", __func__);
> > +        /* Don't fail the load, only for this. */
> > +        return 0;
> > +    }
> > +
> >      /*
> >       * This means source VM is ready to resume the postcopy migration.
> >       * It's time to switch state and release the fault thread to
> > 
> > Basically I just don't want to crash the dest VM (it holds hot dirty
> > pages) even if it receives a faulty RESUME command.
> 
> Yes, so now that's a fun problem; effectively you then have 3 valid
> failure modes:
>     a) An IO failure so we need to go into POSTCOPY_PAUSE
>     b) A fatal migration stream problem to quit
>     c) A non-fatal migration stream problem to go .. back into PAUSE?

Hmm yes...

So I got at least three TODO ITEMs now:

- support manual switch source into PAUSED state
- support migrate_cancel during PAUSED/RECOVER state
- when anything wrong happens during PAUSED/RECOVER, switching back to
  PAUSED state on both sides

It just depends on whether we would like to postpone these work, or we
think any of them are essential even for the first version.

IMHO we can postpone this 3rd one as well.

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]