qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 15/28] migration: create new section to store glo


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PULL 15/28] migration: create new section to store global state
Date: Wed, 8 Jul 2015 11:43:27 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Christian Borntraeger (address@hidden) wrote:
> Am 08.07.2015 um 12:14 schrieb Dr. David Alan Gilbert:
> > * Christian Borntraeger (address@hidden) wrote:
> >> Am 07.07.2015 um 15:08 schrieb Juan Quintela:
> >>> This includes a new section that for now just stores the current qemu 
> >>> state.
> >>>
> >>> Right now, there are only one way to control what is the state of the
> >>> target after migration.
> >>>
> >>> - If you run the target qemu with -S, it would start stopped.
> >>> - If you run the target qemu without -S, it would run just after 
> >>> migration finishes.
> >>>
> >>> The problem here is what happens if we start the target without -S and
> >>> there happens one error during migration that puts current state as
> >>> -EIO.  Migration would ends (notice that the error happend doing block
> >>> IO, network IO, i.e. nothing related with migration), and when
> >>> migration finish, we would just "continue" running on destination,
> >>> probably hanging the guest/corruption data, whatever.
> >>>
> >>> Signed-off-by: Juan Quintela <address@hidden>
> >>> Reviewed-by: Dr. David Alan Gilbert <address@hidden>
> >>
> >> This is bisected to cause a regression on s390.
> >>
> >> A guest restarts (booting) after managedsave/start instead of continuing.
> >>
> >> Do you have any idea what might be wrong?
> > 
> > I'd add some debug to the pre_save and post_load to see what state value is
> > being saved/restored.
> > 
> > Also, does that regression happen when doing the save/restore using the 
> > same/latest
> > git, or is it a load from an older version?
> 
> Seems to happen only with some guest definitions, but I cant really pinpoint 
> it yet.
> e.g. removing queues='4' from my network card solved it for a reduced xml, but
> doing the same on a bigger xml was not enough :-/

Nasty;  Still the 'paused' value in the pre-save/post-load feels right.
I've read through the patch again and it still fells right to me, so I don't
see anything obvious.

Perhaps it's worth turning on the migration tracing on both sides and seeing 
what's
different with that 'queues=4' ?

Dave

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]