qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcop


From: Peter Xu
Subject: Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
Date: Wed, 11 Oct 2017 11:00:58 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Oct 10, 2017 at 01:30:18PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > We have to be careful about this; a network can fail in a way it
> > > gets stuck rather than fails - this can get stuck until a full TCP
> > > disconnection; and that takes about 30mins (from memory).
> > > The nice thing about using 'shutdown' is that you can kill the existing
> > > connection if it's hung. (Which then makes an interesting question;
> > > the rules in your migrate-incoming command become different if you
> > > want to declare it's failed!).  Having said that, you're right that at
> > > this point stuff has already failed - so do we need the shutdown?
> > > (You might want to do the shutdown as part of the recovery earlier
> > > or as a separate command to force the failure)
> > 
> > I assume if I call shutdown before the lock then we'll be good then.
> 
> The question is what happens if you only allow recovery if we're already
> in postcopy-paused state; in the case of a hung socket, since no IO has
> actually failed yet, you will still be in postcopy-active.

Hmm, but isn't that a problem of kernel rather than QEMU?  Since
sockets are after all managed by kernel.

I don't really know what is the best thing to do to detect whether a
socket is stuck.  Assume we can observed that (say, we see migration
transferred bytes keep static for 30 seconds), IIRC you mentioned
about iptable tricks to break an existing e.g. TCP connection, then we
can trigger the -EIO path.

Or do you think we should provide a way to manually trigger the paused
state?  Then it goes back to something we discussed with Dan in the
earlier post - I'd appreciate if we can postpone the manual trigger
support a bit (to make this series small, which is already not...).

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]