[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 0/3] Migration cancel with dead network
From: |
Daniel P. Berrange |
Subject: |
Re: [Qemu-devel] [PATCH 0/3] Migration cancel with dead network |
Date: |
Thu, 8 Jan 2015 11:33:40 +0000 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Thu, Jan 08, 2015 at 11:29:59AM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (address@hidden) wrote:
> > On Thu, Jan 08, 2015 at 11:11:29AM +0000, Dr. David Alan Gilbert (git)
> > wrote:
> > > From: "Dr. David Alan Gilbert" <address@hidden>
> > >
> > > If the remote host, or networking dies during a migration, the socket can
> > > be
> > > waiting for a long timeout, and migration_cancel can't complete the cancel
> > > for a long time (and you can't start a new one to somewhere else).
> > > (Where 'long' is the TCP timeout, that's ~15 mins)
> > >
> > > This patch set uses the shutdown(2) syscall to unblock any write/sends
> > > that
> > > are in progress to let the migrate_cancel happen quickly.
> > >
> > > 1/3: socket shutdown - An updated patch from my postcopy world to
> > > add a shut_down method on QEMUFile - only
> > > for 'socket' (where the syscall is supported).
> > >
> > > 2/3: Handle bi-directional communication for fd migration
> > > - A patch from Cristian Klein to use the socket
> > > QEMUFile for FDs that are passed in, if the FDs
> > > are sockets; this is needed so that libvirt
> > > migrations can take advantage of the other
> > > patches.
> > > Again this patch (and its naming) come from the
> > > postcopy world.
> > >
> > > 3/3: migration_cancel: shutdown migration socket
> > > - A new patch that uses the shutdown in
> > > migrate_fd_cancel
> > >
> > >
> > > Note this does not fix the timeout if you try to migrate to an already
> > > dead host;
> > > the connect timeout is typically a much shorter 2 minutes anyway.
> >
> > In any libvirt managed setup, you'd need to address the connect timeout
> > issue in libvirt instead, since libvirt always uses fd based migration.
> > ie libvirt estabishes the connection & then passes the TCP Socket to
> > QEMU.
> >
> > It should be possible for libvirt to use a non-blocking connect()
> > call and catch use of its virDomainMigrateCancel APi to stop the
> > connection attempt.
>
> Yes, although as I say I've not fixed the (must shorter) connect side timeout
> on the QEMU side either.
>
> How does libvirt behave if you cancel a tunnelled migration with the network
> dieing in the middle?
I think its just reliant on the TCP timeout - we don't have any of the
clever use of 'shutdown' that you're adding here. I guess it could make
sense for libvirt to use shutdown.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
- [Qemu-devel] [PATCH 0/3] Migration cancel with dead network, Dr. David Alan Gilbert (git), 2015/01/08
- [Qemu-devel] [PATCH 2/3] Handle bi-directional communication for fd migration, Dr. David Alan Gilbert (git), 2015/01/08
- [Qemu-devel] [PATCH 1/3] socket shutdown, Dr. David Alan Gilbert (git), 2015/01/08
- [Qemu-devel] [PATCH 3/3] migration_cancel: shutdown migration socket, Dr. David Alan Gilbert (git), 2015/01/08
- Re: [Qemu-devel] [PATCH 0/3] Migration cancel with dead network, Daniel P. Berrange, 2015/01/08
- Re: [Qemu-devel] [PATCH 0/3] Migration cancel with dead network, Paolo Bonzini, 2015/01/08
- Re: [Qemu-devel] [PATCH 0/3] Migration cancel with dead network, Amit Shah, 2015/01/09