Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support

From:	Daniel P . Berrangé
Subject:	Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Date:	Wed, 5 Jun 2019 15:36:05 +0100
User-agent:	Mutt/1.11.4 (2019-03-13)

On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert 
> > > > > > wrote:
> > > > > > > * Jens Freimann (address@hidden) wrote:
> > > Why is it bad to fully re-create the device in case of a failed migration?
> > 
> > Bad or not, I thought the whole point of doing it inside QEMU was
> > to do something libvirt wouldn't be able to do (namely,
> > unplugging the device while not freeing resources).  If we are
> > doing something that management software is already capable of
> > doing, what's the point?
> 
> Event though management software seems to be capable of it, a failover
> implementation has never happened. As Michael says network failover is
> a mechanism (there's no good reason not to use a PT device if it is
> available), not a policy. We are now trying to implement it in a
> simple way, contained within QEMU.
> 
> > Quoting a previous message from this thread:
> > 
> > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > | > >  This patch series is very
> > | > > odd precisely because it's trying to do the unplug itself in the
> > | > > migration phase rather than let the management layer do it - so unless
> > | > > it's nailed down how to make sure that's really really bullet proof
> > | > > then we've got to go back and ask the question about whether we should
> > | > > really fix it so it can be done by the management layer.
> > | > >
> > | > > Dave
> > | >
> > | > management already said they can't because files get closed and
> > | > resources freed on unplug and so they might not be able to re-add device
> > | > on migration failure. We do it in migration because that is
> > | > where failures can happen and we can recover.
> 
> This is something that I can work on as well, but it doesn't have to
> be part of this patch set in my opinion. Let's say migration fails and we 
> can't
> re-plug the primary device. We can still use the standby (virtio-net)
> device which would only mean slower networking. How likely is it that
> the primary device is grabbed by another VM between unplugging and
> migration failure anyway?

The case of another VM taking the primary device is *not* a problem for
libvirt. We keep track of which device is allocated for use by which
guest, so even if its not currently plugged into the guest, we won't
give it away to a second guest.

The failure scenario is the edge cases where replugging the device fails
for some reason more outside libvirt's control. Running out of file
descriptors, memory allocation failure when pinning guest RAM. Essentially
any failure path that may arise from "device-add vfio..."

In such a case the device won't get replugged. So the mgmt app will think
the migration was rolled back, but the rollback won't be complete as the
original device will be missing.

I guess the question is whether that's really something to worry about ?

Can we justifiably just leave this as a docs problem give that it would
be very rare failure ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, (continued)
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Dr. David Alan Gilbert, 2019/06/03
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Daniel P . Berrangé, 2019/06/05

Prev by Date: Re: [Qemu-devel] [PATCH 3/5] tricore: fix RRPW_INSERT instruction
Next by Date: [Qemu-devel] [PATCH 0/6] target/mips: Amend and clean up MSA support
Previous by thread: Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Next by thread: Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
Index(es):
- Date
- Thread