qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate


From: Alex Williamson
Subject: [Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
Date: Mon, 13 Dec 2010 10:43:22 -0700

On Mon, 2010-12-13 at 02:55 +0530, Juan Quintela wrote:
> Alex Williamson <address@hidden> wrote:
> > On Sun, 2010-12-12 at 20:07 +0530, Juan Quintela wrote:
> >> "Michael S. Tsirkin" <address@hidden> wrote:
> >> > On Sun, Dec 12, 2010 at 05:23:39PM +0530, Juan Quintela wrote:
> >> >> "Michael S. Tsirkin" <address@hidden> wrote:
> >> >> > On Thu, Dec 09, 2010 at 03:14:17PM -0700, Alex Williamson wrote:
> >> 
> >> >> > How about we keep migrating the index for the benefit of
> >> >> > old versions, but ignore the value on load?
> >> >> > Something like the following:
> >> >> 
> >> >> This was my 1st suggestion to Alex O:-)
> >> >
> >> > The difference here is that instead of sending garbage to the
> >> > old version we send an actual index value.
> >> >
> >> >> So, I am in.  he think this is bad for upstream,  I don't think so (but
> >> >> I understand that it is oppinable).
> >> >> 
> >> >> Later, Juan.
> >> >
> >> > I think it makes sense to fix this for the stable branch,
> >> > and I think we should try as hard as we can to avoid bumping up the
> >> > version number there.
> >> >
> >> > For master we can bump the version number but it might be easier to
> >> > just keep the code the same there.
> >> 
> >> I think that your solution is better.  For older versions, it works as
> >> expected.  For new versions, problem is fixed.  Solution is not the
> >> "purest", but you can say the same about uping the version for a state
> >> that is exactly the same length & fields O:-)
> >
> > I disagree, without bumping the version number, we can never guarantee
> > the problem is behind us.
> 
> we can, if we use the latest version.

And we determine we're using the latest version via the vmsd
version_id...

> > We can always migrate to the bad version,
> 
> That is the whole point.  Bumping the version makes this impossible.

Which seems like a good thing to me.  Yes, it sucks that a user may
upgrade a host, migrate a guest to it, and suddenly not be able to
migrate back to the original host.  On the other hand, isn't it better
that we don't allow a migration that could potentially risk the
integrity of the guest?  I think so.

> > which puts our users at risk.  The responsible behavior is to allow
> > forward migrations and prevent migrations to a version with an issue
> > known to compromise VM integrity.  Perhaps I feel more strongly about
> > this because I actually had to debug this problem.  Obvious in
> > retrospect, but a huge pain in the butt to get there.
> 
> Obviously, my point of view is different, and is related with
> maintaining a stable migration ABI. So, ... I am also "biased".
> 
> We have to make a decission (in general, not just this case):
> - we are going to never bump the version:
>   this gives an stable ABI, but bugs stay with us forever

This is impossible.

> - we are not ever going to prettend that we care
>   this makes changes trivial, as we don't have to maintain
>   backward compatiblity.

That's a little dramatic.  If we can come up with a way to not bump the
version number, I'm all for it.  I haven't seen one so far.

> 
> And that is it.  Basically anything in the middle don't matter.  If I
> have a machine definition, with only a single device that has bumped
> version, I can't migrate to the backwards one.

Sorry, it's for your own good.  AIUI, there is plenty of grey between
your criteria above.  Yes we should try to preserve the migration ABI.
However, we will hit bugs where that's impossible.  Then it's good to
have discussions like this and investigate whether we can safely make a
change without bumping the version_id.  IMHO, the integrity of the guest
is always more important than maintaining a static ABI.

> This is the reason why I am against the changes like this, if we are
> prettending that we are going to maintain the versions stable.
> 
> Notice that there are (at least) two ways to look at this specific
> problem:
> - don't bump the version.
>   * new -> new : works
>   * old -> new : works
>   * new -> old : works (at least as well as old -> old that existed
>                         before)

If it worked, I wouldn't be working on this bug ;)  Here are some
failure scenarios:

a)
   1. Boot guest with single rtl8139
   2. Hot add 2nd rtl8139
   3. Migrate guest
   4. Hot remove 2nd rtl8139
   Result: 1st NIC stops working, guest segfaults on reboot

Too complicated?  How about this:

b)
   1. Boot guest with 2 rtl8139 NICs
   2. Boot migration target with NICs listed in reverse order
   3. Migrate
   Result: NICs get swapped at reboot!!

Or how about:

c)
   1. Boot guest with e1000, rtl8139
   2. Boot migration target with rtl8139, e1000
   3. Migrate
   Result: rtl8139 now points at e1000 mmio space, fails on reboot,
e1000 fails if rtl8139 is removed

I don't think it's fair to call any of these working, and in fact, I
retract my patch that sets the mmio space to unassigned if the device is
hotplugged, since issues can clearly happen without hotplug involved.
The index the device uses depends entirely on instantiation ordering,
which is bound to cause confusing, hard to reproduce, and difficult to
debug issues.

> - bump the version
>   * new -> new: works
>   * old -> new: works
>   * new -> old: fails always

Correction here too, new->old is prevented, the migration source will
continue running after the incompatible versions are transmitted and the
migration canceled.

So, unfortunately, I stand by my original patch.  I understand your
desire to avoid bumping the migration ABI, but doing so is the only way
we can guarantee integrity of the guest... and that's what it's there
for.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]