qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migrati


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Date: Mon, 25 Jul 2011 23:10:57 +0200

On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini <address@hidden> wrote:
> With the current migration format, VMS_STRUCTs with subsections
> are ambiguous.  The protocol cannot tell whether a 0x5 byte after
> the VMS_STRUCT is a subsection or part of the parent data stream.
> In the past QEMU assumed it was always a part of a subsection; after
> commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
> 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
> defined.
>
> Unfortunately, this means that if a destination has no subsections
> defined for the struct, it will happily read subsection data into
> its own fields.  And if you are "lucky" enough to stumble on a
> zero byte at the right time, it will be interpreted as QEMU_VM_EOF
> and migration will be interrupted with half-loaded state.
>
> There is no way out of this except defining an incompatible
> migration protocol.  Not-so-long-term we should really try to define
> one that is not a joke, but the bug is serious so we need a solution
> for 0.15.  A sentinel at the end of embedded structs does remove the
> ambiguity.
>
> Of course, this can be restricted to new machine models, and this
> is what the patch series does.  (And note that only patch 3 is specific
> to the short-term solution, everything else is entirely generic).
>
> Untested beyond compilation.

I have now tested this series (exactly as sent) both by examining
manually the differences between the two formats on the same guest
state, and by a mix of saves/restores (new on new, 0.14 on new
pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
always does what is expected.

Michael Tsirkin objected that the format should be passed as a
parameter in the migrate command.  I kind of agree, however since this
is a real bug you would need to bump the default for new machine
types, and this default would still go in the QEMUMachine struct like
I am doing.  So I consider the two settings to be orthogonal.  Also,
the alternative requires changes to the whole management stack and if
the default is not changed it imposes a broken format unless you
update the management tools.  Clearly much less bang for the buck.

I think this is ready to go into 0.15.  The bug happens when migrating
to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
floppy.  The media changed subsection is almost always included, and
this causes problems when migrating to 0.14 which didn't have any
subsection for the floppy device.  While QEMU support for migration to
old version admittedly depends on luck, this isn't true of certain
downstreams :) which would like to have an unambiguous migration
format.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]