qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Question about odd snapshot behaviour


From: Sam Bobroff
Subject: Re: [Qemu-devel] Question about odd snapshot behaviour
Date: Fri, 23 Oct 2015 10:54:00 +1100
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Oct 19, 2015 at 02:50:17PM +0200, Kevin Wolf wrote:
> [ CC: qemu-block ]
> 
> Am 19.10.2015 um 07:27 hat Sam Bobroff geschrieben:
> > Hi all,
> > 
> > While working through the QEMU code that saves and loads snapshots, I've
> > noticed some confusing behaviour when using named VM snapshots that may 
> > need to
> > be fixed somehow. This is what I've noticed:
> > 
> > * If I create two empty qcow2 filesystems: fs1.qcow2 and fs2.qcow2:
> >     * e.g. qemu-image create -f qcow2 fs1.qcow2 10G
> >     * e.g. qemu-image create -f qcow2 fs2.qcow2 10G
> > * Then boot the guest with the above filesystems as hda and hdb (my guest's 
> > rootfs
> >   is an initramfs, not either of these):
> >     * e.g. qemu-system-ppc64 ... -hda fs1.qcow2 -hdb fs2.qcow2
> > * In the QEMU monitor, create a blockdev snapshot on hd2 called "foo":
> >     e.g. snapshot_blkdev_internal scsi0-hd1 foo
> > * Check fs2 on the host command line:
> >     e.g. qemu-img snapshot -l fs2.qcow2
> >     Snapshot list:
> >     ID        TAG                 VM SIZE                DATE       VM CLOCK
> >     1         foo                       0 2015-10-19 15:56:29   00:00:20.969
> > * In the QEMU monitor, save the vm state as "bar":
> >     e.g. savevm bar
> > * Check the snapshot on fs1:
> >     e.g. info snapshots
> >     ID        TAG                 VM SIZE                DATE       VM CLOCK
> >     1         bar                    215M 2015-10-19 04:57:13   00:01:05.589
> > * Check fs2 again on the host command line:
> >     e.g. qemu-img snapshot -l fs2.qcow2
> >     Snapshot list:
> >     ID        TAG                 VM SIZE                DATE       VM CLOCK
> >     1         foo                       0 2015-10-19 15:56:29   00:00:20.969
> >     2         bar                       0 2015-10-19 15:57:13   00:01:05.589
> > 
> > * Now the fun part: overwrite the bar snapshot on fs1 by ID:
> >     * savevm 1
> > * Examine the results from the monitor:
> >     * e.g. info snapshots
> >     There is no suitable snapshot available
> > * Examine fs1 and fs2 from the command line:
> >     * e.g. qemu-img snapshot -l fs1.qcow2 
> >     Snapshot list:
> >     ID        TAG                 VM SIZE                DATE       VM CLOCK
> >     1         bar                    215M 2015-10-19 16:00:45   00:04:36.065
> >     * e.g. qemu-img snapshot -l fs2.qcow2 
> >     Snapshot list:
> >     ID        TAG                 VM SIZE                DATE       VM CLOCK
> >     2         bar                       0 2015-10-19 15:57:13   00:01:05.589
> >     3         bar                       0 2015-10-19 16:00:45   00:04:36.065
> 
> I must admit that I wouldn't have been able to predict this result. :-)
> 
> Generally, internal snapshots with multiple disks are somewhat tricky
> because we have individual names and IDs in every image file, and each
> file can individually be snapshotted, but need to create a consistent
> view from it.
> 
> > So what seems odd is:
> > 
> > * QEMU has lost the bar snapshot on fs1, although qemu-img still shows
> >   it and it re-appears in QEMU if the guest is started with only
> >   hd1.qcow attached.
> 
> This one I think is a feature: A snapshot can only be loaded if it's
> present on all image files. Loading the snapshot only on some disks and
> leaving the other disks alone is likely to result in an inconsistent
> state.

How about expanding the output from "info snapshots" to include the invalid
ones but with enough information so that you can see that they're invalid, and
why that is?

> > * QEMU has deleted the snapshot named "foo" on hd2, even though we issued
> >   "savevm 1" and the old snapshot (on hd1) with ID 1 was named "bar". This
> >   surprised me.
> 
> You mean we should check whether a snapshot identified by an ID has the
> same name on all image files before it's considered valid? That might be
> a reasonable constraint.
> 
> What I would have expected is that the old snapshot with ID 1 indeed
> gets deleted, but a new one with ID 1 is created again. I'm not sure if
> I would have expected "foo" or "bar" as its name, I could argue either
> way.

A large part of the confusion comes from the way the search functions work:
they take a string but will match either the ID or name, and there's no way to
specify which. Could we change the HMP command so that by default it works only
by name, but can optionally take an ID (via an explicit flag or something)?

> > * QEMU has created a snapshot on hd2 with a duplicate name ("bar") and if we
> >   issue a "loadvm bar" QEMU loads the snapshot with ID 1 on hd1 and the 
> > first
> >   snapshot with the name "bar" on hd2 which is the one with ID 2. That is 
> > not
> >   the snapshot that matches the vm state (which is the one with ID 3) and
> >   presumably can lead to filesystem corruption on hd2.
> 
> Yup, this doesn't seem very nice. I'm not sure whether we can forbid
> duplicate names, though, without breaking compatibility in an
> unacceptable way.

It seems like the code is currently trying to prevent duplicate names: if you
create a snapshot with the name of an existing one, existing snapshots with
that name are (silently) deleted. It's just that the code gets confused by
passing an ID and the above happens. So I don't think really forbidding it is
going to break anything.

There seems to be an assumption that the name or ID both uniquely identify a
snapshot "set" (both in the loadvm code and "info snapshots", see
hmp_info_snapshots()) but this just isn't the case. I think we need to change
the ID so that it really is unique to a "snapshot set" (maybe make it a UUID?)
and change the existing code to prevent duplicate names or the loading of
"mismatched" snapshots (with the changed ID this would just mean they need to
have matching IDs).

So I'm proposing:
* Change ID from an incrementing integer to something "fairly unique" like a
  UUID.
* Snapshots can only be created with a name if:
        * There are no snapshots on any device with that name, or
        * There are already snapshots on all devices and they currently all
          have the same ID.
* Snapshots are only valid if other snapshots exist on all devices with the
  same name (if named) and the same ID.

Do you think it would be too incompatible to change the ID field? Would it be
better to leave the ID field (and deprecate it somehow) and add an additional
"set ID" (UUID)?

> On the other hand, IDs and names don't really mix well and make it too
> easy to shoot yourself in the foot. Maybe it's bad enough to justify
> even some incompatibilities.

Agreed.

> > Do these seem like bugs? Should I try to find a way to fix some or all of 
> > them?
> > 
> > (Note: I checked the launchpad bug list but didn't see these listed.)
> 
> I'm not sure whether to call them bugs, but there's definitely a safety
> check or two missing. Let's discuss what the exact behaviour should be,
> and then you can go ahead and write the fixes to get that behaviour.

Sounds good, thanks for your time :-)

> Kevin

Cheers,
Sam.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]