qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] qcow2 - safe on kill? safe on power fail?


From: Jamie Lokier
Subject: [Qemu-devel] qcow2 - safe on kill? safe on power fail?
Date: Mon, 21 Jul 2008 19:10:31 +0100
User-agent: Mutt/1.5.13 (2006-08-11)

Quite a while ago, Anthony Liguori wrote:
> David Barrett wrote:
> >I'm tracking down a image corruption issue and I'm curious if you can 
> >answer the following:
> >
> >1) Is there any difference between sending a "TERM" signal to the QEMU 
> >process and typing "quit" at the monitor?
> 
> Yes.  Since QEMU is single threaded, when you issue a quit, you know you 
> aren't in the middle of writing qcow2 meta data to disk.
> 
> >2) Will sending TERM corrupt the 'gcow2' image (in ways other than 
> >normal guest OS dirty shutdown)?
> 
> Possibly, yes.
> 
> >3) Assuming I always start QEMU using "-loadvm", is there any risk in 
> >using 'kill' to send SIGTERM to the QMEU process when done?
> 
> Yes.  If you want to SIGTERM QEMU, the safest thing to do is use -snapshot.

Just today, I had a running KVM instance for an important server (our
busy mail server) lock up.  It was accepting VNC
connections, but sending keystrokes, mouse movements and so on didn't
do anything.  It had been running for several weeks without any problem.
I don't have a report on whether there was a picture from VNC.

Our system manager decided there was nothing else to do, and killed
that process (SIGTERM), then restarted it.

(Unfortunately, he didn't know about the monitor and "quit".)

So far, it's looking ok, but I'm concerned about the possibility of
qcow2 corruption which the above mail says is possible.

Even if we could have used the monitor *this* time, QEMU is quite a
complex piece of software which we can't assume to be bug free.  what
happens if KVM/QEMU locks up or crashes, in the following ways:

    - Some emulated driver crashes.  I *have* seen this happen.
      (Try '-net user -net user' on the command line.  Ok, now we know not
      to do it...).  The process dies.

    - Some emulated driver gets stuck in a loop.  You know, a bug.
      No choice but to kill the process.

    - The host machine loses power.  Host's journalled filesystem is
      fine, but what about the qcow2 images of guests?

I'm imagining that qcow2 is like a very simple filesystem format.
Real filesystems have "fsck" and/or use journalling or similar to be
robust.  Is there a "fsck" equivalent for qcow2?  (Maybe running
qemu-img convert is that?)  Does it use journalling or other
techniques internally to make sure it is difficult to corrupt, even if
the host dies unexpectedly?

If qcow2 is not resistant to sudden failures, would it be difficult to
make it more robust?

(One method which comes to mind is to use a daemon process just to
handle the disk image, communicating with QEMU.  QEMU is complex and
may occasionally have problems, but the daemon would do just one
thing, so quite likely to survive.  It won't be robust against power
failure, though, and it sounds like performance might suck.)

Or should we avoid using qcow2, for important guest servers that would
be expensive or impossible to reconstruct?

If not qcow2, are any of the other supported incremental formats
robust in these ways, e.g. the VMware one?

Thanks,
-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]