qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qcow2 - safe on kill? safe on power fail?


From: Anthony Liguori
Subject: Re: [Qemu-devel] qcow2 - safe on kill? safe on power fail?
Date: Mon, 21 Jul 2008 14:43:45 -0500
User-agent: Thunderbird 2.0.0.14 (X11/20080501)

Jamie Lokier wrote:
Quite a while ago, Anthony Liguori wrote:
David Barrett wrote:
I'm tracking down a image corruption issue and I'm curious if you can answer the following:

1) Is there any difference between sending a "TERM" signal to the QEMU process and typing "quit" at the monitor?
Yes. Since QEMU is single threaded, when you issue a quit, you know you aren't in the middle of writing qcow2 meta data to disk.

2) Will sending TERM corrupt the 'gcow2' image (in ways other than normal guest OS dirty shutdown)?
Possibly, yes.

3) Assuming I always start QEMU using "-loadvm", is there any risk in using 'kill' to send SIGTERM to the QMEU process when done?
Yes.  If you want to SIGTERM QEMU, the safest thing to do is use -snapshot.

Just today, I had a running KVM instance for an important server (our
busy mail server) lock up.  It was accepting VNC
connections, but sending keystrokes, mouse movements and so on didn't
do anything.  It had been running for several weeks without any problem.
I don't have a report on whether there was a picture from VNC.

Our system manager decided there was nothing else to do, and killed
that process (SIGTERM), then restarted it.

SIGTERM is about the worse thing you could do, but you're probably okay.

QCOW2 files have no journal, so they are not safe against unexpected power outages or hard crashes. If you need a great deal of reliability, you should use a raw image.

With that said, let me explain exactly what circumstances corruption can occur in as it turns out that, in practice, the corruption window isn't that big.

Obviously there are no issues on the read path, so we'll stick strictly to the write path.

QEMU is single-threaded and QCOW2 supports asynchronous write operations. There are two parts in this operation. The first discovers what offset within the QCOW2 file to write to. If the sector has been previously allocated, this will consist only of read operations. It will then issue an asynchronous write operation to the allocated sector.

Since your guest probably is using a journalled file system, you will be okay if something happens before that data gets written to disk[1].

If the sector hasn't been previously allocated, then a new sector in the file needs to be allocated. This is going to change metadata within the QCOW2 file and this is where it is possible to corrupt a disk image. The operation of allocating a new disk sector is completely synchronous so no other code runs until this completes. Once the disk sector is allocated, you're safe again[1].

Since no other code runs during this period, bugs in the device emulation, a user closing the SDL window, and issuing quit in the monitor, will not corrupt the disk image. Your guest may require an fsck but the QCOW2 image will be fine.

The only ways that you can cause corruption is if the QCOW2 sector allocation code is faulty (and you would be screwed no matter what here) or if you issue a SIGTERM/SIGKILL that interrupts the code while it's allocating a new sector. If your guest is hung, chances are it's not actively writing to disk but this is why SIGTERM/SIGKILL is really a terrible thing to do. It's really the only practical way to corrupt a disk image (short of a hard power outage).

If someone was sufficiently concerned, it's probably relatively straight forward to implement an fsck or journal for QCOW2. This would allow the image to be recovered if the meta data somehow got corrupted.

With all this said, I've definitely seen corruption in QCOW2 images that were caused by crashing my host kernel. I beat up on QEMU pretty badly though. I think under normal circumstances, it's unlikely a user would see this in practice.

[1] It's not quite that simple. Your host doesn't necessarily guarantee integrity unless 1) you've got battery backed cache on your disks (commodity disks aren't battery backed typically) or you've disabled write-back 2) you have a file system that supports barriers and barriers are enabled by default (they aren't enabled by default with ext2/3) 3) you are running QEMU with cache=off to disable host write caching. Basically, chances are your data is not as safe as you assume it is and QEMU adds very little additional uncertainty to that unless you do something nasty like SIGKILL/SIGTERM while doing heavy disk IO.

Regards,

Anthony Liguori

(Unfortunately, he didn't know about the monitor and "quit".)

So far, it's looking ok, but I'm concerned about the possibility of
qcow2 corruption which the above mail says is possible.

Even if we could have used the monitor *this* time, QEMU is quite a
complex piece of software which we can't assume to be bug free.  what
happens if KVM/QEMU locks up or crashes, in the following ways:

    - Some emulated driver crashes.  I *have* seen this happen.
      (Try '-net user -net user' on the command line.  Ok, now we know not
      to do it...).  The process dies.

    - Some emulated driver gets stuck in a loop.  You know, a bug.
      No choice but to kill the process.

    - The host machine loses power.  Host's journalled filesystem is
      fine, but what about the qcow2 images of guests?

I'm imagining that qcow2 is like a very simple filesystem format.
Real filesystems have "fsck" and/or use journalling or similar to be
robust.  Is there a "fsck" equivalent for qcow2?  (Maybe running
qemu-img convert is that?)  Does it use journalling or other
techniques internally to make sure it is difficult to corrupt, even if
the host dies unexpectedly?

If qcow2 is not resistant to sudden failures, would it be difficult to
make it more robust?

(One method which comes to mind is to use a daemon process just to
handle the disk image, communicating with QEMU.  QEMU is complex and
may occasionally have problems, but the daemon would do just one
thing, so quite likely to survive.  It won't be robust against power
failure, though, and it sounds like performance might suck.)

Or should we avoid using qcow2, for important guest servers that would
be expensive or impossible to reconstruct?

If not qcow2, are any of the other supported incremental formats
robust in these ways, e.g. the VMware one?

Thanks,
-- Jamie







reply via email to

[Prev in Thread] Current Thread [Next in Thread]