qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] qcow2 corruption repair can not proceed due to bad snapshot


From: Brian Taber
Subject: [Qemu-devel] qcow2 corruption repair can not proceed due to bad snapshot
Date: Fri, 20 Nov 2015 14:33:34 -0500
User-agent: Roundcube Webmail/1.0.0

I recently ran across an issue (completely my own fault) that others have encountered with varying details/success in fixing.  I had a VM stuck in shutdown (windoze asking/waiting to kill a program) that I thought was already down when I created a snapshot on the 3 disks attached to the VM.  After running the snapshot command I went back to the machine and instead of just turning off (which would have been better), I let the shutdown complete.

Needless to say all 3 images had corruption to varying degrees.  The first disk, system disk, was the worse.  The other 2 has databases and were repairable via the "qemu-img check -r all image.img" command (with a bunch of messages/warnings).  I suspect the limited activity on shutdown helped save them.  The system disk would not perform a check, it encountered:

qemu-img: Could not open 'image.img': Could not read snapshots: File too large

Searching online for this returns different repair methods, but the latest version of qemu I compiled for a newer qemu-img (I did not want to use an older version as suggested in posts), I pulled latest source, compiled, but I got the same error trying to check or convert the image.  I dug into the qcow2 code, silenced that particular error, and was able to get the check to actually run (I was able to work around the problem and let the repair run with modifications to block/qcow2.c about line 1136 and ignoring the return result if 27 (EFBIG) and setting res to 0; probably really bad to do, just did this to get get to checks).  The repair run repaired the image to the point the checks came back OK.  Unfortunately the image was still broke, trying to list snapshots or use image returned the file to long error again.

Ultimately I was able to repair the system disk by converting the image to raw as suggested in other posts now that it was repaired and was able to start the machine again right where it left off (or at least it appears so).  Disk checks within the machine return OK.  One thing I am unsure of is how safe the qemu images are in regards to snapshots, and I dare not try to do anything with them as they are, and will convert to raw then all of them back into qemu images.

Even though this is entirely due to creating a snapshot while the disk is in use, some thoughts:

- if a user is trying to run a repair it should not error about snapshots and proceed with checks/repairs and allow convert if possible.  
- if possible, before actually doing a snapshot, check if the file is in use to avoid this situation all together

I would submit a patch, but I do not know enough about the possible repercussions of ignoring an error and repairing/converting.

Any questions please reply.

 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]