On Wed, Dec 23, 2015 at 11:14:12AM +0800, Fam Zheng wrote:
On Tue, 12/22 17:46, Kevin Wolf wrote:
Enough innocent images have died because users called 'qemu-img
snapshot' while
the VM was still running. Educating the users doesn't seem to be a working
strategy, so this series adds locking to qcow2 that refuses to
access the image
read-write from two processes.
Eric, this will require a libvirt update to deal with qemu
crashes which leave
locked images behind. The simplest thinkable way would be to
unconditionally
override the lock in libvirt whenever the option is present. In that case,
libvirt VMs would be protected against concurrent non-libvirt
accesses, but not
the other way round. If you want more than that, libvirt would
have to check
somehow if it was its own VM that used the image and left the
lock behind. I
imagine that can't be too hard either.
The motivation is great, but I'm not sure I like the side-effect that an
unclean shutdown will require a "forced" open, because it makes
using qcow2 in
development cumbersome, and like you said, management/user also
needs to handle
this explicitly. This is a bit of a personal preference, but it's
strong enough
that I want to speak up.
Yeah, I am also not really a big fan of locking mechanisms which are not
automatically cleaned up on process exit. On the other hand you could
say that people who choose to run qemu-img manually are already taking
fate into their own hands, and ending up with a dirty image on unclean
exit is still miles better than loosing all your data.
As an alternative, can we introduce .bdrv_flock() in protocol drivers, with
similar semantics to flock(2) or lockf(3)? That way all formats can benefit,
and a program crash will automatically drop the lock.
FWIW, the libvirt locking daemon (virtlockd) will already attempt to take
out locks using fcntl()/lockf() on all disk images associated with a VM.