qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Massive read only kvm guests when backing file was miss


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
Date: Thu, 27 Mar 2014 10:10:40 +0200

On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <address@hidden> writes:
> 
> > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> >> Hi List!
> >> Hope some one can help me, we had a big issue in our cloud the other
> >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> >> went read only filesystem from the guest side because the backing
> >> files directory (the openstack _base directory) was compromised and
> >> the data was lost, when we realized the data was lost, it took us 5
> >> mins to restore the backup of the backing files, but by that time all
> >> the kvm guests received some kind of IO error from the hypervisor
> >> layer, and went read only on root filesystem.
> >> 
> >> My question would be, is there a way to hold the IO operations against
> >> the backing files ( i thought that would be 99% READ operations ) for
> >> a little longer ( im asking this because i dont quite understand what
> >> is the process and when it raises the error ) in a case the backing
> >> files are missing (no IO possible) but is recoverable within minutes ?
> >> 
> >> Any tip  on how to achieve this if possible, or information about how
> >> backing files works on kvm, will be amazing.
> >> Waiting for feedback!
> >> 
> >> kindest regards.
> >> Alejandro Comisario
> >
> >
> > I'm guessing this is what happened: guests timed out meanwhile.
> > You can increase the timeout within the guest:
> > echo 600 > /sys/block/sda/device/timeout
> > to timeout after 10 minutes.
> >
> > If you have installed qemu guest agent on your system, you can do this
> > from the host. Unfortunately by default it's memory can be pushed out to 
> > swap
> > and then on disk error access there might will fail :(
> > Maybe we should consider mlock on all its memory at least as an option.
> >
> > You could pause your guests, restart them after the issue is resolved,
> > and we could I guess add functionality to pause VM on disk errors
> > automatically.
> > Stefan?
> 
> Would -drive rerror=stop do?

I think it will. It's a pity it doesn't appear in --help output -
would make it easier to find.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]