[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] RFC block/iscsi command timeout
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-block] RFC block/iscsi command timeout |
Date: |
Tue, 26 May 2015 12:06:26 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 26.05.2015 um 11:44 hat Paolo Bonzini geschrieben:
>
>
> On 26/05/2015 11:37, Kevin Wolf wrote:
> > > If we run into a timeout we theoretically have the following options:
> > > - reconnect
> > > - retry
> > > - error
> > >
> > > I would reconnect as Ronnie proposed.
> >
> > Just trying to reconnect indefinitely might not be the best option.
> > Consider the situation where you're inside a bdrv_drain_all(), which
> > blocks qemu completely. Trying to reconnect once or twice is probably
> > fine, but if that doesn't work, eventually you want to return an error
> > so that qemu is unstuck.
>
> Whenever the topic of timeout is brought about, I'm worried that
> introducing timeouts (and doing anything except reconnecting) is the
> same as NFS's soft option, which can actually cause data corruption.
> So, why would it be safe?
How would it cause data corruption for qemu, i.e. which of the block
layer assumptions would be broken?
> Considering that, unlike a process stuck on NFS, QEMU can always be
> SIGKILLed, reconnection seems like a pretty good default.
Having to kill a whole VM just because one disk is on an NFS server that
has gone down might somehow be good enough, but I wouldn't call it
"pretty good".
> Perhaps we can have a limited number of retries (like NFS's retrans)
> followed by either reconnect or error?
Perhaps. And unless there is a real corruption scenario, a limited
number of reconnects before we error out.
Kevin