qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] RFC block/iscsi command timeout


From: Peter Lieven
Subject: Re: [Qemu-block] RFC block/iscsi command timeout
Date: Tue, 02 Jun 2015 16:45:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Am 26.05.2015 um 12:21 schrieb Paolo Bonzini:

On 26/05/2015 12:06, Kevin Wolf wrote:
Am 26.05.2015 um 11:44 hat Paolo Bonzini geschrieben:

On 26/05/2015 11:37, Kevin Wolf wrote:
If we run into a timeout we theoretically have the following options:
  - reconnect
  - retry
  - error

I would reconnect as Ronnie proposed.
Just trying to reconnect indefinitely might not be the best option.
Consider the situation where you're inside a bdrv_drain_all(), which
blocks qemu completely. Trying to reconnect once or twice is probably
fine, but if that doesn't work, eventually you want to return an error
so that qemu is unstuck.
Whenever the topic of timeout is brought about, I'm worried that
introducing timeouts (and doing anything except reconnecting) is the
same as NFS's soft option, which can actually cause data corruption.
So, why would it be safe?
How would it cause data corruption for qemu, i.e. which of the block
layer assumptions would be broken?
Reordering of operations.  Say you have:

      guest -> QEMU        write A to sector 1
      QEMU -> NFS          write A to sector 1
      QEMU -> guest        write A to sector 1 timed out
      guest -> QEMU        write B to sector 1

At this point you have the two outstanding writes are for the same
sector and with different payloads, so it's undefined which one wins.

      QEMU -> NFS          write B to sector 1
      NFS -> QEMU          write B to sector 1 completed
      QEMU -> guest        write B to sector 1 completed
      NFS -> QEMU          write A to sector 1 completed
                           (QEMU doesn't report this to the guest)

The guest thinks it has written B, but it's possible that the storage
has written A.

So you would go for infinite reconnecting? We can SIGKILL then anyway.

As said before my idea would be default of 5000ms for all sync calls and
no timeout for all async calls coming from the block layer.

A user settable timeout can be optionally specified via cmdline options
to define a timeout for both sync and async calls.

Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]