qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] RFC block/iscsi command timeout


From: ronnie sahlberg
Subject: Re: [Qemu-block] RFC block/iscsi command timeout
Date: Tue, 2 Jun 2015 09:43:49 -0700


On Tue, Jun 2, 2015 at 7:45 AM, Peter Lieven <address@hidden> wrote:
Am 26.05.2015 um 12:21 schrieb Paolo Bonzini:

On 26/05/2015 12:06, Kevin Wolf wrote:
Am 26.05.2015 um 11:44 hat Paolo Bonzini geschrieben:

On 26/05/2015 11:37, Kevin Wolf wrote:
If we run into a timeout we theoretically have the following options:
  - reconnect
  - retry
  - error

I would reconnect as Ronnie proposed.
Just trying to reconnect indefinitely might not be the best option.
Consider the situation where you're inside a bdrv_drain_all(), which
blocks qemu completely. Trying to reconnect once or twice is probably
fine, but if that doesn't work, eventually you want to return an error
so that qemu is unstuck.
Whenever the topic of timeout is brought about, I'm worried that
introducing timeouts (and doing anything except reconnecting) is the
same as NFS's soft option, which can actually cause data corruption.
So, why would it be safe?
How would it cause data corruption for qemu, i.e. which of the block
layer assumptions would be broken?
Reordering of operations.  Say you have:

      guest -> QEMU        write A to sector 1
      QEMU -> NFS          write A to sector 1
      QEMU -> guest        write A to sector 1 timed out
      guest -> QEMU        write B to sector 1


If we change this to iSCSI, we can actually avoid this by using task management functions:
      guest -> QEMU        write A to sector 1
      QEMU -> iSCSI        write A to sector 1
     ... timeout...
      QEMU -> iSCSI       task management: abort task for Write A     (**A)
      QEMU -> guest        write A to sector 1 timed out
      guest -> QEMU        write B to sector 1         (**B)

I think that IF a task times out and then IF you then immediately generate and send a task management abort task to the
target, and you do this before you tell the guest the i/o failed, then all should be good.

That should guarantee the ordering of **A always being sent to the target before **B
so the race should not happen.




At this point you have the two outstanding writes are for the same
sector and with different payloads, so it's undefined which one wins.

      QEMU -> NFS          write B to sector 1
      NFS -> QEMU          write B to sector 1 completed
      QEMU -> guest        write B to sector 1 completed
      NFS -> QEMU          write A to sector 1 completed
                           (QEMU doesn't report this to the guest)

The guest thinks it has written B, but it's possible that the storage
has written A.

So you would go for infinite reconnecting? We can SIGKILL then anyway.

As said before my idea would be default of 5000ms for all sync calls and
no timeout for all async calls coming from the block layer.

A user settable timeout can be optionally specified via cmdline options
to define a timeout for both sync and async calls.

Sounds sane to me.

As for infinite reconnect. I guess that since these disks are not exposes as "removable" to the
guest, there is not really much recovery that the guest kernel can do if the disk go away and never return
so there might not be much point in not having infinite reconnect attempts.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]