[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] RFC block/iscsi command timeout
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-block] RFC block/iscsi command timeout |
Date: |
Wed, 03 Jun 2015 09:16:29 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 02/06/2015 18:43, ronnie sahlberg wrote:
> If we change this to iSCSI, we can actually avoid this by using task
> management functions:
> guest -> QEMU write A to sector 1
> QEMU -> iSCSI write A to sector 1
> ... timeout...
> QEMU -> iSCSI task management: abort task for Write A (**A)
> QEMU -> guest write A to sector 1 timed out
> guest -> QEMU write B to sector 1 (**B)
>
> I think that IF a task times out and then IF you then immediately
> generate and send a task management abort task to the
> target, and you do this before you tell the guest the i/o failed, then
> all should be good.
You still have to wait for the answer to the TMF, so this doesn't help
much. :-(
Paolo
> That should guarantee the ordering of **A always being sent to the
> target before **B
> so the race should not happen.
>
>
>
>
> At this point you have the two outstanding writes are for the same
> sector and with different payloads, so it's undefined which one
> wins.
>
> QEMU -> NFS write B to sector 1
> NFS -> QEMU write B to sector 1 completed
> QEMU -> guest write B to sector 1 completed
> NFS -> QEMU write A to sector 1 completed
> (QEMU doesn't report this to the guest)
>
> The guest thinks it has written B, but it's possible that the
> storage
> has written A.
>
>
> So you would go for infinite reconnecting? We can SIGKILL then anyway.
>
> As said before my idea would be default of 5000ms for all sync calls and
> no timeout for all async calls coming from the block layer.
>
> A user settable timeout can be optionally specified via cmdline options
> to define a timeout for both sync and async calls.
>
>
> Sounds sane to me.
>
> As for infinite reconnect. I guess that since these disks are not
> exposes as "removable" to the
> guest, there is not really much recovery that the guest kernel can do if
> the disk go away and never return
> so there might not be much point in not having infinite reconnect attempts.
>
>
>