|
From: | Eric Blake |
Subject: | Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN |
Date: | Thu, 2 Apr 2020 08:33:20 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 |
On 4/2/20 1:41 AM, Vladimir Sementsov-Ogievskiy wrote:
02.04.2020 1:38, Eric Blake wrote:I was trying to test qemu's reconnect-delay parameter by using nbdkit as a server that I could easily make disappear and resume. A bit of experimenting shows that when nbdkit is abruptly killed (SIGKILL), qemu detects EOF on the socket and manages to reconnect just fine; but when nbdkit is gracefully killed (SIGTERM), it merely fails all further guest requests with NBD_ESHUTDOWN until the client disconnects first, and qemu was blindly failing the I/O request with ESHUTDOWN from the server instead of attempting to reconnect. While most NBD server failures are unlikely to change by merely retrying the same transaction, our decision to not start a retry loop in the common case is correct. But NBD_ESHUTDOWN is rare enough, and really is indicative of a transient situation, that it is worth special-casing.
Interesting. I see, that prior to this patch we don't handle ESHUTDOWN at all in nbd client..What does spec say?> On a server shutdown, the server SHOULD wait for inflight requests to be serviced prior to initiating a hard disconnect. A server MAY speed this process up by issuing error replies. The error value issued in respect of these requests and any subsequently received requests SHOULD be NBD_ESHUTDOWN. > If the client receives an NBD_ESHUTDOWN error it MUST initiate a soft disconnect.
Perhaps the spec should be relaxed to state that a client SHOULD initiate soft disconnect (as there are existing clients that do not). If a server knows it wants to initiate hard disconnect soon, it shouldn't be forced to wait for a client to respond to NBD_ESHUTDOWN, since not all clients do. Then again, it is indeed nicer if the client does initiate soft disconnect (as soft is always cleaner than hard).
> The client MAY issue a soft disconnect at any time, but SHOULD wait until there are no inflight requests first. > The client and the server MUST NOT initiate any form of disconnect other than in one of the above circumstances.Hmm. So, actually we MUST initiate a soft disconnect, which means that we must send NBD_CMD_DISC..
With this patch as-is, qemu as client initiates hard disconnect in response to NBD_ESHUTDOWN (but only if it plans on trying to reconnect).
Then, what about "SHOULD wait until no inflight requests"? We don't do it either.. Should we?
qemu as server doesn't send NBD_ESHUTDOWN. It probably should (the way nbdkit does), but that's orthogonal to qemu as client responding to NBD_ESHUTDOWN.
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |