[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when ser
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage |
Date: |
Tue, 15 Aug 2017 09:54:24 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 |
On 08/15/2017 09:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2017 00:34, Eric Blake wrote:
>> When we switched NBD to use coroutines for qemu 2.9 (in particular,
>> commit a12a712a), we introduced a regression: if a server sends us
>> garbage (such as a corrupted magic number), we quit the read loop
>> but do not stop sending further queued commands, resulting in the
>> client hanging when it never reads the response to those additional
>> commands. In qemu 2.8, we properly detected that the server is no
>> longer reliable, and cancelled all existing pending commands with
>> EIO, then tore down the socket so that all further command attempts
>> get EPIPE.
>>
>> Restore the proper behavior of quitting (almost) all communication
>> with a broken server: Once we know we are out of sync or otherwise
>> can't trust the server, we must assume that any further incoming
>> data is unreliable and therefore end all pending commands with EIO,
>> and quit trying to send any further commands. As an exception, we
>> still (try to) send NBD_CMD_DISC to let the server know we are going
>> away (in part, because it is easier to do that than to further
>> refactor nbd_teardown_connection, and in part because it is the
>> only command where we do not have to wait for a reply).
>>
>> Based on a patch by Vladimir Sementsov-Ogievskiy.
>>
>> +++ b/block/nbd-client.c
>> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>> int ret;
>> Error *local_err = NULL;
>>
>> - for (;;) {
>> + while (!s->quit) {
>
> looks like quit will never be true here
>
>> assert(s->reply.handle == 0);
>> ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>> if (ret < 0) {
>> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>> qemu_coroutine_yield();
>> }
>>
>> + if (ret < 0) {
>> + s->quit = true;
>
> so, you set quit only here.. if we fail on some write, reading coroutine
> will not
> know about it and will continue reading..
Looks like you are correct - your version set the flag in more places
than me, but it looks like you're right that we DO want to set the flag
when writing hits a known failure.
Here's what I plan to squash in:
diff --git i/block/nbd-client.c w/block/nbd-client.c
index bb17e3da45..422ecb4307 100644
--- i/block/nbd-client.c
+++ w/block/nbd-client.c
@@ -161,6 +161,9 @@ static int nbd_co_send_request(BlockDriverState *bs,
} else {
rc = nbd_send_request(s->ioc, request);
}
+ if (rc < 0) {
+ s->quit = true;
+ }
qemu_co_mutex_unlock(&s->send_mutex);
return rc;
}
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature