qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when ser


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage
Date: Tue, 15 Aug 2017 09:54:24 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 08/15/2017 09:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2017 00:34, Eric Blake wrote:
>> When we switched NBD to use coroutines for qemu 2.9 (in particular,
>> commit a12a712a), we introduced a regression: if a server sends us
>> garbage (such as a corrupted magic number), we quit the read loop
>> but do not stop sending further queued commands, resulting in the
>> client hanging when it never reads the response to those additional
>> commands.  In qemu 2.8, we properly detected that the server is no
>> longer reliable, and cancelled all existing pending commands with
>> EIO, then tore down the socket so that all further command attempts
>> get EPIPE.
>>
>> Restore the proper behavior of quitting (almost) all communication
>> with a broken server: Once we know we are out of sync or otherwise
>> can't trust the server, we must assume that any further incoming
>> data is unreliable and therefore end all pending commands with EIO,
>> and quit trying to send any further commands.  As an exception, we
>> still (try to) send NBD_CMD_DISC to let the server know we are going
>> away (in part, because it is easier to do that than to further
>> refactor nbd_teardown_connection, and in part because it is the
>> only command where we do not have to wait for a reply).
>>
>> Based on a patch by Vladimir Sementsov-Ogievskiy.
>>

>> +++ b/block/nbd-client.c
>> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>       int ret;
>>       Error *local_err = NULL;
>>
>> -    for (;;) {
>> +    while (!s->quit) {
> 
> looks like quit will never be true here
> 
>>           assert(s->reply.handle == 0);
>>           ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>>           if (ret < 0) {
>> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>           qemu_coroutine_yield();
>>       }
>>
>> +    if (ret < 0) {
>> +        s->quit = true;
> 
> so, you set quit only here.. if we fail on some write, reading coroutine
> will not
> know about it and will continue reading..

Looks like you are correct - your version set the flag in more places
than me, but it looks like you're right that we DO want to set the flag
when writing hits a known failure.

Here's what I plan to squash in:

diff --git i/block/nbd-client.c w/block/nbd-client.c
index bb17e3da45..422ecb4307 100644
--- i/block/nbd-client.c
+++ w/block/nbd-client.c
@@ -161,6 +161,9 @@ static int nbd_co_send_request(BlockDriverState *bs,
     } else {
         rc = nbd_send_request(s->ioc, request);
     }
+    if (rc < 0) {
+        s->quit = true;
+    }
     qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
 }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]