qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] nbd-client: avoid spurious qio_channel_yield()


From: Stefan Hajnoczi
Subject: Re: [Qemu-block] [PATCH] nbd-client: avoid spurious qio_channel_yield() re-entry
Date: Wed, 23 Aug 2017 15:26:34 +0100

On Wed, Aug 23, 2017 at 3:20 PM, Eric Blake <address@hidden> wrote:
> On 08/22/2017 07:51 AM, Stefan Hajnoczi wrote:
>> The following scenario leads to an assertion failure in
>> qio_channel_yield():
>>
>> 1. Request coroutine calls qio_channel_yield() successfully when sending
>>    would block on the socket.  It is now yielded.
>> 2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because
>>    nbd_receive_reply() failed.
>> 3. Request coroutine is entered and returns from qio_channel_yield().
>>    Note that the socket fd handler has not fired yet so
>>    ioc->write_coroutine is still set.
>> 4. Request coroutine attempts to send the request body with nbd_rwv()
>>    but the socket would still block.  qio_channel_yield() is called
>>    again and assert(!ioc->write_coroutine) is hit.
>>
>> The problem is that nbd_read_reply_entry() does not distinguish between
>> request coroutines that are waiting to receive a reply and those that
>> are not.
>>
>> This patch adds a per-request bool receiving flag so
>> nbd_read_reply_entry() can avoid spurious aio_wake() calls.
>>
>> Reported-by: Dr. David Alan Gilbert <address@hidden>
>> Signed-off-by: Stefan Hajnoczi <address@hidden>
>> ---
>> This should fix the issue that Dave is seeing but I'm concerned that
>> there are more problems in nbd-client.c.  We don't have good
>> abstractions for writing coroutine socket I/O code.  Something like Go's
>> channels would avoid manual low-level coroutine calls.  There is
>> currently no way to cancel qio_channel_yield() so requests doing I/O may
>> remain in-flight indefinitely and nbd-client.c doesn't join them...
>
> Is this patch needed for 2.10-rc4, or does Fam's series cover the issue?

Fam's series fixes non-shared storage migration.

This patch addresses the failure case when the server closes the
connection prematurely.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]