qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4


From: ronnie sahlberg
Subject: Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4
Date: Tue, 4 Aug 2015 16:23:30 -0700



On Tue, Aug 4, 2015 at 5:53 AM, Peter Lieven <address@hidden> wrote:
Am 04.08.2015 um 14:29 schrieb Peter Lieven:
Am 04.08.2015 um 14:09 schrieb Paolo Bonzini:

On 04/08/2015 13:57, Peter Lieven wrote:
Okay, what I found out is that in aio_poll I get revents = POLLIN for
the nfs file descriptor. But there is no data available on the socket.
Does read return 0 or EAGAIN?

If it returns EAGAIN, the bug is in the QEMU main loop or the kernel.
It should never happen that poll returns POLLIN and read returns EAGAIN.

If it returns 0, it means the other side called shutdown(fd, SHUT_WR).
Then I think the bug is in the libnfs driver or more likely libnfs.  You
should stop polling the POLLIN event after read has returned 0 once.

You might be right. Ronnie originally used the FIONREAD ioctl before every read and considered
the socket as disconnected if the available bytes returned where 0.
I found that I get available bytes == 0 from that ioctl even if the socket was not closed.

You only get >0 from this call if there are actual bytes available to read.

For context,  the problem was that

75 static void nfs_process_read(void *arg)
76 {
77     NFSClient *client = arg;
78     nfs_service(client->context, POLLIN);
79     nfs_set_events(client);
80 }

sometimes trigger and call nfs_service(POLLIN) eventhough the socket is not readable.
I verified this by adding an extra call to poll() at around line 78
to check whether POLLIN was actually set on the fd or not. Sometimes it would not be but I got lost in the sources and could not find if or where this happens or even if qemu even guarantees "only call the POLLIN callbacks if the filedescriptor is actually readable".


The old code in libnfs used to assume that IF we are called for POLLIN and the if ioctl(FIONREAD) returns that there are 0 bytes available to read then there was a problem with the socket.

:-(


 
This seems to be some kind of bug in Linux - at least what I have thought.

See BUGS in the select(2) manpage.

       Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data  has  arrived  but
       upon  examination  has  wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready.  Thus it may be safer to use O_NON‐
       BLOCK on sockets that should not block.

I will debug further, but it seems to be that I receive a POLLIN even if there is no data available. I see 0 bytes from the recv call inside libnfs and continue without a deadlock - at least
so far.

Would it be a good idea to count the number of 0 bytes from recv and react after I received 0 bytes for a number of consecutive times?

And then: stop polling POLLIN or reconnect?

Okay, got it. Ronnie was using FIONREAD without checking for EAGAIN or EINTR.

I will send a patch for libnfs to reconnect if count == 0. Libiscsi is not affected, it reconnects if count is 0.

Thanks, and merged.
 


Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]