qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [bug] busy-loop in send_all()


From: Chris Friesen
Subject: Re: [Qemu-devel] [bug] busy-loop in send_all()
Date: Tue, 27 May 2014 08:58:06 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 05/26/2014 10:41 PM, Amit Shah wrote:
On (Fri) 23 May 2014 [13:55:40], Stefan Hajnoczi wrote:
On Thu, May 15, 2014 at 11:23:54AM -0600, Chris Friesen wrote:

Looking at the implementation of send_all(), the core loop looks like:

      while (len > 0) {
          ret = write(fd, buf, len);
          if (ret < 0) {
              if (errno != EINTR && errno != EAGAIN)
                  return -1;
          } else if (ret == 0) {
              break;
          } else {
              buf += ret;
              len -= ret;
          }
      }


So if we get EAGAIN, we'll just immediately retry.

I'm not sure where the unix socket would get opened, but I'm assuming it's
set as non-blocking?  And by default /proc/sys/net/unix/max_dgram_qlen is
set to 10.

So if the other end of that unix socket is connected but isn't actually
paying attention to the messages then the first 10 messages will get
buffered but after that we'll end up with qemu spinning forever in a
busy-loop trying to send a message into a full buffer.

This seems less than ideal.  Either we should block, or else we should
discard the data.  And I don't think discarding the data makes sense.

Chardev flow control was added to 1.5.0.  Can you re-try with that
release and let us know if it still behaves similarly?

http://wiki.qemu.org/Features/ChardevFlowControl


It's tricky for me to test a different version since it's part of a unified product.

Someone who understands the flow control code should be able to easily determine whether or not it's still susceptible.

send_all() is still implemented exactly the same way as before. Assuming the tx buffer is big enough, the first /proc/sys/net/unix/max_dgram_qlen messages will get queued up in the receiver and on the next message after that we'll get stuck in the while loop. write() will return -1, but errno is EAGAIN and so we just spin trying to write to a full buffer.

The only way flow control would prevent this is if it ensured that we never queued up /proc/sys/net/unix/max_dgram_qlen messages in the first place.

Assuming it is still a problem, we have to treat EAGAIN differently. Getting EAGAIN should result in aborting send_all() since we have no idea when the socket will become writable. At that point we might want to trigger flow control to prevent the guest from writing additional data into that channel.

Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]