[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Re: [PATCH 2/3] NBD library: add aio-compatible read/wr
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] Re: [PATCH 2/3] NBD library: add aio-compatible read/write function |
Date: |
Tue, 22 Feb 2011 20:14:42 +0000 |
On Tue, Feb 22, 2011 at 3:26 PM, Nicholas Thomas <address@hidden> wrote:
> On Mon, 2011-02-21 at 20:10 +0000, Stefan Hajnoczi wrote:
>> On Mon, Feb 21, 2011 at 12:37 PM, Kevin Wolf <address@hidden> wrote:
>> > Am 18.02.2011 13:55, schrieb Nick Thomas:
>> >> +retry:
>> >> + if (do_read) {
>> >> + ret = recvmsg(sockfd, &msg, 0);
>> >> + } else {
>> >> + ret = sendmsg(sockfd, &msg, 0);
>> >> + }
>> >> +
>> >> + /* recoverable error */
>> >> + if (ret == -1 && (errno == EAGAIN || errno == EINTR)) {
>> >> + goto retry;
>> >> + }
>> >
>> > Make this a do...while loop, no reason for goto here.
>>
>> This is not asynchronous.
>>
>> If we're writing and the in-kernel socket buffer is full then we'll spin
>> retrying - this burns CPU for no reason. Instead we should get a
>> callback when the socket becomes writable again so we can fill it with
>> more data.
>>
>> If we're reading and enough data hasn't arrived yet we will spin
>> retrying. Again, we should set up a callback to be notified when the
>> socket becomes readable again.
>>
>> It seems sheepdog suffers from the same problem.
>
> Here, I'm a little concerned by the possibility of getting reads and
> writes from different requests interleaved - so, writing half of the
> data for one write request, returning, then getting another write
> request through and starting that before the other one is completed.
>
> Obviously, I 'just' need to write clever code that avoids that ;)
>
> Since sheepdog has the same issue, it might be worth me looking at this
> as a separate issue in both code bases, starting from a patch that gets
> nbd to the same kind of state as sheepdog in this area? The intermediate
> stage is definitely better than the starting point, after all.
Please try to solve it in the same patch series. This is not a
limitation where we're being sub-optimal but well-behaved. Spinning
is a bug and introducing it is a problem.
ui/vnc.c has a solution to this problem, it has a Buffer abstraction.
Data can be copied into the buffer. It then drains the buffer to the
socket in vnc_client_write(). For reads the caller says how many
bytes they are expecting and their callback will not be invoked until
the buffer has filled at least that much.
For NBD I would look at something more like QEMUIOVector instead of
copying between a Buffer. It should be possible to concatenate iovec
arrays that need to be transmitted without any interleaving issues.
Stefan