qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH 2/3] NBD library: add aio-compatible read/wr


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Re: [PATCH 2/3] NBD library: add aio-compatible read/write function
Date: Tue, 22 Feb 2011 20:14:42 +0000

On Tue, Feb 22, 2011 at 3:26 PM, Nicholas Thomas <address@hidden> wrote:
> On Mon, 2011-02-21 at 20:10 +0000, Stefan Hajnoczi wrote:
>> On Mon, Feb 21, 2011 at 12:37 PM, Kevin Wolf <address@hidden> wrote:
>> > Am 18.02.2011 13:55, schrieb Nick Thomas:
>> >> +retry:
>> >> +    if (do_read) {
>> >> +        ret = recvmsg(sockfd, &msg, 0);
>> >> +    } else {
>> >> +        ret = sendmsg(sockfd, &msg, 0);
>> >> +    }
>> >> +
>> >> +    /* recoverable error */
>> >> +    if (ret == -1 && (errno == EAGAIN || errno == EINTR)) {
>> >> +        goto retry;
>> >> +    }
>> >
>> > Make this a do...while loop, no reason for goto here.
>>
>> This is not asynchronous.
>>
>> If we're writing and the in-kernel socket buffer is full then we'll spin
>> retrying - this burns CPU for no reason.  Instead we should get a
>> callback when the socket becomes writable again so we can fill it with
>> more data.
>>
>> If we're reading and enough data hasn't arrived yet we will spin
>> retrying.  Again, we should set up a callback to be notified when the
>> socket becomes readable again.
>>
>> It seems sheepdog suffers from the same problem.
>
> Here, I'm a little concerned by the possibility of getting reads and
> writes from different requests interleaved - so, writing half of the
> data for one write request, returning, then getting another write
> request through and starting that before the other one is completed.
>
> Obviously, I 'just' need to write clever code that avoids that ;)
>
> Since sheepdog has the same issue, it might be worth me looking at this
> as a separate issue in both code bases, starting from a patch that gets
> nbd to the same kind of state as sheepdog in this area? The intermediate
> stage is definitely better than the starting point, after all.

Please try to solve it in the same patch series.  This is not a
limitation where we're being sub-optimal but well-behaved.  Spinning
is a bug and introducing it is a problem.

ui/vnc.c has a solution to this problem, it has a Buffer abstraction.
Data can be copied into the buffer.  It then drains the buffer to the
socket in vnc_client_write().  For reads the caller says how many
bytes they are expecting and their callback will not be invoked until
the buffer has filled at least that much.

For NBD I would look at something more like QEMUIOVector instead of
copying between a Buffer.  It should be possible to concatenate iovec
arrays that need to be transmitted without any interleaving issues.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]