|
From: | Anton Ivanov |
Subject: | Re: [Qemu-devel] [PATCH v6] net: L2TPv3 transport |
Date: | Mon, 31 Mar 2014 16:48:35 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131103 Icedove/17.0.10 |
if (iovcnt > MAX_L2TPV3_IOVCNT - 1) { size_t size = iov_size(iov, iovcnt); uint8_t *buf; buf = g_malloc(size); iov_to_buf(iov, iovcnt, 0, buf, size); ret = net_l2tpv3_receive_dgram(nc, buf, size); g_free(buf); return ret; }iov_to_buf is a memcpy of the data and a malloc down the call chain. That is quite a significant cost compared to just shifting the iov a bit to add an element for header. I tried that early on and it was introducing a noticeable penalty. If I am to merge it I would actually merge it the other way around - express both net_l2tpv3_receive_dgram and net_l2tpv3_receive_dgram_iov via an iov based backend. That will allow to add sendmmsg one day as the data structures will be the same in all cases.I agree with you that the iov code path should be used. The context for my comment was "if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {". I'm just suggesting that we fall back to linearizing the iovector if it has too many elements, instead of erroring out.
I will send this one as incremental in addition to a proposal to improve iov buffer lineariziation.
I have some ideas how to avoid copying there for the cases where you get a chunk of data in only one of the iov_base pointers.
A.
+ /* deliver directly to peer bypassing queueing and + * buffering + */ + size = qemu_deliver_packet( + &s->nc, + QEMU_NET_PACKET_FLAG_NONE, + vec->iov_base, + data_size, + s->nc.peer + );There is no callback when the peer re-enables receive. In other words, packets will be stuck in s->msgvec until a new packet arrives and net_l2tpv3_send() is invoked. I think you need to modify qemu_flush_queued_packets() to invoke a new callback. You could convert the existing code:I had the same thoughts, just did not want to touch it just yet. Something along the lines would have been necessary for sendmmsg too as there you would want to work off a pre-configured queue in the first place.
I tried it adding a quick and ugly if similar to the one used by HUB, this gives ~ 5% improvement on a benchmark, more when the machine is under "other" load (so it is easier to trigger the buffering).
However it is better if it is introduced across the board. There are other drivers which (if I understand them correctly) should not invoke the legacy queue - f.e. netmap. I will submit this separately.
A.
[Prev in Thread] | Current Thread | [Next in Thread] |