To be honest I'm not entirely sure what scenario you are sketching there, or
what you are comparing it to? To make sure we're talking about the same
thing, allow me to elaborate what I have in mind and then please feel free
to point out flaws in my logic :)
Take the scenario of a 4-byte chunk and then a 256-byte chunk of data being
sent by an application:
application:
- write(fd, "...", 4) // chunk #1
my side:
- allocate a 512-byte RAM pbuf: my_pbuf
- remote-copy-in requested 4 bytes to &my_pbuf->payload[0]
- call tcp_write(&my_pbuf->payload[0], 4, noflags)
lwip side, in tcp_write():
- allocate a REF pbuf
- point REF pbuf to (&my_pbuf->payload[0], 4)
- enqueue REF pbuf as part of unsent segment
application:
- write(fd, "...", 256) // chunk #2
my side:
- remote-copy-in requested 256 bytes to &my_pbuf->payload[4]
- call tcp_write(&my_pbuf->payload[4], 256, noflags)
lwip side, in tcp_write(ptr, len):
- allocate a REF pbuf
- point REF pbuf to (&my_pbuf->payload[4], 256)
- enqueue REF pbuf as part of unsent segment
At this point there are two reference pbufs attached to the same unsent
segment, and those two reference pbufs will end up at the NIC as well. Now,
with the kind of merging I am proposing, sending the second chunk of data
would proceed like this:
application:
- write(fd, "...", 256) // chunk #2
my side:
- remote-copy-in requested 256 bytes to &my_pbuf->payload[4]
- call tcp_write(&my_pbuf->payload[4], 256, noflags)
lwip side, in tcp_write(ptr, len):
- extend the last REF pbuf on the last unsent segment from
(&my_pbuf->payload[0], 4) to (&my_pbuf->payload[0], 260)
And ultimately only that one reference pbuf will end up at the NIC.
If I were to use WRITE_COPY, I would first have to do a remote-copy-in of
the data to.. somewhere temporary, after which I would hand the data to
tcp_write(), which would allocate an oversized RAM pbuf once and fill that
with the two chunks of data. Without WRITE_COPY, I have thus saved copying
260 bytes at the cost of having lwIP allocate a single REF pbuf. Given that
my expectation is that most writes are actually large, this would generally
be a tradeoff between copying chunks of 512 bytes of data and allocating REF
pbufs. I'd expect/hope the latter is faster..
David
On 9/30/2016 12:54, Dirk Ziegelmeier wrote:
I still don't understand why you do that. You need to get a pbuf_ref
from somewhere, initialize it, append it to the pbuf chain, in case of
sending iterate through that chain to create a consecutive buffer for
the MAC (=copy the data), dechain the pbuf and put it back into some
pbuf pool.
This is more effective/faster than just copying around a few bytes and
increasing the pbuf->len/tot_len?
Dirk
On Fri, Sep 30, 2016 at 12:45 PM, David van Moolenbroek
<address@hidden <mailto:address@hidden>> wrote:
Follow-up Comment #6, bug #49218 (project lwip):
> And that was my point: I don't expect you have a scatter-gather-MAC
that
supports TX frames with that many chunks. Being like that, you
probably end up
copying all bytes per CPU anyway. Thus, I don't see the advantage of
doing it
with ref pbufs. On the contrary, it's probably less performant.
That is correct, and that is the other side of why I think the ideal
solution
is to merge multiple consecutive buffer-local references. In that
case we do
eliminate an extra memory copy for all data in such circumstances,
while
keeping the number of chunks very low. As such I'll most probably be
adding
support for that locally in any case. If you think that is (in
principle)
something that is worthwhile might also be merged into lwIP itself,
I'd be
happy to make a proper patch out of it and submit it..
_______________________________________________________
Reply to this item at:
<http://savannah.nongnu.org/bugs/?49218
<http://savannah.nongnu.org/bugs/?49218>>
_______________________________________________
Message sent via/by Savannah
http://savannah.nongnu.org/
_______________________________________________
lwip-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-devel