lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [lwip-devel] Re: [task #7040] Work on tcp_enqueue


From: bill
Subject: RE: [lwip-devel] Re: [task #7040] Work on tcp_enqueue
Date: Mon, 2 Feb 2009 13:52:26 -0500

>I don't think I fully understand you: we didn't have a MAC that supports
>HW-checksum, we had a DMA-less MAC with a custom DMA engine writing the
>packets to/from the MAC (of course all handled by the MAC driver). The
>goal was to use a custom DMA-copy HW that supports checksum calculation
>while copying, but we only had a very small FPGA, so it didn't fit.

Yes, I mean your goal (DMA from p->payload to the MAC generating a checksum
and plugging in the checksum when needed).  I guess this is the definitions
of TCP checksum offloading.  It's not this simple as you have to know the
packet type, where and how much to DMA/checksum.  Then zero-copy *is*
zero-copy and no-read except the requisite copy to the MAC.

> Of course the fastest solution is HW-checksum, but the goal is for lwIP
> to be as fast as possible without that (while still being as small as
> possible).

Ok.  What wins?  A little extra code for more speed, or a remove some code
and loss a little speed?  As you know, you don't usually get both.

> I think trying to create full packets complies with the LW goal (given
> RAM size and execution overhead, not code size), since we need less
> segments. The savings of one-pbuf-per-segment are even bigger since we
> save a lot of pbuf structs.

>From what I saw, this adds more code and in my tests, degraded performance.
Even though slight, there is a loss on both accounts.  If it's going to be
added for these minimal systems, then there needs to be an option for it
because the minimal system might have the memory to offset the added time
required to append and split the pbufs.  I know it adds time, because I
benchmarked full pbufs doing it in tcp_sent versus doing it using Stoklund's
patch.  And we know from the +'s to -'s that more code was added than
removed.

>We do that for small writes, so why not do it for big writes? I'd favour
>to get rid of the long pbuf chains altogether since they waste RAM (the
>pbuf structs) and are inefficient (when calculating checksum).

Isn't it done for small writes by the Nagle option?

Wouldn't it be transparent if tcp_sndbuf returned a multiple of MSS when
there is more than MSS in the snd_buf?  Then, the standard code in tcp_sent
would send a multiple of MSS or the remainder when not.  In fact, if
TCP_SND_WND is a multiple of TCP_MSS, isn't the problem solved then as well?

I'm trying not to be a pain, but I sense changes are being proposed which
will hurt bandwidth on some systems for sending data when there is a simple
solution at the application level or maybe even in the lwIP settings to
resolve it.  Recently there was an lwIP user really fighting a transmit
bandwidth problem and I don't know if they abandoned lwIP, or their project,
or what.

Bill






reply via email to

[Prev in Thread] Current Thread [Next in Thread]