lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] TCP Checksum = 0xFFFF


From: Valery Ushakov
Subject: Re: [lwip-users] TCP Checksum = 0xFFFF
Date: Wed, 14 May 2014 14:22:11 +0000 (UTC)
User-agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (NetBSD/6.1_RC1 (macppc))

Niall Donovan <address@hidden> wrote:

> If you saw my follow up email, you will notice that I identified the code
> that is causing my problem. It is caused by the line of code at line 1146
> in tcp_out.c (Note I have LWIP_CHECKSUM_ON_COPY = 1)
> "acc += (u16_t)~(seg->chksum);"
> 
> acc is a one's compliment checksum obtained from a call to
> inet_chksum_pseudo_partial()
> and seg->chksum is a checksum of the payload.
> What is happening is that occasionally during operation acc is resulting in
> a value of M and seg->chksum has, by coincidence, a value of M. Then M +
> (~M) always gives 0xFFFF.
> 
[...] 
> Mathematically speaking using ones compliment maths, ~(sum(a+b+c+d)) is not
> the same as [(~sum(a+b)) + (~sum(c+d))] for the special corner case where
> sum(a+b) = ~sum(c+d). In this special case the answer will be 0xFFFF
> instead of 0x0000. Which is what is happening in my case!

Your analysis is correct.


As for the likelihood of stumbling into just the right numbers I
recently had an interesting problem.  I have a diskless machine that
has been running off NFS.  For ages.  After NFS server upgrade the
diskless machine would reliably wedge quite early in the boot process.

Eventually I tracked it down to a hardware checksum bug in the
ethernet of the diskless machine.  In UDP the checksum value 0 means
no checksum and if the datagram data actually has checksum 0, it's
replaced with 0xffff - which gives the same result if verification is
done properly (computing the sum with checksum filed included).
Apparently hardware checksum used recomute and compare method instead,
so it flagged such valid UDP datagrams as having a bad checksum.

The hardware bug has always been there and I've never seen it.  Then
when NFS server change changed the numerology of the NFS handles just
right, the bug was triggered reliably by some particular NFS response
datagram.

So don't underestiamte luck as a factor in system stability :)

-uwe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]