[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [lwip-devel] long term stability - ping no response after fewdays- H
From: |
Pettinato, Jim |
Subject: |
RE: [lwip-devel] long term stability - ping no response after fewdays- HELP ME |
Date: |
Fri, 7 Nov 2008 09:31:36 -0500 |
Even if you've done stress testing, I still would not
rule out a driver issue - if your corporate LAN traffic is anything
like ours, it's about as random a stress test as one could ever design.
Tolerating a steady high rate of packets generated by another device is not the
same as handling for example a sudden flood of several dozen or more
broadcast packets in a few milliseconds. I found a bug, for example, that
only occurred when I had completely filled my incoming packet queue (128
packets deep!) - and then got one more LAN interrupt between two specific
lines of code in the driver. Since it is rare to even have the queue full, this
was very difficult to reproduce and locate.
Since you are using the sockets API, you are likely in a
different situation than I am... I have no pre-emptive
multitasking OS and use SYS_LIGHTWEIGHT_PROT (i.e. enabling/disabling
interrupts) to assure critical sections run to completion. I likely
would have different error modes than you would.
As for stress testing your applications using the stats,
what I was suggesting was to look at lwip_stats.pbuf.used after exercising
each application and verify that post test (in a no-traffic situation, i.e.
disconnected from the network) the used count has returned to 0 indicating there
were no pool memory leaks.
From:
address@hidden
[mailto:address@hidden On Behalf Of
Piero 74
Sent: Thursday, November 06, 2008 10:02 AM
To:
lwip-devel
Subject: Re: [lwip-devel] long term stability - ping no
response after fewdays- HELP ME
2008/11/6 Pettinato, Jim
<address@hidden>
As
far as I recall, all of the problems reported that were related
to
long-term stability ended up being either driver issues or
application
issues...
i'm using the debugger to understand where is the bug...
i.e.
not handling incoming packet buffer overruns properly,
I had boards connected to business lan, and no request to the application
was done for few days... i'd exclude application problem
not
properly protecting critical code (both resulting in a corrupt pbuf
pool)
i didn't undestand... it could be related to my driver
implementation?
or
having application execution paths which did not free received
pbufs
properly in all cases (memory leaks eventually leading to
pool
depletion).
i'm using BSD socket, so, the high level api... i didn' write any piece of
code related to the pbuf (except the driver)
Since
these issues are often hard to nail down as they take time to
occur, I
would suggest enabling the stats,
yes... i have stats enabled
and
checking that you aren't
slowly leaking pools by exercising each
application and verifying the
pbuf pool count returns to starting
conditions afterward.
Sorry... can you explain me better??? How have i to search?? :O(
If
the instability is a result of broadcast traffic only, I'd suspect a
hole
in the driver...
ok... i'll check again my driver code.... but, in the past, i did some
stress tests (sending a lot of packets very fast to my application, thought
a tcp connection)... do you think that the bug in driver had to appear
before?
If
you do get the error to occur on your debug setup, if your stats show
that
you should have pbufs remaining in the pool but your pbuf_pool
pointer is
NULL (resulting in pbuf_alloc() always failing) - that's a
sure sign of
pool corruption.
OK... i hope that the problem will apper again in my board with debugger,
and i will post here all i'll see.
In this case i suppose i will need help of you.... if you can! :O)
Thanks
Piero
__
James
M. Pettinato, Jr.
Software Engineer
E: address@hidden | P: 814 898
5250
FMC Technologies Measurement Solutions Inc.
1602 Wagner Avenue
| Erie PA | 16510 USA
Phone: 814 898 5000 | Fax: 814 899-3414
www.fmctechnologies.com