[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Status of CVS for 0.3.0 branch?
From: |
Chris Crowley |
Subject: |
Re: Status of CVS for 0.3.0 branch? |
Date: |
Thu, 09 Feb 2006 15:29:04 -0500 |
User-agent: |
Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) |
Dan -
Thanks for the e-mail.
I'll work with these suggestions:
>If you can get a stack trace out of the process (gdb it and run "thread
>apply all bt), that would help narrow down what's hanging. Also try
>upgrading to a less-buggy glibc, or set the environment variable
>LD_ASSUME_KERNEL=2.4.1. Any time I see a process hang on a futex,
>setting that has fixed it (it disables futexes entirely).
>
>
and report what I find. First, I'll run the stack trace on the process
as it is. I don't use gdb frequently, so if you have recommendations
that are more specifc that what you've already provided, please e-mail
me again. I'll try the env fix, then the glibc update if I can ( I'll
have to check some dependencies ). It probably won't be until next week.
Chris Crowley
Dan Nelson wrote:
>In the last episode (Feb 09), Chris Crowley said:
>
>
>>My question, "Is CVS for the 0.3.0 branch improved from the distro,
>>and stable for production use?" If not, I'll drill down into the
>>problems with the 0.3.0 tar file that I've got, otherwise, I'll
>>install the CVS version and see if the problems persist.
>>
>>
>
>Only minor changes have been made since 0.3.0; none should affect
>stability one way or the other. My milters never seem to crash or
>hang, but I only process 1 message every 5-10 seconds. Each milter
>thread is independant, so (barring OS bugs) hangs/crashes due to race
>conditions should not be possible.
>
>
>
>>...details...
>>I've been running 0.2.0, and plan to upgrade soon. I've build 0.3.0,
>>and have noticed in some high load testing that it fails differently
>>than the 0.2.0 spamass-milter. By failure I mean that I see error
>>messages in the log. For example:
>><log>
>>Milter (spamassassin): local socket name /var/run/sendmail/spamass.sock unsafe
>>sendmail[10000]: ###ID: Milter (spamassassin): to error state
>>spamass-milter[13360]: SpamAssassin, mi_rd_cmd: read returned -1: Connection
>>reset by peer
>>spamass-milter[19980]: SpamAssassin: thread_create() failed: 12, try again
>></log>
>>
>>and a strace on the process shows that it is "hung":
>><strace>
>>strace -p 13360
>>Process 13360 attached - interrupt to quit
>>futex(0xc9e20c, FUTEX_WAIT, 2, NULL <unfinished ...>
>></strace>
>>
>>
>
>If you can get a stack trace out of the process (gdb it and run "thread
>apply all bt), that would help narrow down what's hanging. Also try
>upgrading to a less-buggy glibc, or set the environment variable
>LD_ASSUME_KERNEL=2.4.1. Any time I see a process hang on a futex,
>setting that has fixed it (it disables futexes entirely).
>
>
>
>>From the logs, and a quick non-scientific assessment, I don't think
>>that 0.3.0 is failing any less frequently that 0.2.0 was. It's just
>>that the 0.3.0 process actually persists after it fails, so my
>>restart script (which looks if the socket exists) doesn't work to
>>repair things.
>>
>>Thanks for any insight you can provide. Of course, I'm able to
>>provide more details if they would be beneficial.
>>
>>
>
>
>
--
Christopher Crowley
Network Administrator
Tulane Technology Services
address@hidden
phone: (504) 324-2249