[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Jamming up with mutex_lock
From: |
Joe Maimon |
Subject: |
Re: Jamming up with mutex_lock |
Date: |
Tue, 19 Jun 2007 11:50:59 -0400 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.11) Gecko/20050728 MultiZilla/1.7.0.1j |
Andrew Daviel wrote:
I have been running a modified version of spamass-milter-0.3.1
(match_gecos, per-user rejection threshold). It worked fine in testing,
but in production it jams up after a day or so. The milter continues to
run, but sendmail cannot connect to it, logging
"error connecting to filter". Sometimes there a few messages
"Milter (spamassassin): to error state"
"milter_read(spamassassin): cmd read returned 0"
This means that the read call timed out.
earlier, though the milter continues to operate for a while - maybe a
couple of hours.
The other threads continue to operate.
You then probably run into a ulimit condition.
When I look at the processes, I see two or more copies of spamass-milter
in sleep (S) state as well as the parent in sleep (Ss1) state.
Are you displaying all the threads?
If those are all the threads, then apparently the deadlock extends to
the engine thread so that no new connections can be accepted.
If I connect to one of the processes with gdb and do a backtrace, I
typically see something like
in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
in _L_mutex_lock_29 () from /lib/tls/libc.so.6
in strdup () from /lib/tls/libc.so.6
in SpamAssassin::Connect (this=0x8bb01f8) at spamass-milter.cpp:1506
in mlfi_header ... at spamass-milter.cpp:1148
from which I assume that two threads have got in a deadlocked state.
Sometimes I see "debug" instead of "strdup".
I have tried replacing localtime() and strerror(), which are not
threadsafe on Linux, with localtime_r and strerror_r(), but
that does not help.
Elsewhere on the Web I see a comment that mutex lock may be caused by
calling malloc or printf inside a signal handler. I don't think
spamass-milter is a signal handler, though strdup and vsyslog would call
malloc and printf, so it's a not-impossible explanation.
I had earlier seen mutex_lock called from strlwr, but have now replaced
the complex tolower() call with a much simpler 7-bit ASCII routine.
If you suspect the milter calls unsafe functions, surround them with
mutex's.
Carefully.
The somewhat similar smf-clamd milter runs OK with no problem (similar
in that it uses the same libraries and also passes mail to a daemon
for processing).
RHEL 4.3
sendmail-8.13.1-3.2.el4.i386
glibc-2.3.4-2.25.i686
kernel 2.6.9-34.0.1.ELsmp
Try running this on a recent Debian instead.
(I doubt that my changes are directly responsible, bacause I've been
playing with them without affecting the lock-up. Trying the stock milter
on the production machine is an issue because the users expect their
whitelists to work based on match_gecos - address@hidden
-> user "juser")
Perhaps you could show the patch?