[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers] Re: cron problem on fencepost
From: |
Joel N. Weber II |
Subject: |
[Savannah-hackers] Re: cron problem on fencepost |
Date: |
Thu, 18 Oct 2001 13:54:16 -0400 |
I had a report on savannah about a list that had not yet been created in
two days. I've seen that /usr/local/bin/mailing_lists_create.pl had not
been executed these last two days. It should have been launched by
/com/sys/cron/hourly. Therefore I suspect there is a cron problem since
launching this file by hand reported no problems...
Yep. The ssh process that pushes the aliases file to delysid had hung
two days ago; I just killed it off, but I haven't fixed the real
problem, so it will recur sooner or later.
The failure mode had been that fencepost reported an ESTABLISHED
connection in netstat -an output, yet delysid didn't have any such
connection. Apparently, -o KeepAlive=yes doesn't improve the
situation either, which puzzles me a bit.
in /proc/sys/net/ipv4, fencepost has tcp_keepalive_time 7200 and
tcp_keepalive_probes 9; I was guessing that that meant that the ssh
process would be able to die in 18 hours, but apparently that was not
the case.
This started becoming a problem in the last few weeks because I added
locking so that if one hourly run starts when another is still
running, it will just complain rather than actually running. It had
previously been a problem in that ssh client processes would pile up
on fencepost.