|
From: | drich |
Subject: | Re: Monit dependency problem (bug?) |
Date: | Thu, 08 Dec 2011 09:11:03 -0800 |
User-agent: | Roundcube Webmail/0.6 |
Eric,
That's where I started - the problem with that is that it will start ospf every time apache fails to restart. I end up with entries in the log like:
Dec 6 08:47:39 tecate monit[9988]: 'apache' process is not running
Dec 6 08:47:39 tecate monit[9988]: 'apache' trying to restart
Dec 6 08:47:39 tecate monit[9988]: 'ospfd' stop: /etc/init.d/ospfd
Dec 6 08:47:39 tecate monit[9988]: 'apache' start: /etc/init.d/httpd
Dec 6 08:47:40 tecate monit[9988]: 'ospfd' unmonitor on user request
Dec 6 08:47:40 tecate monit[9988]: monit daemon at 9988 awakened
Dec 6 08:48:09 tecate monit[9988]: 'apache' failed to start
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' start: /etc/init.d/ospfd
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor action done
Dec 6 08:48:09 tecate monit[9988]: Awakened by User defined signal 1
The biggest problem is when this happens it leaves ospfd running even if apache isn't. Martin commented that dependencies are "soft", they define the start/stop order but don't wait for the parent to recover before starting the dependent service.
I'm going to take a look at the code today, the problem I'm seeing right now looks like a race condition. My guess is that it when I call "monit stop ospfd" it hasn't yet marked apache as not existing, so the "if does not exist" block is being executed again and again and again.
Here is the config I am working with now:
check process apache with pidfile /var/run/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if does not exist
then exec "/usr/bin/monit stop ospfd"
else if recovered then exec "/usr/bin/monit monitor ospfd"
if failed host localhost port 80 protocol http
and request "/" then restart
if children > 50 then restart
if 2 restarts within 2 cycles then timeout
group server
check process ospfd with pidfile /var/run/quagga/ospfd.pid
start program = "/etc/init.d/ospfd start"
stop program = "/etc/init.d/ospfd stop"
group network
On 08.12.2011 00:10, Eric Pailleau wrote:
Hello, did you simply try this ? ---8 50 then restart if 2 restarts within 2 cycles then timeout group server depends on tomcat check process ospfd with pidfile /var/run/quagga/ospfd.pid start program = "/etc/init.d/ospfd start" stop program = "/etc/init.d/ospfd stop" depends on apache depends on fcserver depends on mysql depends on tomcat group network ---8Taking out the depends doesn't make a difference, it still stays in that loop where it is spewing to the logs. I'm off-site today, I'll look at this more tomorrow morning when I can pay attention to it rather than to the lecture I'm supposed to be listening to. :-) On 07.12.2011 13:13, Martin Pala wrote:Yes, it Eric is correct. The "monit stop…" in the exec action cannot be combined in this case with the "depends on…"
[Prev in Thread] | Current Thread | [Next in Thread] |