On 20. feb. 2010, at 18.57, Dylan Stamat wrote:
> Hello!
>
> I'm using Monit to monitor some processes, and can't seem to get my simple configuration working correctly.
> When my threshold is met, I end up getting sent constant "failed to stop" messages.
>
> Here is the output in my logs:
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> monit[4823]: 'thin8007' total mem amount of 205988kB matches resource limit [total mem amount>163840kB]
> monit[4823]: 'thin8007' trying to restart
> monit[4823]: 'thin8007' stop: /usr/bin/kill
> monit[4823]: 'thin8007' failed to stop
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Here is my configuration:
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> set daemon 20
> set logfile syslog facility log_daemon
> check process thin8007 with pidfile /shared/pids/thin.8007.pid
> start program = "/usr/bin/thin start -C /etc/thin/application.yml --only 8000"
> stop program = "/usr/bin/kill -9 `cat /shared/pids/thin.8007.pid` && rm -f /shared/pids/thin.8007.pid"
> if totalmem > 160.0 MB for 1 cycles then restart
> if cpu > 90% for 1 cycles then restart
> group thin
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> As you can see, the "stop" directive is a bit of a brute force method. Prior to using that, I was using the "stop" command
> of the application (thin) I'm trying to monitor. I ran into a problem when the application wouldn't clean up after itself, and
> it would end up leaving stale pid files around. So, I decided to SIGKILL the process and clean up the pid manually.
>
> If I run the stop command manually, the process is killed and the pid file is gone. However, when it is run through Monit, I
> get the "failed to stop" message. Monit is run as root on this system, but, it still seems like it could be a permissions issue?
> Is there anyway to get more verbose output in regard to why it "failed to stop"? Is there anything that Monit could glean from
> the output of the system calls it makes? I'd be happy to patch if that was a possibility!
>
> Any suggestions would be welcome!
> Thanks!
> ==
> Dylan
>
> --