monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Problem with monit's "not monitoring" status


From: Martin Pala
Subject: Re: [monit] Problem with monit's "not monitoring" status
Date: Thu, 25 Feb 2010 00:56:14 +0100

Thanks for data.

It seems to me that the problem could be that service start was requested 
before the service managed to stop and the start flag was reset after stop => 
the service stayed in unmonitored mode (as result of stop). To confirm this 
there should be however additional log about stop program result (depending on 
result either "stopped" or "failed to stop"):

1.) are you able to reproduce the issue on will?

2.) please upgrade to monit-5.1.1 ... there is following fix which could play 
role as it seems that the pending stop was woke up by start
--8<--
 * Fixed #27784: wait_start/wait_stop can advance too quickly.
  Thanks to Randy Puro for report.
--8<--

(you can get monit-5.1.1 here: 
http://www.mmonit.com/monit/dist/monit-5.1.1.tar.gz)


... i'll try to replicate the problem in parallel

Best regards,
Martin


On Feb 24, 2010, at 4:20 PM, David Bristow wrote:

> Here is a copy of the configuration for backgroundrb:
> 
> check process backgroundrb with pidfile
> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid
>  group backgroundrb
>  start program = "/usr/local/bin/backgroundrb_wrapper start qa
> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid" with
> timeout 40 seconds
>  stop program = "/usr/local/bin/backgroundrb_wrapper stop qa
> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid" with
> timeout 60 seconds
>  if memory > 240 Mb then restart
> 
> There are no more interesting things in the logs at around this time.
> Nothing related to backgroundrb, at least.
> 
> On Mon, Feb 22, 2010 at 4:56 PM, Martin Pala <address@hidden> wrote:
>> Hi David,
>> 
>> the service is unmonitored on stop ... the service start enables monitoring 
>> again, so it's not expected to see unmonitored service after start.
>> 
>> It seems to me that your 'backgroundrb' service has no "start program = ..." 
>> in your monit config file. If the "start program" would be defined, it 
>> should log similar message to "'backgroundrb' stop: 
>> /usr/local/bin/backgroundrb_wrapper", but with "start" word instead of 
>> "stop". The message is missing in the log so it was logged either past 
>> 11:45:32 (which is likely of start is defined) or start program is not 
>> defined and thus service was not started - check maybe timed out (don't know 
>> your configuration so i cannot say) ... or maybe somebody stopped it again.
>> 
>> Please can you provide full monit configuration for 'backgroundrb' service 
>> and rest of debug log between 11:44:48 and 12:08:33?
>> 
>> Are you able to reproduce the issue on will? I tried to replicate the 
>> problem but it works fine for me.
>> 
>> Best regards,
>> Martin
>> 
>> 
>> 
>> On Feb 22, 2010, at 3:02 PM, David Bristow wrote:
>> 
>>> We are having trouble with certain services managed by monit that do
>>> not restart as they should after being shut down and then started up
>>> again.
>>> 
>>> For example, we use backgroundrb.  Someone shut it down for updating,
>>> and started it up afterwards.  Here is a sample section of the
>>> monit.log  that shows what was happening at the time:
>>> 
>>> [EST Feb 19 11:44:48] debug    : stop service 'backgroundrb' on user request
>>> [EST Feb 19 11:44:48] info     : monit daemon at 19023 awakened
>>> [EST Feb 19 11:45:10] error    : 'syslog-ng' failed to start
>>> [EST Feb 19 11:45:10] info     : 'backgroundrb' stop:
>>> /usr/local/bin/backgroundrb_wrapper
>>> [EST Feb 19 11:45:19] debug    : start service 'backgroundrb' on user 
>>> request
>>> [EST Feb 19 11:45:19] info     : monit daemon at 19023 awakened
>>> [EST Feb 19 11:45:31] info     : 'backgroundrb' start action done
>>> [EST Feb 19 11:45:32] info     : Awakened by User defined signal 1
>>> 
>>> And at 12:09AM, this is the "monit status" for backgroundrb:
>>> 
>>> Process 'backgroundrb'
>>>  status                            not monitored
>>>  monitoring status                 not monitored
>>>  data collected                    Fri Feb 19 12:08:33 2010
>>> 
>>> Why does this happen?  We are using monit 5.0.3.
>>> 
>>> --
>>> David Bristow <address@hidden>
>>> 
>>> 
>>> --
>>> To unsubscribe:
>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>> 
>> 
>> 
>> --
>> To unsubscribe:
>> http://lists.nongnu.org/mailman/listinfo/monit-general
>> 
> 
> 
> 
> -- 
> David Bristow <address@hidden>
> 
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]