monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[monit-dev] Monit incorrect restart behaviour


From: Valentin Avram
Subject: [monit-dev] Monit incorrect restart behaviour
Date: Tue, 28 Dec 2010 14:59:54 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101213 Lightning/1.0b3pre Thunderbird/3.1.7

Hello.

I have encountered an incorrect behaviour of monit while using the restart command for a monitored service.

The problem is that after executing the stop command for the service, monit does not wait for the stop command to complete before running the start command. Instead, it monitors the pidfile and as soon as it disappeared, monit calls the start command. This is a wrong behaviour since the stop command might do something else beside stopping the service.

Timeframe of events happening:

- User issues the restart command for service X
- monit receives the restart command
- monit calls the stop script
- (stop script running, pidfile exists) monit verifies the pidfile
- (stop script running, pidfile is gone) monit sees the pidfile is gone
- (stop script running) monit calls the start script
- (stop script running) the start scripts detects the stop script is running and refuses to comply its command - (stop script running) monit keeps waiting for the pidfile (according to "with timeout" setting)
- stop script finishes
- monit keeps waiting for the pidfile
- timeout is reached, monit gives up on start script
- time passes according to daemon check interval
- monit detects it's time to check for the service
- the service is down, monit calls the start script
- start script is running, pidfile is created
- monit is happy

The server is running gentoo, I have tried with monit 5.1.1 (marked as stable) as well as monit 5.2.2 (marked as testing).

The correct behaviour, in my opinion, is that monit should:
- either wait for the stop command to finish and check if it has returned code 0 (all ok), and only after this run the start script - REASON: maybe the stop command does something else besides stopping the process and it's unsafe to presume the process should be started regardless of the success of the stop command. - or monit should accept a keyword in the monitrc config file that forces monit to wait and check the return code for the stop command (in case it's not specified, default behaviour should be assumed).

Example of configuration:

- /etc/monitrc:
check process X
        with pidfile "/var/run/X/X_eth1.pid"
        start program = "/etc/init.d/X start" with timeout 60 seconds
        stop program = "/etc/init.d/X stop" with timeout 60 seconds
        if 4 restarts within 5 cycles then timeout
        if totalmem > 250 Mb then alert
        if children > 255 for 5 cycles then stop
        if cpu usage > 95% for 3 cycles then restart
        group X

- monit log in verbose mode:
2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: restart service 'X' on user request 2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: monit daemon at 13661 awakened 2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: Awakened by User defined signal 1 2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: monit: Cannot open proc file /proc/15007/stat -- No such file or directory 2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: system statistic error -- cannot read /proc/15007/stat 2010-12-28T14:08:49.936555+02:00 myserver monit[13661]: 'X' trying to restart 2010-12-28T14:08:49.936555+02:00 myserver monit[13661]: Monitoring disabled -- service X 2010-12-28T14:08:49.936555+02:00 myserver monit[13661]: 'X' stop: /etc/init.d/X 2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist 2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist 2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist 2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: 'X' start: /etc/init.d/X 2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist
[skip repeated lines]
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: 'X' failed to start
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: ------------------------------------------------------------------------------- 2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: /usr/bin/monit [0x8056216] 2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: ------------------------------------------------------------------------------- 2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: Execution failed notification is sent to [EDITED_EMAIL_ADDRESS] 2010-12-28T14:09:50.586555+02:00 myserver monit[13661]: Monitoring enabled -- service X 2010-12-28T14:09:50.586555+02:00 myserver monit[13661]: 'X' restart action done 2010-12-28T14:09:50.586555+02:00 myserver monit[13661]: Action done notification is sent to [EDITED_EMAIL_ADDRESS] 2010-12-28T14:09:50.626555+02:00 myserver monit[13661]: 'X' check skipped -- service already handled in a dependency chain 2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist 2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: 'X' process is not running 2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: ------------------------------------------------------------------------------- 2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: /usr/bin/monit [0x8056216] 2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: ------------------------------------------------------------------------------- 2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: Does not exist notification is sent to [EDITED_EMAIL_ADDRESS] 2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: 'X' trying to restart 2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: Monitoring disabled -- service X 2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist 2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist 2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: 'X' start: /etc/init.d/X 2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist
[skip repeated lines]
2010-12-28T14:11:04.816555+02:00 myserver monit[13661]: monit: pidfile '/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:11:05.826555+02:00 myserver monit[13661]: 'X' started
2010-12-28T14:11:05.826555+02:00 myserver monit[13661]: Execution succeeded notification is sent to [EDITED_EMAIL_ADDRESS] 2010-12-28T14:11:05.866555+02:00 myserver monit[13661]: Monitoring enabled -- service X 2010-12-28T14:12:05.876555+02:00 myserver monit[13661]: 'X' process is running with pid 15609 2010-12-28T14:12:05.876555+02:00 myserver monit[13661]: Exists notification is sent [EDITED_EMAIL_ADDRESS]

Thank you for your time.

Valentin Avram




reply via email to

[Prev in Thread] Current Thread [Next in Thread]