[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: EXEC does not work properly when using CYCLES
From: |
Martin Pala |
Subject: |
Re: EXEC does not work properly when using CYCLES |
Date: |
Thu, 29 Sep 2005 09:59:01 +0200 |
User-agent: |
Mozilla Thunderbird 1.0.6 (Windows/20050716) |
This is (documented) feature - monit retries the exec and restart
actions every failed cycle.
For restart action it is possible to use the following statement to
restrict the attempts count:
'if X restarts within Y cycles then timeout'
(there's no similar alternative for exec action currently)
Snippet from monit manual:
--8<--
Constant object tests are related to failed/passed state. In the
case of error, monit will watch whether the failed parameter will
recover - in such case it will handle recovery related
action. General format:
IF <TEST> [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION
[ELSE IF PASSED [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION]
For constant object tests if the <TEST> should validate to true,
then the selected action is executed each cycle the condition
remains true. The value for comparision is constant. Recovery
action is evalueated only once (on failed->recovered state change
only). The 'ELSE IF PASSED' part is optional - if omitted,
monit will do alert action on recovery by default. The alert is
delivered only once on each state change unless overriden by
'reminder' alert option.
--8<--
It could be probably good to allow to set the action frequency in failed
state as well.
Solutions topic (?):
1.) restrict the action by some option, for example 'retry'. Examples:
... then exec '/foo'
(=> retry each failed cycle - current monit's default)
... then exec '/foo' retry 3 times
(=> retry for 3 consecutive cycles and then giveup)
... then exec '/foo' retry each 3 cycles
(=> retry each 3rd cycle)
2.) add the environment variable such as MONIT_EVENT_COUNT which will
describe how many times is the service in the given failed state. The
executed script can then use this variable to modify its behavior.
3.) add the 'if X execs within Y cycles then timeout' statement
Martin
Andreas Oesterer wrote:
I recently go the monit-4.6-beta1 and a cycle specific bug fix that
updated event.c and util.c
Everything works as expected when using ALERT, however EXEC causes the
scripts to be executed too often.
My setup:
--------------
set daemon 30
check host monit_test with address 192.168.0.3 <http://192.168.0.3>
if failed port 8081 protocol http and request "/index.jsp" with
timeout 5 seconds for 3 times within 3 cycles then EXEC
"/root/monit_test/suspend"
else if passed for 3 cycles then EXEC "/root/monit_test/resume"
Test execution
---------------------
1) Begin where the service is running
2) Stopped service at 14:09:31
3) Restarted service at 14:14:54
My test output:
----------------------
Wed Sep 28 14:09:52 PDT 2005 Resuming *
Wed Sep 28 14:10:22 PDT 2005 Resuming *
Wed Sep 28 14:10:52 PDT 2005 Suspeding -> "failed" Email is sent
Wed Sep 28 14:11:22 PDT 2005 Suspeding *
Wed Sep 28 14:11:52 PDT 2005 Suspeding *
Wed Sep 28 14:12:22 PDT 2005 Suspeding *
Wed Sep 28 14:12:52 PDT 2005 Suspeding *
Wed Sep 28 14:13:22 PDT 2005 Suspeding *
Wed Sep 28 14:13:52 PDT 2005 Suspeding *
Wed Sep 28 14:14:22 PDT 2005 Suspeding *
Wed Sep 28 14:14:57 PDT 2005 Suspeding *
Wed Sep 28 14:15:27 PDT 2005 Suspeding *
Wed Sep 28 14:15:57 PDT 2005 Resuming -> "passed" Email is sent
I marked the log entries where the script should not have been called
with a "*". While it starts to detect that the service is down, it
executes the resume script and as long as the service is down it calls
the suspend script at every cycle. There is no issue when the service is
up at every cycle.
Essentualy if the EXEC code would be called when it sends the emai, then
everything would work fine.
Thanks, Andreas
------------------------------------------------------------------------
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general