[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: slow to take action
From: |
Jan-Henrik Haukeland |
Subject: |
Re: slow to take action |
Date: |
Sat, 18 Jun 2011 00:16:08 +0200 |
On Jun 17, 2011, at 8:07 PM, Nick Upson wrote:
>
> Hi,
>
> I have a monit configuration where it is monitoring 25 hosts (ping
> test) and several local processes.
>
> doing anything with monit except a summary takes a long time. It seems
> that the tests are each done sequentially
>
> a) this means that there is the possibility of one set of tests not
> being complete when the next is due to start as the number of hosts increases
>
> b) restarting a local process takes too long
>
> Is there any way I can adjust the configuration to improve the situation?
a) Monit run all test in a single thread and serial. This means that the list
of tests is run from start to finish. If some tests take a long time to
complete it just means that Monit will take longer to run through the list of
tests. What is important is that each and every test is run and Monit will do
that. What is (usually) less important is if a test run a bit later depending
on how long previous tests take.
b) Monit forks a new process and this operation take just milliseconds, but
Monit will wait, if I remember correct, up to one poll cycle to see if the
process comes up. If your program is slow to start (from Monit's POV that is,
create the pidfile) then this will delay all the tests since, as mentioned,
testing is single threaded. So yeah, this model may be improved [1]. But there
are a few things you can do now, for instance make sure that the program write
its pidfile as soon as possible. If you cannot modify the program, create a
wrapper script that write the pidfile first and then do an exec on the program.
You may also fiddle with connection timeout in the configuration, but if set
too low you may risk false positive alerts which is probably worse.
1. We are about to release a new version of Monit in a short while which
implement a new 'check program' which is meant to be used to check the exit
status of a script or program. This implementation uses another model which
does not delay other tests and we may use this also when checking processes.
Best regards
--
Jan-Henrik Haukeland
http://tildeslash.com/
☏: +47 97141255