Monit triggering restart storm

Hi,

I have a bunch of Monit rules to perform check on a service

One check process rule (existence and port checks)

does not exist for 5 cycles then start
failed port XXXX for 6 times within 8 cycles then restart
failed port YYYY for 6 times within 8 cycles then restart
failed port ZZZZ for 6 times within 8 cycles then restart

Three check program rules with custom checks

if status != 0 for 5 times within 10 cycles then restart
if status != 0 for 5 times within 10 cycles then restart
if status != 0 for 5 times within 10 cycles then restart

One to check log content

check file + if content = "BIG ERROR" then restart

start/stop rules are

start program = "/bin/systemctl start myservice"

stop program = "/bin/systemctl stop myservice"

There are no dependency at Monit level but checks are part of the same bunch of groups.

Problem, is that due to multiple issues, I got a "restart" storm as

some port check failed -> restart issued
lead to error at custom script -> restart issued
content log reading has some lags -> restart issued

Myservice or system.d configuration/feature are not well designed so got "already bind exception" as system.d tried to start several instance at the same time🤔

So port check failed again, system.d killed the wrong one, MyService was blocked, restart again. etc.....

I had to shutdown Monit to prevent further action (I could have monit -g group unmonitor also), kill every instance of my service, start it correctly, then reactivate Monit

Question:

Is there a native way to prevent Monit to issue the same start/stop commands in a defined time-frame ?
Does Monit dependency feature between checks could help as I don't see how it could help ?
Any other hint/proposal (aside increasing the values of "for N times within T cycles" to delay the restart)

Remark: maybe exploring system.D features StartLimitIntervalSe & StartLimitBurst could help.

Best Regards.

From:	Guillaume François
Subject:	Monit triggering restart storm
Date:	Thu, 9 Nov 2017 12:07:35 +0100