monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Total-#-of-starts counter


From: Martin Pala
Subject: Re: [PATCH] Total-#-of-starts counter
Date: Thu, 03 Feb 2005 21:21:34 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20050105 Debian/1.7.5-1

Michel Marti wrote:
Christian Hopp wrote:

On Wed, 2 Feb 2005, Martin Pala wrote:

I think it is better to separate such functionality from monit. The new m/monit version which we work on has this functionality integrated already. It allows to collect all types of events from all monitored nodes, sort them by various criteria, etc. For illustration is in the attachment screenshot of related m/monit part (just basic screenshot - it has much more functionality). It can also make graphs of runtime service properties, such as space usage, load average, etc.


Remember: I am using monit on an autonomous embedded device that should run without any user intervention. I don't use monit alerts (No SMTP available) and I certainly cannot use m/monit.

I can only agree. IHMO it doesn't give you much of an advantage to do diffent actions based upon a certain number of restarts especially without the context of the timeframe. => -1


It would still give some hints on which services are "unstable" when comparing the restart counter between different services, e.g. if service A has 2 restarts within a given timeframe but service B has 10, this might indicate a problem with service B that should be investigated...


I think that the feature you requested will be provided by planned "Event ratio dependant action rules" (http://www.tildeslash.com/monit/doc/next.php#20). It will work on event level (count the events) and allow to handle various error levels based on timeframe context. This way you can set monit to watch and restart process normally, but in the case that the error frequency is too high (let's say 10 errors per day), it can do some other action. This solution is more general then the patch.

IMO the patch is really not intrusive and might provide some benefit to some monit users... Others might just ignore the counter when looking at monit's status. It really doesn't hurt to have it.

It is true, but the usage is very special. I think for majority of users (including case described by you) will the event ratio triggers provide more flexible solution.

By the way: the service monitor included in "Sun Cluster" (pmfctl) also provides such a counter ;-)

Sun cluster's PMF (process monitor facility) has event counters for the purpose of timeframe context:

--8<--
The process monitor facility provides a means of monitoring processes, and their descendents, and restarting them if they fail to remain alive. The total number of failures allowed can be specified, and limited to a specific time period. After the maximum number of failures has occurred within the specified time period, a message is logged to the console, and the process is no longer restarted.
--8<--

Very similar functionality is provided by monit's service timeout facility: http://www.tildeslash.com/monit/doc/manual.php#service_timeout
This functionality will be extended by event triggers.


Martin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]