Re: [PATCH] Total-#-of-starts counter

monit-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Total-#-of-starts counter

From:	Martin Pala
Subject:	Re: [PATCH] Total-#-of-starts counter
Date:	Thu, 03 Feb 2005 21:21:34 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20050105 Debian/1.7.5-1

Michel Marti wrote:

Christian Hopp wrote:
On Wed, 2 Feb 2005, Martin Pala wrote:
I think it is better to separate such functionality from monit. Thenew m/monit version which we work on has this functionalityintegrated already. It allows to collect all types of events from allmonitored nodes, sort them by various criteria, etc. For illustrationis in the attachment screenshot of related m/monit part (just basicscreenshot - it has much more functionality). It can also make graphsof runtime service properties, such as space usage, load average, etc.
Remember: I am using monit on an autonomous embedded device that shouldrun without any user intervention. I don't use monit alerts (No SMTPavailable) and I certainly cannot use m/monit.
I can only agree. IHMO it doesn't give you much of an advantage to dodiffent actions based upon a certain number of restarts especiallywithout the context of the timeframe. => -1
It would still give some hints on which services are "unstable" whencomparing the restart counter between different services, e.g. ifservice A has 2 restarts within a given timeframe but service B has 10,this might indicate a problem with service B that should beinvestigated...

I think that the feature you requested will be provided by planned"Event ratio dependant action rules"(http://www.tildeslash.com/monit/doc/next.php#20). It will work on eventlevel (count the events) and allow to handle various error levels basedon timeframe context. This way you can set monit to watch and restartprocess normally, but in the case that the error frequency is too high(let's say 10 errors per day), it can do some other action. Thissolution is more general then the patch.

IMO the patch is really not intrusive and might providesome benefit to some monit users... Others might just ignore the counterwhen looking at monit's status. It really doesn't hurt to have it.

It is true, but the usage is very special. I think for majority of users(including case described by you) will the event ratio triggers providemore flexible solution.

By the way: the service monitor included in "Sun Cluster" (pmfctl) alsoprovides such a counter ;-)

Sun cluster's PMF (process monitor facility) has event counters for thepurpose of timeframe context:


--8<--

The process monitor facility provides a means of monitoring processes,and their descendents, and restarting them if they fail to remain alive.The total number of failures allowed can be specified, and limited to aspecific time period. After the maximum number of failures has occurredwithin the specified time period, a message is logged to the console,and the process is no longer restarted.

--8<--

Very similar functionality is provided by monit's service timeoutfacility: http://www.tildeslash.com/monit/doc/manual.php#service_timeout

This functionality will be extended by event triggers.


Martin

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH] Total-#-of-starts counter, Martin Pala, 2005/02/02
- Re: [PATCH] Total-#-of-starts counter, Christian Hopp, 2005/02/03
  - Re: [PATCH] Total-#-of-starts counter, Michel Marti, 2005/02/03
    - Re: [PATCH] Total-#-of-starts counter, Martin Pala <=

Prev by Date: Re: [PATCH] Total-#-of-starts counter
Next by Date: monit/protocols ntp3.c
Previous by thread: Re: [PATCH] Total-#-of-starts counter
Next by thread: monit/protocols ntp3.c
Index(es):
- Date
- Thread