intermittent user process tracking with monit

monit-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

intermittent user process tracking with monit

From:	Sean Penticoff
Subject:	intermittent user process tracking with monit
Date:	Tue, 17 Sep 2013 02:22:04 -0700
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

Hi,
Let me take a moment and try and describe what it is I'm trying to do in case my tack is all wrong.
We have several systems that process data for users. The programs the users run all run from a shared space and run in user space at the users discretion. I would like to use monit to alert when one of these processes is started and have it track the memory and cpu usage, further alerting on a condition where cpu or mem of that process exceeds a certain threshold (and possibly renicing it via some script)
I've currently set up alerts like this:
check process process1
    matching "process1"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert
check process process2
    matching "process2"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert
check process process3
    matching "process3"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert

...and it goes on for another dozen or so processes

This "works" but is not ideal
what would be ideal is more along the lines of
check process process1
    matching "process1"
    alert on statechange (basically ignore the fact this process is not running but let me know when it starts and ends [i.e alert on state a change] and monitor it when it is running)
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert

Also we are using m/monit and every process on every machine that is NOT running shows up as a hit against overall health
i.e.
under the host status:
Status 10 out of 27 services are available

and on the main dashboard:

×[Sep 16 2013 15:59:47] Host myhost.example.com reported a problem with process1: process is not running

×[Sep 16 2013 15:59:44] Host myhost.example.com reported a problem with process2: process is not running

×[Sep 16 2013 15:59:40] Host myhost.example.com reported a problem with process3: process is not running

×[Sep 16 2013 15:59:35] Host myhost.example.com reported a problem with process4: process is not running

multiplied by 20+ hosts
you get the idea.

The fact that the process isn't running is never a problem and I would like to reflect that somehow and also be able to have some insight into whats running where.

Another thing I would really like to be able to do is pass args in the alert emails

i.e. when the command process1 -t foo -o bar -cfg process1.cfg -v -X -s
is run I'd be tickled if I could get "-t foo -o bar -cfg process1.cfg -v -X -s" (or even the entire content of monit procmatch) into the alert somehow

I've only had this up and running for about a month and monit has saved my bacon on filesystem checks and dead services several times. Just wanting to do a bit more than the system side of things with it.

[Prev in Thread]

Current Thread

[Next in Thread]

intermittent user process tracking with monit, Sean Penticoff <=
- Re: intermittent user process tracking with monit, sven falempin, 2013/09/17

Prev by Date: Re: Access denied for user ''@'127.0.0.1' (using password: NO)
Next by Date: Re: intermittent user process tracking with monit
Previous by thread: Access denied for user ''@'127.0.0.1' (using password: NO)
Next by thread: Re: intermittent user process tracking with monit
Index(es):
- Date
- Thread