|
From: | Sean Penticoff |
Subject: | intermittent user process tracking with monit |
Date: | Tue, 17 Sep 2013 02:22:04 -0700 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 |
Hi, Let me take a moment and try and describe what it is I'm trying to do in case my tack is all wrong. We have several systems that process data for users. The programs the users run all run from a shared space and run in user space at the users discretion. I would like to use monit to alert when one of these processes is started and have it track the memory and cpu usage, further alerting on a condition where cpu or mem of that process exceeds a certain threshold (and possibly renicing it via some script) I've currently set up alerts like this: check process process1 matching "process1" mode passive group processing if cpu is greater than 90% for 5 cycles then alert if memory is greater than 90% for 5 cycles then alert check process process2 matching "process2" mode passive group processing if cpu is greater than 90% for 5 cycles then alert if memory is greater than 90% for 5 cycles then alert check process process3 matching "process3" mode passive group processing if cpu is greater than 90% for 5 cycles then alert if memory is greater than 90% for 5 cycles then alert ...and it goes on for another dozen or so processes This "works" but is not ideal what would be ideal is more along the lines of check process process1 matching "process1" alert on statechange (basically ignore the fact this process is not running but let me know when it starts and ends [i.e alert on state a change] and monitor it when it is running) mode passive group processing if cpu is greater than 90% for 5 cycles then alert if memory is greater than 90% for 5 cycles then alert Also we are using m/monit and every process on every machine that is NOT running shows up as a hit against overall health i.e. under the host status: Status 10 out of 27 services are available and on the main dashboard: ×[Sep 16
2013 15:59:47] Host myhost.example.com reported a problem with process1: process
is not running
×[Sep 16
2013 15:59:44] Host myhost.example.com reported a problem with process2: process
is not running
×[Sep 16
2013 15:59:40] Host myhost.example.com reported a problem with process3: process
is not running
×[Sep 16
2013 15:59:35] Host myhost.example.com reported a problem with process4: process
is not running
multiplied by 20+ hostsyou get the idea. The fact that the process isn't running is never a problem and I would like to reflect that somehow and also be able to have some insight into whats running where. Another thing I would really like to be able to do is pass args in the alert emails i.e. when the command process1 -t foo -o bar -cfg process1.cfg -v -X -s is run I'd be tickled if I could get "-t foo -o bar -cfg process1.cfg -v -X -s" (or even the entire content of monit procmatch) into the alert somehow I've only had this up and running for about a month and monit has saved my bacon on filesystem checks and dead services several times. Just wanting to do a bit more than the system side of things with it. |
[Prev in Thread] | Current Thread | [Next in Thread] |