[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [monit] monit 4.10.1 is driving me crazy!
From: |
Gilad Benjamini |
Subject: |
RE: [monit] monit 4.10.1 is driving me crazy! |
Date: |
Fri, 23 Jan 2009 15:30:02 -0800 |
I ran into similar problems.
Several attempts to work around them didn't work.
See some posting I made recently.
The recommended solution in that thread was an upgrade to 5.0beta6.
> -----Original Message-----
> From: address@hidden
> [mailto:address@hidden On
> Behalf Of David Paper
> Sent: Friday, January 23, 2009 3:22 PM
> To: This is the general mailing list for monit
> Subject: [monit] monit 4.10.1 is driving me crazy!
>
>
> Hi monit gurus,
>
> I'm absolutely stumped, and have been stumped for more than a month
> trying to chase a problem down. I'm using Monit 4.10.1 on OpenSuse
> 11.0 64-bit.
>
> Monit SOMETIMES starts multiple copies of the same job. Not always,
> not never, SOMETIMES.
>
> Monit can read the PID file for the job, the PID is defined, written
> out to the file, permissions are correct, ownership is correct, and
> the PID file contains a PID of one of the multiple executions of the
> same job.
>
> The job in question is the tm_prod03catalogedge01 job (see -v output
> before for more specifics). The start/stop commands call help scripts
> that do the heavy lifting. The "sleep 30" at the end of the script is
> an attempt to slow monit down so it doesn't try to start multiple
> instances of the same job. It doesn't work. When multiple copies of
> the same job are started, there is a NOT a 30 second delay when
> looking at ps and viewing the start times.
>
> Has anyone else run into a bug where Monit very quickly starts
> multiple instances of the same job? I'm seeing this on dozens of
> different hosts at different times, so it's not isolated to a single
> monit instance or a single job definition. The only thing that is in
> common is that all of the jobs are Jboss servers.
>
> I've been anxiously watching the Monit 5.0 beta's, hoping it gets
> released as a final soon. These are production servers, and I'd
> rather not run beta code if at all possible. However, I will if this
> is a known bug that has been fixed, and I just couldn't match this
> problem up to the entries in the Changelog.
>
> --
>
> monit_run.sh:
>
> #!/bin/ksh
> DATE=`date +%Y%m%d-%H%M%S`
> CONSOLE_LOG=/opt/jboss/server/${4}/log/console.log
> if [ -a ${CONSOLE_LOG} ]; then
> mv ${CONSOLE_LOG} ${CONSOLE_LOG}-${DATE}
> fi
>
> logger "Running /opt/jboss/bin/run.sh for ${2}"
> cd /opt/jboss/bin; ./${4} $* | tee ${CONSOLE_LOG}
>
> #sticking in a sleep to try to get monit to stop spawing multiple procs
> sleep 30
>
> --
>
> monitrc:
>
> set daemon 20
> set logfile syslog facility log_daemon
> set mailserver localhost # primary mailserver
> set eventqueue
> basedir /opt/monit/eventqueue # set the base directory where
> events will be stored
> set mail-format { Subject: monit alert for $HOST -- $EVENT $SERVICE }
> set alert address@hidden # receive all alerts
> set httpd port 2812 and
> use address localhost # only accept connection from localhost
> allow localhost # allow localhost to connect to the
> server and
> include /opt/monit/jobs/*
> check system localhost
> noalert address@hidden
>
> --
>
> monit -v output:
>
> [dpaper]:[18:07:48]:/opt/jboss/bin> sudo monit -v
> monit: Debug: Adding host allow 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> Runtime constants:
> Control file = /opt/monit/etc/monitrc
> Log file = syslog
> Pid file = /var/run/monit.pid
> Debug = True
> Log = True
> Use syslog = True
> Is Daemon = True
> Use process engine = True
> Poll time = 20 seconds
> Event queue = base directory /opt/monit/eventqueue with
> unlimited slots
> Mail server(s) = localhost:25
> Mail from = address@hidden
> Mail subject = monit alert for $HOST -- $EVENT $SERVICE
> Mail message = $EVENT Service $SERV..(truncated)
> Start monit httpd = True
> httpd bind address = localhost
> httpd portnumber = 2812
> httpd signature = True
> Use ssl encryption = False
> httpd auth. style = Host/Net allow list
> Alert mail to = address@hidden
> Alert on = All events
>
> The service list contains the following entries:
>
> Process Name = tm_prod03catalogedge01
> Pid file = /var/run/jboss/tm_prod03catalogedge01.pid
> Monitoring mode = active
> Start program = '/opt/jboss/bin/monit_run.sh -b
> prod03catalogedge01.dc03.totalmusic.net -c prod03catalogedge01' as uid
> 8002 as gid 8002 timeout 1 cycle(s)
> Stop program = '/bin/bash -c /opt/jboss/bin/monit_stop.sh
> prod03catalogedge01.dc03.totalmusic.net > /tmp/stop.log 2>&1' as uid
> 8002 as gid 8002 timeout 1 cycle(s)
> Pid = if changed 1 times within 1 cycle(s) then
> alert
> Ppid = if changed 1 times within 1 cycle(s) then
> alert
> Port = if failed
> prod03catalogedge01.dc03.totalmusic.net:8080 [DEFAULT via TCP] with
> timeout 5 seconds 5 times within 10 cycle(s) then alert else if passed
> 1 times within 1 cycle(s) then alert
>
> System Name = localhost
> Monitoring mode = active
> Alert mail to = address@hidden
> Alert on = No events
>
> -----------------------------------------------------------------------
> --------
> monit daemon at 1850 awakened
>
> --
>
> Thanks!
>
> -dave
>
> --
> Dave Paper address@hidden
>
> MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
>
>
>
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general