monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Monit daemon Hangs


From: Martin Pala
Subject: Re: Monit daemon Hangs
Date: Thu, 11 Oct 2012 00:09:31 +0200

Hello,

please can you post the monit configuration from the host, where you've seen 
the hang? (you can send it directly to my address to not disclose the 
informations to the list and also obfuscate the informations which you don't 
want to uncover such as hostnames, IPs, credentials, etc.).

When monit will hung again, please can you collect the backtrace?:
--8<--
gdb <path>/monit <monit's pid>
(gdb) thr apply all bt
--8<--

Thanks,
Martin



On Oct 10, 2012, at 2:30 PM, Thomas Vaccarino <address@hidden> wrote:

> Hello,
> 
> I am in the process of deploying Monit across hundreds of machines.  This 
> number will grow to well over a thousand machines once the work is complete.  
> I've been running into a situation where Monit hangs after a monit reload is 
> issued.  It's random, but when it does hang a monit quit, followed by a 
> monit, gets everything running again.  In order to support the deployment of 
> Monit across so many systems in a hands off fashion, two RPMs had to be 
> written that facilitate the deployment of the binaries and configs.  The 
> config RPM changes the most often and as such a mass deployment of this RPM 
> isn't uncommon.  However, I am running into cases where after the monit 
> config RPM is installed and a monit reload is issued, any commands to monit 
> (i.e. monit summary etc) after the reload fail.  The log snippet below 
> provides some details:
> 
> I am deploying Monit 5.5 on CentOS 5.
> 
> Please note that the monit daemon is running when the following messages 
> appear in the logs.
> 
> [EDT Oct 10 06:48:32] error    : monit: Openssl read timeout error!
> [EDT Oct 10 06:48:32] error    : monit: Cannot connect to the monit daemon. 
> Did you start it with http support?
> [EDT Oct 10 06:48:53] error    : 'service goes here' process is not running
> [EDT Oct 10 06:49:16] error    : monit: Openssl read timeout error!
> [EDT Oct 10 06:49:16] error    : monit: Cannot connect to the monit daemon. 
> Did you start it with http support?
> [EDT Oct 10 06:55:11] info     : Shutting down monit HTTP server
> [EDT Oct 10 06:55:11] info     : monit HTTP server stopped
> [EDT Oct 10 06:55:11] info     : monit daemon with pid [12988] killed
> [EDT Oct 10 06:55:11] info     : 'hostname goes here' Monit stopped
> [EDT Oct 10 06:56:31] error    : monit: Status not available -- the monit 
> daemon is not running
> [EDT Oct 10 06:56:33] info     : Starting monit daemon with http interface at 
> [*:2812]
> [EDT Oct 10 06:56:33] info     : Starting monit HTTP server at [*:2812]
> [EDT Oct 10 06:56:33] info     : monit HTTP server started
> 
> Restart fixes it:
> 
> address@hidden ~]# monit quit
> monit daemon with pid [12988] killed
> 
> address@hidden bin]# monit
> Starting monit daemon with http interface at [*:2812]
> 
> Monit is deployed with SSL enabled as well as PAM.  This issue doesn't happen 
> all the time.  In a plant of about 35 or so hosts I've had as many as 3 cases 
> where the daemon just stops responding to commands after the reload.  To make 
> things more interesting, there are monit commands in the Application RPMs pre 
> and post install scripts that perform an unmonitor in order to cut down on 
> the number of e-mails when an application deployment is in progress.  As it 
> stands right now, when the monit daemon isn't responding this is causing me 
> some issues when the unmonitor is run from those scripts.  I can probably 
> work around this, but it would be great if hanging issue could be solved.
> 
> Thanks for the help.
> 
> Tom Vaccarino
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]