monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: confused on message


From: Christopher Johnston
Subject: Re: confused on message
Date: Fri, 3 Feb 2012 16:37:50 -0500

In some cases this is useful, in some case no.  There is a bit of a chicken and an egg problem with my situation here.  My systems are all completely stateless so at boot-time we determine what applications are to be loaded on the host and are all pulled down and started up.  Once they are under monit control we will either 

1) start stop apps on an individual basis as needed (or restart when they crash)

2) stop all apps in one shot, or start all apps in one shot (monit restart all to the rescue)

3) stop "groupings" of apps in one shot, or start "groupings" of apps in one shot.  The problem with grouping them is that since we are a dynamic environment (apps get stopped, started, uninstalled, installed) there is no way for us to group them together in our configs as its usually done in an adhoc manner.  For eg, stop all apps that start with the word ABC and end in 50-100. 

4) Lastly, our apps have a built-in "restart yourself" function that can be triggered by our end-users through a control port and that is one way they will do it.  The apps will be sent a trigger to the control port and they will exit and monit attempts to restart them.  I am also dealing with another monit issue there but I am hoping the latest version rectifies that.

So far with 'stop all' and 'start all' I have no issues with that for maintenance windows but my users tend to lean on using 4) since they have a ton of maintenance scripts written around this technology.  

-Chris

On Fri, Feb 3, 2012 at 4:15 PM, Wayne Lawrence <address@hidden> wrote:
Have you thought about putting the program's in groups I use this method to stop and start groups of apps with monit without any issues and I am starting between 10 and 16 processes.

Sent from my iPhone

On 3 Feb 2012, at 20:45, Christopher Johnston <address@hidden> wrote:

Okie, we switched our central 'launch' script which essentially takes the list of apps from 'monit summary' stops them (some of them), then does a start on the list that matches the regex.  If 10 sequential commands get sent to monit it will fail to start 1 or 2 of them and I see this error in my logs.  Does monit have issues receiving multiple commands all at once?  Seems like an issue to me that monit can't scale to handle requests like this.  This is a multi-user environment where app owners stop and start their apps at their leisure.

<27> Feb  3 12:41:00.441595 -08:00 dev001 monit[25592]: monit: action failed -- Other action already in progress -- please try again later


On Thu, Feb 2, 2012 at 12:53 PM, Christopher Johnston <address@hidden> wrote:
Ok - I grokked the script that handles the restar.  I think this could be the cause, it is essentially doing a 'stop && start' so the initiating start is producing that message since there is already another action going (to stop the app).  We will modify this to use 'restart' instead.


On Thu, Feb 2, 2012 at 10:55 AM, Christopher Johnston <address@hidden> wrote:
I am a little confused on why I am seeing this.  I have 4 applications on my host (in some cases up to 10) where we need to do a dailly/weekly rolling restart of all the apps on the host.   If I signal  4 monit restart commands to the apps in sequence I will end up in a situation where only 2 or 3 out of the apps come up and monit complains that an action is already in progress (assuming its from the other commands).  Monit can't handle getting signaled 4x to take down apps and restart them?  This creates some issues for us when we are doing a mass code roll out to 100s of applications.  We end up having to go and clean up things manually and the driver behind using monit is to provide an automated framework for managing apps and guaranteeing uptime. 

Is there any way to remedy this?  We are using a very low timeout in monit since we can't risk having apps down for long periods could this have something to do with it?

<27> Feb  2 07:48:43.202228 -08:00 dev001 monit[3263]: monit: action failed -- Other action already in progress -- please try again later




--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]