monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: exec stuff topic


From: Jan-Henrik Haukeland
Subject: Re: exec stuff topic
Date: Sun, 06 Jul 2003 23:23:11 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Civil Service, linux)

Martin Pala <address@hidden> writes:

> I went through new code - it very very nice (i'm starting to like C
> more and more :)

It was nice to demonstrate in practice the new proposed implementation
style. Not everything should be an object but when it's natural to use
objects I hope (or would like to insist :) that we try to use this style.

# 

I'll cut&paste from your two mails on the exec subject. I agree with
much of your proposal but disagree on a few issues.

> I think your events implementation provides way for method
> registration which i proposed few weeks ago.

The Event model will certainly be part of the execution of methods,
but I'm not sure if registering methods via the event interface will
work. (If you have a working implementation I'll happily eat these
words :)

#

To remove the Exec_T object and extend the Command_T object like you
suggest here is a very good idea and an important part of how we
should implement the exec statement.

> /* mycommand structure addopts name of the method specified in
> configuration, for example:
>  *    method NAME "..."
>  * and events registration mask from Exec_T which can be removed.
>  * mycommand structure is transformed to the list
>  */
> typedef struct mycommand {
>   char *arg[ARGMAX];                             /**< Program with
>   arguments */
>   int length;                         /**< The length of the arguments
>   array */
>   int has_uid;            /**< TRUE if a new uid is defined for this
>   Command */
>   uid_t uid;         /**< The user id to switch to when running this
>   Command */
>   int has_gid;            /**< TRUE if a new gid is defined for this
>   Command */
>   gid_t gid;        /**< The group id to switch to when running this
>   Command */
>   /**********NEW ***********/
>   char name; /** method name */
>   unsigned int events;    /**< Events triggering this program to be
>   executed */
>   struct mycommand *next;
>   /**********NEW ***********/
>
> } *Command_T;

#

> ---
> CONFIGURATION EXAMPLE:
>
> method start "/etc/init.d/foo start" /* automaticaly registered for
> start method because of 'start' name */
> method stop "/etc/init.d/foo start" /* automaticaly registered for
> stop method because of 'stop' name */
> method restart "/etc/init.d/foo restart" on { failed resorce } /*
> automaticaly registered for restart method by name + additionaly
> explicitly named events */
> method reload "/etc/init.d/foo reload" on { timestamp } /* registered
> just for timestamp event because there's no default event tag 'reload'
> */
>
>
> What do you think? Is this way possible?

Everything is possible :-) but unfortunately I'm not sure it will work
so well. To make this work you are right that a method *must* be
associated with an event and one way to do this is like you have
outlined above. But it is quite cumbersome. For instance lets say you
have:

 set method restart "/etc/init.d/foo restart" on { failed resorce }

 and in a service definition:

 if cpu usage > 80% for 5 cycles then restart

 but then you find out that you want to also use the restart method in
 a timestamp statement like this.

 if timestamp blabla changed then restart

 You must then remember to go back and change the set method to
 include timestamp event, like so:

 set method restart "/etc/init.d/foo restart" on { failed resorce timestamp }

This is error prone and awkward. Besides, the user must know which
event is generated - she must know which statements generate which
events and remember to put it inside the methods 'on {events}' block.

One way to work around this is to do the association in the parser
itself. For instance when we parse this statement:

 if cpu usage > 80% for 5 cycles then restart

we know it's a resource event and we can set up the link between
EVENT_RESOURCE and the restart method directly in the parser so the
user does not have to do this explicit with a on { events }. One
problem with this though is that the language will be stricter in the
sense that every method definition *must* be mentioned before they are
used. For instance, this will work:

check apache ..
  method restart  "/etc/init.d/apache restart"
 if cpu usage > 80% for 5 cycles then restart
  
but this will not because the method is not known in the if-statement:

 check apache ..
 if cpu usage > 80% for 5 cycles then restart
  method restart  "/etc/init.d/apache restart"
  

[Context-free ramble: I also think there is an even easier way to
register methods based on your language proposal. The Run-time object
Run should have a method list consisting of a Command_T linked list -
these are the global methods set via a set statement. In the parser
you will have a function called e.g. add_global_method(Command_T c)
which adds methods to the Run.methodlist variable. A local method list
exists in a Service_T object as you propose and a function in the
parser called add_local_method(Command_T c) add methods defined in the
local check { scope} to this list.]


Other stuff:

> 2.) methods are responsible for indication whether it was successfull
> or not (return code TRUE/FALSE). Monit can't know what is the goal of
> particular method - the only supercontrol over method
> successesfullness which monit have is that in the case of problem, the
> same event will occure soon again (thats why 'timeout' statement is
> here for). In the case that (generaly) the method will return FALSE,
> EVENT_FAILED will be generated.
> 
> Following constructs won't be possible in the case of proposed scheme
> -
> it must be replaced by simple return code control.
> 
> OBSOLETE:
> ---
>     check_service(E->source->name, "stop");
> 
>     if(is_process_running(E->source)) {
>       E->id= EVENT_FAILED;
>       swap_message(E, "Failed to stop service '%s'", E->source->name);
>     }
> ---

The *best* way to be sure that a start/stop succeeded is to do this
because spawn() do fork-fork-execve it's impossible to know for
certain if the command succeded (execve does not return, well okay, it
returns on direct error but if the command fails later you will not
now. For instance most of the commands are scripts in /etc/init.d/
which later starts a binary. If the binary fails you will not know).


> 3.) dependency stuff should support private-like method
> classification, which won't be pushed to dependants. Default
> (public) methods will post the event to all dependants so in my
> illustration instead of simple spawn something like following is
> more appropriate:

Ugh! The depend stuff is very complicated and one reason why I think
start/stop should be left as is and *not* be included in the method
implementation. And if it's complicated now it will be very, very
complicated doing this.

I'm sorry for this not beeing so constructive as I would like it to
be, altough I definitely think you are onto something in your
proposal.

Another thing, the method stuff may be nice but it will not do much to
improve this rather messy check statement:

check process apache with pidfile "/usr/local/apache/logs/httpd.pid"
 start program = "/usr/local/apache/bin/http start" 
   as UID nobody and with GID nobody 
 stop program  = "/usr/local/apache/bin/http stop"
 port 123 type udp # ntp time server
 host localhost port 80 protocol http #request "/cgi-bin/printenv"
 host localhost 443 type TCPSSL protocol http
 checksum /usr/local/apache/bin/http 
        and expect 63f8362dfcb7da0ca524c80412285ee3
 alert address@hidden with the mail-format { subject: alarm! }
 alert address@hidden  { checksum }
 timeout if 2 restarts within 3 cycles
 group server
 if cpu usage is greater than 60 percent for 2 cycles then alert
 if cpu usage > 80% for 5 cycles then restart
 if mem > 100 Mb then stop
 if loadavg(5min) greater than 10.0 for 8 cycles then stop
 depends sybase


We should also start looking at the many statements and improve them
to be more consistent. Looking at the above I can already see one
thing that would be a small improvement. The timeout statement may be
written like this:

 if 2 restarts within 3 cycles then timeout

Port and checksum could be prefixed with test (optional) so it says:

 test port 123 type udp # ntp time server
 test checksum /usr/local/apache/bin/http 
    and expect 63f8362dfcb7da0ca524c80412285ee3

Maybe group NAME could be: set groupe server?

Only suggestion from the top of my head, but we *should* look into how
to make all statements more conform and equal in syntax, IMO.

-- 
Jan-Henrik Haukeland






reply via email to

[Prev in Thread] Current Thread [Next in Thread]