Hi,
Reading your responses is useful, there are some good ideas.
BTW, I see a shortcoming in the protocol test interface. I think
there
should be a way to kickback error reports to validate.c so it can be
included in the alert. Now lots of interesting errors can only be
logged. We can change the signature of a protocol test to: int
check_foobar(Socket_T s, char **errors); Where the protocol-test can
allocate an error string upon errors and assign it to the errors
parametere. validate.c will use this error string in the alert if it
is
non-null and validate.c is also responsible for freeing the
errors-buffer. What do you think?
I agree that giving more information in the alert is a good idea,
since
currently the infomation is going to the logs, but the alerts are
just general
for the protocol. I think this change needs input from everyone, not
just me!
1.) the limits in the patch are defined as percentage, but it is not
obvious at first sigth. Currently '%' character is used in monit
control
file for other tests where percentage limit is supported (cpu,
memory,
This sounds a good idea. I used percentages since they cope with
changes in the total number of Apache children, but making it clearer
would be
good.
2.) it could be good to support comparision operators as well, so it
will be possible to use various combinations. It will be more
consitent
with other tests syntax too (such as in the case of 'space' example
above). We can then check for example that there are always 10% child
processes waiting for connection (i.e. ready to serve requests
immediately):
if failed port 80
protocol apache-status waitlimit < 10% then alert
This will allow to stack the actions too based on various error
levels:
if failed port 80
protocol apache-status loglimit > 50% then alert
if failed port 80
protocol apache-status loglimit > 90% then restart
These comparisons are already there, but not given in the control
file.
For all the monitored limits, except waitlimit, an action is taken
when the
measured quantity exceeds the limit. For waitlimit, an action is
triggered
when the measured quantity is below the waitlimit.
The 'escalation' approach of alert and restart is a good idea, but I
haven't
tried it yet. I agree that some other name for the *limits would be
good in
this case, perhaps *trigger or *level would be good.
I have other work to do until after Christmas at the earliest, so I
won't
make any changes now. It has taken quite a lot of work to get the
patch this
far!! What do people want to do? I am happy if other people want to
make
changes, and integrate the patch better with the rest of Monit.
Regards,
David.
--
-------------------------------------------------
Email: address@hidden
-------------------------------------------------
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev