monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uptime test failed!!


From: Stephan Gomes Higuti
Subject: Re: uptime test failed!!
Date: Tue, 3 Jun 2014 11:50:25 -0300

Hello Martin,

This is my conf for the process:

check process Process1-M-aaa matching "/home/user/binn/GK/Process1 -f/home/user/binn/GK/Process1.aaa.properties --dispatch=M --daeMn" 
    start program "/home/user/binn/GK/Process1 -f/home/user/binn/GK/Process1.aaa.properties --dispatch=M --pidfile /tmp/Process1-M.aaa.pid" as uid user and gid user
    stop program "/bin/bash -c '/bin/kill -9 `/bin/cat /tmp/Process1-M.aaa.pid`'" as uid user and gid user
    if does not exist then exec "/bin/bash -c '/etc/monit.d/scripts/nagios_cmd.sh proc_m_aaa_crit'" as uid user and gid user
    else if succeeded then exec "/bin/bash -c '/etc/monit.d/scripts/nagios_cmd.sh proc_m_aaa_ok'"
    if uptime > 15 seconds then exec "/bin/bash -c '/etc/monit.d/scripts/nagios_cmd.sh proc_m_aaa_ok'" 
    group gateway

My ideia is:
If the process goes down, monit will execute nagios_cmd.sh script, it will trigger a critical Alarm on nagios and will also start de process, I couldnt get monit starting the process and executing the script, so I had to insert both actions together in the script.
If the process is running, it will trigger an OK signal do Nagios.
To keep the OK Signal do nagios, I did the uptime thing, because I need to keep sending OK to Nagios.


Here is the status:

Process 'Process1-M-aaa'
  status                            Uptime failed
  monitoring status                 Monitored
  pid                               10601
  parent pid                        1
  uid                               1000
  effective uid                     1000
  gid                               1000
  uptime                            14h 51m 
  children                          0
  memory kilobytes                  20064
  memory kilobytes total            20064
  memory percent                    0.2%
  memory percent total              0.2%
  cpu percent                       0.1%
  cpu percent total                 0.1%
  data collected                    Tue, 03 Jun 2014 11:43:02


The service is up, fit the uptime condition ( > 15 seconds), but I get this status "uptime failed".

Regards,

Att,

Stephan Gomes Higuti


On 3 June 2014 11:37, Martin Pala <address@hidden> wrote:
Hello,

can you please send full Monit configuration for the given service?

Is there only one uptime test or two?

Regards,
Martin


On 02 Jun 2014, at 15:09, Stephan Gomes Higuti <address@hidden> wrote:

Hello Martin.

Thank you. 
Yes, I got about the action, but I still have no clue why the process has a Fail Status, since the uptime is greater then 15 seconds.

[BRT Jun  2 09:39:25] debug    : 'Process1' uptime check succeeded [current uptime=247571 seconds]
247571 seconds fits my condition, so it shouldn't get an "uptime failed" status.

Regards,

Stephan Gomes Higuti


On 2 June 2014 09:59, Martin Pala <address@hidden> wrote:
Hello,

testing rules define a failure condition and the given action is executed if matched ... in your case the script will be executed ca. every 15 seconds.

Regars,
Martin


On 02 Jun 2014, at 14:55, Stephan Gomes Higuti <address@hidden> wrote:

Hello,

I'm having troubles with uptime pid testing.
I'm getting fail status, and I just understand why.
My configuration is:

if uptime > 15 seconds then exec "/bin/bash -c '/etc/monit.d/scripts/script.sh'"

However, looking into my log, I get:

[BRT Jun  2 09:39:25] debug    : 'Process1' zombie check succeeded [status_flag=0000]
[BRT Jun  2 09:39:25] debug    : 'Process1' uptime check succeeded [current uptime=247571 seconds]
[BRT Jun  2 09:39:25] error    : Process1' uptime test failed for /home/user/binn/GW/Process1 -f/home/user/Process1.properties -- current uptime is 247571 seconds
[BRT Jun  2 09:39:25] debug    : -------------------------------------------------------------------------------
[BRT Jun  2 09:39:25] debug    :     monit() [0x41bc93]
[BRT Jun  2 09:39:25] debug    :     monit(LogError+0x9f) [0x41c3ef]
[BRT Jun  2 09:39:25] debug    :     monit(Event_post+0x206) [0x418826]
[BRT Jun  2 09:39:25] debug    :     monit(check_process+0x2d4) [0x42aeb4]
[BRT Jun  2 09:39:25] debug    :     monit(validate+0x22e) [0x42ab1e]
[BRT Jun  2 09:39:25] debug    :     monit(main+0x527) [0x415fb7]
[BRT Jun  2 09:39:25] debug    :     /lib64/libc.so.6(__libc_start_main+0xfd) [0x7fcf020e1b7d]
[BRT Jun  2 09:39:25] debug    :     monit() [0x40c539]
[BRT Jun  2 09:39:25] debug    : -------------------------------------------------------------------------------
[BRT Jun  2 09:39:25] debug    : M/Monit: event message sent to http://172.xxx.yyy.zzz:8080/collector
[BRT Jun  2 09:39:25] info     : 'Process1 exec: /bin/bash

Seems weird, because it fits my condition that the uptime is > than 15 seconds, however I get a fail status on mmonit.
Any ideas of why this is happening?



Regards,

Stephan Gomes Higuti
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]