I think it is either watchdog or some stonith method (power off/cycle
the machine). You can try for example 'lsof | grep watchdog' to see
whether the watchdog device is opened.
If you can supply your heartbeat, monit and scripts configuration as
described Hauk, then it will be much easier to find the problem.
Martin
Hi Martin/Jan-Henrik,
I worked my way through the process of documenting everything I did to
cause this kill all processes problem after a fresh reboot, and wouldn't
you know it, I couldn't reproduce it again!
I'll spend some more time tomorrow getting to the bottom of this and
report back to the list.
The group problem is still very much reproducable for me however, so
I'll fork this thread with the group specific information:
Jan-Henrik said to use the syntax "Try "monit -g groupname start""
which I've done below, with the results pasted.
What am I doing wrong here?
tara root # monit summary
The monit daemon 4.5.1 uptime: 1h 18m
System 'tara' [0.07] [0.32] [0.17]
Device 'drbd' not monitored
Directory 'drbdfs' not monitored
Process 'nfsd' not monitored
tara root # monit -g node1 start
monit: invalid argument -- start (-h will show valid arguments)
tara root # monit summary
The monit daemon 4.5.1 uptime: 1h 19m
System 'tara' [0.02] [0.26] [0.16]
Device 'drbd' not monitored
Directory 'drbdfs' not monitored
Process 'nfsd' not monitored
tara root # monit -t
Control file syntax OK
my monitrc config:
tara root # cat /etc/monitrc |grep -v \#
set daemon 60
set logfile syslog facility log_daemon
set mailserver willow.griffous.net
set mail-format { from: address@hidden }
set alert address@hidden
set httpd port 2812 and
allow 192.168.1.133
check device drbd path /proc/drbd
start program = "/etc/ha.d/resource.d/drbddisk r0 start"
stop program = "/etc/ha.d/resource.d/drbddisk r0 stop"
mode manual
group node1
check directory drbdfs path /mnt/nfstest/nfs
start program = "/etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/nfstest
reiserfs start"
stop program = "/etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/nfstest
reiserfs stop"
mode manual
depends drbd
group node1
check process nfsd with pidfile /var/run/rpc.statd.pid
start program = "/etc/init.d/nfs start"
stop program = "/etc/init.d/nfs stop"
mode manual
depends on drbdfs
group node1
Thanks,
Jonathan
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general