monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

monit 3.2 proc and fd problems (solaris)


From: Martin Pala
Subject: monit 3.2 proc and fd problems (solaris)
Date: Fri, 04 Jul 2003 11:18:59 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

After upgrade of ldap server to iDS5.2 (SUN ONE/iPlanet) monit isn't able to read process data for ldap ("failed to get process data" error).

Monit is watching two services:

---monitrc---
set daemon  120
set logfile syslog
set mailserver 192.168.100.106
set mail-format { from: address@hidden }
set httpd port 2812 and
 address 192.168.100.132
 allow user:password

check sshd with pidfile /var/run/sshd.pid
 start program  "/etc/init.d/sshd start"
 stop program  "/etc/init.d/sshd stop"
 host 192.168.100.132 port 22 protocol ssh

check ldap with pidfile /usr/iplanet/ldap1/slapd-ldap1/logs/pid
 timeout(5, 5)
 host 192.168.100.132 port 389 protocol ldap3
 mode passive
---monitrc---

address@hidden # monit -v validate
(...)
-------------------------------------------------------------------------------
'sshd' is running with pid 568
'sshd' zombie check passed [status_flag=0000]
'sshd' check_process_state() passed.
'sshd' succeeded connecting to INET[192.168.100.132:22]
'sshd' succeeded testing protocol [SSH] at INET[192.168.100.132:22]
'ldap' is running with pid 10340
'ldap' failed to get process data
'ldap' succeeded connecting to INET[192.168.100.132:389]
'ldap' succeeded testing protocol [LDAP3] at INET[192.168.100.132:389]

address@hidden # ps -Leaf | grep monit | grep -v grep
   root 16093     1     1     5  0 10:16:31 ?        0:00 monit
   root 16093     1     2     5  0 10:16:31 ?        0:00 monit
   root 16093     1     3     5  0 10:16:31 ?        0:00 monit
   root 16093     1     5     5  0 10:16:31 ?        0:00 monit
   root 16093     1     6     5  0 10:16:31 ?        0:00 monit

address@hidden # lsof | grep monit
monit     16093    root  cwd   VDIR          85,0        512        2 /
monit 16093 root txt VREG 85,0 476792 1430633 /usr/bin/monit monit 16093 root txt VREG 85,0 44844 34341 /usr/lib/nss_files.so.1 monit 16093 root txt VREG 85,0 191996 33969 /usr/lib/libthread.so.1 monit 16093 root txt VREG 85,0 1157924 34367 /usr/lib/libc.so.1 monit 16093 root txt VREG 85,0 908044 34018 /usr/lib/libnsl.so.1 monit 16093 root txt VREG 85,0 24968 33978 /usr/lib/libmp.so.2 monit 16093 root txt VREG 85,0 4848 433665 /usr/platform/sun4u-us3/lib/libc_psr.so.1 monit 16093 root txt VREG 85,0 70864 34293 /usr/lib/libsocket.so.1 monit 16093 root txt VREG 85,0 382600 34056 /usr/lib/libresolv.so.2 monit 16093 root txt VREG 85,0 38904 34374 /usr/lib/libpthread.so.1 monit 16093 root txt VREG 85,0 5292 33958 /usr/lib/libdl.so.1 monit 16093 root txt VREG 85,0 238776 33970 /usr/lib/ld.so.1 monit 16093 root 0u VCHR 13,2 0t0 1847319 /devices/pseudo/address@hidden:null monit 16093 root 1u VCHR 13,2 0t0 1847319 /devices/pseudo/address@hidden:null monit 16093 root 2u VCHR 13,2 0t0 1847319 /devices/pseudo/address@hidden:null monit 16093 root 3w VCHR 21,0 0t0 1847315 /devices/pseudo/address@hidden:conslog->LOG monit 16093 root 4r PSTA 294,0 1990 /proc/10340/status monit 16093 root 5u IPv4 0x30001ac22f8 0t0 TCP ldap1:2812 (LISTEN) monit 16093 root 7r PSTA 294,0 1990 /proc/10340/status monit 16093 root 8r PSTA 294,0 1990 /proc/10340/status monit 16093 root 9r PSTA 294,0 1990 /proc/10340/status monit 16093 root 10r PSTA 294,0 1990 /proc/10340/status monit 16093 root 11r PSTA 294,0 1990 /proc/10340/status monit 16093 root 12r PSTA 294,0 1990 /proc/10340/status

After following few hours monit consumes all filedescriptors (monit has 255 fd limit on out system) by accessing /proc/10340/status (250 times) and another problems related to unavailable fd's start (monitoring fails thereafter generaly).

address@hidden # ps -elf | grep 10340 | grep -v grep
8 R iplanet 10340 1 26 79 20 ? 202788 Jul 03 ? 1054:25 ./ns-slapd -D /usr/iplanet/ldap1/sl

address@hidden # ls -l /proc/10340/status
-r--------   1 iplanet  giplan      1232 Jul  3 07:23 /proc/10340/status

address@hidden # cat /proc/10340/status
cat: input error on /proc/10340/status: Value too large for defined data type

address@hidden # truss -r all -w all -f monit validate
(...)
16372:  open("/proc/10340/status", O_RDONLY)            = 3
16372:  read(3, 0xFFBEE7F8, 4095)                       Err#79 EOVERFLOW
16372:  fstat(-1, 0xFFBEEAC8)                           Err#9 EBADF
(...)

Complete truss is in the attachment.

It is little bit strange - maybe it is caused by 64-bit support of iDS5.2 (it was 32-bit program before this version).


It seems that there are two problems:

1.) monit isn't able to read 64-bit processes status
2.) there is some loop in proc stuff which causes filedescriptors leak in monit

I tried present CVS version too - both problems remains in it too.


Martin


Attachment: monit.truss.gz
Description: GNU Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]