monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit 3.2 proc and fd problems (solaris)


From: Martin Pala
Subject: Re: monit 3.2 proc and fd problems (solaris)
Date: Fri, 04 Jul 2003 12:35:18 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

First problem related description (from solaris 64-bit developer's guide):

---
What Does EOVERFLOW Mean?

The EOVERFLOW return value is returned from a system call whenever one or more fields of the data structure used to pass information out of the kernel is too small to hold the value.

A number of 32-bit system calls now return EOVERFLOW when faced with large objects on the 64-bit kernel. While this was already true when dealing with large files, the fact that daddr_t, dev_t, time_t, and its derivative types struct timeval and timespec_t now contain 64-bit quantities might allow more EOVERFLOW return values to be observed by 32-bit applications.
---

Martin

Martin Pala wrote:

There's patch for the second problem (filedescriptor leak). It seems that CVS isn't accessible from our site now - i'll chek-in it later (or if you have time and cvs accessible, please check it in :)

Martin

Martin Pala wrote:

After upgrade of ldap server to iDS5.2 (SUN ONE/iPlanet) monit isn't able to read process data for ldap ("failed to get process data" error).

Monit is watching two services:

---monitrc---
set daemon  120
set logfile syslog
set mailserver 192.168.100.106
set mail-format { from: address@hidden }
set httpd port 2812 and
 address 192.168.100.132
 allow user:password

check sshd with pidfile /var/run/sshd.pid
 start program  "/etc/init.d/sshd start"
 stop program  "/etc/init.d/sshd stop"
 host 192.168.100.132 port 22 protocol ssh

check ldap with pidfile /usr/iplanet/ldap1/slapd-ldap1/logs/pid
 timeout(5, 5)
 host 192.168.100.132 port 389 protocol ldap3
 mode passive
---monitrc---

address@hidden # monit -v validate
(...)
-------------------------------------------------------------------------------
'sshd' is running with pid 568
'sshd' zombie check passed [status_flag=0000]
'sshd' check_process_state() passed.
'sshd' succeeded connecting to INET[192.168.100.132:22]
'sshd' succeeded testing protocol [SSH] at INET[192.168.100.132:22]
'ldap' is running with pid 10340
'ldap' failed to get process data
'ldap' succeeded connecting to INET[192.168.100.132:389]
'ldap' succeeded testing protocol [LDAP3] at INET[192.168.100.132:389]

address@hidden # ps -Leaf | grep monit | grep -v grep
   root 16093     1     1     5  0 10:16:31 ?        0:00 monit
   root 16093     1     2     5  0 10:16:31 ?        0:00 monit
   root 16093     1     3     5  0 10:16:31 ?        0:00 monit
   root 16093     1     5     5  0 10:16:31 ?        0:00 monit
   root 16093     1     6     5  0 10:16:31 ?        0:00 monit

address@hidden # lsof | grep monit
monit     16093    root  cwd   VDIR          85,0        512        2 /
monit 16093 root txt VREG 85,0 476792 1430633 /usr/bin/monit monit 16093 root txt VREG 85,0 44844 34341 /usr/lib/nss_files.so.1 monit 16093 root txt VREG 85,0 191996 33969 /usr/lib/libthread.so.1 monit 16093 root txt VREG 85,0 1157924 34367 /usr/lib/libc.so.1 monit 16093 root txt VREG 85,0 908044 34018 /usr/lib/libnsl.so.1 monit 16093 root txt VREG 85,0 24968 33978 /usr/lib/libmp.so.2 monit 16093 root txt VREG 85,0 4848 433665 /usr/platform/sun4u-us3/lib/libc_psr.so.1 monit 16093 root txt VREG 85,0 70864 34293 /usr/lib/libsocket.so.1 monit 16093 root txt VREG 85,0 382600 34056 /usr/lib/libresolv.so.2 monit 16093 root txt VREG 85,0 38904 34374 /usr/lib/libpthread.so.1 monit 16093 root txt VREG 85,0 5292 33958 /usr/lib/libdl.so.1 monit 16093 root txt VREG 85,0 238776 33970 /usr/lib/ld.so.1 monit 16093 root 0u VCHR 13,2 0t0 1847319 /devices/pseudo/address@hidden:null monit 16093 root 1u VCHR 13,2 0t0 1847319 /devices/pseudo/address@hidden:null monit 16093 root 2u VCHR 13,2 0t0 1847319 /devices/pseudo/address@hidden:null monit 16093 root 3w VCHR 21,0 0t0 1847315 /devices/pseudo/address@hidden:conslog->LOG monit 16093 root 4r PSTA 294,0 1990 /proc/10340/status monit 16093 root 5u IPv4 0x30001ac22f8 0t0 TCP ldap1:2812 (LISTEN) monit 16093 root 7r PSTA 294,0 1990 /proc/10340/status monit 16093 root 8r PSTA 294,0 1990 /proc/10340/status monit 16093 root 9r PSTA 294,0 1990 /proc/10340/status monit 16093 root 10r PSTA 294,0 1990 /proc/10340/status monit 16093 root 11r PSTA 294,0 1990 /proc/10340/status monit 16093 root 12r PSTA 294,0 1990 /proc/10340/status

After following few hours monit consumes all filedescriptors (monit has 255 fd limit on out system) by accessing /proc/10340/status (250 times) and another problems related to unavailable fd's start (monitoring fails thereafter generaly).

address@hidden # ps -elf | grep 10340 | grep -v grep
8 R iplanet 10340 1 26 79 20 ? 202788 Jul 03 ? 1054:25 ./ns-slapd -D /usr/iplanet/ldap1/sl

address@hidden # ls -l /proc/10340/status
-r--------   1 iplanet  giplan      1232 Jul  3 07:23 /proc/10340/status

address@hidden # cat /proc/10340/status
cat: input error on /proc/10340/status: Value too large for defined data type

address@hidden # truss -r all -w all -f monit validate
(...)
16372:  open("/proc/10340/status", O_RDONLY)            = 3
16372:  read(3, 0xFFBEE7F8, 4095)                       Err#79 EOVERFLOW
16372:  fstat(-1, 0xFFBEEAC8)                           Err#9 EBADF
(...)

Complete truss is in the attachment.

It is little bit strange - maybe it is caused by 64-bit support of iDS5.2 (it was 32-bit program before this version).


It seems that there are two problems:

1.) monit isn't able to read 64-bit processes status
2.) there is some loop in proc stuff which causes filedescriptors leak in monit

I tried present CVS version too - both problems remains in it too.


Martin

------------------------------------------------------------------------

_______________________________________________
monit-dev mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/monit-dev


------------------------------------------------------------------------

diff -Naur monit/process/common.c monit.cvs-20030704/process/common.c
--- monit/process/common.c      2003-07-04 12:25:44.000000000 +0200
+++ monit.cvs-20030704/process/common.c 2003-07-04 12:18:54.000000000 +0200
@@ -95,6 +95,8 @@

  if ( (bytes = read(fd, buf, buf_size-1)) < 0 ) {

+    close(fd);
+
    return FALSE;

  }






reply via email to

[Prev in Thread] Current Thread [Next in Thread]