2009/2/24 Martin Pala <address@hidden>:
floating point exception could bean divide by zero ... one matching bug was
fixed in upcoming monit-5.x:
* Fixed possible crash when monit is watching VPS environment on
Linux which reports number of CPUs as 0. Thanks to Marius
Schmidt for report.
=> is monit running in virtual environment? if so (and if linux host reports
0 CPUs), upgrade to monit 5.x will solve the problem. You can get the monit
source code here: http://www.mmonit.com/monit/download/
Yes, it's a xen virtual machine.
/proc/cpuinfo shows processor : 0
and the gdb/bt shows
#0 0x08053f62 in ?? ()
#1 0x0000522f in ?? ()
#2 0x082bc788 in ?? ()
#3 0x00000046 in ?? ()
#4 0xfbad8001 in ?? ()
#5 0x082b9210 in ?? ()
#6 0x082b9210 in ?? ()
#7 0x80000000 in ?? ()
#8 0x082b9210 in ?? ()
#9 0x082bdb40 in ?? ()
#10 0x082b930f in ?? ()
#11 0x037f0c7f in ?? ()
#12 0x00000000 in ?? ()
To get the root cause, you can analyze the coredump using gdb:
1.) gdb <path to monit> <path to coredump>
2.) bt
This will show backtrace and also allows to compare the data. If you will
need help, let me know.
I don't quite know what to do with the data above, or if it can be
used to make a fix?
Debian unstable still uses monit-1.4, so to get past this
reported/fixed bug I suppose I'll have to install from source rather
than the debian package, as you suggest.
(By the way: do I need to turn coredump enabling off again?)
coredumps are usually helpful for root causing crashes - you can disbale it,
but in general it doesn't hurt - i usually use it everywhere in production
so if something bad happens to any process, it's possible to backtrace the
reason
This is all new to me, and most helpful.
Many thanks for the help.
Jenny
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general