[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Coredumping problem on Mandrake 10/2.6 kernel
From: |
Brian Thomas |
Subject: |
RE: Coredumping problem on Mandrake 10/2.6 kernel |
Date: |
Fri, 5 Nov 2004 15:11:08 -0800 |
Well... Progress, at least on the cfagent front:
I can actually run cfagent without a problem now, statically linked
against a libdb-3.3.11 compile. I don't know if cfservd will stay up
until the next run of the clients today, since that's the only time
there's enough load, but whatever the problem, it seems (unsurprisingly)
linked to libdb. But it is NOT an issue with dynamic loading weirdness,
since as far as everything I can tell, it's all static.
Brian
-----Original Message-----
From: Brian Thomas
Sent: Friday, November 05, 2004 2:46 PM
To: help-cfengine@gnu.org
Subject: Coredumping problem on Mandrake 10/2.6 kernel
So I'd thought originally I'd solved my problems with coredump problems
on Mandrake 10.x, but my excitement was premature. Furthermore, in
testing I realized my locally-compiled version is not just having a
problem with cfservd; it looks like cfagent is crashing as well.
I originally was, and still am, having problems with 'cfservd'
coredumping after running for awhile, usually under heavy-ish load. At
my half-hour intervals it would crap out, and appeared to be related to
libdb.
So in an effort to solve this, I undertook an effort to compile
statically against libdb. I'll skip the intervening frustration, suffice
to say I decided during the ordeal that just compiling my own libdb and
my own openssl static libraries and compiling against them was probably
better anyway than using the system static libdb.a. No problem with the
compile process itself once I did that, and I can verify (with ldd) I am
relying on neither a dynamic libdb nor a dynamic libcrypto.
The problem is, I have twice the problems! Why? Because now cfagent is
coredumping, and much more spectacularly (Read: Immediately) than
cfservd, although cfservd is still crashing under load.
Included below is lots of relevant, maybe too much, information. I'm not
sure what to do at this point; originally I thought this was an issue
with the tls (/lib/tls) versions of the libraries, and tried
compiling/executing against each individually, with the same results
either way.
So first, the software versions. Bear in mind I have the exact same
problems when compiling against the Mandrake-installed versions of
openssl and berkeleydb:
Openssl 0.9.7e
BerkeleyDB 4.2.52
Cfengine 2.1.11
Next, configure line (After this it's just a 'make'):
./configure --with-berkeleydb=/var/tmp/db-4.2.52
--with-openssl=/var/tmp/openssl-0.9.7e
Next, OS config:
# uname -a
Linux amd-usa 2.6.3-7mdk-p3-smp-64GB #1 SMP Wed Mar 17 15:34:39 CET 2004
i686 unknown unknown GNU/Linux
# cat /etc/issue:
Mandrake Linux release 10.0 (Official) for i586
Kernel 2.6.3-7mdk-p3-smp-64GB on a 4-processor i686 / \l
Next, gdb output. This first one is from the cfservd crash:
# gdb -c ./core.32076 cfservd
GNU gdb 6.0-2mdk (Mandrake Linux)
[warranty deletia]
This GDB was configured as "i586-mandrake-linux-gnu"...Using host
libthread_db library "/lib/libthread_db.so.1".
Core was generated by `./cfservd -m'.
Program terminated with signal 11, Segmentation fault.
warning: current_sos: Can't read pathname for load map: Input/output
error
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_nisplus.so.2...done.
Loaded symbols for /lib/libnss_nisplus.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
#0 0x080b0a6f in __bam_pinsert ()
(gdb) backtrace
#0 0x080b0a6f in __bam_pinsert ()
#1 0x080af683 in __bam_page ()
#2 0x080af070 in __bam_split ()
#3 0x080f3bf9 in __bam_c_put ()
#4 0x080dc06b in __db_c_put ()
#5 0x080d588f in __db_put ()
#6 0x080e250e in __db_put_pp ()
#7 0x08063d97 in LastSeen (hostname=0x40427900 "hostfoo.shopping.com",
role=cf_accept) at ip.c:443
#8 0x0804e130 in VerifyConnection (conn=0x8254e68, buf=0x4042e966
"10.20.3.50 hostfoo.shopping.com root 0")
at cfservd.c:1777
#9 0x0804d06c in BusyWithConnection (conn=0x8254e68) at cfservd.c:1234
#10 0x0804cbc1 in HandleConnection (conn=0x8254e68) at cfservd.c:1133
#11 0x4002c7d3 in start_thread () from /lib/tls/libpthread.so.0
#12 0x40144b4a in clone () from /lib/tls/libc.so.6
Next is the output from the cfagent crash. Note, these two crashes DO
NOT happen at the same time! Usually I can crank up a cfservd, and as
long as there's no significant load it will run fine, while cfagent will
crash every time. Similarly, cfservd will always eventually crash,
whether or not I run the locally-compiled cfagent against it. I am still
guessing the two crashes have the same similar root causes, but they do
not trigger each other!
# gdb -c ./core.32098 cfagent
GNU gdb 6.0-2mdk (Mandrake Linux)
[warranty deletia]
This GDB was configured as "i586-mandrake-linux-gnu"...Using host
libthread_db library "/lib/libthread_db.so.1".
Core was generated by `./cfagent --debug'.
Program terminated with signal 11, Segmentation fault.
warning: current_sos: Can't read pathname for load map: Input/output
error
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_nisplus.so.2...done.
Loaded symbols for /lib/libnss_nisplus.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
#0 0x40115b47 in memcpy () from /lib/libc.so.6
(gdb) backtrace
#0 0x40115b47 in memcpy () from /lib/libc.so.6
#1 0x080cf5ec in __bam_copy ()
#2 0x080cf01e in __bam_psplit ()
#3 0x080cd86c in __bam_page ()
#4 0x080cd280 in __bam_split ()
#5 0x08111e09 in __bam_c_put ()
#6 0x080fa27b in __db_c_put ()
#7 0x080f3a9f in __db_put ()
#8 0x0810071e in __db_put_pp ()
#9 0x0805ba27 in LastSeen (hostname=0xbfff4650
"serverfoo.shopping.com", role=cf_connect) at ip.c:443
#10 0x0805b265 in RemoteConnect (host=0xbfff4650
"serverfoo.shopping.com", forceipv4=110 'n') at ip.c:192
#11 0x080590c7 in OpenServerConnection (ip=0x8290c40) at client.c:57
#12 0x08054308 in MakeImages () at do.c:2435
#13 0x0804d70e in DoTree (passes=1, info=0x81cdf00 "Update") at
cfagent.c:1274
#14 0x0804b435 in main (argc=2, argv=0xbfffe7a4) at cfagent.c:107
- RE: Coredumping problem on Mandrake 10/2.6 kernel,
Brian Thomas <=
- Coredumping problem on Mandrake 10/2.6 kernel, Brian Thomas, 2004/11/05
- RE: Coredumping problem on Mandrake 10/2.6 kernel, Brian Thomas, 2004/11/08
- RE: Coredumping problem on Mandrake 10/2.6 kernel, Brian Thomas, 2004/11/08
- RE: Coredumping problem on Mandrake 10/2.6 kernel, Brian Thomas, 2004/11/08
- RE: Coredumping problem on Mandrake 10/2.6 kernel, Brian Thomas, 2004/11/08