bug-gdb
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The thread/float/NaN bug in GDB 5.1, a survey


From: bart . durinck
Subject: The thread/float/NaN bug in GDB 5.1, a survey
Date: Fri, 7 Dec 2001 15:22:17 +0100

Hi!

I _also_ found the the thread/float/NaN bug in GDB. But Kevin Buettner
beat me in supplying the patch... I saw Kevin's message while I was
preparing this message. How funny/strange two people find the same bug
at the same moment when it's been there for half a year...

But I am not sending this message to tell you only this, the bug gets
fixed, that's important. But during my bug-hunt I discovered more. I
am pretty sure there is one more bug: maybe in GDB, maybe in GCC, or
the Linux kernel... Read about what I found out.
In addendum follow some pointers to problems which I think are related
(found through Google). I hope my explanation can contribute to
solving some "old mysteries", some where GDB was never a suspect...

During 2001 I have occasionally tried to use the snapshots of GDB to
debug the Linux-port of our real-time DSP software. Its sole purpose
is to debug in a more "friendly environment". I encountered several
times strange problems with NaN popping all over when trying to debug
code with floating-point operations. I did not bother to look into it.
It behaved quite funny, it sometimes worked, it sometime didn't. It
didn't seem that easy to reproduce in a small test program and I had
hopes that it would be fixed by the final 5.1 release.

But now I tried 5.1, and it was still there. And I saw some posts on
the bug-gdb mailing-list describing the problem with a very small test
program. I search Google and found a lot of related stuff... In the
end, I set out to debug GDB _with_ GDB (compiled from source, with
optimisation level -O0). Quite a daunting task requiring a certain
degree of schizophrenia ;-) BTW I did not succeed in attaching GDB to
a running GDB (gets stuck in poll()), I had to really run GDB in GDB,
which confused DDD a bit... Is it a known problem not being able to
attach?

To make a complicated story short, I found the same problem in the
convertion of the ftag register (in i387-nat.c).

But when I recompiled GDB with the default optimilisation level -O2
and ran it on another "production" machine (vanilla Mandrake 8.1,
PentiumII), I got NaNs again! Same GDB on a identically configured
Mandrake 8.1 (PentiumIII), no problem... It is CPU related. But how?!

The default Mandrake 8.1 compiler is the "controversial" gcc 2.96
(20000731). I decided to try gcc 3.0.1 (also supplied my MDK) to
compile GDB with -O2. Now the problem was gone. Looks very much like a
gcc 2.96 optimisation issue which is CPU dependent. I can't think what
that can be, but it is sure what I see... Or is it some incorrect
assumption in the GDB sources about volatile variables or aliased
pointers that breaks by GCC's optimisation (I remember vaguely some
issues with the -fstrict-aliasing switch).

I am also thinking of two last lines in the test matrix supplied by
Emmanuel Blindauer on http://manu.agat.net/bug.html. Two identical
Debian Unstable setups, except for the kernel (2.4.13 vs 2.4.14) and
the CPU (K6-2 vs. Athlon), but only one (Athlon) showing the
problem... Does this mean that gcc 2.95.4 (default compiler for Debian
Unstable) has the same problem as gcc 2.96? Can someone with Debian
Unstable check this out? And I am still wondering how one can explain
that the bug can disappear by booting linux 2.2 instead of 2.4?
Anyway, there seem to be a least 2 bugs. One just found and fixed in
GDB. The other one showing when using gcc 2.96, but seemingly related
to the CPU type and the linux kernel version...

For me, the problems with NaNs are totally fixed by using the patch
for GDB and gcc 3.0.1. I suggest people having the same problems try
the same. I will report to Mandrake about this to. Haven't been using
RedHat recently, but good chance they're bitten too...


Kudos to the makers of GDB (and Linux, GCC and the whole
GNU/Linux/FreeSoftware/OpenSource community for that matter). Stepping
with GDB through GDB gives one an enormous sense of "standing on the
shoulders of giants" :-)

Kind regards,

Bart

---

Addendum:

ONE
***
From: Blindauer (address@hidden)
Subject: thread in linux 2.2 and 2.4 where is the difference?
Newsgroups: gnu.gdb.bug
Date: 2001-12-02 06:12:45 PST

Provides the 10 C-line test-program with the strtod() call I used to
reproduce and debug the problem.

TWO
***
From: Alexander Enchevich (address@hidden)
Subject: gdb+pthreads+strtod=nan
Newsgroups: gnu.gdb.bug
Date: 2001-07-04 17:39:27 PST

Describes the same problem with a somewhat less minimal test program,
using 2 threads and strtod.


THREE
*****
From: Ken Whaley (address@hidden)
Subject: CVS: stepping over function returning float returns NaN with shared 
pthreads
Newsgroups: gnu.gdb.bug
Date: 2001-07-12 16:56:07 PST

GNATS GDB bug number 175 (see sources.redhat.com/gdb/bugs)
Synopsis:       smp + pthreads + breakpoints in FP code = corrupt FP state on 
x86
Arrival-Date:   Fri Jul 13 15:38:00 PDT 2001
Originator:     address@hidden
Release:        snapshot 2001-07-13

Problem only seen on SMP machine. Not on single processor. Maybe the
problem is not the SMP, but the different type of processor (PII
vs. PIII?!)...

FOUR
****
GNATS GDB bug number 178
Synopsis:       Floating Point NaN on first fp operation when compiling with 
-lpthread
Arrival-Date:   Mon Jul 23 12:48:01 PDT 2001
Originator:     Marius VLAD (Digital Media Institute, Tampere Univ. of 
Technology, FINLAND)
From: address@hidden

Problem with small program using printf("%f", ...)...

FIVE
****
From: Brendan Doherty (address@hidden)
Subject: gdb 5.1 with shared libraries
Newsgroups: gnu.gdb.bug
Date: 2001-12-05 19:36:51 PST

3 problems: Problem 1 (Linux 2.4/libc 2.2.4/Debian Woody): value of a
double (in static data section of exe) get's changed when run through
5.1.

SIX
***
From: Rychard Bouwens (address@hidden)
Subject: Possible bug in my microprocessor/memory
Newsgroups: comp.lang.asm.x86
Date: 2000/07/07

A thread about the problem trying to find a memory overwrite.
Overwrite is found and the discussion ends with no explanation for the
question "Where was the overwrite?  I would love to know how it
related to the problem you saw." (address@hidden) The GDB
bug might be the answer...


SEVEN
*****
From: Mikael Djurfeldt Mikael Djurfeldt <address@hidden>
Subject: Really weird things happening in Guile/GDB
To: guile-devel mailing-list
Date: Wed, 19 Sep 2001 22:10:33 +0200

Also NaN stuff. Discussion focusses around possible wrongly generated
assembly to the level of looking up the hex-codes of the filldll
instruction in the Intel manual.  Can be GDB again that makes people
think an instruction is not working...





reply via email to

[Prev in Thread] Current Thread [Next in Thread]