Non-deterministic NaN bug

bug-gplusplus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Non-deterministic NaN bug

From:	Ron Miller
Subject:	Non-deterministic NaN bug
Date:	Fri, 1 Jun 2001 16:51:02 -0700

Hi,

I have spent a lot of the last week trying to track down a bug that appears
to be OS/hardware related.

The problem that I am seeing is that NaN (not a number) is mysteriously
appearing in some of my floating point variables even when it should not.

A program which exhibits the behaviour is included at the end of this
message. The program basically calculates floating point calculations over
and over again, and the program is written in such a way such that the
calculations will never result in over-flows or divide by zeros. The results
of the calculations are stored in a vector, and after the calculations are
complete, the program checks for NaN in the vector. I have seen some VERY
strange behavior from this program. The results are completely
non-deterministic, and sometimes the machine will run for hours without any
failures, and then all of a sudden spit out a couple in the span of a few
minutes.

The vectors are stored using doubles.

Here is an example of the strange output:

    i=67131    -14.3721 + -5.43502e-06 = -14.3722    mag=nan

A snip-it of the code that produced this is:

        mag = vect[i];
        if (mag < 0)
        {
          mag = -mag;
        }
        else if (!(mag >= 0))
        {
          printf("i=%d    %g %s %g = %g    mag=%g\n",
                 i, vect2[i], op_str[op], vect2[i + 1], vect[i], mag);

Note that "mag" was simply read from "vect[i]", and that mag was not < 0 and
not >= 0. From the printout, you can see that "mag" is NaN. However, "mag"
was simply copied from "vect[i]" which is also printed out and can be seen
as a valid number (-14.3722). Somehow the assignment was corrupted and NaN
was inserted.

Other times, I see calculations that are stored where the resulting value is
stored in vect[i] as NaN, even though the operands are valid. For example:

    i=5505086    -1.5886e-07 / -5.56503e-05 = nan    mag=nan

The program is relatively simple, although could probably be made more
simple.

My operating system / environment is:

        Redhat 6.2
        gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
        (also tried gcc version 2.95.2 19991024 (release))
        2 x 700 MHz Intel Pentium III
        100 MHz Bus Clock Speed
        2 GB RAM
        4 GB Swap Space

I also tried many different flags, but no luck. Sometimes the program will
run for days without failing at all.

Has anyone seen anything like this before??? Its beginning to seem like a
machine overheating or something, but would an overheated machine suddenly
print out NaN with no other strange behaviour?

Ron Miller
address@hidden

p.s. Here is a sample program

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <math.h>
#include <assert.h>
#include <string.h>
#include <fpu_control.h>

#define N    2500

char *op_str[6] =
{
  "+",
  "-",
  "--",
  "*",
  "//",
  "/"
};

int
main()
{
  double *vect, *vect2, mag;
  int i, pass, set, op;
  fpu_control_t word;

  word = _FPU_DEFAULT & ~(_FPU_MASK_IM | _FPU_MASK_ZM | _FPU_MASK_OM);
  _FPU_SETCW( word );

  vect = (double*)malloc(N * sizeof(double));
  vect2 = (double*)malloc(N * sizeof(double));

  for (set = 0;; ++set)
  {
    if ( set % 10000 == 0 ) {
        printf("set %d\n", set);
        fflush(stdout);
    }

    for (i = 0; i < N; ++i)
      vect[i] = rand() % 100 + 1;

    for (pass = 0; pass < 100; ++pass)
    {
      memcpy(vect2, vect, N * sizeof(double));

      op = rand() % 6;

      switch (op)
      {
      case 0:
        for (i = 0; i < N - 1; ++i)
          vect[i] = vect[i] + vect[i + 1];
        break;
      case 1:
        for (i = 0; i < N - 1; ++i)
          vect[i] = vect[i] - vect[i + 1];
        break;
      case 2:
        for (i = 0; i < N - 1; ++i)
          vect[i] = vect[i + 1] - vect[i];
        break;
      case 3:
        for (i = 0; i < N - 1; ++i)
          vect[i] = vect[i + 1] * vect[i];
        break;
      case 4:
        for (i = 0; i < N - 1; ++i)
          vect[i] = vect[i + 1] / vect[i];
        break;
      case 5:
        for (i = 0; i < N - 1; ++i)
          vect[i] = vect[i] / vect[i + 1];
        break;
      }
      for (i = 0; i < N; ++i)
      {
        mag = vect[i];
        if (mag < 0)
        {
          mag = -mag;
        }
        else if (!(mag >= 0))
        {
          printf("i=%d    %g %s %g = %g    mag=%g\n",
                 i, vect2[i], op_str[op], vect2[i + 1], vect[i], mag);
          fflush(stdout);
          vect[i] = rand() % 100 + 1;
        }
        if (mag > 1e100 || mag < 1e-100)
          vect[i] = rand() % 100 + 1;
      }
    }
  }
}

[Prev in Thread]

Current Thread

[Next in Thread]

Non-deterministic NaN bug, Ron Miller <=

Prev by Date: Golden Jubilee Quranic Exhibition
Next by Date: NEWS FROM ANNA KOURNAKOVA !!!
Previous by thread: Golden Jubilee Quranic Exhibition
Next by thread: NEWS FROM ANNA KOURNAKOVA !!!
Index(es):
- Date
- Thread