bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI, new, rare failure on rawhide


From: Jim Meyering
Subject: Re: FYI, new, rare failure on rawhide
Date: Wed, 16 Nov 2011 22:24:42 +0100

Paul Eggert wrote:
> Possibly a bug in the file system or kernel?
> An earlier part of the code does this:
>
>     /* lines 57 and 58 of test-utimens.h */
>     ASSERT (func (BASE "file", ts) == 0);
>     ASSERT (stat (BASE "file", &st1) == 0);
>
> and this relies on st_ctime being marked for update in the first
> line, and being actually updated before the second line takes effect.
> This is pretty easy to get wrong, and many file systems do get similar
> things wrong, particularly under high load.  If my guess is right,
> it should be easy to verify by instrumenting those lines.
>
> There is a similar issue here:
>
>     /* lines 98 and 99 (/
>     ASSERT (func (BASE "file", ts) == 0);
>     ASSERT (stat (BASE "file", &st2) == 0);
>
> Of course this is just a long-distance guess....

I think it comes down to either a bug, or
that somehow "nap()" is not sleeping long enough.

Details on how this file system is mounted:
(it's SSD-backed in a rawhide guest, 3.2.0-0.rc1.git4.1.fc17.x86_64)

  $ grep vda4 /proc/mounts
  /dev/vda4 / ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0

There seem to be at least three points at which I'm seeing failed assertions.
I caught this by changing the abort() in macros.h's ASSERT definition to
be sleep(9999999) and by running parallel "make check" jobs until one hung.
Then attached with gdb:

#2  0x0000000000403c28 in test_lutimens (func=0x401e10 <do_lutimens1>,
    print=false) at test-lutimens.h:66
(gdb) l
61        }
62        ASSERT (stat (BASE "file", &st2) == 0);
63        ASSERT (st2.st_atime == Y2K);
64        ASSERT (st2.st_mtime == Y2K);
65        if (check_ctime)
66          ASSERT (st1.st_ctime < st2.st_ctime
67                  || (st1.st_ctime == st2.st_ctime
68                      && get_stat_ctime_ns (&st1) < get_stat_ctime_ns 
(&st2)));
69
70        /* Play with symlink timestamps.  */

(gdb) p st1.st_ctim
$1 = {
  tv_sec = 1321473919,
  tv_nsec = 914482118
}
(gdb) p st2.st_ctim
$2 = {
  tv_sec = 1321473919,
  tv_nsec = 914482118
}

Equal structs.  Definitely indicates failure, somewhere.

======================================================================
Trying again, I hit a sleep on the 8th trial.
Note, it's in a different file and with different var names,
but the same condition: unchanged ctime.  This time, I was using a
40-millisecond delay, just in case 20 was close but not enough.

#2  0x0000000000402ad3 in test_utimens (func=0x401e50 <do_utimens>, print=false)
    at test-utimens.h:135
135           ASSERT (st2.st_ctime < st3.st_ctime
(gdb) p st2.st_ctim
$1 = {
  tv_sec = 1321476466,
  tv_nsec = 27092765
}
(gdb) p st3.st_ctim
$2 = {
  tv_sec = 1321476466,
  tv_nsec = 27092765
}

======================================================================
I did it one more time:

    #2  0x0000000000405153 in test_futimens (print=<optimized out>,
        func=0x402930 <do_futimens>) at test-futimens.h:148
    148           ASSERT (st2.st_ctime < st3.st_ctime
    (gdb) p st2.st_ctim
    $1 = {
      tv_sec = 1321478037,
      tv_nsec = 577808187
    }
    (gdb) p st3.st_ctim
    $2 = {
      tv_sec = 1321478037,
      tv_nsec = 577808187
    }

Then examined the ctime of the actual file:

    $ stat --fo=%.Z test-utimens.tfile
    1321478037.577808187

Yes, it matches.  This was also with the 40-millisecond delay.
This time it took 65 iterations.

Tomorrow I'll write a better test.
I suspect that "make check" is not exercising it thoroughly enough.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]