[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: FYI, new, rare failure on rawhide
From: |
Jim Meyering |
Subject: |
Re: FYI, new, rare failure on rawhide |
Date: |
Wed, 16 Nov 2011 22:24:42 +0100 |
Paul Eggert wrote:
> Possibly a bug in the file system or kernel?
> An earlier part of the code does this:
>
> /* lines 57 and 58 of test-utimens.h */
> ASSERT (func (BASE "file", ts) == 0);
> ASSERT (stat (BASE "file", &st1) == 0);
>
> and this relies on st_ctime being marked for update in the first
> line, and being actually updated before the second line takes effect.
> This is pretty easy to get wrong, and many file systems do get similar
> things wrong, particularly under high load. If my guess is right,
> it should be easy to verify by instrumenting those lines.
>
> There is a similar issue here:
>
> /* lines 98 and 99 (/
> ASSERT (func (BASE "file", ts) == 0);
> ASSERT (stat (BASE "file", &st2) == 0);
>
> Of course this is just a long-distance guess....
I think it comes down to either a bug, or
that somehow "nap()" is not sleeping long enough.
Details on how this file system is mounted:
(it's SSD-backed in a rawhide guest, 3.2.0-0.rc1.git4.1.fc17.x86_64)
$ grep vda4 /proc/mounts
/dev/vda4 / ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0
There seem to be at least three points at which I'm seeing failed assertions.
I caught this by changing the abort() in macros.h's ASSERT definition to
be sleep(9999999) and by running parallel "make check" jobs until one hung.
Then attached with gdb:
#2 0x0000000000403c28 in test_lutimens (func=0x401e10 <do_lutimens1>,
print=false) at test-lutimens.h:66
(gdb) l
61 }
62 ASSERT (stat (BASE "file", &st2) == 0);
63 ASSERT (st2.st_atime == Y2K);
64 ASSERT (st2.st_mtime == Y2K);
65 if (check_ctime)
66 ASSERT (st1.st_ctime < st2.st_ctime
67 || (st1.st_ctime == st2.st_ctime
68 && get_stat_ctime_ns (&st1) < get_stat_ctime_ns
(&st2)));
69
70 /* Play with symlink timestamps. */
(gdb) p st1.st_ctim
$1 = {
tv_sec = 1321473919,
tv_nsec = 914482118
}
(gdb) p st2.st_ctim
$2 = {
tv_sec = 1321473919,
tv_nsec = 914482118
}
Equal structs. Definitely indicates failure, somewhere.
======================================================================
Trying again, I hit a sleep on the 8th trial.
Note, it's in a different file and with different var names,
but the same condition: unchanged ctime. This time, I was using a
40-millisecond delay, just in case 20 was close but not enough.
#2 0x0000000000402ad3 in test_utimens (func=0x401e50 <do_utimens>, print=false)
at test-utimens.h:135
135 ASSERT (st2.st_ctime < st3.st_ctime
(gdb) p st2.st_ctim
$1 = {
tv_sec = 1321476466,
tv_nsec = 27092765
}
(gdb) p st3.st_ctim
$2 = {
tv_sec = 1321476466,
tv_nsec = 27092765
}
======================================================================
I did it one more time:
#2 0x0000000000405153 in test_futimens (print=<optimized out>,
func=0x402930 <do_futimens>) at test-futimens.h:148
148 ASSERT (st2.st_ctime < st3.st_ctime
(gdb) p st2.st_ctim
$1 = {
tv_sec = 1321478037,
tv_nsec = 577808187
}
(gdb) p st3.st_ctim
$2 = {
tv_sec = 1321478037,
tv_nsec = 577808187
}
Then examined the ctime of the actual file:
$ stat --fo=%.Z test-utimens.tfile
1321478037.577808187
Yes, it matches. This was also with the 40-millisecond delay.
This time it took 65 iterations.
Tomorrow I'll write a better test.
I suspect that "make check" is not exercising it thoroughly enough.