bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21061: coreutils-8.24 - Partially reproducible failures of tests/mis


From: Peter Bray
Subject: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh
Date: Thu, 16 Jul 2015 16:09:46 +1000
User-agent: Mozilla/5.0 (X11; SunOS i86pc; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 15/07/15 08:30 PM, Pádraig Brady wrote:
On 15/07/15 10:22, Peter Bray wrote:
Greetings,

N.B. This bug report is for reference only, and documents only a
       partially reproducible check failure. No Action Requested.

On Solaris 10 (Update 8 and Update 11) and Solaris 11.2 X86 VMs, and
one Solaris 10 Update 10 (Non-VM) system, I see random "gmake check"
failures for "tests/misc/timeout-parameters.sh".

Running the test by itself (with the command line below) on the same
VMs / real system will sometimes succeed and sometimes fail.

    gmake check TESTS=tests/misc/timeout-parameters.sh VERBOSE=yes SUBDIRS=.

Looking through the attached "failure.log" file, I extracted the
following command line test, which may exhibit the failure without all
the make(1) and test infrastructure code:

    failures=0
    for i in `./src/seq 1 100`
    do
      ./src/timeout 2.34e+5d sleep 0 \
        || { echo fail; failures=`expr ${failures} + 1` }
    done
    echo "Total Failures: ${failures}"

On a real hardware system (Xeon E3-1245v2) with a 64-bit kernel,
failures are very rare (only 1 test harness failure seen, no failures
of the sample code above even with 1..1000 runs).

On virtual machines (also using Xeon E3-1245v2 running VMware ESXi
5.5d (latest patches) - two identical ESXi systems running similarly
configured VMs), test harness failures and failures in the above
command line check are rare for the 64-bit Solaris kernels.

Failures on Solaris 10 32-bit kernels (on both of these ESXi servers),
are easily reproduced and vary between 5% (common) and 45% (rare).

Interesting. I'm not reproducing that in 5000 loops in the above test script
on 32 bit baremetal solaris 10 update 10.

I presume the large timeout value is causing early timer firing
on your systems for some reason? What does this return?

   time src/timeout 2.34e+5d sleep inf

Note on 32 bit, the 234000 days will be truncated to a itimerspec of:
   { {0,0}, {2147483647,999999999} }

A wild guess is that perhaps ntp is adjusting the system time
which causes the above timer to be adjusted in the kernel
and roll over to 0, thus triggering early?

thanks,
Pádraig.


Pádraig,

The additional information you requested, but unfortunately I have yet
to install gdb(1), so I using system tools for this response. The
installation of coreutils-8.24 has been completed on all compile
server VMs, so the commands now have a 'g' prefix.

% gtimeout 2.34e+5d gsleep inf

No output and exit status of 124 [$?=124] (32-bit kernel S10U11 / GCC 4.9.3)

% truss gtimeout 2.34e+5d gsleep inf 2>&1 | tee truss.out

File Attached "truss.out" (also exits with 124)

Note: Adding the -v option below on a separate run, did not yield a
great deal of information on the data provided to the timer*() calls.

% truss -tall -v timer_create,timer_settime gtimeout 2.34e+5d gsleep inf 2>&1

timer_create(3, 0x00000000, 0x080471AC)         = 0
timer_settime(0, 0, 0x080471B0, 0x00000000)     = 0
    Received signal #14, SIGALRM, in waitid() [caught]
      siginfo: SIGALRM pid=12118 uid=100 code=-3
waitid(P_PID, 12119, 0x08047130, WEXITED|WTRAPPED) Err#91 ERESTART

Also captured truss -l output via:

% truss -l -tall -v timer_create,timer_settime gtimeout 2.34e+5d gsleep inf \
  2>&1 | tee truss-l.out

Normal apptrace show nothing of great value (does not show actual data
just addresses):

% apptrace gtimeout 2.34e+5d gsleep inf

but it is attached as "apptrace.out".

Note that the following apptrace command coredumps on each invocation:

% apptrace -v timer_settime gtimeout 2.34e+5d gsleep inf
-> gtimeout -> librt.so.1:int timer_settime(timer_t = 0x0, int = 0x0, const struct itimerspec * = 0x8047130, struct itimerspec * = 0x0)
        arg0 = (timer_t) 0x0
        arg1 = (int) 0x0
        arg2 = (const struct itimerspec *) 0x8047130 (struct itimerspec) {
        it_interval: (struct timespec) {
                tv_sec: (time_t) 0
                tv_nsec: (long) 0
        it_value: (struct timespec) {
                tv_sec: (time_t) 0x7fffffff
                tv_nsec: (long) 0x3b9ac9ff
        }
        arg3 = (struct itimerspec *) 0x0 (struct itimerspec) {
        it_interval: (struct timespec) {
                tv_sec: (time_t)
apptrace: gtimeout: Segmentation Fault(Core dump)

This coredump occurs even on 64-bit systems where the gtimeout command
waits for the sleep command (to finish - which it won't).

The timer_settime(3RT) manual page states that the last argument is
permitted to be NULL, so that does not seem to be a problem.

And regarding the NTP question, all compile server VMs have NTP disabled.

% svcs -a | grep -i ntp

disabled       Jul_13   svc:/network/ntp:default
disabled       Jul_13   svc:/network/ntp4:default

That is, NTP is disabled (since last boot) but actually disabled since
installation. Though NTP is runnning on both ESXi 5.5 hosts.

Regards,
Peter

PS: Investigating with my limited mdb(1) skills shows that its
    apptrace(1) coredumping not gtimeout(1).

% mdb =gtimeout core
> $C
08046a98 LMc9bfeea8`apptrace.so.1`print_int+0xbd(7, 0, 8046cc0)
08046bc4 LMc9bfeea8`apptrace.so.1`elt_print+0x137(c91e47b6, 2f, 0, 2, 8046cc0) 08046bf4 LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x56(c91f3930, 2f, c9b63588, 8046cc0, c91e47b6, 0) 08046c2c LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x15e(c91f3930, 30, c9b63588, 8046cc0, c91e489b, 0) 08046c64 LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x15e(c91f3930, 44, c9b63588, 8046cc0, c9ad9048, 0) 08046c8c LMc9bfeea8`libctf.so.1`ctf_type_visit+0x2c(c91f3930, 44, c9b63588, 8046cc0)
08046ce0 LMc9bfeea8`apptrace.so.1`print_value+0x127(c91f3930, 45, 0)
08046fac LMc9bfeea8`apptrace.so.1`la_i86_pltenter+0x3d1(8047030, 38, c9af06f0, c9af0e48, 8047084, c9940304) 08047000 ld.so.1`_audit_pltenter+0x11e(c9af06c0, c9bfea18, c9af0ac0, 8047030, 38, 8047084) 08047050 ld.so.1`audit_pltenter+0x98(c9bfea18, c9af0ac0, c9940308, 38, 8047084, c9940304)
080470d8 ld.so.1`elf_plt_trace+0x4d(0, 0, 8047130, 0)
08047158 settimeout+0x11d(60000000, 4212d440, 8047218, 8047218)
080471e8 main+0x2c6(8052e50, 4, 8047218)
0804720c _start+0x80(4, 80473d0, 80473d9, 80473e2, 80473e9, 0)

Attachment: apptrace.out
Description: Text document

Attachment: truss.out
Description: Text document

Attachment: truss-l.out
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]