[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: testsuite failure - 193 parallel execution
From: |
Paul Eggert |
Subject: |
Re: testsuite failure - 193 parallel execution |
Date: |
Tue, 20 Jul 2010 14:05:26 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.10) Gecko/20100527 Thunderbird/3.0.5 |
On 07/20/10 12:27, Ralf Wildenhues wrote:
> Good point, but wouldn't that be at least a QoI issue for the shell?
I'd think so, yes.
Staring at the code some more I see another race condition
that could explain the problem. Suppose the parent, just
after the last fork, races ahead of the child and jumps
ahead to the "read at_token" cleanup loop. The parent then
executes the last "read at_token" cleanup at a point
where the second-to-the-last child has already output
its token, but before the second-to-the-last-child has
closed the fifo. The "read at_token" will then return 0 (because
it sees end-of-file), but the parent incorrectly thinks
that it has seen a token and then closes down the fifo
before the last child gets a chance to write its token.
If this guess is right, the following (untested) patch
might fix the problem. The basic idea is to open the
fifo just once for reading and once for writing in the
parent, so that no child needs to open a fifo and no
child is left behind.
--- general.m4 2010-07-20 11:12:58.055141603 -0700
+++ /tmp/general.m4 2010-07-20 13:59:28.607141344 -0700
@@ -959,7 +959,8 @@ export PATH
# Setting up the FDs.
m4_define([AS_MESSAGE_LOG_FD], [5])
-m4_define([AT_JOB_FIFO_FD], [6])
+m4_define([AT_JOB_INFIFO_FD], [6])
+m4_define([AT_JOB_OUTFIFO_FD], [7])
[#] AS_MESSAGE_LOG_FD is the log file. Not to be overwritten if `-d'.
if $at_debug_p; then
at_suite_log=/dev/null
@@ -1366,6 +1367,9 @@ dnl cause changed test semantics; e.g.,
at_joblist=`AS_ECHO([" $at_groups_all "]) | \
sed 's/\( '$at_jobs'\) .*/\1/'`
+ exec AT_JOB_INFIFO_FD<"$at_job_fifo"
+ exec AT_JOB_OUTFIFO_FD>"$at_job_fifo"
+
set X $at_joblist
shift
for at_group in $at_groups; do
@@ -1376,7 +1380,7 @@ dnl avoid all the status output by the s
(
# Start one test group.
$at_job_control_off
- exec AT_JOB_FIFO_FD>"$at_job_fifo"
+ exec AT_JOB_INFIFO_FD<&-
dnl When a child receives PIPE, be sure to write back the token,
dnl so the master does not hang waiting for it.
dnl errexit and xtrace should not be set in this shell instance,
@@ -1386,7 +1390,7 @@ dnl optimize away the _AT_CHECK subshell
dnl Ignore PIPE signals that stem from writing back the token.
trap "" PIPE
echo stop > "$at_stop_file"
- echo token >&AT_JOB_FIFO_FD
+ echo >&AT_JOB_OUTFIFO_FD
dnl Do not reraise the default PIPE handler.
dnl It wreaks havoc with ksh, see above.
dnl trap - 13
@@ -1395,26 +1399,24 @@ dnl kill -13 $$
at_fn_group_prepare
if cd "$at_group_dir" &&
at_fn_test $at_group &&
- . "$at_test_source" # AT_JOB_FIFO_FD>&-
+ . "$at_test_source" # AT_JOB_OUTFIFO_FD>&-
then :; else
AS_WARN([unable to parse test group: $at_group])
at_failed=:
fi
at_fn_group_postprocess
- echo token >&AT_JOB_FIFO_FD
+ echo >&AT_JOB_OUTFIFO_FD
) &
$at_job_control_off
- if $at_first; then
- at_first=false
- exec AT_JOB_FIFO_FD<"$at_job_fifo"
- fi
+ at_first=false
shift # Consume one token.
if test address@hidden:@] -gt 0; then :; else
- read at_token <&AT_JOB_FIFO_FD || break
+ read at_token <&AT_JOB_INFIFO_FD || break
set x $[*]
fi
test -f "$at_stop_file" && break
done
+ exec AT_JOB_OUTFIFO_FD>&-
# Read back the remaining ($at_jobs - 1) tokens.
set X $at_joblist
shift
@@ -1423,9 +1425,9 @@ dnl kill -13 $$
for at_job
do
read at_token
- done <&AT_JOB_FIFO_FD
+ done <&AT_JOB_INFIFO_FD
fi
- exec AT_JOB_FIFO_FD<&-
+ exec AT_JOB_INFIFO_FD<&-
wait
else
# Run serially, avoid forks and other potential surprises.
- testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Ralf Wildenhues, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Paul Eggert, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Ralf Wildenhues, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Paul Eggert, 2010/07/20
- Re: testsuite failure - 193 parallel execution,
Paul Eggert <=
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Paul Eggert, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Paul Eggert, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Paul Eggert, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Paul Eggert, 2010/07/20
- Re: testsuite failure - 193 parallel execution, Eric Blake, 2010/07/20