[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
GNU Parallel Bug Reports Bug in parallel using --pipe
From: |
Bill Wyatt |
Subject: |
GNU Parallel Bug Reports Bug in parallel using --pipe |
Date: |
Tue, 6 Sep 2011 20:09:44 -0400 |
I have a bug that I found in 20110722, and is also in the latest release.
It involves the use of --pipe, where it appears parallel is not at first
recognizing that stdin has been exhausted.
mmtobs 1> parallel --version
GNU parallel 20110822
[...]
Dell PowerEdge R900, 8 cpus, 128 GB
CentOS 5.6
Linux tdc2 2.6.18-238.12.1.el5 #1 SMP \
Tue May 31 13:22:04 EDT 2011 \
x86_64 x86_64 x86_64 GNU/Linux
I have a file of MD5 checksums, in the usual output format of
md5sum(1). I want to use parallel to split the large file into
pieces to run the checking function of "md5sum -c". But, md5sum is
sometimes outputting an error message that seems to mean it has been
called with no input, even though I supplied the "-r" argument.
The error is obvious if you have, say, a 1300-line file and make the
-L a larger number that that, and also allocate more than one cpu:
mmtobs 0> wc -l < fire.md5.txt
1300
mmtobs 1> < fire.md5.txt parallel -r -j3 -L1500 --pipe md5sum -c >/dev/null
md5sum: standard input: no properly formatted MD5 checksum lines found
md5sum: standard input: no properly formatted MD5 checksum lines found
Here's a simple case: 3 lines or fewer, using 3 cpus and one line
per execution of md5sum (and note the output of command 4):
mmtobs 2> head -3 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
2011.0320/dif2.fits.bz2: OK
2011.0320/diff.fits.bz2: OK
2011.0320/fire_0001.fits.bz2: OK
mmtobs 3> head -2 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
md5sum: standard input: no properly formatted MD5 checksum lines found
2011.0320/dif2.fits.bz2: OK
2011.0320/diff.fits.bz2: OK
mmtobs 4> head -1 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
md5sum: standard input: no properly formatted MD5 checksum lines found
md5sum: standard input: no properly formatted MD5 checksum lines found
md5sum: standard input: no properly formatted MD5 checksum lines found
md5sum: standard input: no properly formatted MD5 checksum lines found
2011.0320/dif2.fits.bz2: OK
mmtobs 5> head -1 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
md5sum: standard input: no properly formatted MD5 checksum lines found
md5sum: standard input: no properly formatted MD5 checksum lines found
2011.0320/dif2.fits.bz2: OK
Note that when the number of cpus times the number of lines divides
evenly into the input number of lines, all is well. And yes, that
strange output of _4_ error lines at command 4 really does sometimes
come out. My tests seem to show that the buffering usually correct
with larger files, but I've sometimes, not always, had inconsistent
behavior with my normal-sized files.
Bill Wyatt (wyatt at cfa harvard edu)
Smithsonian Astrophysical Observatory (Cambridge, MA, USA)
- GNU Parallel Bug Reports Bug in parallel using --pipe,
Bill Wyatt <=