parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Broken Pipe


From: Joseph White
Subject: Broken Pipe
Date: Tue, 25 Sep 2012 02:47:57 +0100

Hi.

I'm using GNU Parallel 20120822, and it is very useful to me when sending large lists to bash scripts that just take $1 command line parameter as the input. e.g.,

cat url-list | parallel --eta --progress --joblog jobnew.log -j0 ./linkcheck {} >> errors.log

(I'm using j0 because my scripts call up programs such as host or wget, which I hope benefit from being run many times in parallel.)

But I have large url-lists of around 250mb that I need to pipe. I've tried changing my scripts so they read from stdin using bash 'read'. e.g.,

#!/bin/bash

while read domain;
do
host "$domain"
done

Here is an example of what the url-list contains:

http://www.hairforsale.com
http://www.rdhjobs.com
http://www.gdha.org
http://www.hotdogsafari.com

Would using the pipe command speed up the process significantly? I've tried many forms of it, e.g.,

cat url-list | parallel --pipe --block 10M --eta --progress --joblog jobnew.log -j100 ./checknew {} >> errornew.log

It works for smaller files, but as soon as I try it on the bigger lists I always get this type of error:

'10879 broken pipeĀ  cat url-list |'

I'm assuming this error message is from cat. When I run the above command on smaller lists it appears to work, but perhaps there are special characters -- that I can't notice -- appearing in bigger lists that get interpreted by the shell as a command and break the pipe? I've tried scanning for such characters, but I don't know what to look for; and the files contain well over 370000 lines, so it is like looking for a needle in a haystack.

Also would the -u option speed up the process?

Thanks for reading.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]