parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Broken Pipe


From: Ole Tange
Subject: Re: Broken Pipe
Date: Thu, 27 Sep 2012 11:12:33 +0200

On Thu, Sep 27, 2012 at 2:11 AM, Joseph White <hybris246@gmail.com> wrote:

> Thanks for the suggestions. Going by what I read about what causes broken
> pipe message it may well be that parallel cannot spawn the jobs in time. My
> ./linkcheck program is a bashscript which also calls a perl script and the
> 'host' program from inside it.

It is unclear to me whether linkcheck takes an URL as argument on the
command line or it reads from STDIN. The first type requires
'parallel' without --pipe and the second requires 'parallel --pipe'.

If you cannot explain to others the difference between --pipe and no
--pipe, please watch
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 especially
the FOSDEM release.

> So perhaps parallel cannot spawn enough jobs
> because it has to load all the additional programs as well. I changed a few
> things around and forgot to do backups, so I can't find exactly what I
> changed in my script, but I can now do:
>
> cat urllist | parallel --pipe -j0 ./linkcheck
>
> This runs 506 jobs with split input and is what I wanted initially. But
> going by your example and using moderate values:
>
> cat urllist | parallel -j10 --pipe parallel -j5 ./linkcheck >>  errors.log

That will only spawn 50 checks. I suggested:

 cat urllist | parallel -j10 --pipe parallel -j0 ./linkcheck >>  errors.log

which will run 5000'ish. What I warned you against is:

 cat urllist | parallel -j0 --pipe parallel -j0 ./linkcheck >>  errors.log

which will try to spawn 500*500 jobs.

> This fails with the broken pipe message, and is it not less jobs than the
> first example? Do I need to find a way to delay my scripts to give parallel
> a chance to keep up? Sorry if this is just totally going over my head. I am
> trying to understand.

If you are serious about wanting to understand you should:

* Watch all videos on
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1. If there are
parts you do not understand watch the video at least 3 times.

* Try out all the examples in the man page.

For debugging --dryrun is a good help for understanding:

 cat urllist | parallel -j10 --pipe parallel -j0 --dryrun ./linkcheck

This way you can see what will be run.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]