parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How should --onall work?


From: Ole Tange
Subject: Re: How should --onall work?
Date: Tue, 7 Jun 2011 00:07:09 +0200

First: Did you read the man page? Specifically: EXAMPLE: Running the
same command on remote computers (unimplemented)

(The '(unimplimented)' is not really true anymore).

time ./src/parallel -j0 --nonall -S c,d,e,f "hostname ; uptime"

/Ole

On Mon, Jun 6, 2011 at 3:30 PM, Hans Schou <chlor@schou.dk> wrote:
> I might be doing something wrong, which makes it longer time. What is
> the right syntax for what I am trying to do? (I expect you can read my
> mind)
>
> $ time ./src/parallel "ssh {} 'hostname ; uptime'" ::: c d e f
> castor
>  13:18:12 up 48 days,  6:14,  1 user,  load average: 0.00, 0.00, 0.00
> elvis
>  13:18:12 up 32 days,  2:59,  2 users,  load average: 0.34, 0.34, 0.28
> frank
>  13:18:12 up 160 days,  1:45,  0 users,  load average: 4.74, 5.83, 5.70
> daimi
>  15:18:12 up 47 days, 22:58,  0 users,  load average: 0.00, 0.00, 0.00
>
> real    0m1.031s
> user    0m0.152s
> sys     0m0.032s
>
> $ time ./src/parallel --onall -S c,d,e,f "hostname ; uptime #" ::: 1
> castor
>  13:18:18 up 48 days,  6:14,  1 user,  load average: 0.00, 0.00, 0.00
> elvis
>  13:18:19 up 32 days,  2:59,  2 users,  load average: 0.32, 0.33, 0.28
> frank
>  13:18:19 up 160 days,  1:45,  0 users,  load average: 4.76, 5.82, 5.70
> daimi
>  15:18:19 up 47 days, 22:58,  0 users,  load average: 0.00, 0.00, 0.00
>
> real    0m1.291s
> user    0m0.564s
> sys     0m0.112s
>
>
> /hans
> 2011/5/26 Ole Tange <tange@gnu.org>:
>> I have been convinced that GNU Parallel should have an --onall option.
>>
>>       --onall (unimplemented)
>>                Run all the jobs on all computers given with --sshlogin. GNU
>>                parallel will log into --jobs number of computers in parallel
>>                and run one job at a time on the computer. The order of the
>>                jobs will not be changed, but some computers may finish
>>                before others.
>>
>> I intend this:
>>
>>  parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\ \$2}'
>> ::: a b c ::: 1 2 3
>>
>> to do:
>>
>>  parallel -S eos '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>>  parallel -S iris '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>>
>> In practise I believe this could be easily implemented by having GNU
>> Parallel call parallel like this:
>>
>>  parallel -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3} {2}) | awk
>> \{print\ \$2}'
>>  parallel -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo {3} {2}) | awk
>> \{print\ \$2}'
>>
>> where I simply put 'a\nb\nc\n' and '1\n2\n3\n' into /tmp/abc and
>> /tmp/123 respectively. As they are already being put into temporary
>> files then the change may be small. I believe this would work out
>> fine.
>>
>> A small penalty is that if run n jobs in parallel and have 2n hosts,
>> it will do all the jobs for host1-n first and then all the jobs for
>> hostn-2n. It will not run the first job on all hosts first and then
>> the second.
>>
>> - o -
>>
>> I have a harder time figuring how to deal with stdin:
>>
>>  cat | parallel --onall -S eos,iris
>>
>> This should run whatever comes from cat on both eos and iris. While
>> the above is easy:
>>
>>  cat | tee >(ssh eos) >(ssh iris) >/dev/null
>>
>> it becomes harder if you have so many hosts (10000s) that you cannot
>> login to all of them at the same time.
>>
>> Also this one is tricky as you have to keep the {n} working:
>>
>>  cat | parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\
>> \$2}' :::: - ::: a b c ::: 1 2 3
>>
>> Maybe the solution is to accept that we have to read all of stdin
>> first, put that in a file and use -a as above?
>>
>> So the tricky one will be executed like:
>>
>>  # Stuff everything from stdin into a file
>>  cat > /tmp/stdin
>>  # Call parallel for each host in parallel
>>  parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3}
>> {2}) | awk \{print\ \$2}' &
>>  parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo
>> {3} {2}) | awk \{print\ \$2}' &
>>
>> The price will be that if you have a slow program generating the stdin
>> then that program has to finish before GNU Parallel can even begin
>> executing the jobs. Ideally GNU Parallel should start executing the
>> jobs that it already knows have to be run.
>>
>> One way of solving that would be having a jobqueue for each sshlogin.
>> That, however, looks like a big change to the code.
>>
>> - o -
>>
>> People wanting to use GNU Parallel for running the same commands on a
>> lists of hosts can you please describe your situations, so the design
>> will work well. At the very least I need to know:
>>
>> * number of hosts (can we just log in to all of them simultaneously?)
>> * number of commands to be run (is it just 1 or is it a script
>> generated on stdin?)
>> * is speed an issue? (would it be OK to ssh for each command?)
>> * how are the commands generated? (is it a fast program, so it is OK
>> to read everything before executing anything?)
>>
>>
>> /Ole
>>
>>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]