[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How should --onall work?
From: |
Ole Tange |
Subject: |
Re: How should --onall work? |
Date: |
Tue, 7 Jun 2011 00:07:09 +0200 |
First: Did you read the man page? Specifically: EXAMPLE: Running the
same command on remote computers (unimplemented)
(The '(unimplimented)' is not really true anymore).
time ./src/parallel -j0 --nonall -S c,d,e,f "hostname ; uptime"
/Ole
On Mon, Jun 6, 2011 at 3:30 PM, Hans Schou <chlor@schou.dk> wrote:
> I might be doing something wrong, which makes it longer time. What is
> the right syntax for what I am trying to do? (I expect you can read my
> mind)
>
> $ time ./src/parallel "ssh {} 'hostname ; uptime'" ::: c d e f
> castor
> 13:18:12 up 48 days, 6:14, 1 user, load average: 0.00, 0.00, 0.00
> elvis
> 13:18:12 up 32 days, 2:59, 2 users, load average: 0.34, 0.34, 0.28
> frank
> 13:18:12 up 160 days, 1:45, 0 users, load average: 4.74, 5.83, 5.70
> daimi
> 15:18:12 up 47 days, 22:58, 0 users, load average: 0.00, 0.00, 0.00
>
> real 0m1.031s
> user 0m0.152s
> sys 0m0.032s
>
> $ time ./src/parallel --onall -S c,d,e,f "hostname ; uptime #" ::: 1
> castor
> 13:18:18 up 48 days, 6:14, 1 user, load average: 0.00, 0.00, 0.00
> elvis
> 13:18:19 up 32 days, 2:59, 2 users, load average: 0.32, 0.33, 0.28
> frank
> 13:18:19 up 160 days, 1:45, 0 users, load average: 4.76, 5.82, 5.70
> daimi
> 15:18:19 up 47 days, 22:58, 0 users, load average: 0.00, 0.00, 0.00
>
> real 0m1.291s
> user 0m0.564s
> sys 0m0.112s
>
>
> /hans
> 2011/5/26 Ole Tange <tange@gnu.org>:
>> I have been convinced that GNU Parallel should have an --onall option.
>>
>> --onall (unimplemented)
>> Run all the jobs on all computers given with --sshlogin. GNU
>> parallel will log into --jobs number of computers in parallel
>> and run one job at a time on the computer. The order of the
>> jobs will not be changed, but some computers may finish
>> before others.
>>
>> I intend this:
>>
>> parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\ \$2}'
>> ::: a b c ::: 1 2 3
>>
>> to do:
>>
>> parallel -S eos '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>> parallel -S iris '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3
>>
>> In practise I believe this could be easily implemented by having GNU
>> Parallel call parallel like this:
>>
>> parallel -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3} {2}) | awk
>> \{print\ \$2}'
>> parallel -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo {3} {2}) | awk
>> \{print\ \$2}'
>>
>> where I simply put 'a\nb\nc\n' and '1\n2\n3\n' into /tmp/abc and
>> /tmp/123 respectively. As they are already being put into temporary
>> files then the change may be small. I believe this would work out
>> fine.
>>
>> A small penalty is that if run n jobs in parallel and have 2n hosts,
>> it will do all the jobs for host1-n first and then all the jobs for
>> hostn-2n. It will not run the first job on all hosts first and then
>> the second.
>>
>> - o -
>>
>> I have a harder time figuring how to deal with stdin:
>>
>> cat | parallel --onall -S eos,iris
>>
>> This should run whatever comes from cat on both eos and iris. While
>> the above is easy:
>>
>> cat | tee >(ssh eos) >(ssh iris) >/dev/null
>>
>> it becomes harder if you have so many hosts (10000s) that you cannot
>> login to all of them at the same time.
>>
>> Also this one is tricky as you have to keep the {n} working:
>>
>> cat | parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\
>> \$2}' :::: - ::: a b c ::: 1 2 3
>>
>> Maybe the solution is to accept that we have to read all of stdin
>> first, put that in a file and use -a as above?
>>
>> So the tricky one will be executed like:
>>
>> # Stuff everything from stdin into a file
>> cat > /tmp/stdin
>> # Call parallel for each host in parallel
>> parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3}
>> {2}) | awk \{print\ \$2}' &
>> parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo
>> {3} {2}) | awk \{print\ \$2}' &
>>
>> The price will be that if you have a slow program generating the stdin
>> then that program has to finish before GNU Parallel can even begin
>> executing the jobs. Ideally GNU Parallel should start executing the
>> jobs that it already knows have to be run.
>>
>> One way of solving that would be having a jobqueue for each sshlogin.
>> That, however, looks like a big change to the code.
>>
>> - o -
>>
>> People wanting to use GNU Parallel for running the same commands on a
>> lists of hosts can you please describe your situations, so the design
>> will work well. At the very least I need to know:
>>
>> * number of hosts (can we just log in to all of them simultaneously?)
>> * number of commands to be run (is it just 1 or is it a script
>> generated on stdin?)
>> * is speed an issue? (would it be OK to ssh for each command?)
>> * how are the commands generated? (is it a fast program, so it is OK
>> to read everything before executing anything?)
>>
>>
>> /Ole
>>
>>
>
>