parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)


From: Ole Tange
Subject: Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)
Date: Wed, 19 Jan 2011 17:22:33 +0100

On Wed, Jan 19, 2011 at 1:25 AM, Ole Tange <tange@gnu.org> wrote:
> On Tue, Jan 11, 2011 at 4:32 PM, Ole Tange <tange@gnu.org> wrote:
>> You are hereby invited to help design a block-wise-map-reduce feature
>> of GNU Parallel. These are my current thoughts. Feel free to give your
>> input - especially if you need something similar.
>

The git version now contains --files which will not remove the output
files for stdout but instead print the filenames on stdout. It also
sends only one block to each process. This makes it possible to keep
order and do a parallel sorting like this:

nice seq 1 1000000 | shuf | parallel --files --recend "\n" -j10
--spreadstdin sort -n | parallel -Xj1 sort -nm {} ";"rm {}
>/tmp/sorted

Or gzipping a file in chunks in parallel:

seq 1 1000000 | parallel -k --recend "\n" -j10 --spreadstdin gzip -9 > foo.gz

Get the git version:

git clone git://git.savannah.gnu.org/parallel.git

Please give me feed back on if sending only one block to the worker
process is better or worse for your use.

Advantage:
- it is possible to --keep-order

Disadvantage:
- it requires a fork for each block instead of one fork per jobslot in total


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]