parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using gnu-parallel with processes that have an extended startup time


From: Ole Tange
Subject: Re: Using gnu-parallel with processes that have an extended startup time
Date: Tue, 16 Jul 2013 15:02:22 +0200

On Tue, Jul 16, 2013 at 2:20 PM, Diaa Sami <diaasami@gmail.com> wrote:

> Hi,
> I'm using gnu parallel with a custom python script that processes lines, one 
> line in, one or more lines out, and this script happens to have a long 
> startup time because of the kind of processing it has to perform on the 
> input(it has to load a dictionary in memory first).
> I was wondering if gnu parallel can just keep the processes running and just 
> feed them records rather than starting a process for each block.

So you are doing something like:

cat bigfile | parallel --pipe yourprogram > output

And no: GNU Parallel currently does not have an option for feeding
more blocks to a running instance.

It is a feature that I have been thinking to implement either:

* as a round-robin (pass a block to yourprogram in slot 1, then a
block to yourprogram in slot 2, then back to 1 and then 2 again).

or:

* as a non-blocking write that writes to any slot that is ready to
receive a new block. This is would be the coolest solution, because it
would deal with some blocks being easy to process while some may be
harder, but it might also be the hardest to implement.

A secondary problem is that this may be rather slow. As it is now GNU
Parallel --pipe tops out at 100 MB/s and I expect the implementations
above will make this drop considerably.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]