parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)


From: Jay Hacker
Subject: Re: Splitting STDIN to parallel processes (map-reduce on blocks of data)
Date: Wed, 12 Jan 2011 13:56:48 -0500

I think this is a great feature, and I would definitely use it.

> Will it make more sense to do max-block size? What should happen if a single 
> record is bigger than max-block size?

I think you could do both, and just stipulate that at least one record
will always be read, regardless of max-block size.

> how do I check if a write to a pipe would block?

Spawn a thread to feed each jobslot, and just let it block.  Give each
"feeder" thread a record queue, and have an input reader thread fill
them round-robin.  In buffered mode, just keep filling them to allow
fast processes to go at their own pace.  In unbuffered mode, limit the
max queue size to 1 and block the reader on a full queue.  Overall
progress is limited by the slowest machine in the unbuffered case, but
you don't need any extra memory.  Preferably keep the queues on the
remote machines with -S (using a wrapper script) to take advantage of
distributed memory.

Great idea.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]