parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

parallel, splitting input files and starting


From: Christian Meesters
Subject: parallel, splitting input files and starting
Date: Mon, 2 Jan 2017 15:27:18 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.5.1

Hi and a happy new year to everyone reading this list,

Well, I would like to call a program with "parallel" where each instance of the program takes a certain chunk of the work of two input files. One condition is that the input files or the created chunks need to have a multiple of 4 lines (e.g. 1e6 lines each), because each record to work on consists of 4 lines. There are two big input files and the first solution I came up with was using "split" and then creating a file list and starting the tasks with "parallel". The second, slightly faster solution was to use "split" to split the input files and inotifywait and bash-only to trigger the concurrent tasks keep track of the running tasks.

My first question is: Is "parallel" able to read a given number of lines from two input files and distribute the work without using "split", whilst having the advantage of inotifywait.

I tried

parallel --no-notice --max-lines=200 -j x cmd some/path/to/a/global/reference {1} {2} ::: ./input1 ::: ./input2

which starts one instance of cmd and not x instances.

Hm, I tinkered with --pipe and the options in this thread: http://unix.stackexchange.com/questions/66463/using-gnu-parallel-with-split - to no avail. With --pipe and --max-lines "parallel" starts two perl processes to read the buffers and does not proceed.

Any ideas on how this could be achieved with two input files?

The second question would be: If this more direct approach does not work. Could "parallel" support inotify? If so, how?

And a last unrelated question is: If I start several tasks with "parallel" on several hosts of a cluster, supplying -S with a host list from a batch system - does parallel copy the input files to the hosts or - supplied with a full path - can it use a parallel file system? Well, I think I need to supply a bash function, which extracts the path from a file list, which is copied, but perhaps there is an idea how a parallel file system could be used?

I hope I was understandable.

Best regards,

Christian Meesters




reply via email to

[Prev in Thread] Current Thread [Next in Thread]