[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
parallel, splitting input files and starting
From: |
Christian Meesters |
Subject: |
parallel, splitting input files and starting |
Date: |
Mon, 2 Jan 2017 15:27:18 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.5.1 |
Hi and a happy new year to everyone reading this list,
Well, I would like to call a program with "parallel" where each instance
of the program takes a certain chunk of the work of two input files. One
condition is that the input files or the created chunks need to have a
multiple of 4 lines (e.g. 1e6 lines each), because each record to work
on consists of 4 lines. There are two big input files and the first
solution I came up with was using "split" and then creating a file list
and starting the tasks with "parallel". The second, slightly faster
solution was to use "split" to split the input files and inotifywait and
bash-only to trigger the concurrent tasks keep track of the running tasks.
My first question is: Is "parallel" able to read a given number of lines
from two input files and distribute the work without using "split",
whilst having the advantage of inotifywait.
I tried
parallel --no-notice --max-lines=200 -j x cmd
some/path/to/a/global/reference {1} {2} ::: ./input1 ::: ./input2
which starts one instance of cmd and not x instances.
Hm, I tinkered with --pipe and the options in this thread:
http://unix.stackexchange.com/questions/66463/using-gnu-parallel-with-split
- to no avail. With --pipe and --max-lines "parallel" starts two perl
processes to read the buffers and does not proceed.
Any ideas on how this could be achieved with two input files?
The second question would be: If this more direct approach does not
work. Could "parallel" support inotify? If so, how?
And a last unrelated question is: If I start several tasks with
"parallel" on several hosts of a cluster, supplying -S with a host list
from a batch system - does parallel copy the input files to the hosts or
- supplied with a full path - can it use a parallel file system? Well, I
think I need to supply a bash function, which extracts the path from a
file list, which is copied, but perhaps there is an idea how a parallel
file system could be used?
I hope I was understandable.
Best regards,
Christian Meesters
- parallel, splitting input files and starting,
Christian Meesters <=