Anders,
Take a look at the --sqlmaster and --sqlworker options.
I use them to effectively create a jobqueue that any node
can pull tasks from. I do this for long running backups on a
parallel filesystem (all nodes have read/write access to the
data and the sql joblog file).
1. Create a list of "tasks" and send that to parallel
invoked with the --sqlmaster option. The sqlmaster option
will create the joblog and exit.
2. On any machine that has access to the joblog file AND
the data, run parallel with the --sqlworker option. As new
machines come available, you can start parallel on them in the
same manner. To stop work on a particular node, send a KILL
signal to the parallel process on that node, which will stop
spawning any new jobs and exit after existing tasks have
completed.
In my case, each "task" is a bash script file, and I list
them, one per line, in a tasklist file, such as:
/path/to/001.cmd
/path/to/002.cmd
...
/path/to/675.cmd
The parallel sqlmaster cmdline is then:
parallel -a "/path/to/tasklist" --sqlmaster "$DBURL" bash
The DBURL is now a task queue as well as a joblog.
The parallel sqlworker cmdline is:
parallel --sqlworker "$DBURL"
Some advantages here are:
+ The original (sqlmaster) host does not have to control
the parallel process and keep spawning new tasks on all the
workers.
+ The worker nodes can each run at their own width (-j
option). This might allow you to run a low task count on the
worker nodes without interfering with other users on the
node. You could even stop and restart with different -j
values as needed throughout the day.
+ Worker nodes can be started simply by running parallel on
each. And can be stopped by sending a KILL to the local
parallel on that node.
NOTE: The sql* options have very recent changes to them so
make sure you are using the most recent version of parallel.
Hope this is helpful.
Cheers,
--Andy
On
14/03/2017 at 10:54,
Anders Lind <anders.lind@icm.uu.se> wrote:
> I could perhaps set this up using the ssh
functionality of parallel, but I
> would need to be able to on the fly stop some
machines from running jobs,
> since the computers belong to co-workers who
sometimes need their computers
> for their own work.
Hi Anders,
The following thread may interest you:
Dynamically changing remote servers list
https://lists.nongnu.org/archive/html/parallel/2014-08/msg00012.html
Based on that, at the time I made a shell script that
keeps parallel's
sshloginfile updated by filtering out unreachable remote
servers and also
allowing the user to edit (include and/or exclude remote
servers) on-the-fly:
https://github.com/daaugusto/gnuparallel
PS: It worked with older versions of GNU Parallel (I
haven't tested it with
more recent ones yet), so you mileage may vary.
--
Douglas A. Augusto