parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallel ... thank you (and a couple of questions)


From: Ole Tange
Subject: Re: Parallel ... thank you (and a couple of questions)
Date: Thu, 3 Apr 2014 18:11:14 +0200

On Thu, Apr 3, 2014 at 12:51 AM, Bryan Dongray <btd@dongrays.com> wrote:
> Hello Ole,
>
> I recently found "GNU parallel" and wish to congratulate you AND was pleased
> to see someone finally got an official program that enables users to easily
> run commands in parallel.
:
> Honestly, I have been VERY surprised that some version of a code that runs
> processes in parallel has only recently made it considering how long
> multi-CPU machines, or machines connected on networks, have been around.
>
> Anyway I'd like to mention that I've had my own version of parallel

There is a high percentage of GNU Parallel users that have at some
point written their own parallelizing script, so you are in good
company :-)

> I'll admit I've yet to understand ALL the arguments possible in GNU
> parallel, I found it a surprisingly big list,

Absolutely true. Even I have a hard time remembering them all: Some of
them are due to xargs compatibility, others are used by a certain
branch of users (e.g. --results for researchers and --nonall for
sysadmins). And I am starting to see users combine options in creative
ways that I never thought of.

My recommendation would be to learn the 20% that makes the most sense
to you, and only when you hit a problem that cannot easily be solved
with those, look to see if there is an option which solves that.

> but overall I say it looks
> like your parallel is written to enable the user to not write a separate
> script, but to do it via the command line, well done!

Thanks. If you like GNU Parallel:

* Walk through the tutorial
(http://www.gnu.org/software/parallel/parallel_tutorial.html)
* Give a demo at your local user group/team/colleagues
* Post the intro videos and tutorial on Reddit/Diaspora*/forums/blogs/
Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
* Request or write a review for your favourite blog or magazine
* Invite me for your next conference

If you use GNU Parallel for research:

* Please cite GNU Parallel in you publications (use --bibtex)

If GNU Parallel saves you money:

* (Have your company) donate to FSF https://my.fsf.org/donate/

> Since it looks like you've been coding it since 2007, I'd be interested in
> stories you have about it, not necessarily just coding, but also any strange
> experiences of getting it into the official GNU source tree.

The year 2007 is deceptive: This was when git was introduced. For a
longer history see:
http://www.gnu.org/software/parallel/history.html

The craziest bug I experienced was a bug that was only present on
RedHat 5.6's version of Perl. It was a race condition which only
happened in 1:1000000 - that is until I figured out what the problem
was, then I could craft input so it would be reproduced with a chance
of 1:20 even on other systems.
https://savannah.gnu.org/bugs/index.php?33352

> I use it on Linux systems and my Windows systems!

Have you been able to use it on Windows through ssh (i.e. automated)?

> I do have a couple of questions...
>
> I quickly learned that an environment variable was very useful, so I have
> $PARALLELID in my version of parallel, but I don't see an equivalent
> variable.

And there is not.

> This is not quite like $PARALLEL_SEQ, mine is like the jobslot,
:
> My question: Is there a way in GNU parallel that subprocesses can know which
> jobslot they're in?

The jobslot is a virtual construct that is only used in the
documentation: While you might think it would be a central construct,
it does not exist at all in the code!

> If currently not, might I suggest adding some code to give an environment
> variable $PARALLEL_JOBSLOT - or something named like that?

Should definitely be doable. The obvious solution would simply to have

PARALLEL_JOBSLOT = PARALLEL_SEQ mod Number_of_jobslots.

There is a thing that make is a bit more tricky. Number_of_jobslots is
not a constant: It can change over time (e.g. with '--jobs procfile'
where procfile is changed). So you cannot depend on jobslot 1 to be
followed by jobslot 2.

Transferring yet another variable through SSH will also lower the
maximal command line length - making '-vv' look even more crazy.
Probably not a big problem, but still worth mentioning.

But maybe it is worth just trying to implement a simple version first
and see how many problems arise.

https://savannah.gnu.org/bugs/index.php?42041

> Lastly, one more thought... Have you ever thought to get parallel as part of
> the cygwin release? I didn't see it there.

GNU Parallel is on the way of becoming part of OIN's Linux Definition:
http://www.openinventionnetwork.com/pat_linuxdef.php

While I would welcome getting GNU Parallel in Cygwin, I will probably
not be the one pushing for it: I extremely rarely use Microsoft
Windows and I do not think I have ever used Cygwin for anything real.
Without an automated way of testing GNU Parallel on Cygwin, chances
are also that some options will not work well on Cygwin. I will be
happy to accept patches for GNU Parallel on Cygwin, but I do not see
myself spending time maintaining it on Cygwin - at least not without
being paid to do so. If someone else will step up to do the
maintaining on Cygwin, I will be fine with that.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]