parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: First time user: when to use Parallel?


From: Shlomi Fish
Subject: Re: First time user: when to use Parallel?
Date: Sat, 15 Feb 2014 07:56:21 +0200

Hi Junchen,

see below for my response. Just note that I'm not a Parallel core developer and
am speaking from general knowledge.

On Fri, 14 Feb 2014 17:00:28 -0600
Junchen Gu <jcgu113@gmail.com> wrote:

> Hi,
> 
> Thanks for making this cool tool. I've tried to use it but I have a
> question about the rationale behind it and would like to hear your
> opinions on how to best utilize the tool.
> 
> I work in a computational biology lab. Recently I have to align many
> short reads onto the genome and I'm using novoalign which has a option
> to let me specify the maximum number of threads to use. I'm using a
> shared server so I typically set it at 8 threads max for one novoalign
> job and run all novoalign jobs one by one.
> 
> I just tried to run all 6 novoalign with parallel still with 8 threads
> each then my own job would basically use up 95% of the memory, which
> is not very good... so I have to lower the maximum number of threads
> for each job.
> 
> I'm wondering then which way of running all the jobs is faster?
> Running each job one by one with a higher number of threads or using
> Parallel with a lower number of threads?
> 
> In a more general sense, when do you choose to use Parallel instead of
> running jobs sequentially?

Well, I'm not familiar with how threads help with your use case , but note the
following:

1. In general, if the total number of $number_of_jobs *
$number_of_threads_per_job exceeds the number of available “processors” (where
“processors” can also be the total number of cores or hyperthreads, even though
although separate processor, or even separate physical and networked machines
will likely yield even better throughput), then you will necessarily not
utilise all processors all the time. And so while although there is a chance it
will result in a slightly quicker meta-task finishing duration, it may also
likely be somewhat slower, and it's not probable it will scale linearly to the
total number of assigned threads.

2. In general, multitasking works best if the tasks are independent and do not
involve a lot of synchronisation between one another. To take my work on
Freecell Solver ( http://fc-solve.shlomifish.org/ ) for example, I split a
“large” (= 32,000 in number) range of https://en.wikipedia.org/wiki/FreeCell
deals (generated using this algorithm -
http://rosettacode.org/wiki/Deal_cards_for_FreeCell based on the predictable
https://en.wikipedia.org/wiki/Pseudorandom_number_generator which has
originated in the Microsoft’s C compilers) into chunks of 10 or so and solve
each chuck separately using a completely serial range solver which outputs
into a file in a certain directory with a the sequential number of the chunk.
And after all is done, I just "cat *" the files together to join them.

You can look at the code here:

https://bitbucket.org/shlomif/fc-solve/src/b41030005b17bdc79601fdaf197fb322e5d09dd9/fc-solve/scripts/?at=master

(see the *gnu-par* related scripts, though I'm not sure how easy they would be
to deploy as is).

This works pretty well, but I had the opposite problem (= over-utilisation
instead of under-utilisation) that running the tasks made my Core i3 processor
(on my desktop machine) overheat at times.

-----------------

What I said is not specific to GNU parallel, and as you probably realise it
uses autonomous processes - not threads. Hope it helps and it wasn't too
obvious.

Regards,

        Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Rethinking CPAN - http://shlom.in/rethinking-cpan

He who reinvents the wheel will likely design a square wheel and spend a year
trying to figure out why it doesn’t work properly.
    — Nadav Har’El

Please reply to list if it's a mailing list post - http://shlom.in/reply .



reply via email to

[Prev in Thread] Current Thread [Next in Thread]