parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel + blast + LSF


From: Giuseppe Aprea
Subject: Re: parallel + blast + LSF
Date: Wed, 15 Apr 2015 19:20:40 +0200

Hi, 

Again, I have to say I don't understand your problem, neither what you really need, sorry. It seems you should just put all options in front of bsub in your script:
"bsub -o %J.out -q queue command [options]"
If you use %J you will receive separated LSF stdout files for debug (of course, if you are an avid LSF user, you already know that).

Incidentally, if you use parallel instead of a for loop to submit jobs I guess it's more likely you will have those problems with hitting limits with file descriptors, etc.
Moreover a foor loop, in your example, isn't that much more difficult, in my opinion.

g






On Wed, Apr 15, 2015 at 5:48 PM, Martin d'Anjou <martin.danjou14@gmail.com> wrote:
Hi,

Thanks for clarifying. I want to use GNU Parallel to bsub jobs. This way I can use GNU Parallel to throttle the number of jobs that are submitted to LSF, and it is easier than writing a loop.

parallel -j 100 my_script [bsub options] ::: {1..2000}

my_script (pseudo-code):
#!/bin/bash
...
bsub [bsub options] command ...
post-process data

This way I can submit jobs, say 100 at a time. When I submit all 2000 jobs, it gets problematic and I start hitting limits with file descriptors, etc.

Thanks for sharing,
Martin


On 15-04-15 11:35 AM, Giuseppe Aprea wrote:
Hi Martin,

I am not sure I understand. As far as I can see, things work exactly the opposite way: you have an LSF script which launches GNU Parallel on some hosts provided by LSF. Something like:

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
#!/bin/bash

#BSUB -J gnuParallel_blast_test      # Name of the job.
#BSUB -o %J.out                              # Appends std output to file %J.out. (%J is the Job ID)
#BSUB -e %J.err                               # Appends std error to file %J.err.
#BSUB -q large                                 # Queue name.
#BSUB -n 30                                      # Number of CPUs.

module load 4.8.3/ncbi/12.0.0
module load 4.8.3/parallel/20150122

SLOTS=`cat ${LSB_DJOB_HOSTFILE} |wc -l`

SERVER=""

for i in `cat ${LSB_DJOB_HOSTFILE}| sort`
do
echo "/afs/enea.it/software/bin/blaunch.sh ${i}" >> servers
done 

cat absolute_path_to_sequences.fasta | parallel --no-notice -vv -j ${SLOTS} --slf servers --plain --recstart '>' -N 1 --pipe blastp -evalue 1e-05 -outfmt 6 -db absolute_path_to_db_file -query - -out absolute_path_to_result_file_{%}
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

LSF is the one which gives you the execution hosts so if you are launching bsub from GNU parallel how do you know how to set the --slf option?


g



On Wed, Apr 15, 2015 at 4:24 PM, Martin d'Anjou <martin.danjou14@gmail.com> wrote:
On 15-04-15 09:34 AM, Giuseppe Aprea wrote:
Hi all,

I would like to ask you, please, some help in using parallel with blast alignment software.


I am trying to use GNU parallel v. 20150122 with blast for a very large sequences alignment. I am using Parallel on a cluster which uses LSF as queue system.

Hello Giuseppe,

I am an avid LSF user, and I want to use GNU Parallel to dispatch jobs to LSF. Could you please explain a little bit to me how GNU Parallel works with LSF? I do not see it in the on-line tutorials. For example, I would like to understand how to pass "bsub" options like -oo, -q queue_name, etc. to LSF from GNU Parallel.

Thanks,
Martin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]