parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

parallel + blast + LSF


From: Giuseppe Aprea
Subject: parallel + blast + LSF
Date: Wed, 15 Apr 2015 15:34:55 +0200

Hi all,

I would like to ask you, please, some help in using parallel with blast alignment software.


I am trying to use GNU parallel v. 20150122 with blast for a very large sequences alignment. I am using Parallel on a cluster which uses LSF as queue system.
Input file is a text file with the query sequences having this structure:

>seq1
ACCGGTTTGATTAGATTCCA
CCGGTTTGATTAGATTCCAC
CCACCGGTTTGATTAGATTC
CCACCGGTTTGATATTCCAT
>seq2
ACCGGATTAGATTCCACCTA
ACCGGATTAGATTCCACCTA
ACCGGATTAGATTCCACCTA
ACCGGATTAGATTCCACCTA
....

In this exambple I submit the following command asking LSF for 24 slots:

cat absolute_path_to_sequences.fasta | parallel --no-notice -vv -j 24 --slf servers --plain --recstart '>' -N 1 --pipe blastp -evalue 1e-05 -outfmt 6 -db absolute_path_to_db_file -query - -out absolute_path_to_result_file_{%}

"servers" is this file:

/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it
/afs/enea.it/software/bin/blaunch.sh cresco3x013.portici.enea.it

I have to use blaunch instead of ssh here. First I want to say that from the command above I expect GNU parallel to run 24 BLAST instances on the slots given by LSF and written on "servers". I also expect 24 result files to be cat-ed later.
Unfortunately that is not the case. 
DB and query sequences are exactly the same so an output is expected but missing. I just retrieve SOME empy files absolute_path_to_result_file_XX.

My problems are 

- I should be able to control exactly the number of blast instances. How do I set servers file and "-j" option for that?
- the result files are empy and I can see the following messages:

"......
 sh -c 'dd bs=1 count=1 of=/tmp/par9piqe.chr 2>/dev/null';  test ! -s "/tmp/par9piqe.chr" && rm -f "/tmp/par9piqe.chr" && exec true;  (cat /tmp/par9piqe.chr; rm /tmp/par9piqe.chr; cat - ) | (/afs/enea.it/software/bin/blaunch.sh cresco3x018.portici.enea.it exec perl -e \\\$ENV\\\{\\\"PARALLEL_PID\\\"\\\}=\\\"30669\\\"\\\;\\\$ENV\\\{\\\"PARALLEL_SEQ\\\"\\\}=\\\"648\\\"\\\;\\\$bashfunc\\\ =\\\ \\\"\\\"\\\;@ARGV=\\\"blastp\\\ -evalue\\\ 1e-05\\\ -outfmt\\\ 6\\\ -db\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/goodProteins_first_0010000\\\ -query\\\ -\\\ -out\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/resultd_39\\\"\\\;\\\$SIG\\\{CHLD\\\}=sub\\\{\\\$done=1\\\;\\\}\\\;\\\$pid=fork\\\;unless\\\(\\\$pid\\\)\\\{setpgrp\\\;exec\\\$ENV\\\{SHELL\\\},\\\"-c\\\",\\\(\\\$bashfunc.\\\"@ARGV\\\"\\\)\\\;die\\\"exec:\\\$\\\!\\\\n\\\"\\\;\\\}do\\\{\\\$s=\\\$s\\\<1\\\?0.001+\\\$s\\\*1.03:\\\$s\\\;select\\\(undef,undef,undef,\\\$s\\\)\\\;\\\}until\\\(\\\$done\\\|\\\|getppid==1\\\)\\\;kill\\\(SIGHUP,-\\\$\\\{pid\\\}\\\)unless\\\$done\\\;wait\\\;exit\\\(\\\$\\\?\\\&127\\\?128+\\\(\\\$\\\?\\\&127\\\):1+\\\$\\\?\\\>\\\>8\\\););
 sh -c 'dd bs=1 count=1 of=/tmp/pariINik.chr 2>/dev/null';  test ! -s "/tmp/pariINik.chr" && rm -f "/tmp/pariINik.chr" && exec true;  (cat /tmp/pariINik.chr; rm /tmp/pariINik.chr; cat - ) | (/afs/enea.it/software/bin/blaunch.sh cresco3x018.portici.enea.it exec perl -e \\\$ENV\\\{\\\"PARALLEL_PID\\\"\\\}=\\\"30669\\\"\\\;\\\$ENV\\\{\\\"PARALLEL_SEQ\\\"\\\}=\\\"687\\\"\\\;\\\$bashfunc\\\ =\\\ \\\"\\\"\\\;@ARGV=\\\"blastp\\\ -evalue\\\ 1e-05\\\ -outfmt\\\ 6\\\ -db\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/goodProteins_first_0010000\\\ -query\\\ -\\\ -out\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/resultd_24\\\"\\\;\\\$SIG\\\{CHLD\\\}=sub\\\{\\\$done=1\\\;\\\}\\\;\\\$pid=fork\\\;unless\\\(\\\$pid\\\)\\\{setpgrp\\\;exec\\\$ENV\\\{SHELL\\\},\\\"-c\\\",\\\(\\\$bashfunc.\\\"@ARGV\\\"\\\)\\\;die\\\"exec:\\\$\\\!\\\\n\\\"\\\;\\\}do\\\{\\\$s=\\\$s\\\<1\\\?0.001+\\\$s\\\*1.03:\\\$s\\\;select\\\(undef,undef,undef,\\\$s\\\)\\\;\\\}until\\\(\\\$done\\\|\\\|getppid==1\\\)\\\;kill\\\(SIGHUP,-\\\$\\\{pid\\\}\\\)unless\\\$done\\\;wait\\\;exit\\\(\\\$\\\?\\\&127\\\?128+\\\(\\\$\\\?\\\&127\\\):1+\\\$\\\?\\\>\\\>8\\\););
 sh -c 'dd bs=1 count=1 of=/tmp/parUO93M.chr 2>/dev/null';  test ! -s "/tmp/parUO93M.chr" && rm -f "/tmp/parUO93M.chr" && exec true;  (cat /tmp/parUO93M.chr; rm /tmp/parUO93M.chr; cat - ) | (/afs/enea.it/software/bin/blaunch.sh cresco3x010.portici.enea.it exec perl -e \\\$ENV\\\{\\\"PARALLEL_PID\\\"\\\}=\\\"30669\\\"\\\;\\\$ENV\\\{\\\"PARALLEL_SEQ\\\"\\\}=\\\"692\\\"\\\;\\\$bashfunc\\\ =\\\ \\\"\\\"\\\;@ARGV=\\\"blastp\\\ -evalue\\\ 1e-05\\\ -outfmt\\\ 6\\\ -db\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/goodProteins_first_0010000\\\ -query\\\ -\\\ -out\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/resultd_8\\\"\\\;\\\$SIG\\\{CHLD\\\}=sub\\\{\\\$done=1\\\;\\\}\\\;\\\$pid=fork\\\;unless\\\(\\\$pid\\\)\\\{setpgrp\\\;exec\\\$ENV\\\{SHELL\\\},\\\"-c\\\",\\\(\\\$bashfunc.\\\"@ARGV\\\"\\\)\\\;die\\\"exec:\\\$\\\!\\\\n\\\"\\\;\\\}do\\\{\\\$s=\\\$s\\\<1\\\?0.001+\\\$s\\\*1.03:\\\$s\\\;select\\\(undef,undef,undef,\\\$s\\\)\\\;\\\}until\\\(\\\$done\\\|\\\|getppid==1\\\)\\\;kill\\\(SIGHUP,-\\\$\\\{pid\\\}\\\)unless\\\$done\\\;wait\\\;exit\\\(\\\$\\\?\\\&127\\\?128+\\\(\\\$\\\?\\\&127\\\):1+\\\$\\\?\\\>\\\>8\\\););
 sh -c 'dd bs=1 count=1 of=/tmp/par3v_gT.chr 2>/dev/null';  test ! -s "/tmp/par3v_gT.chr" && rm -f "/tmp/par3v_gT.chr" && exec true;  (cat /tmp/par3v_gT.chr; rm /tmp/par3v_gT.chr; cat - ) | (/afs/enea.it/software/bin/blaunch.sh cresco3x012.portici.enea.it exec perl -e \\\$ENV\\\{\\\"PARALLEL_PID\\\"\\\}=\\\"30669\\\"\\\;\\\$ENV\\\{\\\"PARALLEL_SEQ\\\"\\\}=\\\"666\\\"\\\;\\\$bashfunc\\\ =\\\ \\\"\\\"\\\;@ARGV=\\\"blastp\\\ -evalue\\\ 1e-05\\\ -outfmt\\\ 6\\\ -db\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/goodProteins_first_0010000\\\ -query\\\ -\\\ -out\\\ /gporq1_1M/usr/aprea/bio/solanum_melongena/analysis/orthomcl_00/resultd_28\\\"\\\;\\\$SIG\\\{CHLD\\\}=sub\\\{\\\$done=1\\\;\\\}\\\;\\\$pid=fork\\\;unless\\\(\\\$pid\\\)\\\{setpgrp\\\;exec\\\$ENV\\\{SHELL\\\},\\\"-c\\\",\\\(\\\$bashfunc.\\\"@ARGV\\\"\\\)\\\;die\\\"exec:\\\$\\\!\\\\n\\\"\\\;\\\}do\\\{\\\$s=\\\$s\\\<1\\\?0.001+\\\$s\\\*1.03:\\\$s\\\;select\\\(undef,undef,undef,\\\$s\\\)\\\;\\\}until\\\(\\\$done\\\|\\\|getppid==1\\\)\\\;kill\\\(SIGHUP,-\\\$\\\{pid\\\}\\\)unless\\\$done\\\;wait\\\;exit\\\(\\\$\\\?\\\&127\\\?128+\\\(\\\$\\\?\\\&127\\\):1+\\\$\\\?\\\>\\\>8\\\););
......"


"......
Apr 15 15:29:11 2015 21417 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:10 2015 21469 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:10 2015 21491 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:12 2015 20721 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:14 2015 21426 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:18 2015 20543 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:21 2015 20702 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:21 2015 20862 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:21 2015 20942 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:21 2015 21109 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:21 2015 21287 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:23 2015 20655 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:23 2015 20999 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:23 2015 21259 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:27 2015 20764 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:27 2015 21275 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:29 2015 21422 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:32 2015 20992 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
Apr 15 15:29:32 2015 21286 3 7.03 lsb_launch(): Failed while waiting for tasks to finish.
....."


from stdout and stderr, respectively.


Please, does anyone have any idea what am I doing wrong?

Many thanks in advance,


Giuseppe

















reply via email to

[Prev in Thread] Current Thread [Next in Thread]