Hi and thank you for your reply.
-j issue: I used "-j 192" since 192 is the sum of all the slots the queue system allocates on the different hosts. Reading again the manual I see why, given my options and my server file, GNU parallel could run 192 jobs on the same host. Anyway, in my opinion, this point isn't really clear. An user could also get the idea that -j is for the total cores which get divided among the different hosts as specified by the ncpus in the server file. At least, I expected that.
--wait: I am running on a shared cluster; that means the queue system may give me 8 slots an a 16-cores host. The other 8 slots could be used by a different user at the same time. Resources fair share implies that I don't run more than 8 simultaneous blastp instances on that host. That is why, when 8 simultaneous blastp are reached, I want GNU parallel to wait for one of these to complete before starting another one. That is what I expect from "--semaphore"; I used --wait for that and to be sure the queue system waited for all background gnu parallel jobs to be completed before considering the whole job finished. Does that make sense now?
g