parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Warning: Semaphore stuck for 30 seconds


From: Meli Massimiliano
Subject: Fwd: Warning: Semaphore stuck for 30 seconds
Date: Tue, 6 Mar 2018 10:49:40 +0100

The home are exported along each cluster nodes by NFS:

/data01/home 192.168.4.0/24(rw,no_root_squash)

each node have 32 processors. The script that are:

===========  main script  =============
#!/bin/sh
mkdir  Files
for i in {1..1000}
do

sem -j 30 ./ene_calc $i

done
sem --wait
echo "all done"

==================================

=========== ene_calc ===============
#!/bin/sh
source program_ene_calc.sh
mkdir $1
cd $1
echo $1

CALC_program.py -O ................. -o RESULTS.dat.$1

mv RESULTS.dat.$1 ../Files/
gzip ../Files/RESULTS.dat.$1&

cd ..
rm -rf $1
===================================

I'm running the this version: GNU parallel 20180222.
The warning message never appear anymore by changing the temporary directory to a node local directory,
seems that there is a problem to recognize the different job in different node albeit each job name have also a 
node id extension in semaphores directory. 

thanks a lot 
Massimiliano

2018-03-05 1:11 GMT+01:00 Ole Tange <ole@tange.dk>:
On Wed, Feb 28, 2018 at 12:38 PM, Meli Massimiliano
<massimiliano.meli@gmail.com> wrote:

> The error messages that sometimes block the production of the output is:
>
> parallel: Warning: Semaphore stuck for 30 seconds. Consider using
> --semaphoretimeout.
>
> i think that the problem come from the hidden directory in the shared
> home of the cluster:
>
> .parallel
>
> the is any way to move this directory in a different position?

The semaphores are in: ~/.parallel/semaphores so you can symlink that
to somewhere else.

Or you can do:

  export XDG_CACHE_HOME=/somedir/with/write/access
  mkdir $XDG_CACHE_HOME/parallel

This should create semaphores in $XDG_CACHE_HOME/parallel (but it is
not tested very well).

I would, however, prefer if we can find the root cause and fix it. But
if you only see this sometimes it will make it harder. How is the home
shared? I recall doing a fix for NFS a year ago or so, so if you are
not running newest version, then try upgrading.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]