help-gnubatch
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-gnubatch] How do I run jobs on a remote node


From: Trygve Laugstøl
Subject: Re: [help-gnubatch] How do I run jobs on a remote node
Date: Fri, 25 Sep 2009 14:48:03 +0200
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

John M Collins wrote:
On Thu, 2009-09-24 at 19:04 +0200, Trygve Laugstøl wrote:
John M Collins wrote:
> On Wed, 2009-09-23 at 23:18 +0200, Trygve Laugstøl wrote:
>> Hi
>>
>> I'm trying to package GNUBatch for OpenCSW [1]. I've created a package >> and installed GNUBatch quite successfully. I can start it, run jobs with >> gbch-r and they're executed. I even get an email about it every time.
>>
>> Now, the question is how do I get it to run the job on other nodes? I've >> been through the manuals but haven't been able to find much info on the >> subject. I can't find much info on how to create different queues, only >> how to write expressions to select them.
>>
>> [1]: http://opencsw.org
>>
>> --
>> Trygve
>>
> You firs need to have each other node set up so it sees "exported" jobs > and variables from its peers - you should be able to change job > parameters remotely etc. > > You may have to run gbch-hostedit to set up other nodes' IP addresses > and stop/restart the scheduler.

I think I've got the host file correctly configured. When I'm in gbch-hostedit I can see the (correct) IP of the host.

However, I'm not sure how to verify the file and the connection to the other hosts.

This is my setup:

$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:59:19

skybert-6       s6      probe,manual,trusted


When I'm trying to access a variable on a remote node (assuming this is the correct syntax) I'm getting:

$ gbch-var skybert-6:CLOAD
gbch-var: Unknown variable skybert-6:CLOAD

I did try to run "gbch-conn skybert-6" which seemed to work just fine after switching the connection type to manual.
If there had been anything wrong with the hosts file it would have given some error message at that point.

Are there any log files I can look at?
I think it has probably worked OK

You have got each machine with a hosts file entry pointing to the other one haven't you?

Yep:

telestes:$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:59:19

skybert-6       s6      probe,manual,trusted

skybert-6:]$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:58:09

telestes        -       probe,manual,trusted

Now they're both in manual mode. I haven't seen any messages in the "btsched-reps" file

telestes:$ netstat -a|grep gnubatch
*.gnubatch                          Idle
*.gnubatch-netsrv                      Idle
*.gnubatch           *.*                0      0 49152      0 LISTEN
*.gnubatch-feeder       *.*                0      0 49152      0 LISTEN
*.gnubatch-netsrv       *.*                0      0 49152      0 LISTEN
*.gnubatch-api       *.*                0      0 49152      0 LISTEN


skybert-6:$ netstat -a|grep gnubatch
*.gnubatch                          Idle
*.gnubatch-netsrv                      Idle
*.gnubatch           *.*                0      0 49152      0 LISTEN
*.gnubatch-feeder       *.*                0      0 49152      0 LISTEN
*.gnubatch-netsrv       *.*                0      0 49152      0 LISTEN
*.gnubatch-api       *.*                0      0 49152      0 LISTEN


If you look in the "btsched-reps" file there will be messages if it doesn't understand a connection attempt.

After you've run "gbch-conn" check for a connection on the gnubatch port using "netstat -a|grep gnubatch".

You won't "see" the variables on the other machine until you've marked them for export on the other machine with "gbch-var -E varname". The same is true of jobs. (I had to make it like that as the network traffic is too great especially when you have several hosts).

I tried this on telestes as the user "gnubatch":

$ gbch-var -C TRYGVE -s 123
gbch-var: Unknown variable TRYGVE

Is this the right syntax?

$ gbch-vlist
CLOAD     0                         Export # Current value of load level
LOADLEVEL 20000                            # Maximum value of load level
LOGJOBS                                    # File to save job record in
LOGVARS                                    # File to save variable record in
MACHINE   telestes-nge0.vs.inamo.no        # Name of current host
STARTLIM  15                               # Number of jobs to start at once
STARTWAIT 30 # Wait time in seconds for job start

On skybert-6:
$ gbch-vlist -R
CLOAD                           #
LOADLEVEL                       #
LOGJOBS                         #
LOGVARS                         #
MACHINE   skybert-6.vs.inamo.no # Name of current host
STARTLIM                        #
STARTWAIT                       #

--
Trygve





reply via email to

[Prev in Thread] Current Thread [Next in Thread]