help-gnubatch
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-gnubatch] How do I run jobs on a remote node


From: Trygve Laugstøl
Subject: Re: [help-gnubatch] How do I run jobs on a remote node
Date: Fri, 25 Sep 2009 15:49:02 +0200
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

John M Collins wrote:
On Fri, 2009-09-25 at 14:48 +0200, Trygve Laugstøl wrote:
John M Collins wrote:
> On Thu, 2009-09-24 at 19:04 +0200, Trygve Laugstøl wrote:
>> John M Collins wrote:
>> > On Wed, 2009-09-23 at 23:18 +0200, Trygve Laugstøl wrote:
>> >> Hi
>> >>
>> >> I'm trying to package GNUBatch for OpenCSW [1]. I've created a package >> >> and installed GNUBatch quite successfully. I can start it, run jobs with >> >> gbch-r and they're executed. I even get an email about it every time.
>> >>
>> >> Now, the question is how do I get it to run the job on other nodes? I've >> >> been through the manuals but haven't been able to find much info on the >> >> subject. I can't find much info on how to create different queues, only >> >> how to write expressions to select them.
>> >>
>> >> [1]: http://opencsw.org
>> >>
>> >> --
>> >> Trygve
>> >>
>> > You firs need to have each other node set up so it sees "exported" jobs >> > and variables from its peers - you should be able to change job >> > parameters remotely etc. >> > >> > You may have to run gbch-hostedit to set up other nodes' IP addresses >> > and stop/restart the scheduler.
>>
>> I think I've got the host file correctly configured. When I'm in >> gbch-hostedit I can see the (correct) IP of the host.
>>
>> However, I'm not sure how to verify the file and the connection to the >> other hosts.
>>
>> This is my setup:
>>
>> $ cat /opt/csw/etc/gnubatch.hosts
>> # Host file created on 24/09/09 at 18:59:19
>>
>> skybert-6       s6      probe,manual,trusted
>>
>>
>> When I'm trying to access a variable on a remote node (assuming this is >> the correct syntax) I'm getting:
>>
>> $ gbch-var skybert-6:CLOAD
>> gbch-var: Unknown variable skybert-6:CLOAD
>>
>> I did try to run "gbch-conn skybert-6" which seemed to work just fine >> after switching the connection type to manual. > If there had been anything wrong with the hosts file it would have given > some error message at that point. > >> Are there any log files I can look at?
> I think it has probably worked OK
> > You have got each machine with a hosts file entry pointing to the other > one haven't you?

Yep:

telestes:$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:59:19

skybert-6       s6      probe,manual,trusted

skybert-6:]$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:58:09

telestes        -       probe,manual,trusted

Now they're both in manual mode. I haven't seen any messages in the "btsched-reps" file

telestes:$ netstat -a|grep gnubatch
*.gnubatch                          Idle
*.gnubatch-netsrv                      Idle
*.gnubatch           *.*                0      0 49152      0 LISTEN
*.gnubatch-feeder       *.*                0      0 49152      0 LISTEN
*.gnubatch-netsrv       *.*                0      0 49152      0 LISTEN
*.gnubatch-api       *.*                0      0 49152      0 LISTEN


skybert-6:$ netstat -a|grep gnubatch
*.gnubatch                          Idle
*.gnubatch-netsrv                      Idle
*.gnubatch           *.*                0      0 49152      0 LISTEN
*.gnubatch-feeder       *.*                0      0 49152      0 LISTEN
*.gnubatch-netsrv       *.*                0      0 49152      0 LISTEN
*.gnubatch-api       *.*                0      0 49152      0 LISTEN

You may need to set up the "local address" in each "gnubatch.hosts" file to give the right IP address to use. You can do that with the "l" key in gbch-hostedit.

I see that the IPs of the hosts are correct when I use gbch-hostedit. When I try to set the local address it actually refuses it as it is the same as the configured one.

I have no "127.0.0.1 telestes" lines in my /etc/hosts files and use a local DNS for resolving names (which is why it resolved telestes to "telestes-nge0", nge0 is the name of the interface on the box).

With "manual" on it won't attempt a connection until you tell it to with gbch-conn

I've run "gbch-conn skybert-6" and "gbch-conn telestes" on each of the hosts. I also tried to add an offline node (skybert-2) and connect to it but I got no log messages in the "btsched-reps" file.

> If you look in the "btsched-reps" file there will be messages if it > doesn't understand a connection attempt. > > After you've run "gbch-conn" check for a connection on the gnubatch port > using "netstat -a|grep gnubatch". > > You won't "see" the variables on the other machine until you've marked > them for export on the other machine with "gbch-var -E varname". The > same is true of jobs. (I had to make it like that as the network traffic > is too great especially when you have several hosts).

I tried this on telestes as the user "gnubatch":

$ gbch-var -C TRYGVE -s 123
gbch-var: Unknown variable TRYGVE
Is this the right syntax?
No the variable name should be last the "-C" just says create it if it doesn't exist.

gbch-var -C -s 123 TRYGVE

Also include "-E" if you want the other connected hosts to see it:

gbch-var -CE -s 123 TRYGVE

Aha, that did the trick.


$ gbch-vlist
CLOAD     0                         Export # Current value of load level
LOADLEVEL 20000                            # Maximum value of load level
LOGJOBS                                    # File to save job record in
LOGVARS                                    # File to save variable record in
MACHINE   telestes-nge0.vs.inamo.no        # Name of current host
STARTLIM  15                               # Number of jobs to start at once
STARTWAIT 30 # Wait time in seconds for job start

I would definitely make sure a "local address" is set as I've suggested above if there is a possibility that the "bare" machine name (without the domain) will just give 127.0.0.1 or something like that.

On skybert-6:
$ gbch-vlist -R
CLOAD                           #
LOADLEVEL                       #
LOGJOBS                         #
LOGVARS                         #
MACHINE   skybert-6.vs.inamo.no # Name of current host
STARTLIM                        #
STARTWAIT                       #

PS: What does the "ch" part in "gbch-*" mean?

--
Trygve




reply via email to

[Prev in Thread] Current Thread [Next in Thread]