help-gnubatch
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-gnubatch] ( [PATCH] Bump version in debian/changelog to 1.4 al


From: Reuti
Subject: Re: [help-gnubatch] ( [PATCH] Bump version in debian/changelog to 1.4 also ) + newbie question
Date: Thu, 7 Jun 2012 23:58:51 +0200

Am 06.06.2012 um 14:46 schrieb Peter Valdemar Mørch:

> Thanks for replying!
> 
> On Wed, Jun 6, 2012 at 12:52 PM, Reuti <address@hidden> wrote:
>> You could put a sleep with the necessary amount of seconds in the jobscript 
>> and start it at the full minute before, so that you can start it at any 
>> second.
> 
> :-( I was affraid of that. That also means, I take it, that all the
> details of the scheduling - when to run what - is left up to me/us.

Well, the absolute time to start you can specify with GNUbatch by minute, but 
not with GridEngine and I think also also not with Torque.

This could be circumvented by a minimum starttime for the job and an unlimited 
number of slots though. Hence it's guaranteed: when the minimum starttime is 
reached, it will start (limited by the number of available slots defined on 
this machine). I think this is what you arel looking for: don't start1200 jobs 
at once but only 50 or whatever at a time.


>> Is it to run in a cluster of machines or only on a local one? To me it 
>> sounds like you need some features from GridEngine (like the accounting 
>> which you can access later), and on the other hand some kind of 
>> "real-time-feature" from GNUbatch.
> 
> No, it is to run all 1200 mostly tiny, light-weight jobs on one
> machine, just like nagios does. Typically ping, check_tcp which
> basically only tries to connect to a TCP port. But could be anything.
> 
> Didn't know about GridEngine, will look into that. I'm just getting
> worried that the scheduler's overhead is much much higher than the
> jobs' with these many tiny jobs!

Yes, GridEngine is more to schedule by certain conditions like: the former 
usage in the past, backfilling, i.e. classic batch processing. But they have an 
accounting for each job which can be checked after the job.


> What we're currently doing is starting a process every 15 min, that
> runs N child processes at a time in parallel until all 1200 are done.
> The consequence of that is we get high-ish load around 0:15, 0:30,
> 0:45 etc. and would love to use "a scheduler" to smooth out that load,
> and hopefully use existing tool infrastructure to get more debugging
> insight (execution times, output etc.) and runtime control.

One the one hand I think GridEngine is one size to large for this task, and you 
can achieve a similar scheduling with GNUbatch:

By using one variable which you preset with the number of jobs beforehand and 
each job has a condition:

$ gbch-var -C -s 5 master
$ gbch-r -c "master>0" -s "master-=1" test.sh

As -s will undo the assignment it did at start of the job, it will always 
adjust "master" to reflect the number of used slots (you could also do it the 
otherway round: start with 0 and testing for -c "master<=5" and an increment of 
the variable.

But there is no accounting about used memory or runtime of each job in GNUbatch 
AFAIK. What information would you need from the last runs?


>> Do you have more machines than jobs, i.e. all 1200 jobs should run at the 
>> same time on a bunch of machines?
> 
> One machine. Cool if it lends itself to more machines, but this is
> handled fine by one machine with our brute force N parallel approach.
> The trick is picking N since execution times vary. And then I thought:
> "Somebody must have tackled this previously and created an open source
> project!" :-)

Setting up GridEgine on only one machine is also possible (we use it to 
serialize our workflow on local machines and use them over the weekend for 
computations), it needs more knowledge to start with than GNUbatch I fear. And: 
There is no repeating mechanism builtin.

As a consequence you would need a cronjob to submitting the jobs every 15 
minutes some minutes before they are entitled to start.

-- Reuti


reply via email to

[Prev in Thread] Current Thread [Next in Thread]