parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: job race!


From: Ozgur Akgun
Subject: Re: job race!
Date: Thu, 25 Apr 2013 15:37:43 +0100

Hi,

On 25 April 2013 12:19, Ole Tange <ole@tange.dk> wrote:
On Wed, Apr 24, 2013 at 9:25 PM, Ozgur Akgun <ozgurakgun@gmail.com> wrote:

> I want to be able to say, something like `parallel --timeout (fastest * 2)`
> and let get the same output.

I have been pondering if I could somehow make a '--timeout 5%'. It should:

1. Run the first 3 jobs to completion (no --timeout)
2. Compute the average and standard deviation for all completed jobs
3. Adjust --timeout based on the new average, standard deviation and user input
4. Go to 2 until all jobs are finished

I like this story. Actually I was thinking of using median (or average + X standard deviations) as the winning criteria instead of proximity to the winner. However, using a multiple of the fastest runtime has one great property: it is very easy to calculate it.

Also, I limited my original question to the case where number of jobs = number of jobslots, but your algorithm above doesn't have such a limitation. Accomplishing this would be great, however, the order in which the jobs are started has a huge impact on the total runtime which disturbes me a bit.

Going back to my original question, and extending it to the case where number of jobs > number of jobslots, what would you think about something like the following?

I keep the existing --timeout unchanged, and add a new option --dynamic-timeout. This new option takes a percentage as you say.

I guess the following could work as an implementation. Given '-jX --dynamic-timeout 200% --timeout 3600'
1. Set current_timeout = timeout
2. Run the first X jobs.
2. After every job that doesn't timeout, update current_timeout if needed. [1,2]
3. Run new jobs as older jobs finish.

[1] This is probably obvious, but "if needed" is basically: `candidate_timeout = job_time * percentage ; if (candidate_timeout < current_timeout) { current_timeout = candidate_timeout }` where job_time is the time taken by the job at hand.
[2] Updating current_timeout will need to also update any timer that is attached to existing jobs. This might be tricky to implement. But consider an extreme case where job 1 takes 100 seconds to complete, and another job down the line takes only 5 seconds to complete. At this point the implementation should be clever enough to kill job 1.

If this doesn't sound too insane, I am happy to have a go at it / help anyone who wants to. I'll need some pointers as to where it should be done though.

- Ozgur.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]