Optimizing -j parameter?

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Optimizing -j parameter?

From:	PD
Subject:	Optimizing -j parameter?
Date:	Tue, 9 Jan 2018 14:57:26 -0800
User-agent:	SquirrelMail/1.4.22-7.fc17

Hi;

I'm trying to download a large number of files from AWS S3, and am using a
command like this:

parallel -j loadfile.txt --eta --joblog joblog.txt --resume-failed aws  s3
cp {} files/{#}.doc :::: todo.txt > /dev/null

I've currently got '10' in loadfile.txt, but I'd like to change that
number to the best possible number, where 'best' means 'minimum total
transfer time'.  It's a network-bound job, so the number of CPU cores
doesn't affect the speed as much, but AWS does have throttling rules that
will start to reject requests if you send them too fast.

If I change the contents of loadfile.txt to '20', the numbers in the ETA
display change, but it's not clear whether the final ETA is based on all
of the jobs that have gone before, or just the ones since loadfile.txt was
changed, or something else.

In general, how do you find the optimal number of jobs to run in parallel?

Is there a way to graph the number of processes and the job rate over
time?  (I'm a visual kind of guy.)

Does Parallel have an automatic optimizer?  After all, it's got every
other feature under the sun, why not this?  :-)

I did see the 'niceload' utility that comes with Parallel, but I'm not
sure how I could use it effectively here.

Cheers,
PD

[Prev in Thread]

Current Thread

[Next in Thread]

Optimizing -j parameter?, PD <=
- Re: Optimizing -j parameter?, Ole Tange, 2018/01/10

Prev by Date: Tutorial length
Next by Date: Re: Problems to reuse --eta output
Previous by thread: Tutorial length
Next by thread: Re: Optimizing -j parameter?
Index(es):
- Date
- Thread