parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU parallel - resumable jobs


From: Ole Tange
Subject: Re: GNU parallel - resumable jobs
Date: Sat, 14 Jan 2012 00:49:02 +0100

On Mon, Jan 9, 2012 at 5:47 PM, rambach <rambachr@yahoo.com> wrote:
> On 1/7/2012 4:31 AM, Ole Tange wrote:
>> On Fri, Dec 16, 2011 at 12:45 PM, Ole Tange<tange@gnu.org>  wrote:
>>> On Fri, Dec 16, 2011 at 9:01 AM, rambach<rambachr@yahoo.com>  wrote:
>>>> On 12/15/2011 11:35 PM, Ole Tange wrote:
>>>>> On Wed, Dec 14, 2011 at 2:35 PM, rambach<rambachr@yahoo.com>    wrote:
>>>>>> On 12/12/2011 11:07 PM, Ole Tange wrote:
>>>>>>>
>>>>>>> * Only look for the job-number.
>>
>> This is now implemented. You can do:
>>
>>   timeout -k 1 1 parallel -j2 --resume --joblog /tmp/joblog2 sleep {}
>> ::: 1.1 2.2 3.3 4.4
>>   parallel -j2 --resume --joblog /tmp/joblog2 sleep {} ::: 1.1 2.2 3.3
>> 4.4;
>>
>> Please test it.
>
> thanks, very good job.
> the functionality works nice and smooth.
>
> i'm sure others will benefit from this feature as well.
>
> however, what i found during testing is that GNU Parallel has some sort of
> memleak:
> the following command
> seq 100000 | parallel -j200 "echo {}; sleep 1"
> starts with a virtual mem usage of about 38 MB, and reaches 50 MB at around
> 25000 finished jobs.
> the size of used memory increases steadily, so at 12MB per 25000 jobs, you'd
> run out of mem on a 128 MB sys pretty quick.
> the leak is independent of the --resume option and even --joblog.

The leak is sort of not a leak: When you use multiple input sources
GNU Parallel has to generate all combinations, thus it has to remember
all arguments seen so far. Only in the special case where there is
only one input source can GNU Parallel safely forget already seen
arguments.

So in the new git version this optimization is now implemented.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]