parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Limiting memory used by parallel?


From: hubert depesz lubaczewski
Subject: Re: Limiting memory used by parallel?
Date: Mon, 29 Jan 2018 18:19:28 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Sun, Jan 28, 2018 at 02:45:42AM +0100, Ole Tange wrote:
> On Thu, Jan 25, 2018 at 4:33 PM, hubert depesz lubaczewski
> You can also use --cat:
> 
>   tar cf - /some/directory | parallel -j 5 --pipe --block 5G --cat
> --recend '' 'cat {} | ./handle-single-part.sh {#}'
> 
> This way each block is saved to the tempdir before the job starts. By
> my limited testing this should make GNU Parallel only keep 1-2 blocks
> in memory.

So, I did try it.
To make it as simple as possible, I made source of data:
dd if=/dev/zero bs=8k count=13107200

which generated 100GB of \x00 bytes.

This was then passed to:
1. in "normal" test, to:
    /home/depesz/parallel/bin/parallel \
    -j 5 \
    --pipe \
    --block 2000M \
    --recend '' \
    /home/depesz/test/handle-single-part.sh 
"/tmp/depesz/out/tarball.part-{#}.gz.aes"
2. in "cat" test, to:
    /home/depesz/parallel/bin/parallel \
    -j 5 \
    --pipe \
    --block 2000M \
    --recend '' \
    --cat \
    /home/depesz/test/handle-single-part.sh {} 
"/tmp/depesz/out/tarball.part-{#}.gz.aes"

the handle-single-part.sh script was modified, in "normal" case it did:
cat - | gzip -9c - | openssl enc -pass "file:pass.file" -aes-256-cbc > 
"${output_file}"
and in "cat test", it was doing:
gzip -9c "${input_file}" | openssl enc -pass "file:pass.file" -aes-256-cbc > 
"${output_file}"

Results of the test:
Time of tests:
normal:
real    3m45.748s
user    12m51.147s
sys     7m6.878s

cat:
real    5m38.099s
user    13m7.587s
sys     9m11.370s

So cat is evidently slower (as it has to write uncompresed data to disk, and
then re-read it)

What's worse. Every 1 second, I logged "ps uwf t 
<terminal-that-i-was-running-the-test-on>"
Then, for each such "ps dump", I summed rss column of all processes.

In case of normal, worse memory usage was 12,402,552 kB. In case of cat
test it was: 12,382,736 kB.

So there is no real memory usage difference, but the cat approach is
significantly slower.

You can see whole ps output on:
normal test: https://share.riseup.net/#IfwBFcQEr0qI3HuBKTpvDA
cat test: https://share.riseup.net/#QhtkCvfjrM6zu4oua5FhRg

Best regards,

depesz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]