[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Limiting memory used by parallel?
From: |
hubert depesz lubaczewski |
Subject: |
Limiting memory used by parallel? |
Date: |
Thu, 25 Jan 2018 16:33:11 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Hi,
I'm writing a tool that will make a tarball, and then the tarball is
passed to parallel, which splits it into 5GB blocks, and each block is
sent to separate pipe.
Call looks like:
tar cf - /some/directory | parallel -j 5 --pipe --block 5G --recend ''
./handle-single-part.sh "{#}"
Where handle-single-part.sh does:
- gzips input
- encrypts with openssl
- writes output to temp file
- uploads the file to aws s3
- removes temp file.
relatively simple task.
But it looks that parallel itself is consumming HUGE amount of memory
- comparable with size of /some/directory itself.
Server that I run it on has 64GB of ram, and the script gets killed
after ~ 3 minutes with "Out of memory!" error.
I did log "ps" output every 5 seconds during the process.
This is how it looked for main parallel process (each line is 5 seconds
after previous):
#v+
depesz 11036 12.0 0.8 5291388 510088 pts/4 S+ 15:11 0:00 | \_ perl
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend
/home/depesz/handle-single-part.sh {#}
depesz 11036 12.2 1.7 5291388 1112536 pts/4 S+ 15:11 0:01 | \_ perl
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend
/home/depesz/handle-single-part.sh {#}
depesz 11036 27.0 6.4 5291388 4064320 pts/4 R+ 15:11 0:04 | \_ perl
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend
/home/depesz/handle-single-part.sh {#}
depesz 11036 45.3 16.5 10534272 10406392 pts/4 R+ 15:11 0:09 | \_ perl
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend
/home/depesz/handle-single-part.sh {#}
depesz 11036 56.0 15.5 10534272 9774768 pts/4 R+ 15:11 0:14 | \_ perl
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend
/home/depesz/handle-single-part.sh {#}
depesz 11036 63.4 16.6 10534272 10498972 pts/4 R+ 15:11 0:19 | \_ perl
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend
/home/depesz/handle-single-part.sh {#}
#v-
It looks that parallel keeps buffer of all of the data in memory, which
in my case is pretty problematic.
Is there anything that can be done about it? Or perhaps I could do
something that would make the problem less significant?
Best regards,
depesz