parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Limiting memory used by parallel?


From: hubert depesz lubaczewski
Subject: Limiting memory used by parallel?
Date: Thu, 25 Jan 2018 16:33:11 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

Hi,
I'm writing a tool that will make a tarball, and then the tarball is
passed to parallel, which splits it into 5GB blocks, and each block is
sent to separate pipe.

Call looks like:

tar cf - /some/directory | parallel -j 5 --pipe --block 5G --recend '' 
./handle-single-part.sh "{#}"

Where handle-single-part.sh does:
- gzips input
- encrypts with openssl
- writes output to temp file
- uploads the file to aws s3
- removes temp file.

relatively simple task.

But it looks that parallel itself is consumming HUGE amount of memory
- comparable with size of /some/directory itself.

Server that I run it on has 64GB of ram, and the script gets killed
after ~ 3 minutes with "Out of memory!" error.

I did log "ps" output every 5 seconds during the process.

This is how it looked for main parallel process (each line is 5 seconds
after previous):

#v+
depesz   11036 12.0  0.8 5291388 510088 pts/4  S+   15:11   0:00  |   \_ perl 
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend  
/home/depesz/handle-single-part.sh {#}
depesz   11036 12.2  1.7 5291388 1112536 pts/4 S+   15:11   0:01  |   \_ perl 
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend  
/home/depesz/handle-single-part.sh {#}
depesz   11036 27.0  6.4 5291388 4064320 pts/4 R+   15:11   0:04  |   \_ perl 
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend  
/home/depesz/handle-single-part.sh {#}
depesz   11036 45.3 16.5 10534272 10406392 pts/4 R+ 15:11   0:09  |   \_ perl 
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend  
/home/depesz/handle-single-part.sh {#}
depesz   11036 56.0 15.5 10534272 9774768 pts/4 R+  15:11   0:14  |   \_ perl 
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend  
/home/depesz/handle-single-part.sh {#}
depesz   11036 63.4 16.6 10534272 10498972 pts/4 R+ 15:11   0:19  |   \_ perl 
/usr/bin/parallel -j 5 --no-notice --pipe --block 5G --recend  
/home/depesz/handle-single-part.sh {#}
#v-

It looks that parallel keeps buffer of all of the data in memory, which
in my case is pretty problematic.

Is there anything that can be done about it? Or perhaps I could do
something that would make the problem less significant?

Best regards,

depesz




reply via email to

[Prev in Thread] Current Thread [Next in Thread]