coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sort: optimal memory usage with multithreaded sort


From: Assaf Gordon
Subject: Sort: optimal memory usage with multithreaded sort
Date: Tue, 15 Jan 2013 14:07:58 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4

Hello,

Sort's memory usage (specifically, sort_buffer_size() ) has been discussed few 
times before, but I couldn't find mention of the following issue:

If given a regular input file, sort tries to guesstimate the optimal buffer 
size based on the file size.
But this value is calculated for one thread (before sort got multi-threaded).
The default "--parallel" value is 8 (or less, if fewer cores are available) - 
which requires more memory.

The result is, that for a somewhat powerful machine (e.g. 128GB RAM, 32 cores - 
not uncommon for a computer cluster),
sorting a big file (e.g 10GB) will always allocate too little memory, and will 
always resort to saving temporary files on "/tmp".
The disk activity will result in slower sorting times than what could be done 
in an all-memory sort.

Based on this: 
http://lists.gnu.org/archive/html/coreutils/2010-12/msg00084.html ,
perhaps it would be beneficial to consider the number of threads in the memory 
allocation ?

Regards,
 -gordon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]