coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort: optimal memory usage with multithreaded sort


From: Pádraig Brady
Subject: Re: Sort: optimal memory usage with multithreaded sort
Date: Tue, 15 Jan 2013 20:26:55 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 01/15/2013 07:07 PM, Assaf Gordon wrote:
Hello,

Sort's memory usage (specifically, sort_buffer_size() ) has been discussed few 
times before, but I couldn't find mention of the following issue:

If given a regular input file, sort tries to guesstimate the optimal buffer 
size based on the file size.
But this value is calculated for one thread (before sort got multi-threaded).
The default "--parallel" value is 8 (or less, if fewer cores are available) - 
which requires more memory.

The result is, that for a somewhat powerful machine (e.g. 128GB RAM, 32 cores - 
not uncommon for a computer cluster),
sorting a big file (e.g 10GB) will always allocate too little memory, and will always 
resort to saving temporary files on "/tmp".
The disk activity will result in slower sorting times than what could be done 
in an all-memory sort.

Based on this: 
http://lists.gnu.org/archive/html/coreutils/2010-12/msg00084.html ,
perhaps it would be beneficial to consider the number of threads in the memory 
allocation ?

It's a fair point, but note since then, the default mem allocation
for sort has doubled and then subsequently capped at 75% of physical memory
due to external factors, as discussed at:
http://lists.gnu.org/archive/html/coreutils/2012-06/msg00019.html

It's often easy to look at sort's performance in isolation,
and one must be careful to consider other system loads
and architectures too.

thanks,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]