coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency


From: Jim Meyering
Subject: Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
Date: Wed, 16 Mar 2011 16:32:32 +0100

Pádraig Brady wrote:
> # SUBTHREAD_LINES_HEURISTIC = 4
> $ for i in $(seq 22); do
>     j=$((2<<$i))
>     yes | head -n$j > t.sort
>     strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 |
>     join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
>     sed -n "s/\([0-9]*\) clone/$j\t\1/p"
>   done
> 4       1
> 8       3
> 16      7
> 32      15
> 64      15
> 128     15
> 256     15
> 512     15
> 1024    15
> 2048    15
> 4096    15
> 8192    15
> 16384   15
> 32768   15
> 65536   15
> 131072  15
> 262144  15
> 524288  15
> 1048576 15
> 2097152 15
> 4194304 30
> 8388608 45
>
> # As above, but add -S1M option to sort
>
> 4       1
> 8       3
> 16      7
> 32      15
> 64      15
> 128     15
> 256     15
> 512     15
> 1024    15
> 2048    15
> 4096    15
> 8192    15
> 16384   30
> 32768   45
> 65536   90
> 131072  165
> 262144  315
> 524288  622
> 1048576 1245
> 2097152 2475
> 4194304 4935
> 8388608 9855
>
> With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no threads 
> as
> nlines never gets above 12787 (there looks to be around 80 bytes overhead per 
> line?).
> Only when -S >= 12M do we get nlines high enough to create threads.

Thanks for pursuing this.
Here's a proposed patch to address the other problem.
It doesn't have much of an effect (any?) on your
issue when using very little memory, but when a sort user
specifies -S1M, I think they probably want to avoid the
expense (memory) of going multi-threaded.

What do you think?

>From 4f591fdd0bb78f621d2b72021de883fc4df1e179 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 16 Mar 2011 16:09:31 +0100
Subject: [PATCH] sort: avoid memory pressure of 130MB/thread when reading
 from pipe

* src/sort.c (INPUT_FILE_SIZE_GUESS): Decrease initial allocation
factor used to size buffer used when reading a non-regular file.
For motivation, see discussion here:
http://thread.gmane.org/gmane.comp.gnu.coreutils.general/878/focus=887
---
 src/sort.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index 9b8666a..07d6765 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -319,8 +319,12 @@ static size_t merge_buffer_size = MAX 
(MIN_MERGE_BUFFER_SIZE, 256 * 1024);
    specified by the user.  Zero if the user has not specified a size.  */
 static size_t sort_size;

-/* The guessed size for non-regular files.  */
-#define INPUT_FILE_SIZE_GUESS (1024 * 1024)
+/* The initial allocation factor for non-regular files.
+   This is used, e.g., when reading from a pipe.
+   Don't make it too big, since it is multiplied by ~130 to
+   obtain the size of the actual buffer sort will allocate.
+   Also, there may be 8 threads all doing this at the same time.  */
+#define INPUT_FILE_SIZE_GUESS (128 * 1024)

 /* Array of directory names in which any temporary files are to be created. */
 static char const **temp_dirs;
--
1.7.4.1.430.g5aa4d



reply via email to

[Prev in Thread] Current Thread [Next in Thread]