bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sort: Add --threads option, which parallelizes internal sort


From: Jim Meyering
Subject: Re: [PATCH] sort: Add --threads option, which parallelizes internal sort.
Date: Thu, 02 Apr 2009 13:06:43 +0200

Ralf Wildenhues wrote:
> Hi Paul, all,
> Paul Eggert writes:
>>
>> This patch is by Glen Lenker, Matt Pham, Benjamin Nuernberger, Sky
>> Lin, TaeSung Roh, and Paul Eggert.  It adds support for parallelism
>> within an internal sort.  On our simple tests on a 2-core desktop x86,
>> overall performance improved by roughly a factor of 1.6.
>
> This is too interesting to pass up.
>
> Example run, on an 8-way, and with cat'ed instances of the dictionary,
> on tmpfs, timings best of three:
>
> runtime [s]     threads
> file size [MB]  1       2       4       8
> 1               0.06    0.04    0.03    0.04
> 2               0.13    0.09    0.07    0.07
> 4               0.28    0.20    0.16    0.15
> 8               0.61    0.43    0.34    0.32
> 16              1.34    0.94    0.74    0.72
> 32              3.00    2.06    1.63    1.57
> 64              6.36    4.38    3.44    3.32
> 128            13.49    9.30    7.13    7.24
> 256            28.62   19.49   15.17   15.18
>
> Here's the abbreviated 'time' output for the last row:
>
> 26.95user 1.67system 0:28.62elapsed 100%CPU
> 30.78user 1.98system 0:19.49elapsed 168%CPU
> 37.41user 2.04system 0:15.17elapsed 260%CPU
> 60.68user 2.79system 0:15.18elapsed 417%CPU
>
> It suggests to me that too much time is spent busy-waiting in pthread_join,
> or that sort is computing too much (I haven't looked at the patch in detail).
>
> Also, I'd have expected the rate going from 1 to 2 threads to get at least
> a bit better with bigger file size, but it remains remarkably constant,
> around 1.45 for this setup.  What am I missing?

Thanks for doing that, Ralf.
Just to give everyone a heads-up,
I'm a little reluctant to apply this patch in its current
state, since it's not achieving reasonable efficiency.

I noticed that running the sole test,

    env time make RUN_VERY_EXPENSIVE_TESTS=yes check -C tests \
      TESTS=misc/sort-benchmark-random VERBOSE=yes

took about 36s on a pretty fast system.

changing the inner loop to this
    print ((map {chr(32+rand(94))} (1..100)) , "\n");
reduced that to 25s

I changed it to write "exp" from within the script
that generated the data, but that didn't make a big difference.

Finally I realized that those 50M calls to "rand" were the
problem, and only about 4% of them were actually ever used
while sorting, since 94^4 is far more than 500k.

Adjusting accordingly brought the test's run time down to ~8s.
However, the resulting sort invocation took barely 1 second here,
and that's obviously too small to be useful.  Ramping up to 5M lines,
the resulting test takes almost 2 minutes and the sort itself took 34s
on this particular quad-core system.

A more interesting test would be to ensure that when
run on a multi-core system sorting with --threads=2
is at least X% faster than sorting with --threads=1.

Also, it'd be nice to add the following:
  - a test that uses --threads=N to exercise the new option-parsing code,
      though if you adjust as suggested to compare --threads=N for
      N=1&2, that's not needed.
  - a NEWS item


>From b1332b9b429dfa9a8bba7286a78bbc7e37a2e419 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 2 Apr 2009 12:35:08 +0200
Subject: [PATCH] tests: make sort-benchmark-random more efficient, then use a 
larger test

* tests/misc/sort-benchmark-random: cut 75% off run time, then
increase input size by a factor of 10, so that sort runs for more
than 1 second on fast hardware.
---
 tests/misc/sort-benchmark-random |   42 +++++++++++++++++++-------------------
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/tests/misc/sort-benchmark-random b/tests/misc/sort-benchmark-random
index 2bace4f..0c4d61d 100755
--- a/tests/misc/sort-benchmark-random
+++ b/tests/misc/sort-benchmark-random
@@ -34,33 +34,33 @@ export LC_ALL
 fail=0

 perl -e '
-my $num_lines = 500000;
+my $num_lines = 5000000;
 my $length = 100;

+# Only the first few bytes of each line need to be random.
+# the remaining bytes are solely to simulate a more realistic work load.
+my $rand_bytes = 5;  # per line, so that 94^5 >> $num_lines
+
+my @line;
 for (my $i=0; $i < $num_lines; $i++)
 {
-    for (my $j=0; $j < $length; $j++)
-    {
-       printf "%c", 32 + rand(94);
-    }
-    print "\n";
-}' > in || framework_failure
-
-# We need to generate a lot of data for sort to show a noticeable
-# improvement in performance. Sorting it in PERL may take awhile.
-
-perl -e '
-open (FILE, "<in");
-my @list = <FILE>;
-print sort(@list);
-close (FILE);
-' > exp || framework_failure
-
-#start=$(date +%s)
+    my $line = join ("", map {chr(32+rand(94))} (1..$rand_bytes))
+                . ("0" x ($length - $rand_bytes)) . "\n";
+    print $line;
+    push @line, $line;
+}
+END {
+  open FH, ">", "exp" or die;
+  print FH sort @line;
+  close FH or die;
+}
+' > in || framework_failure
+
+start=$(date +%s)
 time sort in > out || fail=1
-#duration=$(expr $(date +%s) - $start)
+duration=$(expr $(date +%s) - $start)

-#echo sorting the random data took $duration seconds
+echo sorting the random data took $duration seconds

 compare out exp || fail=1

--
1.6.2.rc1.285.gc5f54




reply via email to

[Prev in Thread] Current Thread [Next in Thread]