coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency


From: Pádraig Brady
Subject: Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
Date: Wed, 16 Mar 2011 13:33:08 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 16/03/11 12:07, Jim Meyering wrote:
> Pádraig Brady wrote:
>> I've not fully analyzed this yet, and I'm not saying it's wrong,
>> but the above change seems to have a large effect on thread
>> creation when smaller buffers are used (you hinted previously
>> that being less aggressive with the amount of mem used by default
>> might be appropriate, and I agree).
>>
>> Anyway with the above I seem to need a buffer size more
>> than 10M to have any threads created at all.
>>
>> Testing the original 4 lines heuristic with the following, shows:
>> (note I only get > 4 threads after 4M of input, not 7 for 16 lines
>> as indicated in NEWS).
>>
>> $ for i in $(seq 30); do
>>>   j=$((2<<$i))
>>>   yes | head -n$j > t.sort
>>>   strace -c -e clone sort --parallel=16 t.sort -o /dev/null 2>&1 |
>>>    join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
>>>    sed -n "s/\([0-9]*\) clone/$j\t\1/p"
>>> done
>> 4       1
>> 8       2
>> 16      3
>> 32      4
>> 64      4
>> 128     4
> ...
>> 1048576 4
>> 2097152 4
>> 4194304 8
>> 8388608 16
>>
>> When I restrict the buffer size with '-S 1M', many more threads
>> are created (a max of 16 in parallel with the above command)
>> 4       1
>> 8       2
>> 16      3
>> 32      4
>> 64      4
>> 128     4
>> 256     4
>> 512     4
>> 1024    4
>> 2048    4
>> 4096    4
>> 8192    4
>> 16384   8
>> 32768   12
>> 65536   24
>> 131072  44
>> 262144  84
>> 524288  167
>> 1048576 332
>> 2097152 660
>> 4194304 1316
>> 8388608 2628
>>
>> After increasing the heuristic to 128K, I get _no_ threads until -S > 10M
>> and this seems to be independent of line length.
> 
> Thanks for investigating that.
> Could strace -c -e clone be doing something unexpected?
> When I run this (without my patch), it would use 8 threads:
> 
>     seq 16 > in; strace -ff -o k ./sort --parallel=16 in -o /dev/null
> 
> since it created eight k.PID files:
> 
>     $ ls -1 k.*|wc -l
>     8
> 
> Now, for such a small file, it does not call clone at all.
> 

Oops, yep I forget to add -f to strace.
So NEWS is correct.

# SUBTHREAD_LINES_HEURISTIC = 4
$ for i in $(seq 22); do
    j=$((2<<$i))
    yes | head -n$j > t.sort
    strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 |
    join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
    sed -n "s/\([0-9]*\) clone/$j\t\1/p"
  done
4       1
8       3
16      7
32      15
64      15
128     15
256     15
512     15
1024    15
2048    15
4096    15
8192    15
16384   15
32768   15
65536   15
131072  15
262144  15
524288  15
1048576 15
2097152 15
4194304 30
8388608 45

# As above, but add -S1M option to sort

4       1
8       3
16      7
32      15
64      15
128     15
256     15
512     15
1024    15
2048    15
4096    15
8192    15
16384   30
32768   45
65536   90
131072  165
262144  315
524288  622
1048576 1245
2097152 2475
4194304 4935
8388608 9855

With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no threads as
nlines never gets above 12787 (there looks to be around 80 bytes overhead per 
line?).
Only when -S >= 12M do we get nlines high enough to create threads.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]