coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] added ability in sort to skip n number of lines for each


From: Pádraig Brady
Subject: Re: [coreutils] added ability in sort to skip n number of lines for each file
Date: Tue, 23 Nov 2010 16:21:07 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 23/11/10 15:57, Jim Hester wrote:
> Below I have an updated proper patch, it is quite a bit larger than my
> first, but should address all of the concerns from Assaf and Pádraig.
> 
> My main motivation here is not just to make this common operation less
> annoying, it was mostly for increased performance.  I made a test
> dataset of 10 files with 3 header lines each and 500,000 lines to sort,
> then ran sort by using head and tail as Pádraig suggests, and then again
> using my implemented header skip on an 8 core machine.  Larger files
> seem to show similar speed up as well.  I believe this speedup comes
> from the fact that the multithreaded sort is trying to read from the
> buffer faster than tail can write to the buffer.
> 
>>time { (head -q -n 3 test[0-9] | head -n 3; tail -q -n+4 test[0-9] |
> ./sort -n ) > out2; }
> 
> real    0m51.660s
> user    2m0.324s
> sys     0m4.115s
> 
>>time ./sort -n -l 3 test[0-9] > out
> 
> real    0m31.834s
> user    2m17.775s
> sys     0m3.981s
>>diff out out2

The user time from the head;tail|sort
is lower than sort -l which suggests that
the first invocation was just waiting on disk?

Could you please repeat the test using precached data?

Currently the threads in `sort` are passed data that is read
sequentially from input files (as otherwise `sort`
would have to start worrying about device ids,
and /sys/block/<blockdev>/queue/rotational etc.
so as to not thrash disk heads). That kind of
logic is probably always best outside of `sort`.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]