bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: efficient version of 'sort | uniq -c | sort -n'?


From: Philip Rowlands
Subject: Re: efficient version of 'sort | uniq -c | sort -n'?
Date: Mon, 21 May 2007 21:48:13 +0100 (BST)

On Mon, 21 May 2007, Matthew Woehlke wrote:

I thought about that, but /maximum/ efficiency is only achievable doing everything in one go. Anyway I think 'countitems' would still be a big improvement; I would do that as 'sort --unique-with-count' (preferably aliased 'sort -U') since IMO this is a missing feature of 'sort -u'.

You don't really want to do the first sort at all - it's just a convenient way of creating the buckets. The relative order of each bucket is unimportant, but that's what sort spends a long time calculating.

A fundamentally more efficient approach would be something like:

perl -lne '$bucket{$_}++; END { foreach $key (keys %bucket) { print "$bucket{$key} 
$key" } }' | \
  sort -n

The trailing "sort" could be done inside perl, but it doesn't help the (algorithmic) efficiency, and we're not playing perl golf...


Cheers,
Phil




reply via email to

[Prev in Thread] Current Thread [Next in Thread]