bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#32099: New uniq option for speed


From: Paul Eggert
Subject: bug#32099: New uniq option for speed
Date: Mon, 9 Jul 2018 09:35:35 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

Kingsley G. Morse Jr. wrote:
     $ echo -e "b\na\nb" | awk '!seen[$0]++'

It basically avoids sorting by using hashed
indexes into an associative array to find
previously seen values in about O(N) time.

No, it's still O(N log N) because hash table lookup is really O(log N), despite what many textbooks say. Though no doubt we could make it faster than than the sorting pipeline, it wouldn't be algorithmically faster. See, for example:

https://lemire.me/blog/2009/08/18/do-hash-tables-work-in-constant-time/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]