coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to sort and count efficiently?


From: Peng Yu
Subject: Re: How to sort and count efficiently?
Date: Sun, 30 Jun 2019 12:10:34 -0500

The problem with this kind of awk program is that everything will be loaded
to memory. But bare `sort` use external files to save memory. When the hash
in awk is too large, accessing it can become very slow (maybe due to
potential cache miss or slow down of hash as a function of hash size).

On Sun, Jun 30, 2019 at 11:52 AM Assaf Gordon <address@hidden> wrote:

> Correcting myself:
>
> On Sun, Jun 30, 2019 at 10:08:46AM -0600, Assaf Gordon wrote:
> > On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote:
> > >
> > > I have a long list of string (each string is in a line). I need to
> > > count the number of appearance for each string.
> > >
> > > [...] Does anybody know any better way
> > > to make the sort and count run more efficiently?
> > >
> >
> > Or using gnu awk:
>
> use 'asorti' instead of 'asort', with the two-parameter variant:
>
>
>   $ printf "%s\n" a c b b b b b b c \
>         | awk 'a[$1]++ {}
>                END { n = asorti(a,b)
>                      for (i = 1; i <= n; i++) {
>                         print b[i], a[b[i]]
>                      }
>                    }'
>   a 1
>   b 6
>   c 2
>
>
> For more details see:
>
> https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html#Array-Sorting-Functions
>
> -assaf
>
> --
Regards,
Peng


reply via email to

[Prev in Thread] Current Thread [Next in Thread]