bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21000: coreutils 8.24 sort -h gets ordering wrong


From: Linda Walsh
Subject: bug#21000: coreutils 8.24 sort -h gets ordering wrong
Date: Wed, 15 Jul 2015 06:41:52 -0700
User-agent: Thunderbird

Pádraig Brady wrote:
tag 21000 wontfix
close 21000
stop

On 07/07/15 03:00, Christopher Samuel wrote:
Hi there,

When trying to sort with the -h option (--human-numeric-sort) it seems
to fail to get the ordering correct, for instance in a column of values
of memory usage from the Slurm HPC batch system you get this:

  2868768K
  2875504K
  3278652K
  3435484K
  3461744K
  4050208K
   419.50M
      421M
      422M
   447.50M
      451M
      467M
   478.50M
      479M
      496M
      998M
     1.09G
     1.31G
     1.31G
     1.31G




I.E. sort -h is not as general as you require.
You can leverage numfmt(1) though to do the required adjustments.
For example, copy/pasting this command:

g 420M
h 421M
i 422M
j 448M
k 451M
l 467M
m 479M
n 479M
o 496M
p 998M
q 1.1G
r 1.4G
s 1.4G
t 1.4G
a 2.9G
b 2.9G
c 3.3G
d 3.5G
e 3.5G
f 4.1G

But that's the wrong output.  sort  -h uses power-of-2 units.
And you expect users to use that as a workaround?

*puhshaw*... it's not that hard and this:
Paul Eggert wrote:

Looking at both would require arbitrary-precision arithmetic, something that 'sort' doesn't do (it does only arbitrary-precision comparison).

Arbitrary precision...that's a strawman argument IMHO...since the output
is targetted for 3-4 digits+ a suffix, one would just normalize them.
You can do up to exabytes in 64-bit integers (binary or decimal).

I added a file to mine as well: (file-{001-020}):

420M  file-006
421M  file-007
422M  file-008
448M  file-009
451M  file-010
467M  file-011
478M  file-012
479M  file-013
496M  file-014
998M  file-015
1.1G  file-016
1.3G  file-017
1.3G  file-018
1.3G  file-019
2.7G  file-000
2.7G  file-001
3.1G  file-002
3.3G  file-003
3.3G  file-004
3.9G  file-005
----  -----
~29G  TOTAL

The above is from a 10-y/o horrid perl prog, "hsort", that started in perl4.
I've thought about refactoring it, but it works, so it sits on a back-shelf.

Admittedly, I just added the normalization, but I never had a case that needed it before I saw the above input. I would presume that the O.P. (Christopher)
wouldn't find normalizing it to fewer digits to be a problem -- as long
as it sorts the numbers in the correct order...

My word... sort-h really does numeric compares within the same prefix?
It doesn't convert them to a number then sort, then reapply units?

Urg.....and I was so impressed that you guys finally added that to sort.
Oh well....*whistling innocently*....;-)

(it's not a completely trivial prog to write even in perl -- likely just
that much harder in C).
If you leave out support for Zetta and Yotta, you can do the rest in
64-bit integers...

Hey...I just wrote the output portion of that (and it has normalization)
in C++ for another prog...  It's in C++, but I wouldn't think it that
hard to reuse for C.  Yeah, I know, the output's the easier part.
but for 36 lines, it seems to work ok.


/********#*********#*********#*********#*********#*********#*********#*********/
// Produce num+binary or SI suffix  {{{

static char * Scale(char buf[], double value, const int scale = 1024) {
 static const char suffixes[] = { ' ', 'K', 'M', 'G', 'T', 'P', 'E' };
 CE int last_i = sizeof(suffixes) / sizeof(char) - 1;
 uns i;
 for (i = 0; value >= 999.5 && i < last_i; ++i ) value /= (double) scale;
 snprintf(buf, 10, value == 0.0  ? "0"
                   : value < 9.95  ? "%.1f%c"
                                   : "%.0f%c", value, suffixes[i]);
 return buf;
}

string Scale(double value, const int scale = 1024) {
 char buf[16];
 return string(Scale(buf, value, scale));
}

string Binary_Scale (double value) { return Scale(value); }
string Binary_Scale (uint64_t value) { return Binary_Scale((double) value); }
string Binary_Scale (int64_t value) { return Binary_Scale((double) value); }
string SI_Scale (uint64_t value) { return Scale((double) value, 1000); }

char * Binary_Scale (char buf[], double value) { return Scale(buf, value); }

char * Binary_Scale (char buf[], uint64_t value) {
 return Binary_Scale(buf, (double) value); }

char * Binary_Scale (char buf[], int64_t value) {
 return Binary_Scale(buf, (double) value); }

char * SI_Scale (char buf[], uint64_t value) {
 return Scale(buf, (double) value, 1000); }








reply via email to

[Prev in Thread] Current Thread [Next in Thread]