[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
2 issues with binning
From: |
Andreas Schamanek |
Subject: |
2 issues with binning |
Date: |
Sun, 19 Jun 2022 23:54:48 +0200 (CEST) |
Hi everyone,
Recently, I ran into some issues when I tried to calculate histograms
using an Awk script of my own. Eventually, I found that I needed to
fix my script. As I was digging deeper I started to look for
alternatives to do the binning. That's how I found datamash which I
wish I knew all along as it does so many things I frequently need. So,
thanks for this great tool.
Comparing the outputs of my script with those of datamash it seems I
hit 2 possible "bugs" in datamash. (Disclaimer: I am not a programmer
apart from some scripting skills.)
## Possible issue due to binary floating point arithmetics:
$ printf '%s\n' 4.19 4.2 4.21 | datamash --full bin:0.1 1
4.19 4.1
4.2 4.1
4.21 4.2
Of course, 4.2 should bin to 4.2, unless I mistaken.
## Possible issue with binning negative numbers:
$ printf '%s\n' 0 1 2 | datamash --full bin:2 1
0 0
1 0
2 2
For positive numbers, the bins are inclusive ("[") on the lower end,
exclusive on the upper end (")"), i.e. here they are [0,2) and
[2,4).
I expected this type of binning to continue for negative numbers, i.e.
that the bins left of [0,2) are [-2,0) and next one would be [-4,2).
However:
$ printf '%s\n' -2 -1 0 | datamash --full bin:2 1
-2 -4
-1 -2
0 0
I was expecting -2 to map to -2. Maybe, bin:1 shows my concerns even
better:
$ printf '%s\n' -2 -1 0 | datamash --full bin:1 1
-2 -3
-1 -2
0 0
Curious, what you think!
--
-- Andreas
:-)
- 2 issues with binning,
Andreas Schamanek <=