bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6366: join can't join on numeric fields


From: Alex Shinn
Subject: bug#6366: join can't join on numeric fields
Date: Wed, 9 Jun 2010 10:47:05 +0900

2010/6/8 Pádraig Brady <address@hidden>:
> On 07/06/10 06:19, Alex Shinn wrote:
>>
>> Ideally join should be able to handle files sorted in any order
>> that sort provides, but as a bare minimum it should at least
>> be able to join files sorted on numeric fields.
>
> Well if there were no aliases in the numbers, you could always
> sort the output numerically after the join if it was important.

By first sorting lexicographically, you mean?
In the use case I had, the data was already sorted
numerically.  So whenever I want to join two files,
currently I have to do:

  sort file1 > file1.tmp
  sort file2 > file2.tmp
  join file1.tmp file2.tmp | sort -n > out
  rm -f file1.tmp file2.tmp

instead of just

  join -n file1 file2 > out

In the small tools philosophy you want to avoid adding
redundancy, but in this case join isn't doing the same
thing as sort, it's just working with it better.  Not to mention
the fact that sort is an expensive operation to have to
perform multiple times, not just an extra O(n) filter
to throw in the middle of a pipeline.

> However if you wanted to join "01" and "1" then your patch is required.
> Are numeric aliases common enough to warrant this? I think so.

Leading zeros may not be so common, but don't forget
"1.0" and "1" or "1e2" and "100" and "100.0", etc.

> I'd use -g, --general-numeric to correspond with `sort`.

Yes, that's probably better.

-- 
Alex





reply via email to

[Prev in Thread] Current Thread [Next in Thread]