[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6366: join can't join on numeric fields
From: |
Alex Shinn |
Subject: |
bug#6366: join can't join on numeric fields |
Date: |
Wed, 9 Jun 2010 10:47:05 +0900 |
2010/6/8 Pádraig Brady <address@hidden>:
> On 07/06/10 06:19, Alex Shinn wrote:
>>
>> Ideally join should be able to handle files sorted in any order
>> that sort provides, but as a bare minimum it should at least
>> be able to join files sorted on numeric fields.
>
> Well if there were no aliases in the numbers, you could always
> sort the output numerically after the join if it was important.
By first sorting lexicographically, you mean?
In the use case I had, the data was already sorted
numerically. So whenever I want to join two files,
currently I have to do:
sort file1 > file1.tmp
sort file2 > file2.tmp
join file1.tmp file2.tmp | sort -n > out
rm -f file1.tmp file2.tmp
instead of just
join -n file1 file2 > out
In the small tools philosophy you want to avoid adding
redundancy, but in this case join isn't doing the same
thing as sort, it's just working with it better. Not to mention
the fact that sort is an expensive operation to have to
perform multiple times, not just an extra O(n) filter
to throw in the middle of a pipeline.
> However if you wanted to join "01" and "1" then your patch is required.
> Are numeric aliases common enough to warrant this? I think so.
Leading zeros may not be so common, but don't forget
"1.0" and "1" or "1e2" and "100" and "100.0", etc.
> I'd use -g, --general-numeric to correspond with `sort`.
Yes, that's probably better.
--
Alex