[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6529: --key option problem
From: |
Victor Grishchenko |
Subject: |
bug#6529: --key option problem |
Date: |
Mon, 28 Jun 2010 20:42:01 +0200 |
On 28 June 2010 18:07, Eric Blake <address@hidden> wrote:
> On 06/28/2010 08:26 AM, Victor Grishchenko wrote:
> Thanks for the report. However, I don't think this is a bug in sort,
> but rather a misunderstanding on your part. Your command says to use as
> your primary key the substring consisting of fields 17 through 30, and
> as secondary key the entire line.
My fault.
Probably, it makes sense to reference the POS format explanation from
the -k option description.
> What did you intend to sort by? If you were typing 17,30 thinking you
> were getting bytes instead of fields, thus meaning:
>> 0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132)
> ................^^^^^^^^^^^^^^..................
Well, that would be closer to the intended result.
As I see now, I need --key=2 --stable, i.e. from the 2nd field till
the end, stable.
By the way, regarding the LC_ALL warning at the man page.
Me and my colleague have "independently discovered", that non-C
locales might penalize sort performance by an order of magnitude.
Probably, it makes sense to add that to the warning.
$ time ( gzcat vtt2_98.gz | LC_ALL=ru_RU.UTF-8 sort > /dev/null )
real 1m52.153s
user 1m41.614s
sys 0m1.395s
$ time ( gzcat vtt2_98.gz | LC_ALL=C sort > /dev/null )
real 0m10.096s
user 0m4.255s
sys 0m1.186s
> Also, the next version of coreutils will include 'sort --debug' that
> gives you a visual indication of what bytes are actually being compared,
> which would have given you a clue that your --key=17,30 was selecting
> data outside the range of your input.
That is really good, because the absence of any error reports
contributed to the confusion.
--
Victor