bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6529: --key option problem


From: Victor Grishchenko
Subject: bug#6529: --key option problem
Date: Mon, 28 Jun 2010 20:42:01 +0200

On 28 June 2010 18:07, Eric Blake <address@hidden> wrote:
> On 06/28/2010 08:26 AM, Victor Grishchenko wrote:
> Thanks for the report.  However, I don't think this is a bug in sort,
> but rather a misunderstanding on your part.  Your command says to use as
> your primary key the substring consisting of fields 17 through 30, and
> as secondary key the entire line.

My fault.
Probably, it makes sense to reference the POS format explanation from
the -k option description.

> What did you intend to sort by?  If you were typing 17,30 thinking you
> were getting bytes instead of fields, thus meaning:
>> 0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132)
>  ................^^^^^^^^^^^^^^..................

Well, that would be closer to the intended result.
As I see now, I need --key=2 --stable, i.e. from the 2nd field till
the end, stable.

By the way, regarding the LC_ALL warning at the man page.
Me and my colleague have "independently discovered", that non-C
locales might penalize sort performance by an order of magnitude.
Probably, it makes sense to add that to the warning.

$ time ( gzcat vtt2_98.gz | LC_ALL=ru_RU.UTF-8 sort > /dev/null )

real    1m52.153s
user    1m41.614s
sys     0m1.395s
$ time ( gzcat vtt2_98.gz | LC_ALL=C sort > /dev/null )

real    0m10.096s
user    0m4.255s
sys     0m1.186s

> Also, the next version of coreutils will include 'sort --debug' that
> gives you a visual indication of what bytes are actually being compared,
> which would have given you a clue that your --key=17,30 was selecting
> data outside the range of your input.

That is really good, because the absence of any error reports
contributed to the confusion.

--
Victor





reply via email to

[Prev in Thread] Current Thread [Next in Thread]