[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and
From: |
Lennart Sorensen |
Subject: |
bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used) |
Date: |
Fri, 15 Aug 2014 17:05:23 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Aug 15, 2014 at 02:32:14PM -0600, Eric Blake wrote:
> 'info sort' says:
>
> The '--stable' ('-s') option
> disables this "last-resort comparison" so that lines in which all fields
> compare equal are left in their original relative order. The '--unique'
> ('-u') option also disables the last-resort comparison.
>
> and later on:
>
> '-u'
> '--unique'
>
> Normally, output only the first of a sequence of lines that compare
> equal. For the '--check' ('-c' or '-C') option, check that no pair
> of consecutive lines compares equal.
>
> This option also disables the default last-resort comparison.
>
> The commands 'sort -u' and 'sort | uniq' are equivalent, but this
> equivalence does not extend to arbitrary 'sort' options. For
> example, 'sort -n -u' inspects only the value of the initial
> numeric string when checking for uniqueness, whereas 'sort -n |
> uniq' inspects the entire line. *Note uniq invocation::.
OK I guess that does somewhat point out the behaviour.
> -u is the only option that implicitly enables -s.
>
> You are welcome to propose a patch to the documentation that would
> clarify the situation; we can reopen this bug if a patch materializes.
> Maybe even a change to 'sort --help' output to mention that -u implies
> -s (which would also feed the 'man sort' page).
I do wonder why there isn't an option to undo that implicit option,
but perhaps it would not actually make sense.
> The info page DOES mention this:
>
> '-n'
> '--numeric-sort'
> '--sort=numeric'
> Sort numerically. The number begins each line and consists of
> optional blanks, an optional '-' sign, and zero or more digits
> possibly separated by thousands separators, optionally followed by
> a decimal-point character and zero or more digits. An empty number
> is treated as '0'. The 'LC_NUMERIC' locale specifies the
> decimal-point character and thousands separator. By default a
> blank is a space or a tab, but the 'LC_CTYPE' locale can change
> this.
>
> The --help output is intentionally terse, so I don't know what we could
> do there to make it more obvious without exploding the size of what is
> supposed to be brief.
Well I always thought info was meant to be complete documentation.
I see nothing in the above that makes me think it would ignore the part
of the line that isn't a number. The part in -u does seem to point out
that this is the behaviour.
I think this might be the first time I ever used -n when the input was
not pure numbers, so I never hit this before.
--
Len Sorensen