[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and
From: |
Eric Blake |
Subject: |
bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used) |
Date: |
Fri, 15 Aug 2014 14:32:14 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 |
On 08/15/2014 02:22 PM, Lennart Sorensen wrote:
> OK I accept that it is correct behaviour.
>
> The documentation on the other hand is awful in that case. I went and
> checked the documentation to try and make sense of what it was doing
> before sending the report, and there was nothing there that gave any
> hint that this was expected behaviour.
'info sort' says:
The '--stable' ('-s') option
disables this "last-resort comparison" so that lines in which all fields
compare equal are left in their original relative order. The '--unique'
('-u') option also disables the last-resort comparison.
and later on:
'-u'
'--unique'
Normally, output only the first of a sequence of lines that compare
equal. For the '--check' ('-c' or '-C') option, check that no pair
of consecutive lines compares equal.
This option also disables the default last-resort comparison.
The commands 'sort -u' and 'sort | uniq' are equivalent, but this
equivalence does not extend to arbitrary 'sort' options. For
example, 'sort -n -u' inspects only the value of the initial
numeric string when checking for uniqueness, whereas 'sort -n |
uniq' inspects the entire line. *Note uniq invocation::.
>
> Why does it have a blob talking about which options implicitly enable -s,
> rather than mention that in the documentation for the options that do it.
-u is the only option that implicitly enables -s.
You are welcome to propose a patch to the documentation that would
clarify the situation; we can reopen this bug if a patch materializes.
Maybe even a change to 'sort --help' output to mention that -u implies
-s (which would also feed the 'man sort' page).
>
> Why does it not mention for -n that anything that isn't a number is
> ignored and treated as if it didn't exist when it comes to deciding
> things like uniqueness? Are people expected to go read the posix
> standard instead?
The info page DOES mention this:
'-n'
'--numeric-sort'
'--sort=numeric'
Sort numerically. The number begins each line and consists of
optional blanks, an optional '-' sign, and zero or more digits
possibly separated by thousands separators, optionally followed by
a decimal-point character and zero or more digits. An empty number
is treated as '0'. The 'LC_NUMERIC' locale specifies the
decimal-point character and thousands separator. By default a
blank is a space or a tab, but the 'LC_CTYPE' locale can change
this.
The --help output is intentionally terse, so I don't know what we could
do there to make it more obvious without exploding the size of what is
supposed to be brief.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature