[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#11006: But in sort or WAD?
From: |
Pádraig Brady |
Subject: |
bug#11006: But in sort or WAD? |
Date: |
Tue, 13 Mar 2012 12:43:03 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 |
On 03/13/2012 12:29 PM, Eric Blake wrote:
> tag 11006 notabug
> thanks
>
> On 03/13/2012 06:20 AM, Philipp Thomas wrote:
>> I got this bug report for coreutils 8.14:
>>
>> ----------------------------------------------------
>>
>> export LANG=en_US.UTF-8
>> { echo 16301 3.574885; echo 163 0.171036; } | sort
>>
>> Produces
>>
>> 16301 3.574885
>> 163 0.171036
>>
>>
>> which is incorrect. The lines should be in the other order
>>
>> With "LANG=C" it works correctly.
>>
>> ----------------------------------------------------
>>
>> Is this really a bug or is this because of differing collating rules?
>
> This is correct behavior, and not a bug in sort. The use of LANG=C to
> switch the behavior is indeed intended, as the en_US.UTF-8 really does
> collate with punctuation and whitespace elided, where '163013' is before
> '163017'. I suggest you point the original poster to the FAQ.
> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
What Eric said is correct, but note it's the en_US locale rather than anything
UTF8 specific that is causing this:
$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort --debug
sort: using `en_US' sorting rules
16301 3.574885
______________
163 0.171036
____________
We were wondering about updating the --debug option to make this apparent,
though that was thought too invasive for the benefit provided.
The following confirms that the ' ' and '.' are discounted from the sort:
$ { echo 16301 3.574885; echo 163 0.121036; } | LANG=en_US sort --debug
sort: using `en_US' sorting rules
163 0.121036
____________
16301 3.574885
______________
Also note above that the whole line is compared.
If you want to compare only field 1 first:
$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort -k1,1 --debug
sort: using `en_US' sorting rules
163 0.171036
___
____________
16301 3.574885
_____
______________
Or only field 1 in isolation:
$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort -k1,1 -s --debug
sort: using `en_US' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
163 0.171036
___
16301 3.574885
_____
Or you can implicitly restrict to field 1 with a numeric sort like:
$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort -n --debug
sort: using `en_US' sorting rules
163 0.171036
___
16301 3.574885
_____
cheers,
Pádraig.