bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #27373] sort -h performs incorrectly if in utf8 locale.


From: Michael Speer
Subject: Re: [bug #27373] sort -h performs incorrectly if in utf8 locale.
Date: Thu, 3 Sep 2009 11:15:19 -0400

On Sep 3, 2009 10:08am, C de-Avillez <address@hidden> wrote:
>
> Interestingly, it works here with the string you used, and fails in the
> following case:
>
> ~ $ for LANG in $(locale -a); do printf "A b\nAA b\nAAA b\n"   | sort
> -h|tr -d '\n'; echo; done | uniq -c
>       1 A bAA bAAA b
>      21 AAA bAA bA b
>       1 A bAA bAAA b
>       2 AAA bAA bA b
> ~ $
>

I am receiving the same different sort for the C and POSIX locales
with and without the -h option.  These two discrepancies are due, I
believe, to the collation functions for the C and POSIX locales
specifying binary ordering and hence a space being sorted before an
'A'.  In the other locales it seems that longer words are given
preference.

~/src/core/fake$ locale -a | ./bin/sort | while read LANG ; do printf
"%10s " $LANG ; echo -e 'A b\nAA b\nAAA b\n' | ./bin/sort -h | tr -d
'\n' ; echo ; done ;
         C A bAA bAAA b
     POSIX A bAA bAAA b
en_AU.utf8 AAA bAA bA b
en_BW.utf8 AAA bAA bA b
en_CA.utf8 AAA bAA bA b
en_DK.utf8 AAA bAA bA b
en_GB.utf8 AAA bAA bA b
en_HK.utf8 AAA bAA bA b
en_IE.utf8 AAA bAA bA b
     en_IN AAA bAA bA b
     en_NG AAA bAA bA b
en_NZ.utf8 AAA bAA bA b
en_PH.utf8 AAA bAA bA b
en_SG.utf8 AAA bAA bA b
en_US.utf8 AAA bAA bA b
en_ZA.utf8 AAA bAA bA b
en_ZW.utf8 AAA bAA bA b
~/src/core/fake$ locale -a | ./bin/sort | while read LANG ; do printf
"%10s " $LANG ; echo -e 'A b\nAA b\nAAA b\n' | ./bin/sort | tr -d '\n'
; echo ; done ;
         C A bAA bAAA b
     POSIX A bAA bAAA b
en_AU.utf8 AAA bAA bA b
en_BW.utf8 AAA bAA bA b
en_CA.utf8 AAA bAA bA b
en_DK.utf8 AAA bAA bA b
en_GB.utf8 AAA bAA bA b
en_HK.utf8 AAA bAA bA b
en_IE.utf8 AAA bAA bA b
     en_IN AAA bAA bA b
     en_NG AAA bAA bA b
en_NZ.utf8 AAA bAA bA b
en_PH.utf8 AAA bAA bA b
en_SG.utf8 AAA bAA bA b
en_US.utf8 AAA bAA bA b
en_ZA.utf8 AAA bAA bA b
en_ZW.utf8 AAA bAA bA b
~/src/core/fake$




reply via email to

[Prev in Thread] Current Thread [Next in Thread]