[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort --ignore-case option changes underscore sort position
From: |
Bob Proulx |
Subject: |
Re: sort --ignore-case option changes underscore sort position |
Date: |
Thu, 21 Aug 2008 23:27:13 -0600 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
I am CC'ing address@hidden because that is actually the home
mailing list for the sort command. Followups should go there and I
have set Mail-Followup-To: appropriately.
jrw32982 wrote:
> I couldn't find this previously reported. I didn't see anything in
> the documentation which indicates that this is expected behavior.
Thank you for the report. However even though this is perhaps
surprising I think it does count as expected behavior.
> To replicate the bug:
Thank you for that very nice test case! That was excellent.
> $ sort --version
> sort (coreutils) 5.2.1
> ...
> $ export LC_ALL=C
> $ { echo a_; echo ax; } | sort
> a_
> ax
> $ { echo a_; echo ax; } | sort --ignore-case
> ax
> a_
The documentation for --ignore-case explains what is happening.
In the man page for sort:
-f, --ignore-case
fold lower case to upper case characters
And of course the info documentation has the full authoritative
documentation.
`-f'
`--ignore-case'
Fold lowercase characters into the equivalent uppercase characters
when comparing so that, for example, `b' and `B' sort as equal.
The `LC_CTYPE' locale determines character types.
Therefore your test case:
{ echo a_; echo ax; } | sort --ignore-case
Is really the same as:
$ { echo a_; echo ax; } | sort
a_
ax
$ { echo A_; echo AX; } | sort
AX
A_
$ { echo A_; echo AX; } | sort --ignore-case
AX
A_
When using upper case you can see that it is equivalent to using the
--ignore-case option. Perhaps this should have been more accurately
called --convert-to-upper-case-before-sorting.
The surprising part might be realizing that underscore collates
between the upper and lower case letters when using the C/POSIX
standard sort ordering. That is the standard legacy behavior. It
does this along with [ \ ] ^ _ ` which all occur between Z and a in
the US-ASCII code table. To ignore these look at the
--dictionary-order option.
`-d'
`--dictionary-order'
Sort in "phone directory" order: ignore all characters except
letters, digits and blanks when sorting. The `LC_CTYPE' locale
determines character types.
And of course alternative sort ordering is provided by, for example,
the en_US.UTF-8 locale which orders in what amounts to dictionary
ordering.
Bob
- Re: sort --ignore-case option changes underscore sort position,
Bob Proulx <=