coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Does sort handle -t / correctly


From: Eric Blake
Subject: Re: Does sort handle -t / correctly
Date: Fri, 17 Apr 2015 10:26:44 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 04/17/2015 10:10 AM, Peng Yu wrote:
> Hi, I got the following results when I call sort with -t /. It seems
> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
> using sort correctly?

Your assumption is correct - you are using sort incorrectly, by failing
to take locales into account, and by failing to limit the amount of data
being compared to single field widths.

> 
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4
> a
> a!
> a/1.txt
> aB
> ab

sort --debug is your friend:

$ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4
sort: using ‘en_US.UTF-8’ sorting rules
a
_
 ^ no match for key
 ^ no match for key
 ^ no match for key
_
a!
__
  ^ no match for key
  ^ no match for key
  ^ no match for key
__
a/1.txt
_______
  _____
       ^ no match for key
       ^ no match for key
_______
ab
__
  ^ no match for key
  ^ no match for key
  ^ no match for key
__
aB
__
  ^ no match for key
  ^ no match for key
  ^ no match for key
__


As shown in the debug trace, the line 'a!' sorts prior to the line
'a!1.txt' because your first sort key is the entire line, and in the
locale you are using (where both '!' and '/', and also '.', are ignored
in collation orders), the collation string "a" really does come before
"a1txt".

What you REALLY want is to limit your sorting to a single field at a
time (-k1,1 rather than -k), as in:

$ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2
sort: using ‘en_US.UTF-8’ sorting rules
a
_
 ^ no match for key
_
a/1.txt
_
  _____
_______
a!
__
  ^ no match for key
__
ab
__
  ^ no match for key
__
aB
__
  ^ no match for key
__


Or additionally, to limit your sorting to a locale that does not discard
punctuation as unimportant, as in:

$ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1
-k 2
sort: using simple byte comparison
a
_
 ^ no match for key
_
a/1.txt
_
  _____
_______
a!
__
  ^ no match for key
__
aB
__
  ^ no match for key
__
ab
__
  ^ no match for key
__


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]