coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Does sort handle -t / correctly


From: Peng Yu
Subject: Re: Does sort handle -t / correctly
Date: Fri, 17 Apr 2015 12:03:25 -0500

On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake <address@hidden> wrote:
> On 04/17/2015 10:10 AM, Peng Yu wrote:
>> Hi, I got the following results when I call sort with -t /. It seems
>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
>> using sort correctly?
>
> Your assumption is correct - you are using sort incorrectly, by failing
> to take locales into account, and by failing to limit the amount of data
> being compared to single field widths.

Thanks for the explanation.

If I don't know the number of fields, but I want to sort according to
all fields (from 1 to whatever the max number of fields), is there a
way to do it?

>> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4
>> a
>> a!
>> a/1.txt
>> aB
>> ab
>
> sort --debug is your friend:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4
> sort: using ‘en_US.UTF-8’ sorting rules
> a
> _
>  ^ no match for key
>  ^ no match for key
>  ^ no match for key
> _
> a!
> __
>   ^ no match for key
>   ^ no match for key
>   ^ no match for key
> __
> a/1.txt
> _______
>   _____
>        ^ no match for key
>        ^ no match for key
> _______
> ab
> __
>   ^ no match for key
>   ^ no match for key
>   ^ no match for key
> __
> aB
> __
>   ^ no match for key
>   ^ no match for key
>   ^ no match for key
> __
>
>
> As shown in the debug trace, the line 'a!' sorts prior to the line
> 'a!1.txt' because your first sort key is the entire line, and in the
> locale you are using (where both '!' and '/', and also '.', are ignored
> in collation orders), the collation string "a" really does come before
> "a1txt".
>
> What you REALLY want is to limit your sorting to a single field at a
> time (-k1,1 rather than -k), as in:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2
> sort: using ‘en_US.UTF-8’ sorting rules
> a
> _
>  ^ no match for key
> _
> a/1.txt
> _
>   _____
> _______
> a!
> __
>   ^ no match for key
> __
> ab
> __
>   ^ no match for key
> __
> aB
> __
>   ^ no match for key
> __
>
>
> Or additionally, to limit your sorting to a locale that does not discard
> punctuation as unimportant, as in:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1
> -k 2
> sort: using simple byte comparison
> a
> _
>  ^ no match for key
> _
> a/1.txt
> _
>   _____
> _______
> a!
> __
>   ^ no match for key
> __
> aB
> __
>   ^ no match for key
> __
> ab
> __
>   ^ no match for key
> __
>
>
> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]