coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is the 'associated field'? (about sort)


From: Eric Blake
Subject: Re: What is the 'associated field'? (about sort)
Date: Tue, 05 Jul 2011 08:30:22 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.10

On 07/01/2011 07:22 PM, Peng Yu wrote:
> Hi,
> 
> The following explanation for coreutils manual is not very clear.
> 
> "Also note that the ‘n’ modifier was applied to the field-end
> specifier for the first key. It
> would have been equivalent to specify ‘-k 2n,2’ or ‘-k 2n,2n’. All
> modifiers except ‘b’
> apply to the associated field, regardless of whether the modifier
> character is attached
> to the field-start and/or the field-end part of the key specifier."

Maybe it also helps to read the POSIX wording for this same feature:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html

The keydef argument is a restricted sort key field definition. The
format of this definition is:

field_start[type][,field_end[type]]

where field_start and field_end define a key field restricted to a
portion of the line (see the EXTENDED DESCRIPTION section), and type is
a modifier from the list of characters 'b' , 'd' , 'f' , 'i' , 'n' , 'r'
. The 'b' modifier shall behave like the -b option, but shall apply only
to the field_start or field_end to which it is attached. The other
modifiers shall behave like the corresponding options, but shall apply
only to the key field to which they are attached; they shall have this
effect if specified with field_start, field_end, or both. If any
modifier is attached to a field_start or to a field_end, no option shall
apply to either.

> 
> According to the manual and the following output, '-k 1,2n' is the
> same as '-k 1n,2' and '-k 1n,2n'. But isn't this syntax a little
> confusing? Shouldn't '-k 1n,2n' be the same as '-k1,1n -k2,2n'?

No.  '-k 1n,2' says to treat the combination of fields 1 and 2 as a
single numeric string, and is generally not what you want.  Meanwhile,
'-k 1n,1 -2n,2' says to treat both field 1 and field 2 as numeric
strings, where field 2 is used to break ties when field 1 compares equal.

> 
> Also I don't understand what "associated field" refers to?

The "associated field" is the -k1,1 portion.  Most letters can be
written on the start, end, or both positions of the -k1,1 field, at
which point that entire key takes on that option letter.  But b is
special, in that ignoring blanks of just the start or just the end makes
sense, so it only applies to the half of the associated -k1,1 field
where the b appears.

Perhaps you might gain further understanding of this by using the
--debug option.

> 
> 
>> cat input1.txt
> 1 10
> 1 9
>> sort --key=1,2n input1.txt
> 1 10
> 1 9

$ printf '1 10\n1 9\n' | LC_ALL=C sort --debug -k1,2n
sort: using simple byte comparison
sort: key 1 is numeric and spans multiple fields
1 10
_
____
1 9
_
___


Here, -k1,2n means to sort the single key comprised of fields 1 and 2 as
a number (but the number necessarily ends at the end of field 1), with a
fall-back sort to the lexicographical sort of the entire line.  '9' >
'1' lexicographically, even though "10" > "9" numerically.

>> sort --key=1n,2n input1.txt
> 1 10
> 1 9
>> sort --key=1,1n --key=2,2n input1.txt
> 1 9
> 1 10

That's better - you have now separated the two numeric keys, as
evidenced by --debug not warning you about spanning multiple fields:

$ printf '1 10\n1 9\n' | LC_ALL=C ../coreutils/src/sort --debug -k1,1n
-k2,2n
../coreutils/src/sort: using simple byte comparison
1 9
_
  _
___
1 10
_
  __
____


-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]