coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort with header/skip-lines support


From: Pádraig Brady
Subject: Re: Sort with header/skip-lines support
Date: Fri, 11 Jan 2013 18:13:00 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 01/11/2013 04:10 PM, Assaf Gordon wrote:
Pádraig Brady wrote, On 01/10/2013 07:11 PM:
On 01/10/2013 09:57 PM, Assaf Gordon wrote:

I'd like to re-visit an old issue: adding header-line/skip-lines support to 
'sort'.

[...]

[2] - no pipe support: 
http://lists.gnu.org/archive/html/bug-coreutils/2007-07/msg00215.html

But recent sed can be used for this like: `seq -u 1q`
http://git.sv.gnu.org/gitweb/?p=sed.git;a=commit;h=737ca5e
Note that commit is 4 years old, but only recently released sed 4.2.2 contains 
it.

Thanks for the tip.

Note one can also add -n to the sed command,
to get it to strip the header entirely.


The following indeed works with sed 4.2.2 ( on linux 3.2 ):
    $ ( echo 99 ; seq 10 ) | ( sed -u 1q ; sort -n )

But I'm wondering (as per the link above [2]) if this is posix compliant and 
stable (i.e. can this be trusted to work everytime, even on non-linux 
machines?).

No `sed -u` with this functionality is not portable.
Though it's more portable than `sort --header`
given that it already exists :)

[3] - Jim's patch: 
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00091.html

Thanks for collating the previous threads on this subject.

I'm on the fence on how warranted this is TBH.
We'd need stronger arguments for it I think.


I'll collate the arguments as well :)

If the "sed" method works reliably, it leaves error checking: how to reliably 
check for error in such a pipe (inside a posix shell script)?
The closest code I found is this: https://github.com/cheusov/pipestatus which 
seems very long.


For completeness, showing the current options for such cases...

So additional arguments are:
1. robust error checking
2. simplicity of use: if 'sort' had this option built-in, the following use cases would 
"just work". with sed+sort, it will require different invocations (and probably 
different pitfalls):
   a. one input file

(sed -u 1q && sort) < file

   b. one input pipe

seq 10 | ( sed -u 1q && sort -n )

   c. multiple input files (without resorting to pipe, as this will cause 
'sort' to use different amount of memory)

So for multiple files, we'd only take the header from the first, I suppose:

(head -q -n1 file.* | head -n1; tail -q -n+2 file.* | sort)

There is also the --merge case.
This is especially awkward with the per file constructs:

(head -q -n1 file.*; sort -m <(tail -n+2 file.1) <(tail -n+2 file.1))

   d. specifying output file (with "-o")

How does -o impact things?


Thanks,
  -gordon

As a side note, I have a hackish Perl script that wraps sort and consumes the 
first line, and it's basically works-for-me kind of script - but I just wish it 
wasn't necessary:
https://github.com/agordon/bin_scripts/blob/master/scripts/sort-header.in

thanks for collating the arguments for --header.
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]