Re: Sort with header/skip-lines support

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort with header/skip-lines support

From:	Pádraig Brady
Subject:	Re: Sort with header/skip-lines support
Date:	Fri, 11 Jan 2013 18:13:00 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 01/11/2013 04:10 PM, Assaf Gordon wrote:

Pádraig Brady wrote, On 01/10/2013 07:11 PM:

On 01/10/2013 09:57 PM, Assaf Gordon wrote:


I'd like to re-visit an old issue: adding header-line/skip-lines support to 
'sort'.

[...]

[2] - no pipe support: 
http://lists.gnu.org/archive/html/bug-coreutils/2007-07/msg00215.html


But recent sed can be used for this like: `seq -u 1q`
http://git.sv.gnu.org/gitweb/?p=sed.git;a=commit;h=737ca5e
Note that commit is 4 years old, but only recently released sed 4.2.2 contains 
it.

Thanks for the tip.


Note one can also add -n to the sed command,
to get it to strip the header entirely.


The following indeed works with sed 4.2.2 ( on linux 3.2 ):
    $ ( echo 99 ; seq 10 ) | ( sed -u 1q ; sort -n )

But I'm wondering (as per the link above [2]) if this is posix compliant and 
stable (i.e. can this be trusted to work everytime, even on non-linux 
machines?).


No `sed -u` with this functionality is not portable.
Though it's more portable than `sort --header`
given that it already exists :)

[3] - Jim's patch: 
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00091.html


Thanks for collating the previous threads on this subject.

I'm on the fence on how warranted this is TBH.
We'd need stronger arguments for it I think.


I'll collate the arguments as well :)

If the "sed" method works reliably, it leaves error checking: how to reliably 
check for error in such a pipe (inside a posix shell script)?
The closest code I found is this: https://github.com/cheusov/pipestatus which 
seems very long.


For completeness, showing the current options for such cases...

So additional arguments are:
1. robust error checking
2. simplicity of use: if 'sort' had this option built-in, the following use cases would 
"just work". with sed+sort, it will require different invocations (and probably 
different pitfalls):
   a. one input file


(sed -u 1q && sort) < file

   b. one input pipe


seq 10 | ( sed -u 1q && sort -n )

   c. multiple input files (without resorting to pipe, as this will cause 
'sort' to use different amount of memory)


So for multiple files, we'd only take the header from the first, I suppose:

(head -q -n1 file.* | head -n1; tail -q -n+2 file.* | sort)

There is also the --merge case.
This is especially awkward with the per file constructs:

(head -q -n1 file.*; sort -m <(tail -n+2 file.1) <(tail -n+2 file.1))

   d. specifying output file (with "-o")


How does -o impact things?


Thanks,
  -gordon

As a side note, I have a hackish Perl script that wraps sort and consumes the 
first line, and it's basically works-for-me kind of script - but I just wish it 
wasn't necessary:
https://github.com/agordon/bin_scripts/blob/master/scripts/sort-header.in


thanks for collating the arguments for --header.
Pádraig

[Prev in Thread]

Current Thread

[Next in Thread]

Sort with header/skip-lines support, Assaf Gordon, 2013/01/10
- Re: Sort with header/skip-lines support, Pádraig Brady, 2013/01/10
  - Re: Sort with header/skip-lines support, Assaf Gordon, 2013/01/11
    - Re: Sort with header/skip-lines support, Pádraig Brady <=
    - Re: Sort with header/skip-lines support, Assaf Gordon, 2013/01/11
    - Re: Sort with header/skip-lines support, Pádraig Brady, 2013/01/11

Prev by Date: Re: Sort with header/skip-lines support
Next by Date: Re: Sort with header/skip-lines support
Previous by thread: Re: Sort with header/skip-lines support
Next by thread: Re: Sort with header/skip-lines support
Index(es):
- Date
- Thread