bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: comm: summary patch


From: Bob Proulx
Subject: Re: comm: summary patch
Date: Tue, 12 Jul 2005 13:39:26 -0600
User-agent: Mutt/1.5.9i

Paul Eggert wrote:
> Andrew Stribblehill writes:
> > It can sometimes be coded with awk, sure:
> >
> > #! /bin/sh
> > # usage: commsum <file(s)>
> >
> > awk '
> > BEGIN {t[0]=0; t[1]=0; t[2]=0}
> >       {match($0,/^\t*/); t[RLENGTH]++}
> >   END {printf "%d\t%d\t%d\n",t[0],t[1],t[2]}
> > ' "$@"

This is off topic but as a shell script that does nothing but calls
awk I would make that a pure awk script.  There is no need for the
shell to be there at all.  '#!/usr/bin/awk' or '#!/usr/bin/env awk' or
whatever and the appropriate script changes.

> > However, this presumes that the input has no leading tabs in it.
> 
> Yes, that's a problem: the output of comm is ambiguous.  But how about
> if we solve this more-general problem instead if your particular one?
> That will let "comm" be useful in other situations.
> 
> One way to solve the problem is by having an option that lets "comm"
> quote its output in some way, so that the output is not ambiguous.
> For example, it might quote leading tabs using "\t" and backslashes
> using "\\".  Or perhaps you can think of a better approach.

I admit to being skeptical that such a quoting is really useful.  It
would mean that something would have to read that quoting.

> > there's no way to avoid that, short of preprocessing:
> 
> How about this?
> 
>   echo $(comm -23 f1 f2) $(comm -13 f1 f2) $(comm -12 f1 f2)
> 
> Admittedly it's not as efficient as one might like, but is there
> really much of an efficiency issue here?

Personally I think a multipass approach would be fine too.  Of course
as soon as someone makes an argument for reading directly from a pipe
then a multipass is problematic.  But is there really a case for
reading from a pipe here?  I think we would be optimizing for a 0.05%
use case.

By the way...  I think you meant to use wc -l here too, right?

  seq 10 19 > f1
  seq 16 25 > f2
  echo $(comm -23 f1 f2 | wc -l) $(comm -13 f1 f2 | wc -l) $(comm -12 f1 f2 | 
wc -l)
  6 6 4

And of course any single value is very simple.

  comm -23 f1 f2 | wc -l
  6

> > Does anyone else agree with me, or shall I just crawl back under my
> > rock? ;)
> 
> Let's see whether anyone else chimes in.
> 
> This email exchange is archived, so perhaps someone will read it in
> 2010 and say "Hey, Andrew was right!"  and fix things....

I am skeptical of how much use an option like this would really get.
And it is pretty straight-forward to code a similar solution
presently, the only disadvantage being that it is multipass.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]