Re: [bug-gawk] Performance observations while using getline: reading fro

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Performance observations while using getline: reading fro

From:	Andrew J. Schorr
Subject:	Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Date:	Sun, 7 Apr 2013 22:18:18 -0400
User-agent:	Mutt/1.5.21 (2010-09-15)

On Sun, Apr 07, 2013 at 09:05:16PM -0300, Hermann Peifer wrote:
> The observation:
> When using getline to read from a pipe, as in [1], the processing of
> 50000 records of sample data is more than 60 times slower compared
> to doing basically the same distance calculation via a coprocess,
> see [2]. I am using gawk from git on a MacBook. I also tested with
> gawk 3.1.5 and 3.1.8 which show the same behaviour.
> 
> As far as I can see: The close(cmd) slows the data processing down.
> Maybe this behaviour is worth mentioning in the manual.

I think this behavior is to be expected.  There is far greater overhead
for launching a separate process to handle each datapoint.  Here is a simple
example using a shell script that shows the same type of performance
degradation:

bash-4.1$ cat /tmp/test.sh
#!/bin/sh

dataset () {
   for i in `seq 1 50000` ; do
      echo $i + $i
   done
}

func1 () {
   dataset | while read data ; do
      echo $data
   done | bc -l | sha256sum
}

time func1

func2 () {
   dataset | while read data ; do
      echo $data | bc -l
   done | sha256sum
}

time func2

bash-4.1$ /tmp/test.sh
f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513  -

real    0m1.580s
user    0m1.724s
sys     0m0.942s
f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513  -

real    1m9.101s
user    0m16.962s
sys     0m48.898s
18.686u 49.840s 1:10.68 96.9%   0+0k 320+0io 1pf+0w

Regards,
Andy

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess, Hermann Peifer, 2013/04/07
- Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess, Andrew J. Schorr <=
  - Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess, Adam Edgar, 2013/04/07
- Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess, Aharon Robbins, 2013/04/08
  - Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess, Hermann Peifer, 2013/04/08

Prev by Date: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Next by Date: Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Previous by thread: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Next by thread: Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Index(es):
- Date
- Thread