bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Performance observations while using getline: reading fro


From: Andrew J. Schorr
Subject: Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Date: Sun, 7 Apr 2013 22:18:18 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Sun, Apr 07, 2013 at 09:05:16PM -0300, Hermann Peifer wrote:
> The observation:
> When using getline to read from a pipe, as in [1], the processing of
> 50000 records of sample data is more than 60 times slower compared
> to doing basically the same distance calculation via a coprocess,
> see [2]. I am using gawk from git on a MacBook. I also tested with
> gawk 3.1.5 and 3.1.8 which show the same behaviour.
> 
> As far as I can see: The close(cmd) slows the data processing down.
> Maybe this behaviour is worth mentioning in the manual.

I think this behavior is to be expected.  There is far greater overhead
for launching a separate process to handle each datapoint.  Here is a simple
example using a shell script that shows the same type of performance
degradation:

bash-4.1$ cat /tmp/test.sh
#!/bin/sh

dataset () {
   for i in `seq 1 50000` ; do
      echo $i + $i
   done
}

func1 () {
   dataset | while read data ; do
      echo $data
   done | bc -l | sha256sum
}

time func1

func2 () {
   dataset | while read data ; do
      echo $data | bc -l
   done | sha256sum
}

time func2

bash-4.1$ /tmp/test.sh
f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513  -

real    0m1.580s
user    0m1.724s
sys     0m0.942s
f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513  -

real    1m9.101s
user    0m16.962s
sys     0m48.898s
18.686u 49.840s 1:10.68 96.9%   0+0k 320+0io 1pf+0w

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]