[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Performance observations while using getline: reading fro
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess |
Date: |
Sun, 7 Apr 2013 22:18:18 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sun, Apr 07, 2013 at 09:05:16PM -0300, Hermann Peifer wrote:
> The observation:
> When using getline to read from a pipe, as in [1], the processing of
> 50000 records of sample data is more than 60 times slower compared
> to doing basically the same distance calculation via a coprocess,
> see [2]. I am using gawk from git on a MacBook. I also tested with
> gawk 3.1.5 and 3.1.8 which show the same behaviour.
>
> As far as I can see: The close(cmd) slows the data processing down.
> Maybe this behaviour is worth mentioning in the manual.
I think this behavior is to be expected. There is far greater overhead
for launching a separate process to handle each datapoint. Here is a simple
example using a shell script that shows the same type of performance
degradation:
bash-4.1$ cat /tmp/test.sh
#!/bin/sh
dataset () {
for i in `seq 1 50000` ; do
echo $i + $i
done
}
func1 () {
dataset | while read data ; do
echo $data
done | bc -l | sha256sum
}
time func1
func2 () {
dataset | while read data ; do
echo $data | bc -l
done | sha256sum
}
time func2
bash-4.1$ /tmp/test.sh
f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513 -
real 0m1.580s
user 0m1.724s
sys 0m0.942s
f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513 -
real 1m9.101s
user 0m16.962s
sys 0m48.898s
18.686u 49.840s 1:10.68 96.9% 0+0k 320+0io 1pf+0w
Regards,
Andy