bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Performance observations while using getline: reading fro


From: Adam Edgar
Subject: Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Date: Sun, 7 Apr 2013 23:10:09 -0400


On Apr 7, 2013, at 10:18 PM, "Andrew J. Schorr" <address@hidden> wrote:

> On Sun, Apr 07, 2013 at 09:05:16PM -0300, Hermann Peifer wrote:
>> The observation:
>> When using getline to read from a pipe, as in [1], the processing of
>> 50000 records of sample data is more than 60 times slower compared
>> to doing basically the same distance calculation via a coprocess,
>> see [2]. I am using gawk from git on a MacBook. I also tested with
>> gawk 3.1.5 and 3.1.8 which show the same behaviour.
>> 
>> As far as I can see: The close(cmd) slows the data processing down.
>> Maybe this behaviour is worth mentioning in the manual.
> 
> I think this behavior is to be expected.  There is far greater overhead
> for launching a separate process to handle each datapoint.  Here is a simple
> example using a shell script that shows the same type of performance
> degradation:
> 
> bash-4.1$ cat /tmp/test.sh
> #!/bin/sh
> 
> dataset () {
>   for i in `seq 1 50000` ; do
>      echo $i + $i
>   done
> }
> 
> func1 () {
>   dataset | while read data ; do
>      echo $data
>   done | bc -l | sha256sum
> }
> 
> time func1
> 
> func2 () {
>   dataset | while read data ; do
>      echo $data | bc -l
>   done | sha256sum
> }
> 
> time func2
> 
> bash-4.1$ /tmp/test.sh
> f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513  -
> 
> real    0m1.580s
> user    0m1.724s
> sys     0m0.942s
> f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513  -
> 
> real    1m9.101s
> user    0m16.962s
> sys     0m48.898s
> 18.686u 49.840s 1:10.68 96.9%   0+0k 320+0io 1pf+0w
> 
> Regards,
> Andy
> 

Fork and exec are expensive so I try to do as much within the shell as 
possible. Do keep in mind some commands are builtins and are very cheap.  Just 
run builtin to see what you can use without spawning a new process. 

ASE


reply via email to

[Prev in Thread] Current Thread [Next in Thread]