[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Performance observations while using getline: reading fro
From: |
Adam Edgar |
Subject: |
Re: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess |
Date: |
Sun, 7 Apr 2013 23:10:09 -0400 |
On Apr 7, 2013, at 10:18 PM, "Andrew J. Schorr" <address@hidden> wrote:
> On Sun, Apr 07, 2013 at 09:05:16PM -0300, Hermann Peifer wrote:
>> The observation:
>> When using getline to read from a pipe, as in [1], the processing of
>> 50000 records of sample data is more than 60 times slower compared
>> to doing basically the same distance calculation via a coprocess,
>> see [2]. I am using gawk from git on a MacBook. I also tested with
>> gawk 3.1.5 and 3.1.8 which show the same behaviour.
>>
>> As far as I can see: The close(cmd) slows the data processing down.
>> Maybe this behaviour is worth mentioning in the manual.
>
> I think this behavior is to be expected. There is far greater overhead
> for launching a separate process to handle each datapoint. Here is a simple
> example using a shell script that shows the same type of performance
> degradation:
>
> bash-4.1$ cat /tmp/test.sh
> #!/bin/sh
>
> dataset () {
> for i in `seq 1 50000` ; do
> echo $i + $i
> done
> }
>
> func1 () {
> dataset | while read data ; do
> echo $data
> done | bc -l | sha256sum
> }
>
> time func1
>
> func2 () {
> dataset | while read data ; do
> echo $data | bc -l
> done | sha256sum
> }
>
> time func2
>
> bash-4.1$ /tmp/test.sh
> f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513 -
>
> real 0m1.580s
> user 0m1.724s
> sys 0m0.942s
> f79f35ec338f9e859f66ed9d7f19b21df250ba8150af96067f51c5e251b28513 -
>
> real 1m9.101s
> user 0m16.962s
> sys 0m48.898s
> 18.686u 49.840s 1:10.68 96.9% 0+0k 320+0io 1pf+0w
>
> Regards,
> Andy
>
Fork and exec are expensive so I try to do as much within the shell as
possible. Do keep in mind some commands are builtins and are very cheap. Just
run builtin to see what you can use without spawning a new process.
ASE