[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] Performance observations while using getline: reading from a
From: |
Hermann Peifer |
Subject: |
[bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess |
Date: |
Sun, 07 Apr 2013 21:05:16 -0300 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 |
Hi,
I made the below performance observations which I thought would be worth
noting down and sending to you. However, I might be simply stating the
obvious.
The context:
I am processing some GPX data, where I want to make the Geod utility
from the GeographicLib library [0] calculate the distance between
coordinates lat1,lon1 and lat2,lon2
The observation:
When using getline to read from a pipe, as in [1], the processing of
50000 records of sample data is more than 60 times slower compared to
doing basically the same distance calculation via a coprocess, see [2].
I am using gawk from git on a MacBook. I also tested with gawk 3.1.5 and
3.1.8 which show the same behaviour.
As far as I can see: The close(cmd) slows the data processing down.
Maybe this behaviour is worth mentioning in the manual.
Not sure if this is of any relevance, but when using valgrind, each
execution of close(cmd) triggers this message:
UNKNOWN task message [id 3403, to mach_task_self(), reply 0x2903]
Regards, Hermann
[0] http://sourceforge.net/projects/geographiclib/
[1]
awk 'BEGIN{ while (++x <= 50000) print
rand()*90,rand()*180,rand()*-90,rand()*-180}' > testdata
==> pipe.awk <==
# Geod will be used for distance calculations
BEGIN { str = "Geod -i --input-string " }
{
cmd = str "'" $0 "'"
if ((cmd | getline) > 0)
print $0
close(cmd)
}
$ time awk -f pipe.awk testdata > out.pipe
real 3m25.636s
user 1m10.757s
sys 1m40.159s
[2]
==> coprocess.awk <==
# Geod will be used for distance calculations
BEGIN { cmd = "Geod -i" }
{
print $0 |& cmd
if ((cmd |& getline) > 0)
print $0
}
END { close(cmd) }
$ time awk -f coprocess.awk testdata > out.coprocess
real 0m3.037s
user 0m2.470s
sys 0m0.459s
$ diff out.pipe out.coprocess
$
- [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess,
Hermann Peifer <=