bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] Performance observations while using getline: reading from a


From: Hermann Peifer
Subject: [bug-gawk] Performance observations while using getline: reading from a pipe vs. using a coprocess
Date: Sun, 07 Apr 2013 21:05:16 -0300
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130328 Thunderbird/17.0.5

Hi,

I made the below performance observations which I thought would be worth noting down and sending to you. However, I might be simply stating the obvious.

The context:
I am processing some GPX data, where I want to make the Geod utility from the GeographicLib library [0] calculate the distance between coordinates lat1,lon1 and lat2,lon2

The observation:
When using getline to read from a pipe, as in [1], the processing of 50000 records of sample data is more than 60 times slower compared to doing basically the same distance calculation via a coprocess, see [2]. I am using gawk from git on a MacBook. I also tested with gawk 3.1.5 and 3.1.8 which show the same behaviour.

As far as I can see: The close(cmd) slows the data processing down. Maybe this behaviour is worth mentioning in the manual.

Not sure if this is of any relevance, but when using valgrind, each execution of close(cmd) triggers this message:

UNKNOWN task message [id 3403, to mach_task_self(), reply 0x2903]

Regards, Hermann


[0] http://sourceforge.net/projects/geographiclib/

[1]

awk 'BEGIN{ while (++x <= 50000) print rand()*90,rand()*180,rand()*-90,rand()*-180}' > testdata

==> pipe.awk <==
# Geod will be used for distance calculations
BEGIN { str = "Geod -i --input-string " }

{
        cmd = str "'" $0 "'"

        if ((cmd | getline) > 0)
                print $0
        close(cmd)
}

$ time awk -f pipe.awk testdata > out.pipe

real    3m25.636s
user    1m10.757s
sys     1m40.159s

[2]

==> coprocess.awk <==
# Geod will be used for distance calculations
BEGIN { cmd = "Geod -i" }

{
        print $0 |& cmd

        if ((cmd |& getline) > 0)
                print $0
}

END { close(cmd) }

$ time awk -f coprocess.awk testdata > out.coprocess

real    0m3.037s
user    0m2.470s
sys     0m0.459s

$ diff out.pipe out.coprocess
$



reply via email to

[Prev in Thread] Current Thread [Next in Thread]