[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] speed regression when doing math with gawk 5
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] speed regression when doing math with gawk 5 |
Date: |
Sat, 20 Apr 2019 14:51:08 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sat, Apr 20, 2019 at 12:55:53AM +0000, Tom Gray wrote:
> I noticed a big speed regression in some of my programs when I upgraded from
> 4.2.1 to 5.0
>
> I traced the issue to commit c1f670b26, a fix for "Numeric assignment to $0"
> See discussion here:
> http://lists.gnu.org/archive/html/bug-gawk/2018-07/msg00042.html
>
> The speed hit shows up when you do a lot of numeric computation.
> The "fix" in commit c1f670b26 adds a call to force_string() (line 46,
> interpret.h) which
> punishes every assignment with a number to string conversion.
>
> In the examples here,
> gawk5 is built from the current master branch
> gawk5n has force_string() removed
>
> $ time ~/src/gawk/gawk5 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real 0m3.980s
> user 0m3.962s
> sys 0m0.015s
>
>
> $ time ~/src/gawk/gawk5n 'BEGIN{ for(i=0; i<1e6; i++) y=sin(i) }'
>
> real 0m0.130s
> user 0m0.124s
> sys 0m0.000s
>
>
> The original problem was the numeric assignment to $0 followed by the output
> of $0.
> Numeric assignment does not automagically trigger generation of the string
> representation
> required for output. A better way to do that is lazily like all other
> conversions.
> The string is not needed until $0 gets sent to the output pipe.
>
> Adding force_string() in do_print_rec() gets the job done.
Yikes. This is bad. But I don't think this fix quite works. With the patch
applied, I still have this problem in master:
bash-4.2$ yes a b c | sed 5q | ./gawk '{$0 = ++i; print $1}'
As compared to an older version:
bash-4.2$ yes a b c | sed 5q | /bin/gawk '{$0 = ++i; print $1}'
1
2
3
4
5
So this doesn't quite get the job done...
Regards,
Andy