[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnucap-devel] floating point optimization
From: |
al davis |
Subject: |
[Gnucap-devel] floating point optimization |
Date: |
Thu, 7 Dec 2006 04:44:53 -0500 |
User-agent: |
KMail/1.9.5 |
This was asked on the ng-spice developer list, in a thread about
ng-spice and gnucap working together. It is interesting , so I
am reposting my reply here.
On Wednesday 06 December 2006 22:07, John Doe wrote:
> But for the same compiler on the same machine, the results
> are much closer across different optimization levels with 64
> bit rounding. When I run the same regressions on windows,
> freebsd, and linux, the results are off in a much less
> significant decimal place. I want my regression to show
> significant differences due to algorithmic changes, and not
> just the fact that I chose -O2 versus -O3.
I ran some tests on gnucap ...
Two computers
1: intel 1.8ghz, Debian testing, 1 g mem
2: AMD64x2, 2.4 ghz, Debian unstable, 2 g mem
Three compiler option settings
All "-O2"
1. as is
2. "-ffast-math"
3. "-float-store"
Configuration #1 "std" took all defaults except "-O2"
Configuration #2 was the same except for the "-ffast-math"
option, which turns on all available floating point
optimizations, including those considered dangerous.
Configuration #3, same as std except for the "-ffloat-store"
option. This option forces storage of intermediate results,
therefore rounding to 64 bits.
Two circuit files, one with 147000 nodes, other with 590000
nodes. The larger circuit swapped unaccepably on the small
machine so I tested only the smaller circuit there. These were
used to compare speed.
AMD, large, AMD small, intel-small
std 39 sec 9.5 sec 11.2 sec
-ffast-math: 39 sec 9.5 sec 11.2 sec
-ffloat-store: 50 sec 12 sec 13 sec
The "small" circuit takes 30 minutes to run on ng-spice, on the
AMD, with equivalent results. Note that the time is 9.5
SECONDS on gnucap, 30 MINUTES in ng-spice. The algorithms are
different.
Also, complete gnucap test suite, 345 test files.
Test suite showed
AMD-64---
no difference between AMD "std" and "-ffloat-store".
13 test differences between AMD "std" and "-ffast-math"
One difference was that an overflow was not properly trapped
with -fast-math.
Intel ---
intel with -float-store had 4 trivial test differences compared
to AMD std
intel standard had 48 test differences, one is significant,
compared to AMD std. The significantly different test still
gave correct answers with trivial differences, but had
different time steps.
intel with -fast-math had 43 test differences compared to AMD
std, one is significant. It had the same time stepping as the
standard version. One test had an overflow that was not
properly trapped.
My conclusion about speed: The AMD-64 and Intel processor speed
difference corresponds to clock speed.
The AMD gives more consistent results, apparently because the
math really is 64 bit, all the time. "-ffast-math" causes
problems and does not improve speed. "-ffloat-store" results
in a significant speed penalty (28% on the big circuit) with no
change in results. The standard setting is therefore the best
choice.
The Intel has more differences. With the "-ffloat-store"
option, only 4 tests had any difference compared to the AMD,
and these were trivial. I think this confirms that it was
doing essentially the same 64 bit rounding. The standard
setting resulted in 48 tests with trivially different results
in all but one. I am assuming this is because of the excess
precision you mention. The "-ffast-math" option gave 43
differences compared to the reference. I do not consider this
43 compared to the 48 with no options difference to be
significant. There were 25 trivial differences comparing intel
with fast-math to intel with no options. One was the numeric
overflow case.
As to which option is best, I am not sure. The "--fast-math"
option causes problems and does not improve speed, so it should
not be used. Whether the "-ffloat-store" option should be used
could be debated. It doesn't give improved accuracy, but it
does give a more predictable error, essentially matching
another 64 bit system. The option does give a speed penalty,
16% in my test.
The particular test that resulted in different time stepping
gives believable but incorrect results in ng-spice, with no
warnings. It is a negative resistance oscillator using the
switch element as the negative resistance device. On
resistance is 1 ohm. Off resistance is 1e9. Gnucap handles
the fast switching correctly, automatically. Spice hops past,
giving a glitch that is really trapezoidal ringing, making it
appear to work.
One important point here is that differences in algorithms have
much more effect than differences in compiler optimization.
> When I do AMD-64 in 64bit mode, it is going to prefer the
> 64-bit SSE instructions over the 80-bit 387 instructions.
> Now I am going to get closer results to a machine with a
> sparc chip then when I compiled the program on the same
> machine in 32-bit mode.
>
> If my result is rounded to 64-bit in the floating point
> register, less damage is done when that number is written
> back to memory and read back in. I am happier with that than
> having an 80-bit number written from register to memory, read
> back in and zero extended.
I think I just confirmed what you said. The results were as I
expected.
> An excellent paper on this issue is:
> http://www.wrcad.com/linux_numerics.txt
I have read this paper, long time ago.
> When an EDA customer gets a new update to their tools,
> they're going to validate and they want an explanation why
> the results no longer match their golden files. EDA
> companies are keenly aware of this, and often provide
> extended precision, but only as a non-default option.
==================
comments?????
Should the intel - Linux version by default compile
with "-ffloat-store"?
How does NetBSD handle this?
- [Gnucap-devel] floating point optimization,
al davis <=