[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnugo-devel] Floating point arithmetics
From: |
Portela Fernando |
Subject: |
[gnugo-devel] Floating point arithmetics |
Date: |
Thu, 16 Sep 2004 12:44:49 +0200 |
Hi,
I mentioned a couple weeks ago a platform-dependency problem. Since
I finally got around getting a Linux box running, I tried to investigate
by myself. The problem statement in short :
Current CVS, regressing century2002:150.
Linux:
century2002 3.63 561561 681 4589
Win32 (VC++ build):
century2002:150 FAIL A18 [B18]
century2002 3.64 565550 681 4617
A rapid analysis showed differences in move valuations, due to territory
erasure by the break-in code.
Then I noticed strong differences in nodes counters. For the first
batch :
Linux:
reading 3.91 86443 0 0
owl 85.11 14817938 18479 89309
owl_rot 1.36 246077 96 1983
ld_owl 33.67 4605715 23230 5810
optics 1.56 207448 0 937
filllib 9.82 1001313 1213 7738
atari_atari 15.51 2820607 1968 13834
connection 16.41 3301900 0 38031
break_in 1.87 409148 451 3740
blunder 20.71 3356064 3368 20102
unconditional 0.80 38816 0 0
trevora 75.64 16084241 59073 128449
nngs1 235.76 50957389 68605 391863
strategy 197.62 38937784 75831 311809
Win32 (VC++ build):
reading 3.92 86443 0 0
owl 77.39 14982712 18479 92516
owl_rot 1.36 244683 96 1942
ld_owl 31.75 4606698 23230 5804
optics 1.63 207400 0 934
filllib 9.23 1001164 1213 7739
atari_atari 15.23 2819317 1968 13812
connection 16.23 3301860 0 37888
break_in 1.84 399540 451 3584
blunder 19.94 3321666 3368 20015
unconditional 1.16 38816 0 0
trevora 73.89 16124199 59073 129095
nngs1 228.83 51516879 68605 398756
strategy 189.34 39136720 75831 314446
I rapidly concluded that there must be an underlying problem with the
connection code. And I strongly suspected floating point arithmetics.
After some debugging, I could spot a location where things could (and
actually do) go wrong, in the ENQUEUE() macro. The first comparison
involves values which haven't been normalized, with the consequence that
the delta, vulnerable1 and vulnerable2 fields might (or might not) get
overwritten, leading to possible variations in the further processing
of the queue.
As a possible solution, I rejected the idea of spreading lots of
gg_normalize_float() calls throughout the code. It seemed much simple
and efficient to transform the floating point arithmetic into a fixed
point one (well, sort of). So I wrote a simple patch, just replacing
float declarations by int ones, and scaling all the constants by 10000
(smallest constant found in the ENQUEUE_STONE() macro).
Testing the patch resulted in :
* Positive
- Nodes counts are almost identical on both Linux and Win32 (there are
still a couple deltas in trevora, nngs and nngs3, which means there
are problems elsewhere)
- Regression breakage is identical (the century2002:150 problem on Win32
has disappeared)
- Regression breakage is apparently positive compared to CVS, with 1
FAIL and 3 PASSes (not analyzed yet, but at first glance, the PASSes
all look good)
break_in:100 FAIL 1 D9 [0]
nngs:1280 PASS D13 [D13]
connect:70 PASS 0 [0]
global:1 PASS B3 [B3]
* Negative
- Performance impact is heavy : +2% or so in reading nodes,
+5.7% connection nodes,
timing around +5% (imprecise)
My guess is that with CVS and the above mentioned problem in
ENQUEUE(), there is quite a number of cases where vulnerabilities
are overwritten, globally resulting in less checks and readings.
- A possible issue for us developers : tuning the constants will be
less natural than with floating points.
Questions :
1. Are we interested by this patch, even at the mentioned performance
cost ?
2. If I submit a patch, should I make the change reversible ? In other
words, provide typedefs and #define's so as to be able to switch ?
To be honest, I don't see any good reason we'd possibly want to go
back to floating points, but maybe someone on the list has better
ideas on the topic.
-- nando
- [gnugo-devel] Floating point arithmetics,
Portela Fernando <=