|
From: | Olivier Baur |
Subject: | Re: [Bug-gnubg] Bug in sigmoid? |
Date: | Thu, 17 Apr 2003 19:17:57 +0200 |
Le jeudi, 17 avr 2003, à 18:28 Europe/Paris, Nis a écrit :
--On Thursday, April 17, 2003 15:33 +0200 Olivier Baur <address@hidden> wrote:I think I've found a bug in sigmoid (neuralnet.c), but I'm not sure aboutits impact on the evaluation function...I don't understand the role of this function fully, but I would guess that the actual function doesn't matter much, as long as it is relatively smooth and increasing.
True, but maybe a little sigmoid error at the ouput of each one of the 128 hidden neurons, combined into the 5 output neurons and sigmoid'ed again... If I remember well, (sigmoid(x)-S(x))/S(x) was less than +/-.0047% in the [-10.0 .. +10.0] range (not counting the discrepancy in the [+9.9 .. +10.0] interval and the two plateaus for x < -10.0 and x > +10.0)
Let's call S the real sigmoid function: S(x) = 1 / ( 1 + e^x) It seems that sigmoid(x) will return a good approximation of S(x) for -10.0 < x < 10.0 (less than +/-.01% error), but then it returns S(9.9)for x >= 10.0 (instead of S(10.0)) and S(-9.9) for x <= -10.0 (instead ofS(-10.0)). sigmoid is not even monotonic!The big question for me is: Does it matter if we change the function - for instance to get it closer to S(x).More specifically: Is the current sigmoid function "imprinted" on the synapses of the current nets?
Yes, it seems so.When I started vectorising the sigmoid function with an algorithm that gives results closer to S(x), I couldn't pass the gnubgtest. Then, instead of building my lookup table with S(x), I built it using the results of the current sigmoid(x), and provided the lookup table had enough entries (about 1000 in the [-10..+10] range, with .01 steps), I could successfully pass the test!
By the way, I found a simple way of optimising the current sigmoidfunction: instead of having a lookup table holding pre-computed values ofexp(X) and then returning sigmoid(x) = sigmoid(X+dx) = 1/(1+exp(X)(1+dx)), why not have a lookup table holding precomputed values of S(X) and return sigmoid(x) = sigmoid(X+dx) = S(X)+dx.(S(X+1)-S(X))? The time consumming operations here are thelookups and the reciprocal (1/x) operations. With the second method, youtrade one reciprocal and one lookup for two lookups; and since in thelatter case the second lookup will probably already be in the processor cache (since S(X+1) follows S(X) in memory), you end up doing mostly onelookup and no more reciprocal. On my machine, it gave me a +60% speed increase in sigmoid.Beautiful. Did you compare the precision of the results?
Yes, I get better results with a lookup table of the same size (100 elements, from 0.0 to 10.0 with .1 steps), but then gnubg won't pass gnubgtest... Of course, you can reach whatever precision you like by using a more detailed table (ie, with more entries), but then you might lose the "cached lookup advantage" if the lookup table can't be kept in the processor cache during repeated calls to the sigmoid function.
Olivier
[Prev in Thread] | Current Thread | [Next in Thread] |