gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] paul_3_13.6


From: Paul Pogonyshev
Subject: Re: [gnugo-devel] paul_3_13.6
Date: Tue, 10 Dec 2002 01:53:40 +0200

Dave wrote:
> I'm confused...
>
> >      /* go to next state */
> > -    delta = pdfa->states[state].next[dfa_p[dfa_pos + spiral[l][row]]];
> > +    delta = pdfa->states[state].next[dfa_p[dfa_pos + spiral[row][l]]];
> >      state += delta;
> >      row++;
> >    } while (delta != 0); /* while not on error state */
>
> I see the inner loop being over row, not 'l'. So swapping the
> indices hurts the cache, rather than helping it ?
>
> If there are enough registers, compiler should be able to put
> spiral[row] into a register.
>
> x86 can scale memory accesses by small powers of 2, but 8*sizeof(int)
> is probably too big.

i did check that it really improved performance. here is an explanation
(which i think might be true ;).

when we have spiral[8][MAX_ORDER] array, it looks this way in memory:

0123456.....
0123456.....
.............
0123456..... (eight times)

where 0, 1 and so on mean the second index. the dfa matcher uses the
first few dozens of elements of each spiral[ll] array most of the
time (99.9% i think). each this array has DFA_MAX_BOARD
* DFA_MAX_BOARD * 4 > 1600 elements. so, we have eight _different_
regions of frequently accessed memory. this must significantly
decrease caching efficiency.

when the indices had been swapped, all the frequently used elements
combined into a _single_ continuous region of memory. besides, i
think gcc is smart enough, so it doesn't actually evaluate
`row*8*sizeof(int)', but adds 8*sizeof(int) to the pointer it uses.

unfortunately, this might be processor/compiler specific. but
anything involving speed is going to depend on those factors.

the reasoning above should also apply to the transformation[][]
array. there's another factor i mentioned, about multiplying on
8 (using shifts) instead of MAX_OFFSETS.

if we could rearrange the transformation array, it would improve
caching drastically (at least i think so). the thing which stops
me so far is some lines of code in patterns/helpers.c:

  apos = OFFSET_BY(0, -1);  /* O to west */
  bpos = OFFSET_BY(-1, 0);  /* O to north */
  cpos = OFFSET_BY(-1, -1); /* X to northwest */

and such. if we rearrange the array, i can see no easy way to
evaluate OFFSET(x, y) in a macro. any suggestions?

Paul



reply via email to

[Prev in Thread] Current Thread [Next in Thread]