gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] paul_3_13.6


From: Paul Pogonyshev
Subject: Re: [gnugo-devel] paul_3_13.6
Date: Tue, 10 Dec 2002 16:25:44 +0200

Dave wrote:
> Paul Pogonyshev <address@hidden> writes:
> > [i'm trying to explain (and understand ;) why swapping the indices of
> >  spiral[][] array speeds the things up]
>
> Ah, I'm with you...

does it mean you (partially) agree with my messy explanation? as far as
i understood, some of your arguments were against it.

> Incidentally, an interesting experiment I may have mentioned before
> is to get the initial alignment of memory structures right.
> Eg making spiral[] 32-byte aligned may help cache behaviour
> some more (whichever order is used).

it might not be easy to do in a compiler-independent way. frankly
speaking, i don't even know how to do it with gcc or vc++ compiler.

> > when the indices had been swapped, all the frequently used elements
> > combined into a _single_ continuous region of memory. besides, i
>
> Hmm - continous region of virtual main memory, but caches are
> probably more important ?

can't say it for sure, but i think it influents the secondary cache.
i think it reads memory in lines longer than 32 bytes (but i admit
i don't really know it).

> > think gcc is smart enough, so it doesn't actually evaluate
> > `row*8*sizeof(int)', but adds 8*sizeof(int) to the pointer it uses.
>
> Hmm - x86 is rather short of registers. I did try playing
> with the source to get good register usage. Eg x86 can scale
> by small powers of 2 when reading memory. spiral[l][row]
> is one instruction if spiral[l] and row are in registers.
> But I don't know if the scaling goes up as far as 32 ?

i'm familiar with x86 assembler. here is how you can address memory
on x86 in one instruction:

  [reg1 + [1 | 2 | 4 | 8] * reg2 + offset]

where reg1 and reg2 are from the eight general-purpose registers
(and might be the same). any of the addenda may be ommited. so,
the largest power of 2 you can scale by is 8.

but i didn't mean addressing modes of x86. i believe that modern
compilers (including gcc) can get subexpressions out of loops.
they can often see the real usage of loop counters and other
local variables. i guess gcc does not really have that `row'
variable, but rather a variable that stores `spiral + row*8 + l'
in it. and instead of executing `row++;', it adds 8*sizeof(int)
to that variable. i can't say whether it really works that way,
but i've seen similar behaviour with vc++ compiler (in visual
studio, you can easily look through assembler code generated).

i did observe a speedup after having swapped the indices. that's
why i added it to the patch. if you have a slowdown with your
processor + platform (i use amd athlon-900 + gcc 2.96) you may
repatch the things way round.

Paul




reply via email to

[Prev in Thread] Current Thread [Next in Thread]