gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] paul_3_13.6


From: Dave Denholm
Subject: Re: [gnugo-devel] paul_3_13.6
Date: 10 Dec 2002 20:44:38 +0000

Paul Pogonyshev <address@hidden> writes:

> Dave wrote:
> > Paul Pogonyshev <address@hidden> writes:
> > > [i'm trying to explain (and understand ;) why swapping the indices of
> > >  spiral[][] array speeds the things up]
> >
> > Ah, I'm with you...
> 
> does it mean you (partially) agree with my messy explanation? as far as
> i understood, some of your arguments were against it.
> 

Sorry. I had been thinking only of the inner loop, missing the
bit we only use the first few entries of each spiral list,
which means that interesting bits of table are far apart with the
old index order.

I can see why there could be benefits from the swap, as well
as possible downsides.



> > Incidentally, an interesting experiment I may have mentioned before
> > is to get the initial alignment of memory structures right.
> > Eg making spiral[] 32-byte aligned may help cache behaviour
> > some more (whichever order is used).
> 
> it might not be easy to do in a compiler-independent way. frankly
> speaking, i don't even know how to do it with gcc or vc++ compiler.
> 

gcc has something like __attribute__

from info page :


`aligned (ALIGNMENT)'
     This attribute specifies a minimum alignment for the variable or
     structure field, measured in bytes.  For example, the declaration:

          int x __attribute__ ((aligned (16))) = 0;




> > > when the indices had been swapped, all the frequently used elements
> > > combined into a _single_ continuous region of memory. besides, i
> >
> > Hmm - continous region of virtual main memory, but caches are
> > probably more important ?
> 
> can't say it for sure, but i think it influents the secondary cache.
> i think it reads memory in lines longer than 32 bytes (but i admit
> i don't really know it).
> 
> > > think gcc is smart enough, so it doesn't actually evaluate
> > > `row*8*sizeof(int)', but adds 8*sizeof(int) to the pointer it uses.
> >

sure enough...


> > Hmm - x86 is rather short of registers. I did try playing
> > with the source to get good register usage. Eg x86 can scale
> > by small powers of 2 when reading memory. spiral[l][row]
> > is one instruction if spiral[l] and row are in registers.
> > But I don't know if the scaling goes up as far as 32 ?
> 
> i'm familiar with x86 assembler. here is how you can address memory
> on x86 in one instruction:
> 
>   [reg1 + [1 | 2 | 4 | 8] * reg2 + offset]
> 
> where reg1 and reg2 are from the eight general-purpose registers
> (and might be the same). any of the addenda may be ommited. so,
> the largest power of 2 you can scale by is 8.
> 
> but i didn't mean addressing modes of x86. i believe that modern
> compilers (including gcc) can get subexpressions out of loops.
> they can often see the real usage of loop counters and other
> local variables. i guess gcc does not really have that `row'
> variable, but rather a variable that stores `spiral + row*8 + l'
> in it. and instead of executing `row++;', it adds 8*sizeof(int)
> to that variable. i can't say whether it really works that way,
> but i've seen similar behaviour with vc++ compiler (in visual
> studio, you can easily look through assembler code generated).
> 


Yes, it can do these things, but I think it may do things differently
since x86 registers aren't quite all general purpose.

IIRC, it was using EDI for spiral[l]  and it used a simple register for row.
IIRC, %EDI is less general-purpose than other registers ..?



dd
-- 
address@hidden          http://www.insignia.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]