swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: crash on Alpha box


From: Bohdan DURNOTA
Subject: Re: crash on Alpha box
Date: Tue, 11 Feb 1997 13:05:11 +1100 (EST)

> I'm going to guess at this one.  And if I'm right, it won't be
> good news.  The fact that the receiver is at 0x40099310 and
> everything else is in the vicinity of 0x1400lmnop, tells me that
> the receiver is mucked up, like you pointed out.  (In fact, 
> 0x40099310 is probably not in the data section of the process.)
> It looks, to me, like the "aZone" variable has been tampered with
> to make it go from 0x140099310 to 0x040099310 somewhere between
> the createBegin: aZone call in Create.m (which took an argument
> of "globalZone") and the [aZone allocIVars: self] in Map.m, the
> zone variable was mucked up.
> 
> I suspect it was in the [self getZone] in ProbeLibrary.m at
> 
>   classMap = [[Map createBegin: [self getZone]] createEnd] ;
> 
> 
> Now, the "getZone" method is defined on all objects inheriting
> from DefObject; but, it's really just a macro defined in 
> defalloc.h, which reads:
> 
> #define getZone( anObject ) \
> ({ unsigned _zbits_ = (anObject)->zbits; \
>   ( _zbits_ & BitSuballocList ? \
>    (id)((Object_s *)( _zbits_ & ~0x7 ))->zbits : \
>    (id)( _zbits_ & ~0x7 ) ); })
> 
> "zbits" is a variable we use internally for keeping track of which
> zone the object is in.  Now, the zone is basically defined as
> the contents of (zbits & ~0x7), which is 
> 
>  xxxx|xxxx|xxxx|xxxx|xxxx|xxxx|xxxx|x000
> 
> on a 32 bit machine.  The three reserved bits are used for
> keeping track of the allocation properties of the object.  
> #define BitMappedAlloc     0x4  // set by suballoc list or explicit macro 
> #define BitSuballocList    0x2  // set whenever object contains suballoc list
> #define BitComponentAlloc  0x1  // set if object is not in the zone population
> 
> Now, the only things that might allow a number like 0x140099310 to 
> get corrupted to 0x040099310 (which you'll notice as corrupted in
> the first digit beyond what's normally available in a 32 bit machine),
> is the sign bit or trash in the higher significant digits.  And I 
> don't think the sign bit would be anywhwere near 0x0000000x00000000.
> 
> So, I suspect that one of the constants (0x2 or ~0x7) might be screwing
> up the location of the zone by picking up trash in the higher part
> of the word.
> 
> *OR* since we're casting the result of (_zbits_ & ~0x7) to a pointer,
> which could mean promoting an unsigned (presuming the bit-wise
> and of an unsigned and an integer constant gets promoted to an 
> unsigned int) to a pointer, we could be picking up trash in the 
> higher part of the word if unsigned ints are 32 bits on the alpha.
> (Pointers are 64 bits.)
> 
> My advice would be to modify defalloc.h to read:
> 
>    #define getZone( anObject ) \
>    ({ unsigned long _zbits_ = (anObject)->zbits; \
>      ( _zbits_ & BitSuballocList ? \
>       (id)((Object_s *)( _zbits_ & ~0x7L ))->zbits : \
>       (id)( _zbits_ & ~0x7L ) ); })
> 
> [...]
> 
>    #define BitSuballocList    0x2L  // set whenever object contains suballoc 
> list
> 
> (i.e. make _zbits_ an unsigned long and adding the "L" to the
> end of the three constants.)
> 
> This assumes that unsigned long is 64 bits.  If it's not, then 
> you may need unsigned long long or some crap like that.
> 
> glen
> p.s. Sorry for such a long winded response.
> 
Glen,

Thanks for the reply. Two of our systems people have had a closer
look at the problem.

We tried the getZone fix, but still the same problem remains.

However, we think we have a good handle on what the problem is. There is an
implicit assumption in the code that an 'unsigned' (ie. unsigned int) is
sufficient to hold the bits of a pointer. This is broken on the alpha, and
it is a severe problem since the code makes this assumption all over the source
tree.

In other words, given the pointer whose integer represention is 
0x0000000143804590 it is truncated to the 32 bit representation 0x43804590
and then we get the segfault. We think there are three main trouble areas:

defalloc.h (as pointed out by you)
Zone.m
DefObject.m

But there may be others which are contributing as well. One will need to 
port at least the above source code to the alpha. Whether the callers
of the functions which are altered need to be adjusted as well needs to
be considered .... but hopefully not!

I am just a bit wary of the time all this porting may be taking away
from actual research --- but then again, the alphas here are the main
staff machines. Thinking what should we do next ..... 

Ta, Bohdan

_____________________________________________________________________

Bohdan Durnota
Department of Computer Science          Parkville 3052
Melbourne University                    Australia

email:  address@hidden
phone:  +61-3-93449116
fax:    +61-3-93481184
____________________________________________________________________



reply via email to

[Prev in Thread] Current Thread [Next in Thread]