[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problem with MPITB on IA64 arch
From: |
Javier Fernández Baldomero |
Subject: |
Re: Problem with MPITB on IA64 arch |
Date: |
Wed, 18 Jan 2006 20:30:58 +0100 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) |
Hi, Gianvito
Gianvito Quarta wrote:
Hi,
I'm trying to set up a parallel octave environment on an Itanium II,
IA64, 128 cpu cluster.
I have some problem during the mpitb re-compilation because
for IA64 arch, the cast from pointer to int gives problem
(during the compilation with gcc 3.2.3 the error:
reinterpret_cast from `_comm*' to `int' loses precision
occurs).
I'm sorry I was not able to reply to your e-mail sent at 15:29 on time,
and this question reached the help list at 17:22. Most people here won't
probably be interested in MPITB compilation problems. If you don't mind,
I'd rather continue this dialog with personal e-mail instead of the
help mailing list.
Thanks for using MPITB. I'm gladly surprised you managed to reach that
far. I have never used any IA64, but perhaps with a little bit of help
you can manage to build a working MPITB version for that platform.
Please search for "size" and "alignment" in your LAM config.log file.
I'm mostly interested in the "int" and "void*" types size and alignment
on your IA64 architecture. Also check the endianness. In my IA32 PC I
have this:
________________
configure:5363: checking size of int
...
configure:5408: result: 4
configure:5436: checking size of long
...
configure:5481: result: 4
configure:5509: checking size of long long
...
configure:5554: result: 8
...
configure:5801: checking size of void *
...
configure:5846: result: 4
...
configure:6111: checking alignment of int
...
configure:6172: result: 4
...
configure:6265: checking alignment of long long
...
configure:6326: result: 4
...
configure:6573: checking alignment of void *
...
configure:6634: result: 4
...
configure:19090: checking whether byte ordering is bigendian
...
configure:19301: result: no
________________
So on IA32 all alignments are 4 and only "long long" has size 8. That's
why I chose to cast LAM communicators (_comm*) to C ints. BTW, when
returned to Octave they become "flints", so MPITB communicators are
Octave scalars (doubles). You are not expected to do any maths with
them, so when later reused they can be casted again from flints back to
C ints and void*.
Your error message makes me suspect that IA64 void* is size 8, or at
least greater than 4. In order to be able to cast LAM pointers to Octave
<integers, flints, scalars, whatever>, I would need to know which is the
compatible integer type under IA64. BTW, you can also look for the same
information on Octave's own config.log file. I have:
________________
ac_cv_sizeof_int=4
ac_cv_sizeof_long=4
ac_cv_sizeof_long_long=8
ac_cv_sizeof_short=2
________________
Tell me the alignment and size of your GCC integer types and void* type
so we can choose which one matches best. You can find that information
in the LAM config.log file. E-mail directly to me, we can later
summarize here in the list if you succeed in having MPITB working under
IA64..
I tried to change the casting of pointers to long
and then I have successifull compiled MPITB.
Perhaps sizeof(long)==8 in IA64 ?!?
I assume you have edited just MPI_COMM_WORLD.cc, on line 33
from
RET_1_ARG(reinterpret_cast<int>( NAME )) // defined ->
expanded
to
RET_1_ARG(reinterpret_cast<long>( NAME )) // defined ->
expanded
If you haven't modified that line, or have modified others, please let
me know. There is no hint in your original e-mail about which
files/lines you have edited.
Unfortunaly some problems occur at run time,
...
[info rank]=MPI_Comm_rank(MPI_COMM_WORLD)% rank=0
MPI process rank 0 (n0, p31218) caught a SIGSEGV in MPI_Comm_rank.
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Comm_rank()
Rank (0, MPI_COMM_WORLD): - main()
I think the SigSegV may come from the communicator argument, since
that's what you have edited (if I correctly guessed above).
So MPI_Init is working ?!? Great!!! It also seems you can also invoke
MPI_COMM_WORLD without any problems. Try it out. I get this:
________________
$ octave
Set SSI rpi to tcp with the command:
putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init
Help on MPI: help mpi
octave:1> MPI_COMM_WORLD
ans = 1099670176
octave:2> MPI_Init
ans = 0
octave:3> a=MPI_COMM_WORLD
a = 1099670176
octave:4> whos a
*** local user variables:
Prot Name Size Bytes Class
==== ==== ==== ===== =====
rwd a 1x1 8 scalar
Total is 1 element using 8 bytes
octave:5> MPI_Finalize
ans = 0
octave:6> quit
address@hidden mpitb]$
________________
So the pointer becomes a flint a=1099670176. Send me a copy of your
output for this command sequence. Of course, if a=0 that's where the
SigSegV comes from. Perhaps the pointer is being correctly casted to
long (if you were lucky with your long decision), but it is not being
correctly casted back to pointer, since it's using this code:
________________
MPI_Comm comm = (MPI_Comm) args(ARGN).int_value();
________________
That's my fault. Right now I cannot remember why I didn't write any
XXX_cast reserved word there. When I learned one shouldn't directly cast
in C++, I started to static_ and reinterpret_cast. Perhaps I wrote that
line before I learned that. I have forgotten again C++, so I guess I
must re-read once more Stroustrup's "The C++ progr. lang" chapter
6.2.7... sigh!
Ok, summarizing: this is your homework :-)
0.- reply directly to me, not to the mailing list
1.- copy-paste LAM config.log lines related to int and void* sizes,
alignments and endianness
2.- copy-paste Octave config.log lines related to int sizes
3.- tell me if you modified the line I mentioned (MPI_COMM_WORLD.cc, on
line 33)
4.- tell me if you modified (and how) any other line
5.- copy-paste a screen dump with the same Octave command sequence I
showed above
6.- (just a joke) locate in the sources the last line of code shown, the
one with the bad C-style typecast
When I have all that information I'll suggest you to change the typecast
to reinterpret<> (gcc will complaint, as it should if I had wrote it
correctly for a start), if so then I'll suggest you to cast from long
instead of from int... and so on until it works (I hope :-)
-javier
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------