octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JIT test crash


From: Daniel J Sebald
Subject: Re: JIT test crash
Date: Fri, 03 Aug 2012 13:30:29 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 08/03/2012 01:13 PM, Max Brister wrote:
On Fri, Aug 3, 2012 at 1:05 PM, Michael Goffioul
<address@hidden>  wrote:
On Thu, Aug 2, 2012 at 8:54 PM, Michael Goffioul
<address@hidden>  wrote:

On Thu, Aug 2, 2012 at 6:42 PM, Max Brister<address@hidden>  wrote:

On Thu, Aug 2, 2012 at 8:36 AM, Michael Goffioul
<address@hidden>  wrote:
On Thu, Aug 2, 2012 at 1:57 PM, Max Brister<address@hidden>  wrote:
[snip]


The output with OCTAVE_JIT_DEBUG looks correct to me.

I have attached the patch for llvm 3.1.


I applied it, but it didn't change anything (the generated assembly
looks
exactly the same). If I'm reading this [1] correctly (XF86SubTarget
constructor), the stack alignment was already set to 4 anyway. And in
[2],
in X86_32TargetMachine constructor, the native stack alignment is also
specified on 4 bytes (trailing "-S32" at line 45).

Michael.

[1]

https://github.com/earl/llvm-mirror/blob/master/lib/Target/X86/X86Subtarget.cpp
[2]

https://github.com/earl/llvm-mirror/blob/master/lib/Target/X86/X86TargetMachine.cpp

Actually, that makes sense. In order to use the sse instruction, we
really want the stack to 16 byte aligned I think. Can you try changing
the stack alignment to 16 bytes instead of 4?


No luck. I've modified your patch to read:

opts.StackAlignmentOverride = 16

For your information, I've attached the generated assembly for the 4-bytes
and 16-bytes case. The code still crashes, but at an earlier location. Now
it crashes at the MOVAPD call (address 02D300BC). If you compare with the
4-bytes case, the latter uses MOVUPD instead, so it doesn't crash. Also if
you compare the 2 files, you see that in the 16-bytes case, all stack
offsets are multiple of 16 bytes, but I don't see any code to realign the
stack on a 16-bytes boundary.

The bottom line is: within the generated code, the stack is kept aligned
on 16-bytes, but as there's no forced realignment, it entirely depends on
the stack alignment on function entry.


Any update, ideas or suggestions?

Michael.


Michael,

This defiantly looks like a bug in LLVM to me. I'll bring it up with
the LLVM people. In the mean time I'm thinking of not using the SSE
instructions for complex operations. I'm not sure how much benefit
there is considering complex numbers only have two values.

SSE works in groups of four, if I remember correctly. An alternative to across complex numbers, another way to use SSE might be parallel operations such as vector/matrix operations. For example, say 9x1 complex vectors are multiplied. It would be

4 real x real mult
4 imag x imag mult
4 real add
4 real sub
4 real x imag mult
4 imag x real mult
4 imag add
4 imag add
-
4 real x real mult
4 imag x imag mult
4 real add
4 real sub
4 real x imag mult
4 imag x real mult
4 imag add
4 imag add
-
1 real x real mult (3 bogus)
1 imag x imag mult (3 bogus)
1 real add (3 bogus)
1 real sub (3 bogus)
1 real x imag mult (3 bogus)
1 imag x real mult (3 bogus)
1 imag add (3 bogus)
1 imag add (3 bogus)

That would speed by a factor of four when it is really needed, e.g., large matrix multiplies.

Dan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]