inline assembler functions returning double as 80-bit extended-precision

help-gplusplus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

inline assembler functions returning double as 80-bit extended-precision

From:	DavidIMcIntosh
Subject:	inline assembler functions returning double as 80-bit extended-precision double
Date:	Tue, 10 May 2011 12:37:51 -0700 (PDT)
User-agent:	G2/1.0

I have also posted this in gnu.gcc.help but no replies there.

We have a very large library that is compiled on different platforms
and with different compilers.  A few of the routines need to be
carefully coded in assembler on the x86 platform - they make careful
use of 80-bit extended precision floating point arithmetic.  Because
this is part of a larger package that is compiled with different
compilers (MS and GNU are not the only compilers used), we are using
inline assembly for the few functions that require the careful coding,
rather than using separate assembly source files.  We have coded this
for both the Microsoft MASM inline assembler and for the GNU inline
assembler.  I have one issue that remains to be solved with the gnu
inline assembly.

The functions that are hand coded with inline assembly are entirely
inline-assember code - no C++ code in the functions what-so-ever.  For
the GNU inline assembler, they are coded as a single
__asm__("" : : : ); statement.  One of the critical features of the
functions, and the reason they must be hand coded in assembler, is
that they are cdecl-calling convention, so that the double values they
return are returned in st(0) of the FPU and are thus returned as 80-
bit values.  As an (extremely simple) example, consider this code:

typedef const unsigned char Double80[10];
Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29,
0x42, 0xCC, 0xFD, 0x3F}; // 0x3FFDCC42299EA1B28468,
0.39894228040143267793994605993438

#if defined(__GNUC__)

#if 1
#define ASM_CALLING_CONVENTION cdecl
#define __ASM_NAKED_RETURN \
 "\n\t" "leave" \
 "\n\t" "ret"

#elif 0
#define ASM_CALLING_CONVENTION cdecl, always_inline
#define __ASM_NAKED_RETURN

#else
#define ASM_CALLING_CONVENTION naked
#define __ASM_NAKED_RETURN

#endif

inline double oneOverRootTwoPi()
__attribute__(( ASM_CALLING_CONVENTION ));

inline double oneOverRootTwoPi()
{
     __asm__( "fldt %[s_oneOverRootTwoPi]" __ASM_NAKED_RETURN : :
[s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi) : "st(7)" );

}

#endif //#if defined(__GNUC__)

The intent of the function oneOverRootTwoPi() is to allow the use of
the constant as an 80-bit constant within expressions, and one would
like it to be inlined.  However, I do not know how to tell the
compiler that my __asm__ code is already setting up the return value
in st(0).  If I do not set __ASM_NAKED_RETURN as above (and I really
shouldn't be doing that), the compiler insists on loading a default
return value into st(0) (NAN I think) before returning, even though I
have nowhere instructed it to do so.  To avoid that, I put in my own
return before the compiler-generated load-and-return code can
execute.  That approach of course precludes inlining the function.

So how do I instruct the compiler NOT to load its own return value
into st(0) before returning?  How do I tell it that my __asm__ code
has already loaded the return value into st(0)?  I am guessing there
is some output constraing I can add, but I just could not find
anything.

The above example is very very simple.  We do have few substantial
functions which are hand coded in assembler that suffer from the same
issue.

A bit more detail on the above, with specifics:

This source:

typedef const unsigned char Double80[10];
Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29,
0x42, 0xCC, 0xFD, 0x3F}; // 0x3FFDCC42299EA1B28468,
0.39894228040143267793994605993438

double oneOverRootTwoPi() __attribute__(( cdecl ));
double oneOverRootTwoPi() {
        register double dReturnValue;
        __asm__(
                "fldt %[s_oneOverRootTwoPi]"
                : "=t" (dReturnValue)
                : [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi)
                :
                );
        return dReturnValue;

}

generates the following atrocious code, which COMPLETELY kills the
entire purpose of the function:

Dump of assembler code for function _Z14oneOverRootTwov:
   0x00401412 <+0>:     push   %ebp
   0x00401413 <+1>:     mov    %esp,%ebp
   0x00401415 <+3>:     push   %esi
   0x00401416 <+4>:     push   %ebx
=> 0x00401417 <+5>:     sub    $0x10,%esp
   0x0040141a <+8>:     fldt   0x40407a
   0x00401420 <+14>:    fstpl  -0x10(%ebp)
   0x00401423 <+17>:    mov    -0x10(%ebp),%ebx
   0x00401426 <+20>:    mov    -0xc(%ebp),%esi
   0x00401429 <+23>:    mov    %ebx,%eax
   0x0040142b <+25>:    mov    %esi,%edx
   0x0040142d <+27>:    mov    %eax,-0x18(%ebp)
   0x00401430 <+30>:    mov    %edx,-0x14(%ebp)
   0x00401433 <+33>:    fldl   -0x18(%ebp)
   0x00401436 <+36>:    add    $0x10,%esp
   0x00401439 <+39>:    pop    %ebx
   0x0040143a <+40>:    pop    %esi
   0x0040143b <+41>:    leave
   0x0040143c <+42>:    ret
End of assembler dump.

The following pedantically-ugly source:

typedef const unsigned char Double80[10];
Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29,
0x42, 0xCC, 0xFD, 0x3F}; // 0x3FFDCC42299EA1B28468,
0.39894228040143267793994605993438

double oneOverRootTwoPi() __attribute__(( cdecl ));

double oneOverRootTwoPi() {
        __asm__(
                "fldt %[s_oneOverRootTwoPi]"
                "\n\t" "leave"
                "\n\t" "ret"
                :
                : [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi)
                :
                );

}

generates the following almost-ideal code, which never-the-less cannot
be inlined:

Dump of assembler code for function _Z14oneOverRootTwov:
   0x004013e2 <+0>:     push   %ebp
   0x004013e3 <+1>:     mov    %esp,%ebp
=> 0x004013e5 <+3>:     fldt   0x40407a
   0x004013eb <+9>:     leave
   0x004013ec <+10>:    ret
   0x004013ed <+11>:    flds   0x40473c
   0x004013f3 <+17>:    leave
   0x004013f4 <+18>:    ret
End of assembler dump.

The problem is, WHY DOES THE GNU COMPILER INSIST ON GENERATING THE
"flds   0x40473c" before the return code?
Nowhere does the source say "return xxx;", and why does GCC take it
upon itself to return a default value??? If I forget to (or in
this case, intentionally do not ) return a value, why doesn't the
compiler simply issue me a stern warning, but still do what I
instruct it to do?

Acceptable code would be:

Dump of assembler code for function _Z14oneOverRootTwov:
   0x004013e2 <+0>:     push   %ebp
   0x004013e3 <+1>:     mov    %esp,%ebp
   0x004013e5 <+3>:     fldt   0x40407a
   0x004013eb <+9>:     leave
   0x004013ec <+10>:    ret
End of assembler dump.

Ideal code would be:

Dump of assembler code for function _Z14oneOverRootTwov:
   0x004013e5 <+3>:     fldt   0x40407a
   0x004013ec <+10>:    ret
End of assembler dump.

so long as, in both cases, the COMPILER is what generates the return
code, so that inlining would happen correctly.
This is where the "naked" attribute would be handy.  I understand
developers reluctance to implement this,
but what is my alternative?

[Prev in Thread]

Current Thread

[Next in Thread]

inline assembler functions returning double as 80-bit extended-precision double, DavidIMcIntosh <=
- Re: inline assembler functions returning double as 80-bit extended-precision double, DavidIMcIntosh, 2011/05/24

Prev by Date: "redirect" symbol search from .so to .a
Next by Date: Re: Cygwin (gnu) path problem, cannot find g++/make...
Previous by thread: "redirect" symbol search from .so to .a
Next by thread: Re: inline assembler functions returning double as 80-bit extended-precision double
Index(es):
- Date
- Thread