help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: -ffast-math option at compling octave in FreeBSD ports ?


From: Jaroslav Hajek
Subject: Re: -ffast-math option at compling octave in FreeBSD ports ?
Date: Sun, 7 Dec 2008 16:58:36 +0100

On Sun, Dec 7, 2008 at 9:12 AM, Tatsuro MATSUOKA <address@hidden> wrote:
> Hello
>
> In an octave thread in Japan, there was a report that asked the meaning 
> -ffast-math option in FreeBSD ports.
>
> It will be glad for me if there are some peple who will give me information 
> about it.
>
> Regards
>
> Tatsuro
>
> --------------------------------------
> Power up the Internet with Yahoo! Toolbar.
> http://pr.mail.yahoo.co.jp/toolbar/
> _______________________________________________
> Help-octave mailing list
> address@hidden
> https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
>

I'm not exactly an expert, but I'll try to explain:
-ffast-math in GCC enables certain optimizations that can dramatically
boost performance, but may slightly violate the expected semantics of
a computation.

To get an idea what is allowed under -ffast-math, try this simple
function with g++:

void dscal (double *x, int n, double a)
{
  for (int i = 0; i < n; i++)
    x[i] /= a;
}

compiled (to assembler) using "-O3 -fomit-frame-pointer"
(I intentionally omit -funroll-loops so that the assembler stays readable)
I get (g++ 4.3, old Intel Celeron):
        movl    8(%esp), %ecx
        movl    4(%esp), %edx
        fldl    12(%esp)
        testl   %ecx, %ecx
        jle     .L8
        xorl    %eax, %eax
        .p2align 4,,7
.L4:
        fldl    (%edx,%eax,8)
        fdiv    %st(1), %st
        fstpl   (%edx,%eax,8)
        addl    $1, %eax
        cmpl    %ecx, %eax
        jne     .L4
.L8:
        fstp    %st(0)
        ret

whereas with "-O3 -fomit-frame-pointer -ffast-math" I get:
        movl    8(%esp), %ecx
        movl    4(%esp), %edx
        fldl    12(%esp)
        testl   %ecx, %ecx
        jle     .L8
        fld1
        xorl    %eax, %eax
        fdivp   %st, %st(1)
        .p2align 4,,7
.L4:
        fldl    (%edx,%eax,8)
        fmul    %st(1), %st
        fstpl   (%edx,%eax,8)
        addl    $1, %eax
        cmpl    %ecx, %eax
        jne     .L4
.L8:
        fstp    %st(0)
        ret


If you can read assembler at the basic level (like I do), you see that
in the second case, the compiler essentially transformed the function
like this:
void dscal (double *x, int n, double a)
{
  double ainv = 1.0/a;
  for (int i = 0; i < n; i++)
    x[i] *= ainv;
}

This is much faster, because division is much slower than
multiplication, and can also be better vectorized using SSE
instructions and loop unrolling.
However, it may produce slightly different results, because, for instance, while
x / x is exactly 1 for any finite nonzero x, x * (1/x) is not (in FP math).
Another thing is that with -ffast-math, compiler is allowed to assume
that NaNs and Infs do not occur in expressions, and thus, for
instance, replace "x-x" by 0. (which does not hold for x=NaN).

HTH,

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]