[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ft-devel] FT_MulFix assembly
From: |
Miles Bader |
Subject: |
Re: [ft-devel] FT_MulFix assembly |
Date: |
Mon, 06 Sep 2010 17:02:38 +0900 |
James Cloos <address@hidden> writes:
> __asm__ __volatile__ (
> "movq %1, %%rax\n"
> "imul %2\n"
> "addq %%rdx, %%rax\n"
> "addq $0x8000, %%rax\n"
> "sarq $16, %%rax\n"
> : "=a"(result)
> : "g"(a), "g"(b)
> : "rdx" );
>
> The above code has a latency of 1+5+1+1+1 = 10 clocks on an amdfam10 cpu.
...
> Is the amd64 version desired, given how little benefit it has?
If this is being used in a context where it might benefit from more
scheduling, etc, perhaps it would help to let the compiler generate the
non-imul insns (since it's pretty good at those)?
E.g. something like:
static __inline__ long
FT_MulFix_x86_64 (long a, long b)
{
register long mr1, mr2;
__asm__ ("imul %3\n" : "=a" (mr1), "=d" (mr2) : "a" (a), "g" (b));
return ((mr1 + mr2) + 0x8000) >> 16;
}
-miles
--
Omochiroi!
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/05
- Re: [ft-devel] FT_MulFix assembly, Graham Asher, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly,
Miles Bader <=
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/18
- Re: [ft-devel] FT_MulFix assembly, Werner LEMBERG, 2010/09/19