[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ft-devel] FT_MulFix assembly
From: |
James Cloos |
Subject: |
Re: [ft-devel] FT_MulFix assembly |
Date: |
Tue, 07 Sep 2010 15:07:09 -0400 |
User-agent: |
Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) |
>>>>> "MB" == Miles Bader <address@hidden> writes:
MB> Hm, are you sure that's not backwards? When I tried the git C version[*],
MB> as well as your most recent FT_MulFix_x86_64, it returned 0xFFFF8506...
Odd. Adding your algo to my test app, I get:
7AFA8000, FFFFFFFF, FFFF8505, FFFF8505, FFFF8506
# a , b , FT , JC , MB
I see that I have one small error in the C code in my app.
FT has:
c = (FT_Long)( ( (FT_Int64)a * b + 0x8000L ) >> 16 );
whereas I used:
c = (int32_t)(((int64_t)a*b + 0x8000L) >> 16);
But changing the int32_t to long does not change the results.
Yours still is always +1 compared to the C, whenever the first arg
represents a positive value with fractional part == 1/2.
Oddly, though, gcc now refuses to compile my asm, even though it did do
so before, complaining that I cannot guess what arg size to use for the
imul.... Wierd. (The existing executables prove that it used to.)
A simple way around that is to specify "D" and "S" as the contraints
for a and b. (The rdi and rsi regesters are where the x86_64 abi puts
the first two args which are passed to a function.)
The disassembly of the final version is:
00000000004006c0 <mf>:
4006c0: 48 89 f8 mov %rdi,%rax
4006c3: 48 f7 ee imul %rsi
4006c6: 48 01 d0 add %rdx,%rax
4006c9: 48 05 00 80 00 00 add $0x8000,%rax
4006cf: 48 c1 f8 10 sar $0x10,%rax
4006d3: c3 retq
And I get this disassembly of yours:
0000000000400840 <miles>:
400840: 48 63 c6 movslq %esi,%rax
400843: 48 63 ff movslq %edi,%rdi
400846: 48 0f af c7 imul %rdi,%rax
40084a: 48 05 00 80 00 00 add $0x8000,%rax
400850: 48 c1 f8 10 sar $0x10,%rax
400854: c3 retq
I also just added this version to my test app:
int another (int32_t a, int32_t b) {
long r = (long)a * (long)b;
long s = r >> 31;
return (r + s + 0x8000) >> 16;
}
That results in:
0000000000400760 <another>:
400760: 48 63 ff movslq %edi,%rdi
400763: 48 63 f6 movslq %esi,%rsi
400766: 48 0f af f7 imul %rdi,%rsi
40076a: 48 89 f0 mov %rsi,%rax
40076d: 48 c1 f8 1f sar $0x1f,%rax
400771: 48 8d 84 06 00 80 00 lea 0x8000(%rsi,%rax,1),%rax
400778: 00
400779: 48 c1 f8 10 sar $0x10,%rax
40077d: c3 retq
Since FT's C version uses longs, though, this:
int another (long a, long b) {
long r = (long)a * (long)b;
long s = r >> 31;
return (r + s + 0x8000) >> 16;
}
gives:
0000000000400760 <another>:
400760: 48 0f af f7 imul %rdi,%rsi
400764: 48 89 f0 mov %rsi,%rax
400767: 48 c1 f8 1f sar $0x1f,%rax
40076b: 48 8d 84 06 00 80 00 lea 0x8000(%rsi,%rax,1),%rax
400772: 00
400773: 48 c1 f8 10 sar $0x10,%rax
400777: c3 retq
So it would seem that when compiling for any processor where FT_Long is
the same as int64_t and where that fits into a single register, then
that last bit of C might be optimal, yes?
-JimC
--
James Cloos <address@hidden> OpenPGP: 1024D/ED7DAEA6
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/05
- Re: [ft-devel] FT_MulFix assembly, Graham Asher, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/06
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly,
James Cloos <=
- Re: [ft-devel] FT_MulFix assembly, Miles Bader, 2010/09/07
- Re: [ft-devel] FT_MulFix assembly, James Cloos, 2010/09/18
- Re: [ft-devel] FT_MulFix assembly, Werner LEMBERG, 2010/09/19