freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] FT_MulFix assembly


From: James Cloos
Subject: Re: [ft-devel] FT_MulFix assembly
Date: Mon, 06 Sep 2010 15:28:57 -0400
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

>>>>> "MB" == Miles Bader <address@hidden> writes:

MB> The compiler generates the following assembly:

MB>     mov     %esi, %eax
MB>     mov     %edi, %edi
MB>     imulq   %rdi, %rax
MB>     addq    $32768, %rax
MB>     shrq    $16, %rax

That does not match the C code though; it rounds negative values wrong.

The C version does away-from-zero rounding.

Using the single arg version of imulq generates a 128 bit result; the
more significant part of which will be 0 iff the product is >=0 and
will be -1 if the product is <0, given that the multiplicands were
only 32 bits.  Adding that, in addition to the 32768, to rax ensures
that the result of the >>=16 is rounded the way freetype wants.

If you use the two arg version of imul, you have to copy the msb of the
result (or do a compare and jump, like the C code) to determine whether
to add 0x8000 or 0x7FFF.

Matching the rounding was the hardest part; noting that the upper 64
bits of the 128-bit product would always be just sign-extension bits
and that, because of the prototype of FT_MulFix() itself, the vaules
are already promoted to 64 bits before they get to the assembly were
what provided the most (in-order) speedups.

If it can be done better, though, I'd be happy to know!

Thanks for also looking at it.

-JimC
-- 
James Cloos <address@hidden>         OpenPGP: 1024D/ED7DAEA6



reply via email to

[Prev in Thread] Current Thread [Next in Thread]