I have had the problem occur with double rands, but it is much easier to reproduce with singles.
I think loss of precision could account for the problem. I was under the impression that the division was taking place as a single precision calculation, not a double precision calculation?
I would be more confident in the correctness of the result if all operands were doubles, and the division took place as a double precision division, and then the final result was cast as a single.
That said, it would probably be much faster to do bit manipulations place the bits in the appropriate locations. However, I am not sure whether assumptions about x86 versus other architectures will require specialized coding. I don't completely understand the comments from lines 104-108 on performance impacts of the i686 64-bit architecture.
- Mike