|
From: | Lucas Mateus Martins Araujo e Castro |
Subject: | Re: [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2* |
Date: | Tue, 10 May 2022 14:25:04 -0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 |
On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
There's a discrepancy between this implementation and mambo/the
hardware where implementing it with float64_mul then float64r32_muladd
sometimes results in an incorrect result after an underflow, but
implementing with float32_mul then float32_muladd results in incorrect
signal in some 0 or infinite results. I've not been able to solve this
I did suggest that the float64_mul needs to be done in round-to-odd.
From what I understood, you meant:
Which doesn't solve the problem, I tried other solutions but overall I found 3 test cases that no solution could pass all, those being:
xa = 0x 000923da 28c31f00 00018540 XXXXXXXX
xb = 0x 9d080000 000f97ac b7092f00 XXXXXXXX
xvbf16ger2 at, xa, xb
at = 0x 80000000 XXXXXXXX XXXXXXXX XXXXXXXX
0xXXXXXXXX 80000016 XXXXXXXX XXXXXXXX
0xXXXXXXXX XXXXXXXX 80000001 XXXXXXXX
0xXXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Doing the operation either with float64 (with and without round_to_odd) or with a new softfloat operation that uses FloatParts64 results in 0x80000015 instead of 0x80000016, but doing it with float32 results in 0x00000000 instead of 0x80000000 and 0x80000002 instead of 0x80000001
Between those choices I'd go with float64 as to keep the result
numerically close tho the actual value if the next operation treat
those as an integer (with float32 you can end up having 0 instead
of INT32_MIN) and the results are close if they're treated as
floating-point.
Anyway, for this patch,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
[Prev in Thread] | Current Thread | [Next in Thread] |