[Qemu-devel] [V2 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fus

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [V2 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fus

From:	Tom Musta
Subject:	[Qemu-devel] [V2 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fused Multiply-Adds
Date:	Tue, 19 Nov 2013 07:40:30 -0600

This patch adds the Single Precision VSX Scalar Fused Multiply-Add
instructions: xsmaddasp, xsmaddmsp, xssubasp, xssubmsp, xsnmaddasp,
xsnmaddmsp, xsnmsubasp, xsnmsubmsp.

The existing VSX_MADD() macro is modified to support rounding of the
intermediate double precision result to single precision.

V2: Re-implemented per feedback from Richard Henderson.  In order to
avoid double rounding and incorrect results, the operands must be
converted to true single precision values and use the single precision
fused multiply/add routine.

Signed-off-by: Tom Musta <address@hidden>
---
 target-ppc/fpu_helper.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/helper.h     |    8 +++++
 target-ppc/translate.c  |   16 ++++++++++
 3 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 8825db2..6428c54 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2250,6 +2250,74 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)      
                     \
     helper_float_check_status(env);                                           \
 }
 
+/* VSX_XSMADDSP - VSX scalar muliply/add single precision variations
+ *   op    - instruction mnemonic
+ *   maddflgs - flags for the float*muladd routine that control the
+ *           various forms (madd, msub, nmadd, nmsub)
+ *   afrm  - A form (1=A, 0=M)
+ */
+#define VSX_XSMADDSP(op, maddflgs, afrm)                                      \
+void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
+{                                                                             \
+    ppc_vsr_t xt_in, xa, xb, xt_out;                                          \
+    ppc_vsr_t *b, *c;                                                         \
+                                                                              \
+    if (afrm) { /* AxB + T */                                                 \
+        b = &xb;                                                              \
+        c = &xt_in;                                                           \
+    } else { /* AxT + B */                                                    \
+        b = &xt_in;                                                           \
+        c = &xb;                                                              \
+    }                                                                         \
+                                                                              \
+    getVSR(xA(opcode), &xa, env);                                             \
+    getVSR(xB(opcode), &xb, env);                                             \
+    getVSR(xT(opcode), &xt_in, env);                                          \
+                                                                              \
+    xt_out = xt_in;                                                           \
+                                                                              \
+    helper_reset_fpstatus(env);                                               \
+                                                                              \
+    /* NOTE: in order to get accurate results, we must first round back */    \
+    /*       to single precision and use the fused multiply add routine */    \
+    /*       for 32-bit floats.                                         */    \
+    float_status tstat = env->fp_status;                                      \
+    float32 a32 = float64_to_float32(xa.f64[0], &tstat);                      \
+    float32 b32 = float64_to_float32(b->f64[0], &tstat);                      \
+    float32 c32 = float64_to_float32(c->f64[0], &tstat);                      \
+                                                                              \
+    set_float_exception_flags(0, &tstat);                                     \
+    float32 t32 = float32_muladd(a32, b32, c32, maddflgs, &tstat);            \
+                                                                              \
+    env->fp_status.float_exception_flags |= tstat.float_exception_flags;      \
+                                                                              \
+    if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {         \
+        if (float64_is_signaling_nan(xa.f64[0]) ||                            \
+            float64_is_signaling_nan(b->f64[0]) ||                            \
+            float64_is_signaling_nan(c->f64[0])) {                            \
+            fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 1);            \
+            tstat.float_exception_flags &= ~float_flag_invalid;               \
+        }                                                                     \
+        if ((float64_is_infinity(xa.f64[0]) && float64_is_zero(b->f64[0])) || \
+            (float64_is_zero(xa.f64[0]) && float64_is_infinity(b->f64[0]))) { \
+            xt_out.f64[0] = fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ, \
+                                                  1);                         \
+            tstat.float_exception_flags &= ~float_flag_invalid;               \
+        }                                                                     \
+        if ((tstat.float_exception_flags & float_flag_invalid) &&             \
+            ((float64_is_infinity(xa.f64[0]) ||                               \
+              float64_is_infinity(b->f64[0])) &&                              \
+              float64_is_infinity(c->f64[0]))) {                              \
+            fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, 1);             \
+        }                                                                     \
+    }                                                                         \
+                                                                              \
+    xt_out.f64[0] = float32_to_float64(t32, &tstat);                          \
+    helper_compute_fprf(env, xt_out.f64[0], 1);                               \
+    putVSR(xT(opcode), &xt_out, env);                                         \
+    helper_float_check_status(env);                                           \
+}
+
 #define MADD_FLGS 0
 #define MSUB_FLGS float_muladd_negate_c
 #define NMADD_FLGS float_muladd_negate_result
@@ -2264,6 +2332,15 @@ VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1)
 VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1)
 VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1)
 
+VSX_XSMADDSP(xsmaddasp, MADD_FLGS, 1)
+VSX_XSMADDSP(xsmaddmsp, MADD_FLGS, 0)
+VSX_XSMADDSP(xsmsubasp, MSUB_FLGS, 1)
+VSX_XSMADDSP(xsmsubmsp, MSUB_FLGS, 0)
+VSX_XSMADDSP(xsnmaddasp, NMADD_FLGS, 1)
+VSX_XSMADDSP(xsnmaddmsp, NMADD_FLGS, 0)
+VSX_XSMADDSP(xsnmsubasp, NMSUB_FLGS, 1)
+VSX_XSMADDSP(xsnmsubmsp, NMSUB_FLGS, 0)
+
 VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0)
 VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0)
 VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 84c6ee1..655b670 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -293,6 +293,14 @@ DEF_HELPER_2(xsdivsp, void, env, i32)
 DEF_HELPER_2(xsresp, void, env, i32)
 DEF_HELPER_2(xssqrtsp, void, env, i32)
 DEF_HELPER_2(xsrsqrtesp, void, env, i32)
+DEF_HELPER_2(xsmaddasp, void, env, i32)
+DEF_HELPER_2(xsmaddmsp, void, env, i32)
+DEF_HELPER_2(xsmsubasp, void, env, i32)
+DEF_HELPER_2(xsmsubmsp, void, env, i32)
+DEF_HELPER_2(xsnmaddasp, void, env, i32)
+DEF_HELPER_2(xsnmaddmsp, void, env, i32)
+DEF_HELPER_2(xsnmsubasp, void, env, i32)
+DEF_HELPER_2(xsnmsubmsp, void, env, i32)
 
 DEF_HELPER_2(xvadddp, void, env, i32)
 DEF_HELPER_2(xvsubdp, void, env, i32)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ae80289..672cf0a 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -7348,6 +7348,14 @@ GEN_VSX_HELPER_2(xsdivsp, 0x00, 0x03, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsresp, 0x14, 0x01, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xssqrtsp, 0x16, 0x00, 0, PPC2_VSX207)
 GEN_VSX_HELPER_2(xsrsqrtesp, 0x14, 0x00, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmaddasp, 0x04, 0x00, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmaddmsp, 0x04, 0x01, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmsubasp, 0x04, 0x02, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsmsubmsp, 0x04, 0x03, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmaddasp, 0x04, 0x10, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmaddmsp, 0x04, 0x11, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmsubasp, 0x04, 0x12, 0, PPC2_VSX207)
+GEN_VSX_HELPER_2(xsnmsubmsp, 0x04, 0x13, 0, PPC2_VSX207)
 
 GEN_VSX_HELPER_2(xvadddp, 0x00, 0x0C, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvsubdp, 0x00, 0x0D, 0, PPC2_VSX)
@@ -10163,6 +10171,14 @@ GEN_XX3FORM(xsdivsp, 0x00, 0x03, PPC2_VSX207),
 GEN_XX2FORM(xsresp,  0x14, 0x01, PPC2_VSX207),
 GEN_XX2FORM(xssqrtsp,  0x16, 0x00, PPC2_VSX207),
 GEN_XX2FORM(xsrsqrtesp,  0x14, 0x00, PPC2_VSX207),
+GEN_XX3FORM(xsmaddasp, 0x04, 0x00, PPC2_VSX207),
+GEN_XX3FORM(xsmaddmsp, 0x04, 0x01, PPC2_VSX207),
+GEN_XX3FORM(xsmsubasp, 0x04, 0x02, PPC2_VSX207),
+GEN_XX3FORM(xsmsubmsp, 0x04, 0x03, PPC2_VSX207),
+GEN_XX3FORM(xsnmaddasp, 0x04, 0x10, PPC2_VSX207),
+GEN_XX3FORM(xsnmaddmsp, 0x04, 0x11, PPC2_VSX207),
+GEN_XX3FORM(xsnmsubasp, 0x04, 0x12, PPC2_VSX207),
+GEN_XX3FORM(xsnmsubmsp, 0x04, 0x13, PPC2_VSX207),
 
 GEN_XX3FORM(xvadddp, 0x00, 0x0C, PPC2_VSX),
 GEN_XX3FORM(xvsubdp, 0x00, 0x0D, PPC2_VSX),
-- 
1.7.1

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [V2 PATCH 02/14] target-ppc: VSX Stage 4: Refactor lxsdx, (continued)
- [Qemu-devel] [V2 PATCH 02/14] target-ppc: VSX Stage 4: Refactor lxsdx, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 05/14] target-ppc: VSX Stage 4: Add stxsiwx and stxsspx, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 03/14] target-ppc: VSX Stage 4: Add lxsiwax, lxsiwzx and lxsspx, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 07/14] target-ppc: VSX Stage 4: Add xsmulsp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 06/14] target-ppc: VSX Stage 4: Add xsaddsp and xssubsp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 04/14] target-ppc: VSX Stage 4: Refactor stxsdx, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 08/14] target-ppc: VSX Stage 4: Add xsdivsp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 09/14] target-ppc: VSX Stage 4: Add xsresp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 10/14] target-ppc: VSX Stage 4: Add xssqrtsp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 11/14] target-ppc: VSX Stage 4: add xsrsqrtesp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fused Multiply-Adds, Tom Musta <=
  - Re: [Qemu-devel] [V2 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fused Multiply-Adds, Richard Henderson, 2013/11/19
- [Qemu-devel] [V2 PATCH 13/14] target-ppc: VSX Stage 4: Add xscvsxdsp and xscvuxdsp, Tom Musta, 2013/11/19
- [Qemu-devel] [V2 PATCH 14/14] target-ppc: VSX Stage 4: Add xxleqv, xxlnand and xxlorc, Tom Musta, 2013/11/19

Prev by Date: [Qemu-devel] [V2 PATCH 11/14] target-ppc: VSX Stage 4: add xsrsqrtesp
Next by Date: [Qemu-devel] [V2 PATCH 13/14] target-ppc: VSX Stage 4: Add xscvsxdsp and xscvuxdsp
Previous by thread: [Qemu-devel] [V2 PATCH 11/14] target-ppc: VSX Stage 4: add xsrsqrtesp
Next by thread: Re: [Qemu-devel] [V2 PATCH 12/14] target-ppc: VSX Stage 4: Add Scalar SP Fused Multiply-Adds
Index(es):
- Date
- Thread