Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG

From:	Shin-ichiro KAWASAKI
Subject:	Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG
Date:	Mon, 01 Sep 2008 02:28:59 +0900
User-agent:	Thunderbird 2.0.0.16 (Windows/20080708)

Thank your for comments, Blue Swirl!
The new patch is shown at the end of this mail.
Reviews are welcome, again.

Blue Swirl wrote:

> On 8/30/08, Shin-ichiro KAWASAKI <address@hidden> wrote:

>> This patch converts two SH4 float instructions, 'fmov Rm,Rn' and
>>  'fadd' into TCG.  Before converting other float instructions
>>  into TCG, comments on it will be appreciated.
>>
>>  - TCG variables intorudced for float operation : FT[01], and DT[01].
>>  - I think float registers 'fregs' are not to be mapped for
>>   TCG variables, because TCG does not support float operations, now.

>
> In TCG README Fabrice mentioned that he's working on a floating point
> version. It would be nice to get that committed, already m68k has a
> fake version. I did not use TCG variables for float just because I've
> been waiting for that. It's not possible to pass env->fp_status
> cleanly now (for example to call softfloat functions directly). And
> for example on Sparc host, moving data from integer registers to
> floating point registers must use memory in between them so there the
> integer TCG registers should be separate from float TCG registers.


I agree that it is important to conisder mapping fregs ot TCG feature
like m68k implementation does. But I think this work will take too long
time, if I do it, and it will be some break for converting TCG work.
I think it is not too late to do this work after eliminating 'op.c'.

And I have a question on it.
As you pointed out below, two 32 bit registers are concatanated as one
64 bit register.  When floating point TCG variables introduced in the future,
it is straight to map one memory region for 32bit TCG variable and 64bit
TCG variable.  I wonder such two way mapping suit for TCG, or not.

> There is no need to pass env as a parameter, it's available to
> op_helper.c code anyway.


Indeed. Thanks!

> I guess that like in Sparc, two 32 bit fregs can be combined to a 64
> bit double. Please have a look at target-sparc/translate.c on the use
> of CPU_DoubleU, there should be a safer way to access the registers.


Thanks. I wrote new codes for load/store function, following sparc.

Regards,
Shin-ichiro KAWASAKI


Index: trunk/target-sh4/op.c
===================================================================
--- trunk/target-sh4/op.c       (revision 5119)
+++ trunk/target-sh4/op.c       (working copy)
@@ -163,18 +163,6 @@
    RETURN();
}

-void OPPROTO op_fadd_FT(void)
-{
-    FT0 = float32_add(FT0, FT1, &env->fp_status);
-    RETURN();
-}
-
-void OPPROTO op_fadd_DT(void)
-{
-    DT0 = float64_add(DT0, DT1, &env->fp_status);
-    RETURN();
-}
-
void OPPROTO op_fsub_FT(void)
{
    FT0 = float32_sub(FT0, FT1, &env->fp_status);
Index: trunk/target-sh4/helper.h
===================================================================
--- trunk/target-sh4/helper.h   (revision 5119)
+++ trunk/target-sh4/helper.h   (working copy)
@@ -18,3 +18,6 @@
DEF_HELPER(void, helper_macw, (uint32_t, uint32_t))

DEF_HELPER(void, helper_ld_fpscr, (uint32_t))
+DEF_HELPER(uint32_t, helper_fadd_FT, (uint32_t, uint32_t))
+DEF_HELPER(uint64_t, helper_fadd_DT, (uint64_t, uint64_t))
+
Index: trunk/target-sh4/op_helper.c
===================================================================
--- trunk/target-sh4/op_helper.c        (revision 5119)
+++ trunk/target-sh4/op_helper.c        (working copy)
@@ -397,3 +397,16 @@
    else
        set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
}
+
+uint32_t helper_fadd_FT(uint32_t t0, uint32_t t1)
+{
+    float32 ret = float32_add(*(float32*)&t0, *(float32*)&t1, &env->fp_status);
+    return *(uint32_t*)(&ret);
+}
+
+uint64_t helper_fadd_DT(uint64_t t0, uint64_t t1)
+{
+    float64 ret = float64_add(*(float64*)&t0, *(float64*)&t1, &env->fp_status);
+    return *(uint64_t*)(&ret);
+}
+
Index: trunk/target-sh4/translate.c
===================================================================
--- trunk/target-sh4/translate.c        (revision 5119)
+++ trunk/target-sh4/translate.c        (working copy)
@@ -69,6 +69,8 @@

/* dyngen register indexes */
static TCGv cpu_T[2];
+static TCGv cpu_FT[2];
+static TCGv cpu_DT[2];

#include "gen-icount.h"

@@ -90,6 +92,15 @@
    cpu_env = tcg_global_reg_new(TCG_TYPE_PTR, TCG_AREG0, "env");
    cpu_T[0] = tcg_global_reg_new(TCG_TYPE_I32, TCG_AREG1, "T0");
    cpu_T[1] = tcg_global_reg_new(TCG_TYPE_I32, TCG_AREG2, "T1");
+    cpu_FT[0] = tcg_global_mem_new(TCG_TYPE_I32, TCG_AREG0,
+                                  offsetof(CPUState, ft0), "FT0");
+    cpu_FT[1] = tcg_global_mem_new(TCG_TYPE_I32, TCG_AREG0,
+                                  offsetof(CPUState, ft1), "FT1");
+    cpu_DT[0] = tcg_global_mem_new(TCG_TYPE_I64, TCG_AREG0,
+                                  offsetof(CPUState, dt0), "DT0");
+    printf("initial offset : %d\n", offsetof(CPUState, dt0));
+    cpu_DT[1] = tcg_global_mem_new(TCG_TYPE_I64, TCG_AREG0,
+                                  offsetof(CPUState, dt1), "DT1");

    for (i = 0; i < 24; i++)
        cpu_gregs[i] = tcg_global_mem_new(TCG_TYPE_I32, TCG_AREG0,
@@ -337,6 +348,62 @@
    tcg_gen_ori_i32(cpu_flags, cpu_flags, flags);
}

+
+#define DEF_gen_ld_frN_FTx(FTindex)                                           \
+static inline void gen_ld_frN_FT##FTindex(TCGv cpu_env, uint32_t reg)         \
+{                                                                             \
+    tcg_gen_ld_i32(cpu_FT[FTindex], cpu_env, offsetof(CPUState, fregs[reg])); \
+}
+
+DEF_gen_ld_frN_FTx(0)
+DEF_gen_ld_frN_FTx(1)
+
+#define DEF_gen_ld_drN_DTx(DTindex)                                           \
+static inline void gen_ld_drN_DT##DTindex(TCGv cpu_env, uint32_t reg)         \
+{                                                                             \
+    TCGv tmp = tcg_temp_new(TCG_TYPE_I32);                                    \
+    uint32_t offset = offsetof(CPUState, dt##DTindex);                        \
+                                                                              \
+    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, fregs[reg]));             \
+    tcg_gen_st_i32(tmp, cpu_env, offset + offsetof(CPU_DoubleU, l.upper));    \
+    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, fregs[reg + 1]));         \
+    tcg_gen_st_i32(tmp, cpu_env, offset + offsetof(CPU_DoubleU, l.lower));    \
+                                                                              \
+    tcg_temp_free(tmp);                                                       \
+}
+
+DEF_gen_ld_drN_DTx(0)
+DEF_gen_ld_drN_DTx(1)
+
+#define DEF_gen_st_FTx_frN(FTindex)                                           \
+static inline void gen_st_FT##FTindex##_frN(TCGv cpu_env, uint32_t reg)       \
+{                                                                             \
+    tcg_gen_st_i32(cpu_FT[FTindex], cpu_env, offsetof(CPUState, fregs[reg])); \
+}
+
+DEF_gen_st_FTx_frN(0)
+DEF_gen_st_FTx_frN(1)
+
+#define DEF_gen_st_DTx_drN(DTindex)                                           \
+static inline void gen_st_DT##DTindex##_drN(TCGv cpu_env, uint32_t reg)       \
+{                                                                             \
+    TCGv tmp = tcg_temp_new(TCG_TYPE_I32);                                    \
+    uint32_t offset = offsetof(CPUState, dt##DTindex);                        \
+                                                                              \
+    tcg_gen_st_i64(cpu_DT[DTindex], cpu_env, offset); /* keep DTx alive. */   \
+                                                                              \
+    tcg_gen_ld_i32(tmp, cpu_env, offset + offsetof(CPU_DoubleU, l.upper));    \
+    tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUState, fregs[reg]));             \
+    tcg_gen_ld_i32(tmp, cpu_env, offset + offsetof(CPU_DoubleU, l.lower));    \
+    tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUState, fregs[reg + 1]));         \
+                                                                              \
+    tcg_temp_free(tmp);                                                       \
+}
+
+DEF_gen_st_DTx_drN(0)
+DEF_gen_st_DTx_drN(1)
+
+
#define B3_0 (ctx->opcode & 0xf)
#define B6_4 ((ctx->opcode >> 4) & 0x7)
#define B7_4 ((ctx->opcode >> 4) & 0xf)
@@ -788,12 +855,14 @@
        tcg_gen_xor_i32(cpu_gregs[REG(B11_8)], cpu_gregs[REG(B11_8)], 
cpu_gregs[REG(B7_4)]);
        return;
    case 0xf00c: /* fmov {F,D,X}Rm,{F,D,X}Rn - FPSCR: Nothing */
+        /* 64 bit fmov code is not tested, because SH-Linux seems
+          not to set FPSCR_SZ flag true. */
        if (ctx->fpscr & FPSCR_SZ) {
-           gen_op_fmov_drN_DT0(XREG(B7_4));
-           gen_op_fmov_DT0_drN(XREG(B11_8));
+           gen_ld_drN_DT0(cpu_env, XREG(B7_4));
+           gen_st_DT0_drN(cpu_env, XREG(B11_8));
        } else {
-           gen_op_fmov_frN_FT0(FREG(B7_4));
-           gen_op_fmov_FT0_frN(FREG(B11_8));
+           gen_ld_frN_FT0(cpu_env, FREG(B7_4));
+           gen_st_FT0_frN(cpu_env, FREG(B11_8));
        }
        return;
    case 0xf00a: /* fmov {F,D,X}Rm,@Rn - FPSCR: Nothing */
@@ -882,17 +951,22 @@
        if (ctx->fpscr & FPSCR_PR) {
            if (ctx->opcode & 0x0110)
                break; /* illegal instruction */
-           gen_op_fmov_drN_DT1(DREG(B7_4));
-           gen_op_fmov_drN_DT0(DREG(B11_8));
+           gen_ld_drN_DT1(cpu_env, DREG(B7_4));
+           gen_ld_drN_DT0(cpu_env, DREG(B11_8));
        }
        else {
-           gen_op_fmov_frN_FT1(FREG(B7_4));
-           gen_op_fmov_frN_FT0(FREG(B11_8));
+           gen_ld_frN_FT1(cpu_env, FREG(B7_4));
+           gen_ld_frN_FT0(cpu_env, FREG(B11_8));
        }

        switch (ctx->opcode & 0xf00f) {
        case 0xf000:            /* fadd Rm,Rn */
-           ctx->fpscr & FPSCR_PR ? gen_op_fadd_DT() : gen_op_fadd_FT();
+           if (ctx->fpscr & FPSCR_PR)
+               tcg_gen_helper_1_2(helper_fadd_DT, cpu_DT[0],
+                                  cpu_DT[0], cpu_DT[1]);
+           else
+               tcg_gen_helper_1_2(helper_fadd_FT, cpu_FT[0],
+                                  cpu_FT[0], cpu_FT[1]);
            break;
        case 0xf001:            /* fsub Rm,Rn */
            ctx->fpscr & FPSCR_PR ? gen_op_fsub_DT() : gen_op_fsub_FT();
@@ -912,10 +986,10 @@
        }

        if (ctx->fpscr & FPSCR_PR) {
-           gen_op_fmov_DT0_drN(DREG(B11_8));
+           gen_st_DT0_drN(cpu_env, DREG(B11_8));
        }
        else {
-           gen_op_fmov_FT0_frN(FREG(B11_8));
+           gen_st_FT0_frN(cpu_env, FREG(B11_8));
        }
        return;
    }

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG, Shin-ichiro KAWASAKI, 2008/08/30
- Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG, Blue Swirl, 2008/08/30
  - Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG, Shin-ichiro KAWASAKI <=
    - Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG, Blue Swirl, 2008/08/31
    - Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG, Aurelien Jarno, 2008/08/31

Prev by Date: Re: [Qemu-devel] Looking for an SH4 QEMU-ready kernel (for TCG conversion)
Next by Date: Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG
Previous by thread: Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG
Next by thread: Re: [Qemu-devel] [PATCH] SH4: convert fmov/fadd to TCG
Index(es):
- Date
- Thread