qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 1/1] target/riscv: use tcg ops generation to emulate whole reg


From: Richard Henderson
Subject: Re: [RFC 1/1] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.
Date: Wed, 18 Dec 2024 13:08:46 -0600
User-agent: Mozilla Thunderbird

On 12/18/24 11:08, Paolo Savini wrote:
This patch aims at emulating the whole register loads and stores through
direct generation of tcg operations rather than through the aid of a helper
function.

Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
---
  target/riscv/insn_trans/trans_rvv.c.inc | 104 +++++++++++++-----------
  1 file changed, 56 insertions(+), 48 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index b9883a5d32..63d8b05bf1 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -1100,26 +1100,34 @@ GEN_VEXT_TRANS(vle64ff_v, MO_64, r2nfvm, ldff_op, 
ld_us_check)
  typedef void gen_helper_ldst_whole(TCGv_ptr, TCGv, TCGv_env, TCGv_i32);
static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
-                             gen_helper_ldst_whole *fn,
-                             DisasContext *s)
+                             uint32_t log2_esz, gen_helper_ldst_whole *fn,
+                             DisasContext *s, bool is_load)
  {
-    TCGv_ptr dest;
-    TCGv base;
-    TCGv_i32 desc;
-
-    uint32_t data = FIELD_DP32(0, VDATA, NF, nf);
-    data = FIELD_DP32(data, VDATA, VM, 1);
-    dest = tcg_temp_new_ptr();
-    desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
-                                      s->cfg_ptr->vlenb, data));
-
-    base = get_gpr(s, rs1, EXT_NONE);
-    tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, vd));
+    TCGv addr;
+    addr = get_address(s, rs1, 0);
+    uint32_t size = s->cfg_ptr->vlenb * nf;
- mark_vs_dirty(s);

Did you lose this? You can't rely on the call from finalize_rvv_inst at the end. Dirty must be already be set for the exception path if any of these memory operations fault.

-
-    fn(dest, base, tcg_env, desc);
+    TCGv_i128 t16 = tcg_temp_new_i128();
+ /*
+     * Load/store minimum vlenb bytes per iteration.
+     * When possible do this atomically.
+     * Update vstart with the number of processed elements.
+     */
+    for (int i=0; i < size; i+=16) {

Spaces.  scripts/checkpatch.pl should have complained.

+        addr = get_address(s, rs1, i);
+        if (is_load) {
+           tcg_gen_qemu_ld_i128(t16, addr, s->mem_idx,
+                   MO_LE | MO_128 | MO_ATOM_WITHIN16);

FWIW, for MO_128, MO_ATOM_WITHIN16 is equivalent to MO_ATOM_IFALIGN, which is 
the default.

Where you can improve things if by specifying *less* atomicity.  For instance:

  log2_esz == MO_8:  MO_ATOM_NONE
  When operating on bytes, no larger atomicity is required.

  log2_esz <= MO_64: MO_ATOM_IFALIGN_PAIR
  Require atomicity on the two MO_64 halves, but no more than that.

I don't currently support further granularity, because there doesn't seem to be a need, and it's already somewhat confusing.

There's no handling for vstart != 0.
You could easily fall back to the helper for !s->vstart_eq_zero.

+           tcg_gen_st_i128(t16, tcg_env, vreg_ofs(s, vd) + i);
+           tcg_gen_addi_tl(cpu_vstart, cpu_vstart, 16 >> log2_esz);
+        } else {
+            tcg_gen_ld_i128(t16, tcg_env, vreg_ofs(s, vd) + i);
+            tcg_gen_qemu_st_i128(t16, addr, s->mem_idx,
+                   MO_LE | MO_128 | MO_ATOM_WITHIN16);
+           tcg_gen_addi_tl(cpu_vstart, cpu_vstart, 16 >> log2_esz);

The final iteration must set cpu_vstart to 0.

Otherwise this looks quite plausible.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]