qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 06/15] tcg/riscv: Implement vector load/store


From: LIU Zhiwei
Subject: Re: [PATCH v1 06/15] tcg/riscv: Implement vector load/store
Date: Mon, 19 Aug 2024 09:41:25 +0800
User-agent: Mozilla Thunderbird


On 2024/8/14 17:01, Richard Henderson wrote:
On 8/13/24 21:34, LIU Zhiwei wrote:
@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                         TCGReg arg1, intptr_t arg2)
  {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    } else {
+        tcg_debug_assert(arg >= TCG_REG_V1);
+        switch (prev_vece) {
+        case MO_8:
+            insn = OPC_VLE8_V;
+            break;
+        case MO_16:
+            insn = OPC_VLE16_V;
+            break;
+        case MO_32:
+            insn = OPC_VLE32_V;
+            break;
+        case MO_64:
+            insn = OPC_VLE64_V;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
      tcg_out_ldst(s, insn, arg, arg1, arg2);

tcg_out_ld/st are called directly from register allocation spill/fill.
You'll need to set vtype here, and cannot rely on this having been done in tcg_out_vec_op.
OK.

That said, with a little-endian host, the selected element size doesn't matter *too* much.  A write of 8 uint16_t or a write of 2 uint64_t produces the same bits in memory.

Therefore you can examine prev_vtype and adjust only if the vector length changes.
OK.
  But we do that -- e.g. load V256, store V256, store V128 to perform a 384-bit store for AArch64 SVE when VQ=3.

Is there an advantage to using the vector load/store whole register insns, if the requested length is not too small?
For vector type equal or bigger than vlen in host, we will use the whole register instructions.
  IIRC the NF field can be used to store multiples, but we can't store half of a register with these.

I think we can still use the unit-stride instructions for them.

Thanks,
Zhiwei



r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]