qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 1/1] target/riscv: rvv: reduce the overhead for simple RIS


From: Max Chou
Subject: Re: [PATCH v6 1/1] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores
Date: Wed, 4 Dec 2024 21:44:28 +0800
User-agent: Mozilla Thunderbird

Hi Craig,

I think that the unexpected vstart issue persists in this patchset.
This version is unable to update the vstart CSR to the correct index when
grouping load/store elements.

For instance, if an exception is raised by an element following the first
one, and the optimization attempts to group multiple elements, the vstart
value remains the index of the first element, which is not the actual
element index that raised the exception.

Max

On 2024/12/4 8:29 PM, Craig Blackmore wrote:
This patch improves the performance of the emulation of the RVV unit-stride
loads and stores in the following cases:

- when the data being loaded/stored per iteration amounts to 8 bytes or less.
- when the vector length is 16 bytes (VLEN=128) and there's no grouping of the
   vector registers (LMUL=1).

The optimization consists of avoiding the overhead of probing the RAM of the
host machine and doing a loop load/store on the input data grouped in chunks
of as many bytes as possible (8,4,2,1 bytes).

Co-authored-by: Helene CHELIN <helene.chelin@embecosm.com>
Co-authored-by: Paolo Savini <paolo.savini@embecosm.com>
Co-authored-by: Craig Blackmore <craig.blackmore@embecosm.com>

Signed-off-by: Helene CHELIN <helene.chelin@embecosm.com>
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
Signed-off-by: Craig Blackmore <craig.blackmore@embecosm.com>
---
  target/riscv/vector_helper.c | 54 ++++++++++++++++++++++++++++++++++++
  1 file changed, 54 insertions(+)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index a85dd1d200..068010ccd2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -393,6 +393,60 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
          return;
      }
+#if defined(CONFIG_USER_ONLY) && !HOST_BIG_ENDIAN
+    /*
+     * For data sizes <= 64 bits and for LMUL=1 with VLEN=128 bits we get a
+     * better performance by doing a simple simulation of the load/store
+     * without the overhead of prodding the host RAM
+     */
+    if ((nf == 1) && ((evl << log2_esz) <= 8 ||
+        ((vext_lmul(desc) == 0) && (simd_maxsz(desc) == 16)))) {
+
+        uint32_t evl_b = evl << log2_esz;
+
+        for (uint32_t j = env->vstart; j < evl_b;) {
+            addr = base + j;
+            if ((evl_b - j) >= 8) {
+                if (is_load) {
+                    lde_d_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                } else {
+                    ste_d_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                }
+                env->vstart += (8 >> log2_esz);
+                j += 8;
+            } else if ((evl_b - j) >= 4) {
+                if (is_load) {
+                    lde_w_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                } else {
+                    ste_w_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                }
+                env->vstart += (4 >> log2_esz);
+                j += 4;
+            } else if ((evl_b - j) >= 2) {
+                if (is_load) {
+                    lde_h_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                } else {
+                    ste_h_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                }
+                env->vstart += (2 >> log2_esz);
+                j += 2;
+            } else {
+                if (is_load) {
+                    lde_b_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                } else {
+                    ste_b_tlb(env, adjust_addr(env, addr), j, vd, ra);
+                }
+                env->vstart++;
+                j += 1;
+            }
+        }
+
+        env->vstart = 0;
+        vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
+        return;
+    }
+#endif
+
      /* Calculate the page range of first page */
      addr = base + ((env->vstart * nf) << log2_esz);
      page_split = -(addr | TARGET_PAGE_MASK);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]