qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 2/2] target/riscv: rvv: speed up small unit-stride loads a


From: Max Chou
Subject: Re: [PATCH v7 2/2] target/riscv: rvv: speed up small unit-stride loads and stores
Date: Wed, 11 Dec 2024 22:41:28 +0800
User-agent: Mozilla Thunderbird

On 2024/12/11 8:51 PM, Craig Blackmore wrote:
Calling `vext_continuous_ldst_tlb` for load/stores smaller than 12 bytes
significantly improves performance.

Co-authored-by: Helene CHELIN <helene.chelin@embecosm.com>
Co-authored-by: Paolo Savini <paolo.savini@embecosm.com>
Co-authored-by: Craig Blackmore <craig.blackmore@embecosm.com>

Signed-off-by: Helene CHELIN <helene.chelin@embecosm.com>
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
Signed-off-by: Craig Blackmore <craig.blackmore@embecosm.com>
---
  target/riscv/vector_helper.c | 16 ++++++++++++++++
  1 file changed, 16 insertions(+)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0f57e48cc5..529b4b261e 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -393,6 +393,22 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
          return;
      }
+#if defined(CONFIG_USER_ONLY)
+    /*
+     * For data sizes < 12 bits we get better performance by simply calling
I think that the `bits` should be replaced with `bytes` here.

It seems that this patch aims to strike a balance between the overhead of checking and probing for the fast path and the benefits of directly executing load/store operations through the slow path.

Could you please share any experiment results (and the steps/commands) that illustrate the performance impact when the load/store size exceeds 12 bytes?

(BTW, this version can pass the vstart test case that I created for the unexpected vstart value issue in the previous version.) (https://github.com/rnax/rise-rvv-tcg-qemu-tooling/commit/bccc43a7be9f636cfdaea3c2bfb020558777c7dc)
+     * vext_continuous_ldst_tlb
+     */
+    if (nf == 1 && (evl << log2_esz) < 12) {
+        addr = base + (env->vstart << log2_esz);
+        vext_continuous_ldst_tlb(env, ldst_tlb, vd, evl, addr, env->vstart, ra,
+                                 esz, is_load);
+
+        env->vstart = 0;
+        vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
+        return;
+    }
+#endif
+
      /* Calculate the page range of first page */
      addr = base + ((env->vstart * nf) << log2_esz);
      page_split = -(addr | TARGET_PAGE_MASK);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]