qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v4 2/5] target/riscv: rvv: Provide a fast path using dire


From: Max Chou
Subject: Re: [RFC PATCH v4 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store
Date: Tue, 25 Jun 2024 23:14:34 +0800
User-agent: Mozilla Thunderbird

On 2024/6/20 12:29 PM, Richard Henderson wrote:

On 6/13/24 10:51, Max Chou wrote:
This commit references the sve_ldN_r/sve_stN_r helper functions in ARM
target to optimize the vector unmasked unit-stride load/store
instructions by following items:

* Get the loose bound of activate elements
* Probing pages/resolving host memory address/handling watchpoint at beginning
* Provide new interface to direct access host memory

The original element load/store interface is replaced by the new element
load/store functions with _tlb & _host postfix that means doing the
element load/store through the original softmmu flow and the direct
access host memory flow.

Signed-off-by: Max Chou <max.chou@sifive.com>
---
  target/riscv/insn_trans/trans_rvv.c.inc |   3 +
  target/riscv/vector_helper.c            | 637 +++++++++++++++++++-----
  target/riscv/vector_internals.h         |  48 ++
  3 files changed, 551 insertions(+), 137 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 3a3896ba06c..14e10568bd7 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -770,6 +770,7 @@ static bool ld_us_mask_op(DisasContext *s, arg_vlm_v *a, uint8_t eew)
      /* Mask destination register are always tail-agnostic */
      data = FIELD_DP32(data, VDATA, VTA, s->cfg_vta_all_1s);
      data = FIELD_DP32(data, VDATA, VMA, s->vma);
+    data = FIELD_DP32(data, VDATA, VM, 1);
      return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
  }
  @@ -787,6 +788,7 @@ static bool st_us_mask_op(DisasContext *s, arg_vsm_v *a, uint8_t eew)
      /* EMUL = 1, NFIELDS = 1 */
      data = FIELD_DP32(data, VDATA, LMUL, 0);
      data = FIELD_DP32(data, VDATA, NF, 1);
+    data = FIELD_DP32(data, VDATA, VM, 1);
      return ldst_us_trans(a->rd, a->rs1, data, fn, s, true);
  }
  @@ -1106,6 +1108,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
      TCGv_i32 desc;
        uint32_t data = FIELD_DP32(0, VDATA, NF, nf);
+    data = FIELD_DP32(data, VDATA, VM, 1);
      dest = tcg_temp_new_ptr();
      desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
                                        s->cfg_ptr->vlenb, data));

This is ok, and would warrant a separate patch.
Ok, I'll split this part to a separate patch at next version.
Thanks.



+    if (vm == 0) {
+        for (i = vstart; i < evl; ++i) {
+            if (vext_elem_mask(v0, i)) {
+                reg_idx_last = i;
+                if (reg_idx_first < 0) {
+                    reg_idx_first = i;
+                }
+            }
+        }

This isn't great, and isn't used for now, since only unmasked unit-stride is handled so far.  I think this first patch should be simpler and *assume* VM is set.
Indeed. I'll remove this part at next version.
I agree that this first patch should be simpler and assume VM is set.
Thank you for the suggestion.


+/*
+ * Resolve the guest virtual addresses to info->page[].
+ * Control the generation of page faults with @fault.  Return false if
+ * there is no work to do, which can only happen with @fault == FAULT_NO.
+ */
+static bool vext_cont_ldst_pages(CPURISCVState *env, RVVContLdSt *info,
+                                 target_ulong addr, bool is_load,
+                                 uint32_t desc, uint32_t esz, uintptr_t ra,
+                                 bool is_us_whole)
+{
+    uint32_t vm = vext_vm(desc);
+    uint32_t nf = vext_nf(desc);
+    bool nofault = (vm == 1 ? false : true);

Why is nofault == "!vm"?

Also, it's silly to use ?: with true/false -- use the proper boolean expression in the first place.

That said... faults with RVV must interact with vstart.

I'm not sure what the best code organization is.

Perhaps a subroutine, passed the first and last elements for a single page.

  Update vstart, resolve the page, allowing the exception.
  If watchpoints, one call to cpu_check_watchpoint for the entire memory range.   If ram, iterate through the rest of the page using host accesses; otherwise,   iterate through the rest of the page using tlb accesses, making sure vstart
  is always up-to-date.

The main routine looks for the page_split, invokes the subroutine for the first (and likely only) page.  Special case any split-page element.  Invoke the subroutine for the second page.


r~
According to the v spec, the vector unit-stride and constant-stride memory accesses do not guarantee ordering between individual element accesses.
So it may affects how we handle the precise vector traps here.
I'll replace the nofault part and try to update the subroutine flow with the suggestion and think about how to update vstart here.
Thanks you for the suggestions.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]