[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v7 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V
From: |
Craig Blackmore |
Subject: |
[PATCH v7 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores |
Date: |
Wed, 11 Dec 2024 12:51:11 +0000 |
Changes since v6:
- Limit access size to element size to address Max Chou's review.
- Fix a typo in the name of a function that this patch now calls.
With access size limited to element size this patch still provides a
significant speedup. The `memcpy` benchmark from:
https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks
shows up to 75% speedup with this patch:
VLEN | Size | ns/inst (ratio)
-----|------|-----------------
128 | 1 | 1.50
128 | 2 | 1.42
128 | 3 | 1.35
128 | 4 | 1.29
128 | 5 | 1.23
128 | 7 | 1.18
128 | 8 | 1.09
128 | 9 | 1.06
128 | 11 | 1.01
VLEN | Size | ns/inst (ratio)
-----|------|-----------------
1024 | 1 | 1.75
1024 | 2 | 1.62
1024 | 3 | 1.52
1024 | 4 | 1.43
1024 | 5 | 1.35
1024 | 7 | 1.31
1024 | 8 | 1.12
1024 | 9 | 1.12
1024 | 11 | 1.01
It is not clear to me exactly why the patch is now helping. At first I
thought it was due to avoiding `vext_continuous_ldst_host` calling out
to `memcpy` for small sizes but trying that directly in
`vext_continuous_ldst_host` was much less beneficial:
VLEN | Size | ns/inst (ratio)
-----|-------|-----------------
128 | 1 | 1.06
128 | 2 | 1.14
128 | 3 | 1.03
128 | 4 | 1.04
128 | 5 | 1.02
128 | 7 | 1.02
128 | 8 | 0.91
128 | 9 | 0.92
128 | 11 | 1.03
VLEN | Size | ns/inst (ratio)
-----|-------|-----------------
1024 | 1 | 1.10
1024 | 2 | 1.14
1024 | 3 | 1.04
1024 | 4 | 1.05
1024 | 5 | 0.96
1024 | 7 | 1.07
1024 | 8 | 0.94
1024 | 9 | 0.93
1024 | 11 | 0.90
Previous versions:
- v1:
https://lore.kernel.org/all/20240717153040.11073-1-paolo.savini@embecosm.com/
- v2:
https://lore.kernel.org/all/20241002135708.99146-1-paolo.savini@embecosm.com/
- v3:
https://lore.kernel.org/all/20241014220153.196183-1-paolo.savini@embecosm.com/
- v4:
https://lore.kernel.org/all/20241029194348.59574-1-paolo.savini@embecosm.com/
- v5:
https://lore.kernel.org/all/20241111130324.32487-1-paolo.savini@embecosm.com/
- v6:
https://lore.kernel.org/all/20241204122952.53375-1-craig.blackmore@embecosm.com/
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: Bin Meng <bmeng.cn@gmail.com>
Cc: Weiwei Li <liwei1518@gmail.com>
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Cc: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Cc: Helene Chelin <helene.chelin@embecosm.com>
Cc: Nathan Egge <negge@google.com>
Cc: Max Chou <max.chou@sifive.com>
Cc: Paolo Savini <paolo.savini@embecosm.com>
Craig Blackmore (2):
target/riscv: rvv: fix typo in vext continuous ldst function names
target/riscv: rvv: speed up small unit-stride loads and stores
target/riscv/vector_helper.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
--
2.43.0
- [PATCH v7 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores,
Craig Blackmore <=