|
From: | LIU Zhiwei |
Subject: | Re: [PATCH v1 06/15] tcg/riscv: Implement vector load/store |
Date: | Mon, 19 Aug 2024 09:41:25 +0800 |
User-agent: | Mozilla Thunderbird |
On 2024/8/14 17:01, Richard Henderson wrote:
On 8/13/24 21:34, LIU Zhiwei wrote:@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1, intptr_t arg2) { - RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD; + RISCVInsn insn; + + if (type < TCG_TYPE_V64) { + insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD; + } else { + tcg_debug_assert(arg >= TCG_REG_V1); + switch (prev_vece) { + case MO_8: + insn = OPC_VLE8_V; + break; + case MO_16: + insn = OPC_VLE16_V; + break; + case MO_32: + insn = OPC_VLE32_V; + break; + case MO_64: + insn = OPC_VLE64_V; + break; + default: + g_assert_not_reached(); + } + } tcg_out_ldst(s, insn, arg, arg1, arg2);tcg_out_ld/st are called directly from register allocation spill/fill.You'll need to set vtype here, and cannot rely on this having been done in tcg_out_vec_op.
OK.
That said, with a little-endian host, the selected element size doesn't matter *too* much. A write of 8 uint16_t or a write of 2 uint64_t produces the same bits in memory.Therefore you can examine prev_vtype and adjust only if the vector length changes.
OK.
But we do that -- e.g. load V256, store V256, store V128 to perform a 384-bit store for AArch64 SVE when VQ=3.For vector type equal or bigger than vlen in host, we will use the whole register instructions.Is there an advantage to using the vector load/store whole register insns, if the requested length is not too small?
IIRC the NF field can be used to store multiples, but we can't store half of a register with these.
I think we can still use the unit-stride instructions for them. Thanks, Zhiwei
r~
[Prev in Thread] | Current Thread | [Next in Thread] |