On 22/11/24, Richard Henderson wrote:
On 11/20/24 19:49, Anton Johansson wrote:
Adds new functions to the gvec API for truncating, sign- or zero
extending vector elements. Currently implemented as helper functions,
these may be mapped onto host vector instructions in the future.
For the time being, allows translation of more complicated vector
instructions by helper-to-tcg.
Signed-off-by: Anton Johansson <anjo@rev.ng>
---
accel/tcg/tcg-runtime-gvec.c | 41 +++++++++++++++++
accel/tcg/tcg-runtime.h | 22 +++++++++
include/tcg/tcg-op-gvec-common.h | 18 ++++++++
tcg/tcg-op-gvec.c | 78 ++++++++++++++++++++++++++++++++
4 files changed, 159 insertions(+)
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index afca89baa1..685c991e6a 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void
*c, uint32_t desc)
}
clear_high(d, oprsz, desc);
}
+
+#define DO_SZ_OP1(NAME, DSTTY, SRCTY) \
+void HELPER(NAME)(void *d, void *a, uint32_t desc) \
+{ \
+ intptr_t oprsz = simd_oprsz(desc); \
+ intptr_t elsz = oprsz/sizeof(DSTTY); \
+ intptr_t i; \
+ \
+ for (i = 0; i < elsz; ++i) { \
+ SRCTY aa = *((SRCTY *) a + i); \
+ *((DSTTY *) d + i) = aa; \
+ } \
+ clear_high(d, oprsz, desc); \
This formulation is not valid.
(1) Generic forms must *always* operate strictly on columns. This
formulation is either expanding a narrow vector to a wider vector or
compressing a wider vector to a narrow vector.
(2) This takes no care for byte ordering of the data between columns. This
is where sticking strictly to columns helps, in that we can assume that data
is host-endian *within the column*, but we cannot assume anything about the
element indexing of ptr + i.
Concerning (1) and (2), is this a limitation imposed on generic vector
ops. to simplify mapping to host vector instructions where
padding/alignment of elements might differ? From my understanding, the
helper above should be fine since we can assume contiguous elements?
I see, I don't think we can make this work for Hexagon vector ops., as
an example consider V6_vadduwsat which performs an unsigned saturated
add of 32-bit elements, currently we emit
void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) {
VectorMem mem = {0};
intptr_t vec5 = temp_new_gvec(&mem, 256);
tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256);
intptr_t vec1 = temp_new_gvec(&mem, 256);
tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256);
tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256);
intptr_t vec3 = temp_new_gvec(&mem, 256);
tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull);
tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256);
tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128);
}
so we really do rely on the size-changing property of zext here, the
input vectors are 128-byte and we expand them to 256-byte. We could
expand vector operations within the instruction to the largest vector
size, but would need to zext and trunc to destination and source
registers anyway.