[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 26/51] target/arm: Implement SME LD1, ST1
From: |
Peter Maydell |
Subject: |
Re: [PATCH v3 26/51] target/arm: Implement SME LD1, ST1 |
Date: |
Thu, 23 Jun 2022 12:41:23 +0100 |
On Mon, 20 Jun 2022 at 19:13, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We cannot reuse the SVE functions for LD[1-4] and ST[1-4],
> because those functions accept only a Zreg register number.
> For SME, we want to pass a pointer into ZA storage.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper-sme.h | 82 +++++
> target/arm/sme.decode | 9 +
> target/arm/sme_helper.c | 615 +++++++++++++++++++++++++++++++++++++
> target/arm/translate-sme.c | 69 +++++
> 4 files changed, 775 insertions(+)
> +/*
> + * FIXME: The ARMVectorReg elements are stored in host-endian 64-bit units.
> + * We do not have a defined ordering of the 64-bit units for host-endian
> + * 128-bit quantities. For now, just leave the host words in little-endian
> + * order and hope for the best.
> + */
I don't understand this comment. The architecture ought to specify
what order the two halves of a 128-bit quantity ought to go in the
ZArray storage. If we can't guarantee that a host int128_t can be
handled in a way that does the right thing, then we just can't
use int128_t.
> +#define DO_LDQ(HNAME, VNAME, BE, HOST, TLB) \
> +static inline void HNAME##_host(void *za, intptr_t off, void *host) \
> +{ \
> + uint64_t val0 = HOST(host), val1 = HOST(host + 8); \
> + uint64_t *ptr = za + off; \
> + ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \
> +} \
> +static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \
> +{ \
> + HNAME##_host(za, off * sizeof(ARMVectorReg), host); \
> +} \
> +/*
> + * Common helper for all contiguous predicated loads.
> + */
> +
> +static inline QEMU_ALWAYS_INLINE
> +void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
> + const target_ulong addr, uint32_t desc, const uintptr_t ra,
> + const int esz, uint32_t mtedesc, bool vertical,
> + sve_ldst1_host_fn *host_fn,
> + sve_ldst1_tlb_fn *tlb_fn,
> + ClearFn *clr_fn,
> + CopyFn *cpy_fn)
We now have several rather long functions that are
pretty complicated and pretty similar handling the various
SVE and SME loads and stores. Is there really no hope for
sharing code ?
> diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
> index d2a7232491..978af74d1d 100644
> --- a/target/arm/translate-sme.c
> +++ b/target/arm/translate-sme.c
> @@ -151,3 +151,72 @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a)
>
> return true;
> }
> +
> +static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
> +{
> + typedef void GenLdSt1(TCGv_env, TCGv_ptr, TCGv_ptr, TCGv, TCGv_i32);
> +
> + /*
> + * Indexed by [esz][be][v][mte][st], which is (except for load/store)
> + * also the order in which the elements appear in the function names,
> + * and so how we must concatenate the pieces.
> + */
> +
> +#define FN_LS(F) { gen_helper_sme_ld1##F, gen_helper_sme_st1##F }
> +#define FN_MTE(F) { FN_LS(F), FN_LS(F##_mte) }
> +#define FN_HV(F) { FN_MTE(F##_h), FN_MTE(F##_v) }
> +#define FN_END(L, B) { FN_HV(L), FN_HV(B) }
> +
> + static GenLdSt1 * const fns[5][2][2][2][2] = {
> + FN_END(b, b),
> + FN_END(h_le, h_be),
> + FN_END(s_le, s_be),
> + FN_END(d_le, d_be),
> + FN_END(q_le, q_be),
> + };
> +
> +#undef FN_LS
> +#undef FN_MTE
> +#undef FN_HV
> +#undef FN_END
> +
> + TCGv_ptr t_za, t_pg;
> + TCGv_i64 addr;
> + int desc = 0;
> + bool be = s->be_data == MO_BE;
> + bool mte = s->mte_active[0];
> +
> + if (!dc_isar_feature(aa64_sme, s)) {
> + return false;
> + }
> + if (!sme_smza_enabled_check(s)) {
> + return true;
> + }
> +
> + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v);
> + t_pg = pred_full_reg_ptr(s, a->pg);
> + addr = tcg_temp_new_i64();
> +
> + tcg_gen_shli_i64(addr, cpu_reg(s, a->rn), a->esz);
> + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rm));
Aren't rm and rn the wrong way around here? The pseudocode
says that rn is the base (can be SP, not scaled) and rm is
the offset (can be XZR, scaled by mbytes).
> +
> + if (mte) {
> + desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
> + desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
> + desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
> + desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
> + desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
> + desc <<= SVE_MTEDESC_SHIFT;
> + } else {
> + addr = clean_data_tbi(s, addr);
> + }
> + desc = simd_desc(s->svl, s->svl, desc);
> +
> + fns[a->esz][be][a->v][mte][a->st](cpu_env, t_za, t_pg, addr,
> + tcg_constant_i32(desc));
> +
> + tcg_temp_free_ptr(t_za);
> + tcg_temp_free_ptr(t_pg);
> + tcg_temp_free_i64(addr);
> + return true;
> +}
thanks
-- PMM
- [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL, (continued)
- [PATCH v3 25/51] target/arm: Implement SME MOVA, Richard Henderson, 2022/06/20
- [PATCH v3 24/51] target/arm: Implement SME ZERO, Richard Henderson, 2022/06/20
- [PATCH v3 26/51] target/arm: Implement SME LD1, ST1, Richard Henderson, 2022/06/20
- Re: [PATCH v3 26/51] target/arm: Implement SME LD1, ST1,
Peter Maydell <=
[PATCH v3 27/51] target/arm: Export unpredicated ld/st from translate-sve.c, Richard Henderson, 2022/06/20
[PATCH v3 15/51] target/arm: Move arm_cpu_*_finalize to internals.h, Richard Henderson, 2022/06/20
[PATCH v3 39/51] linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS, Richard Henderson, 2022/06/20
[PATCH v3 37/51] target/arm: Reset streaming sve state on exception boundaries, Richard Henderson, 2022/06/20
[PATCH v3 32/51] target/arm: Implement FMOPA, FMOPS (widening), Richard Henderson, 2022/06/20
[PATCH v3 33/51] target/arm: Implement SME integer outer product, Richard Henderson, 2022/06/20