qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 9/9] target/arm/translate-a64: vectorise smu


From: Richard Henderson
Subject: Re: [Qemu-devel] [RFC PATCH 9/9] target/arm/translate-a64: vectorise smull vD.4s, vN.[48]s, vM.h[]
Date: Thu, 17 Aug 2017 13:23:49 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 08/17/2017 11:04 AM, Alex Bennée wrote:
> +    int32_t *rd = (int32_t *) d;
> +    int16_t *rn = (int16_t *) n;
> +    int16_t rm = (int16_t) m;
> +    int i;
> +
> +    #pragma GCC ivdep
> +    for (i = 0; i < opr_elt; ++i) {
> +        rd[i] = rn[i + doff_elt] * rm;
> +    }

You need to run this loop backward to avoid clobbering data when rd == rn.
I thought you'd put m into ADVSIMD_DATA.

> 
> +                if (is_q) {
> +                    simd_info = deposit32(simd_info,
> +                                          ADVSIMD_DOFF_ELT_SHIFT, 
> ADVSIMD_DOFF_ELT_BITS, 4);
> +                }

It'd probably be useful to have a macro to clean this up:

#define PUT_SIMD_DATA(t, d)  \
    deposit32(0, ADVSIMD_ ## t ## _SHIFT, ADVSIMD_ ## t ## _BITS, (d))

  simd_info |= PUT_SIMD_DATA(DOFF_ELT, 4)

that said, folding DOFF into the pointer that gets passed in the first place
seems a better solution to me.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]