Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)

From:	Peter Maydell
Subject:	Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Date:	Fri, 24 Jun 2022 13:31:24 +0100

On Mon, 20 Jun 2022 at 19:07, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> +void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
> +                         void *vpm, void *vst, uint32_t desc)
> +{
> +    intptr_t row, col, oprsz = simd_maxsz(desc);
> +    uint32_t neg = simd_data(desc) << 31;
> +    uint16_t *pn = vpn, *pm = vpm;
> +
> +    bool save_dn = get_default_nan_mode(vst);
> +    set_default_nan_mode(true, vst);
> +
> +    for (row = 0; row < oprsz; ) {
> +        uint16_t pa = pn[H2(row >> 4)];
> +        do {
> +            if (pa & 1) {
> +                void *vza_row = vza + row * sizeof(ARMVectorReg);
> +                uint32_t n = *(uint32_t *)(vzn + row) ^ neg;
> +
> +                for (col = 0; col < oprsz; ) {
> +                    uint16_t pb = pm[H2(col >> 4)];
> +                    do {
> +                        if (pb & 1) {
> +                            uint32_t *a = vza_row + col;
> +                            uint32_t *m = vzm + col;
> +                            *a = float32_muladd(n, *m, *a, 0, vst);
> +                        }
> +                        col += 4;
> +                        pb >>= 4;
> +                    } while (col & 15);
> +                }
> +            }
> +            row += 4;
> +            pa >>= 4;
> +        } while (row & 15);
> +    }

The code for the double version seems straightforward:
row counts from 0 up to the number of rows, and we
do something per row. Why is the single precision version
doing something with an unrolled loop here? It's confusing
that 'oprsz' in the two functions isn't the same thing --
in the double version we divide by the element size, but
here we don't.

> +
> +    set_default_nan_mode(save_dn, vst);
> +}
> +
> +void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn,
> +                         void *vpm, void *vst, uint32_t desc)
> +{
> +    intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
> +    uint64_t neg = (uint64_t)simd_data(desc) << 63;
> +    uint64_t *za = vza, *zn = vzn, *zm = vzm;
> +    uint8_t *pn = vpn, *pm = vpm;
> +
> +    bool save_dn = get_default_nan_mode(vst);
> +    set_default_nan_mode(true, vst);
> +
> +    for (row = 0; row < oprsz; ++row) {
> +        if (pn[H1(row)] & 1) {
> +            uint64_t *za_row = &za[row * sizeof(ARMVectorReg)];
> +            uint64_t n = zn[row] ^ neg;
> +
> +            for (col = 0; col < oprsz; ++col) {
> +                if (pm[H1(col)] & 1) {
> +                    uint64_t *a = &za_row[col];
> +                    *a = float64_muladd(n, zm[col], *a, 0, vst);
> +                }
> +            }
> +        }
> +    }
> +
> +    set_default_nan_mode(save_dn, vst);
> +}

The pseudocode says that we ignore floating point exceptions
(ie do not accumulate them in the FPSR) -- it passes fpexc == false
to FPMulAdd(). Don't we need to do something special to arrange
for that ?

thanks
-- PMM

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA, (continued)
- [PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA, Peter Maydell, 2022/06/23
- [PATCH v3 32/51] target/arm: Implement FMOPA, FMOPS (widening), Richard Henderson, 2022/06/20
- [PATCH v3 34/51] target/arm: Implement PSEL, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 34/51] target/arm: Implement PSEL, Peter Maydell, 2022/06/24
- [PATCH v3 35/51] target/arm: Implement REVD, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 35/51] target/arm: Implement REVD, Peter Maydell, 2022/06/24
- [PATCH v3 38/51] target/arm: Enable SME for -cpu max, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 38/51] target/arm: Enable SME for -cpu max, Peter Maydell, 2022/06/24
- [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), Richard Henderson, 2022/06/20
  - Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), Peter Maydell <=
    - Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), Richard Henderson, 2022/06/24
- [PATCH v3 37/51] target/arm: Reset streaming sve state on exception boundaries, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 37/51] target/arm: Reset streaming sve state on exception boundaries, Peter Maydell, 2022/06/24
- [PATCH v3 33/51] target/arm: Implement SME integer outer product, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 33/51] target/arm: Implement SME integer outer product, Peter Maydell, 2022/06/24
- [PATCH v3 44/51] linux-user/aarch64: Verify extra record lock succeeded, Richard Henderson, 2022/06/20
- [PATCH v3 41/51] linux-user/aarch64: Add SM bit to SVE signal context, Richard Henderson, 2022/06/20
- [PATCH v3 43/51] linux-user/aarch64: Do not allow duplicate or short sve records, Richard Henderson, 2022/06/20
- [PATCH v3 47/51] linux-user: Rename sve prctls, Richard Henderson, 2022/06/20
- [PATCH v3 40/51] linux-user/aarch64: Reset PSTATE.SM on syscalls, Richard Henderson, 2022/06/20

Prev by Date: Re: [PATCH 3/3] target/ppc: Check page dir/table base alignment
Next by Date: Re: [PATCH v11 1/2] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl
Previous by thread: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Next by thread: Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Index(es):
- Date
- Thread