Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)

From:	Richard Henderson
Subject:	Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Date:	Fri, 24 Jun 2022 07:16:57 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 6/24/22 05:31, Peter Maydell wrote:

On Mon, 20 Jun 2022 at 19:07, Richard Henderson
<richard.henderson@linaro.org> wrote:


Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

+void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
+                         void *vpm, void *vst, uint32_t desc)
+{
+    intptr_t row, col, oprsz = simd_maxsz(desc);
+    uint32_t neg = simd_data(desc) << 31;
+    uint16_t *pn = vpn, *pm = vpm;
+
+    bool save_dn = get_default_nan_mode(vst);
+    set_default_nan_mode(true, vst);
+
+    for (row = 0; row < oprsz; ) {
+        uint16_t pa = pn[H2(row >> 4)];
+        do {
+            if (pa & 1) {
+                void *vza_row = vza + row * sizeof(ARMVectorReg);
+                uint32_t n = *(uint32_t *)(vzn + row) ^ neg;
+
+                for (col = 0; col < oprsz; ) {
+                    uint16_t pb = pm[H2(col >> 4)];
+                    do {
+                        if (pb & 1) {
+                            uint32_t *a = vza_row + col;
+                            uint32_t *m = vzm + col;
+                            *a = float32_muladd(n, *m, *a, 0, vst);
+                        }
+                        col += 4;
+                        pb >>= 4;
+                    } while (col & 15);
+                }
+            }
+            row += 4;
+            pa >>= 4;
+        } while (row & 15);
+    }


The code for the double version seems straightforward:
row counts from 0 up to the number of rows, and we
do something per row. Why is the single precision version
doing something with an unrolled loop here? It's confusing
that 'oprsz' in the two functions isn't the same thing --
in the double version we divide by the element size, but
here we don't.

It's all about the predicate addressing. For doubles, the bits are spaced 8 bits apart,which makes it easy as you see. For singles, the bits are spaced 4 bits apart, which isinconvenient. Anyway, just as over in sve_helper.c, I load uint16_t at a time and shiftto find each predicate bit.

So it's not unrolled, exactly. There's second loop over predicates. And since this is amatrix op, we get loops nested 4 deep.

The pseudocode says that we ignore floating point exceptions
(ie do not accumulate them in the FPSR) -- it passes fpexc == false
to FPMulAdd(). Don't we need to do something special to arrange
for that ?


Oops, somewhere I read that as "do not trap" not "do not accumulate".
But R_TGSKG is very clear on this as accumulate.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v3 22/51] target/arm: Trap AdvSIMD usage when Streaming SVE is active, (continued)
- [PATCH v3 28/51] target/arm: Implement SME LDR, STR, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 28/51] target/arm: Implement SME LDR, STR, Peter Maydell, 2022/06/23
- [PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA, Peter Maydell, 2022/06/23
- [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), Richard Henderson, 2022/06/20
  - Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), Peter Maydell, 2022/06/24
    - Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), Richard Henderson <=
- [PATCH v3 38/51] target/arm: Enable SME for -cpu max, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 38/51] target/arm: Enable SME for -cpu max, Peter Maydell, 2022/06/24
- [PATCH v3 21/51] target/arm: Add infrastructure for disas_sme, Richard Henderson, 2022/06/20
- [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL, Peter Maydell, 2022/06/21
    - Re: [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL, Richard Henderson, 2022/06/21
    - Re: [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL, Peter Maydell, 2022/06/23
- [PATCH v3 25/51] target/arm: Implement SME MOVA, Richard Henderson, 2022/06/20
  - Re: [PATCH v3 25/51] target/arm: Implement SME MOVA, Peter Maydell, 2022/06/23
    - Re: [PATCH v3 25/51] target/arm: Implement SME MOVA, Richard Henderson, 2022/06/23

Prev by Date: Re: [PATCH v11 1/2] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl
Next by Date: Re: [PATCH v11 1/2] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl
Previous by thread: Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Next by thread: [PATCH v3 38/51] target/arm: Enable SME for -cpu max
Index(es):
- Date
- Thread