qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations


From: Anton Johansson
Subject: Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
Date: Tue, 3 Dec 2024 19:08:18 +0100

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Adds new functions to the gvec API for truncating, sign- or zero
> > extending vector elements.  Currently implemented as helper functions,
> > these may be mapped onto host vector instructions in the future.
> > 
> > For the time being, allows translation of more complicated vector
> > instructions by helper-to-tcg.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
> >   accel/tcg/tcg-runtime.h          | 22 +++++++++
> >   include/tcg/tcg-op-gvec-common.h | 18 ++++++++
> >   tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
> >   4 files changed, 159 insertions(+)
> > 
> > diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> > index afca89baa1..685c991e6a 100644
> > --- a/accel/tcg/tcg-runtime-gvec.c
> > +++ b/accel/tcg/tcg-runtime-gvec.c
> > @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, 
> > void *c, uint32_t desc)
> >       }
> >       clear_high(d, oprsz, desc);
> >   }
> > +
> > +#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      
> > \
> > +void HELPER(NAME)(void *d, void *a, uint32_t desc)                         
> > \
> > +{                                                                          
> > \
> > +    intptr_t oprsz = simd_oprsz(desc);                                     
> > \
> > +    intptr_t elsz = oprsz/sizeof(DSTTY);                                   
> > \
> > +    intptr_t i;                                                            
> > \
> > +                                                                           
> > \
> > +    for (i = 0; i < elsz; ++i) {                                           
> > \
> > +        SRCTY aa = *((SRCTY *) a + i);                                     
> > \
> > +        *((DSTTY *) d + i) = aa;                                           
> > \
> > +    }                                                                      
> > \
> > +    clear_high(d, oprsz, desc);                                            
> > \
> 
> This formulation is not valid.
> 
> (1) Generic forms must *always* operate strictly on columns.  This
> formulation is either expanding a narrow vector to a wider vector or
> compressing a wider vector to a narrow vector.
> 
> (2) This takes no care for byte ordering of the data between columns.  This
> is where sticking strictly to columns helps, in that we can assume that data
> is host-endian *within the column*, but we cannot assume anything about the
> element indexing of ptr + i.

Concerning (1) and (2), is this a limitation imposed on generic vector
ops. to simplify mapping to host vector instructions where
padding/alignment of elements might differ?  From my understanding, the
helper above should be fine since we can assume contiguous elements?

But maybe it doesn't make sense to add a gvec op. that is only
implemented via helper, I'm not sure.

> (3) This takes no care for element overlap if A == D.

Ah, good point!

> The only form of sign/zero-extract that you may add generically is an alias 
> for
> 
>   d[i] = a[i] & mask
> 
> or
> 
>   d[i] = (a[i] << shift) >> shift
> 
> where A and D use the same element type.  We could add new tcg opcodes for
> these (particularly the second, for sign-extension), though x86_64 does not
> support it, afaics.

I see, I don't think we can make this work for Hexagon vector ops., as
an example consider V6_vadduwsat which performs an unsigned saturated
add of 32-bit elements, currently we emit

    void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) {
        VectorMem mem = {0};
        intptr_t vec5 = temp_new_gvec(&mem, 256);
        tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256);

        intptr_t vec1 = temp_new_gvec(&mem, 256);
        tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256);

        tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256);

        intptr_t vec3 = temp_new_gvec(&mem, 256);
        tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull);

        tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256);

        tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128);
    }

so we really do rely on the size-changing property of zext here, the
input vectors are 128-byte and we expand them to 256-byte.  We could
expand vector operations within the instruction to the largest vector
size, but would need to zext and trunc to destination and source
registers anyway.

//Anton



reply via email to

[Prev in Thread] Current Thread [Next in Thread]