On 8/13/20 7:48 PM, Frank Chang wrote:
> esz is passed from e.g. GEN_VEXT_LD_STRIDE() macro:
>
>> #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN) \
>> void HELPER(NAME)(void *vd, void * v0, target_ulong base, \
>> target_ulong stride, CPURISCVState *env, \
>> uint32_t desc) \
>> { \
>> uint32_t vm = vext_vm(desc); \
>> vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN, \
>> sizeof(ETYPE), GETPC(), MMU_DATA_LOAD); \
>> }
>>
>> GEN_VEXT_LD_STRIDE(vlse8_v, int8_t, lde_b)
>
> which is calculated by sizeof(ETYPE), so the results would be: 1, 2, 4, 8.
> and vext_max_elems() is called by e.g. vext_ldst_stride():
Ah, yes.
>> uint32_t max_elems = vext_max_elems(desc, esz);
>
> I can add another parameter to the macro and pass the hard-coded log2(esz) number
> if it's the better way instead of using ctzl().
> Or if there's another approach to get the log2(esz) number more elegantly?
Using ctzl(sizeof(type)) in the GEN_VEXT_LD_STRIDE macro will work well. This
will be constant folded by the compiler.
r~
Checked the codes again,
GEN_VEXT_LD_STRIDE() will eventually call vext_ldst_stride() and pass esz as the parameter.
However, esz is not only used in vext_max_elems() but also used for other calculation, e.g.:
probe_pages(env, base + stride * i, nf * esz, ra, access_type);
and
target_ulong addr = base + stride * i + k * esz;
If we pass ctzl(sizeof(type)) in GEN_VEXT_LD_STRIDE(),
I would still have to do: (1 << esz) to get the correct element size in the above calculations.
Would it eliminate the performance gain we have in vext_max_elems() instead?
Frank Chang