On 8/14/20 7:52 PM, Frank Chang wrote:
> probe_pages(env, base + stride * i, nf * esz, ra, access_type);
> and
> target_ulong addr = base + stride * i + k * esz;
>
> If we pass ctzl(sizeof(type)) in GEN_VEXT_LD_STRIDE(),
> I would still have to do: (1 << esz) to get the correct element size in the
> above calculations.
> Would it eliminate the performance gain we have in vext_max_elems() instead?
Well, no, it will improve performance, because you'll write
addr = base + stride * i + (k << esz)
I.e. strength-reduce the multiply to a shift.
This works like a charm.
Thanks for the advice.
Frank Chang
r~