|
From: | Richard Henderson |
Subject: | Re: TCG performance on PPC64 |
Date: | Wed, 18 May 2022 09:09:43 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 |
On 5/18/22 07:11, Mark Cave-Ayland wrote:
Finally another comment from Richard about vector instruction use from [2]: "As an aside, this does suggest to me that target/ppc might be well served in moving the ppc_vsr_t members of CPUPPCState earlier, so that this offset is smaller". Presumably this is because calculating smaller offsets can be done using fewer instructions? However I suppose this would only have an effect on vector-heavy workloads.
Yes, the offsets, quoting from [2],
ld_vec v128,e8,tmp2,env,$0xd6b0 st_vec v128,e8,tmp2,env,$0xd4c0
being larger than 0x7fff, require two insns to load. It's not just vectors, but fp, since the register space is shared. I think just moving the two spr arrays toward the end of CPUArchState would do that job. But I wouldn't expect it to matter *that* much. r~
[Prev in Thread] | Current Thread | [Next in Thread] |