|
From: | Richard Henderson |
Subject: | Re: [Qemu-arm] [RFC PATCH v2 2/2] utils: Add prefetch for Thunderx platform |
Date: | Wed, 17 Aug 2016 08:34:21 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 08/16/2016 04:45 PM, Vijay Kilari wrote:
On Tue, Aug 16, 2016 at 11:32 PM, Richard Henderson <address@hidden> wrote:On 08/16/2016 05:02 AM, address@hidden wrote:+static inline void prefetch_vector_loop(const VECTYPE *p, int index) +{ +#if defined(__aarch64__) + if (is_thunderx_pass2_cpu()) { + /* Prefetch 4 cache lines ahead from index */ + VEC_PREFETCH(p, index + (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 4)); + } +#endif +}Oh come now. This is even worse than before. A function call protecting a mere prefetch within the main body of an inner loop? Did you not understand what I was asking for?No, Could you please detail the problem?.
The thunderx check, *if it even needs to exist at all*, must happen outside the loop. Preferably not more than once, at startup time.
I strongly suspect that you do not need any check at all. That even for cpus which automatically detect the streaming loop, adding a prefetch will not hurt.
You should repeat your same benchmark, with and without the prefetch, on (1) an A57 or suchlike, and (2) an x86 of some variety.
r~
[Prev in Thread] | Current Thread | [Next in Thread] |