Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx pla

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx pla

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
Date:	Sat, 6 Aug 2016 15:47:28 +0530
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 08/02/2016 03:50 PM, address@hidden wrote:

+#define VEC_PREFETCH(base, index) \
+        asm volatile ("prfm pldl1strm, [%x[a]]\n" : : [a]"r"(&base[(index)]))


Is this not __builtin_prefetch(base + index) ?

I.e. you can defined this generically for all targets.

+#if defined (__aarch64__)
+    do_prefetch = is_thunder_pass2_cpu();
+    if (do_prefetch) {
+        VEC_PREFETCH(p, 8);
+        VEC_PREFETCH(p, 16);
+        VEC_PREFETCH(p, 24);
+    }
+#endif
+
     for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
          i < len / sizeof(VECTYPE);
          i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+
+#if defined (__aarch64__)
+        if (do_prefetch) {
+            VEC_PREFETCH(p, i+32);
+            VEC_PREFETCH(p, i+40);
+        }
+#endif
+

Surely we shouldn't be adding a conditional around a prefetch inside of theinner loop. Does it really hurt to perform this prefetch for all aarch64 cpus?

I'll note that you're also prefetching too much, off the end of the block, andthat you're probably not prefetching far enough. You'd need to break off thelast iteration(s) of the loop.

I'll note that you're also prefetching too close. The loop operates on8*vecsize units. In the case of aarch64, 128 byte units. Both i+32 and i+40are within the current loop. I believe you want to prefetch at


  i + BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * N

where N is the number of iterations in advance to be fetched. Probably N is 1or 2, unless the memory controller is really slow.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform, vijay . kilari, 2016/08/02
- [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, vijay . kilari, 2016/08/02
  - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Paolo Bonzini, 2016/08/02
    - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Vijay Kilari, 2016/08/04
    - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Paolo Bonzini, 2016/08/04
  - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Peter Maydell, 2016/08/02
- [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, vijay . kilari, 2016/08/02
  - Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, Richard Henderson <=
    - Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, Vijay Kilari, 2016/08/12
    - Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, Richard Henderson, 2016/08/12

Prev by Date: [Qemu-devel] [PATCH v3 2/2] usb-mtp: added object properties
Next by Date: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
Previous by thread: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
Next by thread: Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
Index(es):
- Date
- Thread