qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coh


From: Richard Henderson
Subject: Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache
Date: Thu, 19 May 2022 11:31:00 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0

On 5/19/22 07:11, Nicholas Piggin wrote:
dcache writeback and icache invalidate is not required when icache is
coherent, a shorter fixed-length sequence can be used which just has to
flush and re-fetch instructions that were in-flight.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

I haven't been able to measure a significant performance difference
with this, qemu isn't flushing large ranges frequently so the old sequence
is not that slow.

Yeah, we should be flushing smallish regions (< 1-4k), as we generate TranslationBlocks. And hopefully the translation cache is large enough that we spend more time executing blocks than re-compiling them. ;-)


+++ b/include/qemu/cacheflush.h
@@ -28,6 +28,10 @@ static inline void flush_idcache_range(uintptr_t rx, 
uintptr_t rw, size_t len)
#else +#if defined(__powerpc__)
+extern bool have_coherent_icache;
+#endif

Ug.  I'm undecided where to put this.  I'm tempted to say...

--- a/util/cacheflush.c
+++ b/util/cacheflush.c
@@ -108,7 +108,16 @@ void flush_idcache_range(uintptr_t rx, uintptr_t rw, 
size_t len)

... here in cacheflush.c, with a comment that the variable is defined and initialized in cacheinfo.c.

I'm even more tempted to merge the two files to put all of the machine-specific cache data in the same place, then this variable can be static. There's even an existing TODO comment in cacheflush.c for aarch64.


      b = rw & ~(dsize - 1);
+
+    if (have_coherent_icache) {
+        asm volatile ("sync" : : : "memory");
+        asm volatile ("icbi 0,%0" : : "r"(b) : "memory");
+        asm volatile ("isync" : : : "memory");
+        return;
+    }

Where can I find definitive rules on this?

Note that rx may not equal rw, and that we've got two virtual mappings for the same memory, one for "data" that is read-write and one for "execute" that is read-execute. (This split is enabled only for --enable-debug-tcg builds on linux, to make sure we don't regress apple m1, which requires the split all of the time.)

In particular, you're flushing one icache line with the dcache address, and that you're not flushing any of the other lines. Is the coherent icache thing really that we may simply skip the dcache flush step, but must still flush all of the icache lines?

Without docs, "icache snoop" to me would imply that we only need the two barriers and no flushes at all, just to make sure all memory writes complete before any new instructions are executed. This would be like the two AArch64 bits, IDC and DIC, which indicate that the two caches are coherent to Point of Unification, which leaves us with just the Instruction Sequence Barrier at the end of the function.


+bool have_coherent_icache = false;

scripts/checkpatch.pl should complain this is initialized to 0.


  static void arch_cache_info(int *isize, int *dsize)
  {
+#  ifdef PPC_FEATURE_ICACHE_SNOOP
+    unsigned long hwcap = qemu_getauxval(AT_HWCAP);
+#  endif
+
      if (*isize == 0) {
          *isize = qemu_getauxval(AT_ICACHEBSIZE);
      }
      if (*dsize == 0) {
          *dsize = qemu_getauxval(AT_DCACHEBSIZE);
      }
+
+#  ifdef PPC_FEATURE_ICACHE_SNOOP
+    have_coherent_icache = (hwcap & PPC_FEATURE_ICACHE_SNOOP) != 0;
+#  endif

Better with only one ifdef, moving this second hunk up.

It would be nice if there were some kernel documentation for this...


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]