Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coh

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coh

From:	Richard Henderson
Subject:	Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache
Date:	Thu, 19 May 2022 11:31:00 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0

On 5/19/22 07:11, Nicholas Piggin wrote:

dcache writeback and icache invalidate is not required when icache is
coherent, a shorter fixed-length sequence can be used which just has to
flush and re-fetch instructions that were in-flight.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

I haven't been able to measure a significant performance difference
with this, qemu isn't flushing large ranges frequently so the old sequence
is not that slow.

Yeah, we should be flushing smallish regions (< 1-4k), as we generate TranslationBlocks.And hopefully the translation cache is large enough that we spend more time executingblocks than re-compiling them. ;-)

+++ b/include/qemu/cacheflush.h
@@ -28,6 +28,10 @@ static inline void flush_idcache_range(uintptr_t rx, 
uintptr_t rw, size_t len)

#else+#if defined(__powerpc__)

+extern bool have_coherent_icache;
+#endif


Ug.  I'm undecided where to put this.  I'm tempted to say...

--- a/util/cacheflush.c
+++ b/util/cacheflush.c
@@ -108,7 +108,16 @@ void flush_idcache_range(uintptr_t rx, uintptr_t rw, 
size_t len)

... here in cacheflush.c, with a comment that the variable is defined and initialized incacheinfo.c.

I'm even more tempted to merge the two files to put all of the machine-specific cache datain the same place, then this variable can be static. There's even an existing TODOcomment in cacheflush.c for aarch64.

      b = rw & ~(dsize - 1);
+
+    if (have_coherent_icache) {
+        asm volatile ("sync" : : : "memory");
+        asm volatile ("icbi 0,%0" : : "r"(b) : "memory");
+        asm volatile ("isync" : : : "memory");
+        return;
+    }


Where can I find definitive rules on this?

Note that rx may not equal rw, and that we've got two virtual mappings for the samememory, one for "data" that is read-write and one for "execute" that is read-execute.(This split is enabled only for --enable-debug-tcg builds on linux, to make sure we don'tregress apple m1, which requires the split all of the time.)

In particular, you're flushing one icache line with the dcache address, and that you'renot flushing any of the other lines. Is the coherent icache thing really that we maysimply skip the dcache flush step, but must still flush all of the icache lines?

Without docs, "icache snoop" to me would imply that we only need the two barriers and noflushes at all, just to make sure all memory writes complete before any new instructionsare executed. This would be like the two AArch64 bits, IDC and DIC, which indicate thatthe two caches are coherent to Point of Unification, which leaves us with just theInstruction Sequence Barrier at the end of the function.

+bool have_coherent_icache = false;


scripts/checkpatch.pl should complain this is initialized to 0.

  static void arch_cache_info(int *isize, int *dsize)
  {
+#  ifdef PPC_FEATURE_ICACHE_SNOOP
+    unsigned long hwcap = qemu_getauxval(AT_HWCAP);
+#  endif
+
      if (*isize == 0) {
          *isize = qemu_getauxval(AT_ICACHEBSIZE);
      }
      if (*dsize == 0) {
          *dsize = qemu_getauxval(AT_DCACHEBSIZE);
      }
+
+#  ifdef PPC_FEATURE_ICACHE_SNOOP
+    have_coherent_icache = (hwcap & PPC_FEATURE_ICACHE_SNOOP) != 0;
+#  endif


Better with only one ifdef, moving this second hunk up.

It would be nice if there were some kernel documentation for this...


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache, Nicholas Piggin, 2022/05/19
- Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache, Richard Henderson <=
  - Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache, Nicholas Piggin, 2022/05/19
    - Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache, Richard Henderson, 2022/05/20

Prev by Date: Re: [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering
Next by Date: [PATCH v2 00/12] Change helper declarations to use call flags
Previous by thread: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache
Next by thread: Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache
Index(es):
- Date
- Thread