qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] TB chaining in QEMU


From: Peter Maydell
Subject: Re: [Qemu-devel] TB chaining in QEMU
Date: Mon, 30 Jan 2012 12:33:14 +0000

On 27 January 2012 02:55, Xin Tong <address@hidden> wrote:
> I think intel new architecture does split instruction cache/data cache.
> http://upload.wikimedia.org/wikipedia/commons/6/64/Intel_Nehalem_arch.svg

It may have a separate I/D cache in the implementation, but from
the programmer's point of view they are unified (ie the hardware
will be maintaining coherency between the two caches), because the
x86 architecture requires this.

> But I do not know what kind of inconsistency you refer to if the icache and
> dcache are split. can you please give an example.

Basic example: suppose we have a small function at address 0x1000
and another at 0x1010 (so they are in the same cache line), and
a really simple set up with an L1 ICache, L2 DCache and main memory.

 * we execute the function at 0x1000 -- this pulls the cache line
   into the ICache
 * we then modify the code in the function at 0x1010: this is
   going to be a read, modify, write, which will pull the cache line
   into the DCache. However when we write the new code this change
   will just sit in the DCache.
 * so now we have a copy of the old code in the ICache, and the new
   version in the DCache. At this point the caches are incoherent,
   and if we just tried to call the function at 0x1010 we'd be
   executing the wrong code
 * on ARM, to correct this the program has to perform explicit
   cache maintenance operations:
 * 1. clean the DCache: this forces 'dirty' lines in the DCache
   to be written out, in this case to main memory
 * 2. invalidate the ICache: this causes the ICache to forget the
   old, stale cached data it holds, so the next access will reload
   the ICache from main memory
 * now if we call the function at 0x1010 it will see the changed
   code that we wrote

The x86 architecture doesn't need the cache maintenance operations
because it requires the hardware to deal with it (for instance,
by having the ICache "snoop" writes to the DCache and automatically
invalidate any lines it has that are written to) so it can't get
into an incoherent state.

CPU architectures that require explicit cache maintenance for
self-modifying code are I think more common than ones like x86
which don't. (x86 is basically forced to be this way to maintain
backwards compatibility with old code written before there were
any caches for x86.)

What this means in practice for TCG is that once we've written out
some code we need to call flush_icache_range() for that memory.
The x86 implementation of that function is just a no-op.

-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]