[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] TB chaining in QEMU
From: |
Peter Maydell |
Subject: |
Re: [Qemu-devel] TB chaining in QEMU |
Date: |
Mon, 30 Jan 2012 12:33:14 +0000 |
On 27 January 2012 02:55, Xin Tong <address@hidden> wrote:
> I think intel new architecture does split instruction cache/data cache.
> http://upload.wikimedia.org/wikipedia/commons/6/64/Intel_Nehalem_arch.svg
It may have a separate I/D cache in the implementation, but from
the programmer's point of view they are unified (ie the hardware
will be maintaining coherency between the two caches), because the
x86 architecture requires this.
> But I do not know what kind of inconsistency you refer to if the icache and
> dcache are split. can you please give an example.
Basic example: suppose we have a small function at address 0x1000
and another at 0x1010 (so they are in the same cache line), and
a really simple set up with an L1 ICache, L2 DCache and main memory.
* we execute the function at 0x1000 -- this pulls the cache line
into the ICache
* we then modify the code in the function at 0x1010: this is
going to be a read, modify, write, which will pull the cache line
into the DCache. However when we write the new code this change
will just sit in the DCache.
* so now we have a copy of the old code in the ICache, and the new
version in the DCache. At this point the caches are incoherent,
and if we just tried to call the function at 0x1010 we'd be
executing the wrong code
* on ARM, to correct this the program has to perform explicit
cache maintenance operations:
* 1. clean the DCache: this forces 'dirty' lines in the DCache
to be written out, in this case to main memory
* 2. invalidate the ICache: this causes the ICache to forget the
old, stale cached data it holds, so the next access will reload
the ICache from main memory
* now if we call the function at 0x1010 it will see the changed
code that we wrote
The x86 architecture doesn't need the cache maintenance operations
because it requires the hardware to deal with it (for instance,
by having the ICache "snoop" writes to the DCache and automatically
invalidate any lines it has that are written to) so it can't get
into an incoherent state.
CPU architectures that require explicit cache maintenance for
self-modifying code are I think more common than ones like x86
which don't. (x86 is basically forced to be this way to maintain
backwards compatibility with old code written before there were
any caches for x86.)
What this means in practice for TCG is that once we've written out
some code we need to call flush_icache_range() for that memory.
The x86 implementation of that function is just a no-op.
-- PMM