qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] sh : performance problem


From: Paul Brook
Subject: Re: [Qemu-devel] sh : performance problem
Date: Wed, 4 Mar 2009 02:59:18 +0000
User-agent: KMail/1.9.9

> > > Great :) But we're still far from arm :(

> By the way, does someone know why there is some kind of "tlb management
> code" in exec.c ??
>
> Does the SH4 architecture have special features that can't be handled in
> a generic code ? Or are we just rewriting some code that is already
> there ... ?

I think you're missing the most important difference; SH uses a software 
managed TLB, whereas ARM uses a hardware managed TLB.

The main consequence of this is that we don't have to model the actual ARM TLB 
at all, it is never directly visible. We effectively implement an infinitely 
large TLB.

For SH the TLB is programmed directly, so we end up having to maintain two 
TLBs: The qemu TLB and the architectural SH TLB. For correct operation pages 
must be removed from the qemu TLB when they are evicted/replaced in the SH 
TLB. The SH TLB is quite small, and flushing qemu TLB entries is quite 
expensive, so this results in fairly poor performance.

MIPS has a similar problem. However in that case the most common TLB 
operations do not directly expose the TLB state. In particular when setting a 
new TLB entry it is unspecified which TLB entry is replaced. At that point 
the OS can't know which ehtry was evicted, so we can lie, and not evict pages 
until the guest does something that allows it to determine the exact TLB 
state. In practice this is sufficient to make mips-linux workreasonably well.

I'm not sure if the same is posible for SH. It probably depends whether URC is 
visible to/used by the guest.

Large pages add even more complications. The qemu tlb canonly handle a single 
page size. In practice means that when large pages are used invalidating a 
single page entry requires the whole qemu tlb to be flushed. I'm pretty sure 
x86 getsand works mainly be chance (nothing actually ues large pages enough 
to notice it's broken). ARM takes the hit of a full TLB flush (linux breakss 
if you only flush a 1k region of a  4k entry), but single pge flushes are 
rare so in practice this doesn't hurt too much

Paul




reply via email to

[Prev in Thread] Current Thread [Next in Thread]