qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] sh : performance problem


From: Lionel Landwerlin
Subject: Re: [Qemu-devel] sh : performance problem
Date: Tue, 03 Mar 2009 23:28:32 +0100

Le mardi 03 mars 2009 à 20:25 +0100, Laurent Desnogues a écrit :
> On Tue, Mar 3, 2009 at 7:57 PM, Lionel Landwerlin
> <address@hidden> wrote:
> > Le mercredi 04 mars 2009 à 00:46 +0900, Shin-ichiro KAWASAKI a écrit :
> [...]
> >>   sh4 : 5.8 [seconds]     O(n) utlb search.
> >>   sh4 : 4.6 [seconds]     O(log2(n)) utlb search.
> >>   sh4 : 4.1 [seconds]     O(1) utlb search by Lionel
> >>   arm : 0.8 [seconds]     (-M versatilepb + Debian ARM)
> >>
> >> Your patch has a nice score!
> >
> > Great :) But we're still far from arm :(
> 
> It would be interesting if you could run oprofile of both platforms
> (and even better if that was with call-graph output).
> 
> >> > +#if !defined(CONFIG_USER_ONLY)
> >> > +    /* vpn to utlb entry caches (too much space for user emulation) */
> >> > +    uint8_t utlbs_1k[4194304]; /* 222 => 4 Mb */
> >> > +    uint8_t utlbs_4k[1048576]; /* 220 => 1 Mb */
> >> > +    uint8_t utlbs_64k[65536]; /* 216 => 64 Kb */
> >> > +    uint8_t utlbs_1m[4096]; /* 212 => 4 Kb */
> >> > +#endif
> >> >  } CPUSH4State;
> >>
> >> Isn't it too gorgeous?
> >> How about allocating them on demand?
> >> I guess sh-linux uses only utlbs_4k[], in general.
> >> If so, 4 Mb utlbs_1k[] is waste.
> >>
> >
> > sh-linux can also use huge pages of 64k and 1M.
> >
> > I think it is important to keep the emulation as close as possible from
> > the real the cpu capabilities.
> 
> I think Shin-ichiro was rather pointing a the number of 1k pages,
> which are probably not used.  OTOH I agree being close to
> CPU capabilities is important.

The problem is, when a virtual address space must be resolved, the cpu
emulator is not able to know what page size has been (or not) used to
map the address. Thus we must handle the worst case...

And after all, 5Mb isn't so much, considering the fact that the physical
guest memory, in system emulation, can be mapped anywhere in the host
virtual address space, contrary to the user emulation that have harder
contraints...



By the way, does someone know why there is some kind of "tlb management
code" in exec.c ??

Does the SH4 architecture have special features that can't be handled in
a generic code ? Or are we just rewriting some code that is already
there ... ?


-- 
Lionel Landwerlin <address@hidden>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]