qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
Date: Sun, 29 Mar 2009 16:42:50 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

On Sun, Mar 29, 2009 at 03:34:53PM +0200, Aurelien Jarno wrote:
> On Sat, Mar 28, 2009 at 05:18:34PM -0700, Nathan Froyd wrote:
> > On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote:
> > > On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> > > > I am not a TCG expert, but there are several loops in TCG over all
> > > > globals and it seems like those loops would go faster if they didn't
> > > > have to consider registers that would never be touched.  If this patch
> > > > series makes no difference in TCG's performance, then I'd be glad to
> > > > have an explanation of why that's the case.
> > > 
> > > Do you actually have run a benchmark with those changes? TCG is
> > > sometimes a bit strange, and some optimizations does not change the
> > > execution speed, while others improve it a lot. It is very difficult to
> > > predict what will give a gain or not.
> > > 
> > > Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
> > > or a compilation in system emulation.
> > 
> > Benchmarking?  Pffft. ;)
> > 
> > A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files
> > and a 603e emulated CPU suggests that these changes are not terribly
> > beneficial (maybe 1% improvement, if that).  I don't imagine that a
> > similarly stressful benchmark in system emulation would be much
> > different.  Consider the patch series withdrawn.
> > 
> 
> I have done a few profiling on qemu-system-ppc and qemu-system-mips. You
> are actually right that the loop on the TCG variables lists takes time. 
> This is mainly due to the call of save_globals() for TCG functions marked 
> as TCG_OPF_CALL_CLOBBER.
> 
> However it looks like it should be better to address this comment first
> before trying to reduce the number of TCG variables:
> 
>             /* XXX: for load/store we could do that only for the slow path
>                (i.e. when a memory callback is called) */
> 

Thinking a bit more I think we should avoid mapping FPU registers as
global TCG variables. Those variables are mostly modified by helpers
(except for move and load/store), and they will be written back to 
memory before the call to the helper. This means TCG can't delay the 
memory accesses, so there is very few (or no) difference in the
generated code if the FPU register is accessed through a global TCG 
variable or through tcg_gen_ld_tl().

I have done the test with qemu-system-mips, and I have found a gain 
around 1% in speed.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net




reply via email to

[Prev in Thread] Current Thread [Next in Thread]