qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 16:03:52 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Sun, May 15, 2011 at 04:42:05PM +0300, Blue Swirl wrote:
> On Sun, May 15, 2011 at 4:02 PM, Aurelien Jarno <address@hidden> wrote:
> > On Sun, May 15, 2011 at 03:37:00PM +0300, Blue Swirl wrote:
> >> On Sun, May 15, 2011 at 3:14 PM, Laurent Desnogues
> >> <address@hidden> wrote:
> >> > On Sun, May 15, 2011 at 1:33 PM, Blue Swirl <address@hidden> wrote:
> >> > [...]
> >> >>> x86_64 uses r14 as TCG_AREG0. Despite the instructions being quite
> >> >>> simple (only 2 movi_i32), the resulting code makes 2 access to env to
> >> >>> save the two registers. Having to reload the env pointer each time to a
> >> >>> register would clearly increase the size of this TB.
> >> >>
> >> >> I don't think TCG would be that simple, instead the pointer would be
> >> >> loaded only once in this case.
> >> >
> >> > Assuming TCG was able to allocate a register for that,
> >> > it would be live at most for one TB, so you'd have to
> >> > load it at least once per TB, and with block chaining
> >> > that wouldn't be efficient as you'd keep on reloading it.
> >>
> >> Yes, but if there are better uses, the register can be flushed. Now
> >> this is not possible since the register is always unavailable.
> >>
> >
> > What are the better uses, that justify to flush a register that is going
> > to be used three or four host asm later?
> 
> It would obviously replace something else determined by TCG.

The register will be free only for a few host instructions. Could you 
please give more concrete example about such a usage?

> > In the current generated code, roughly one every four instruction
> > reference TCG_AREG0, so this register is really needed very often.
> >
> > If you think TCG will be faster by having one more register in between
> > I suggest you to first optimize tcg_reg_alloc(), which simply spill
> > a random register, even if they are some allocated register that won't
> > be used until the end of the TB. You should also should check how often
> > TCG spills a register (in which case it would have benefit from one more
> > register). It happens less than 2000 times when booting an emulated mips
> > system on x86_64, while more than 160000 TB are generated.
> 
> Right, on a modern CPU with lots of registers, one additional register
> won't be helpful, but on i386 the situation should be very different,
> there are very few registers.
> 

On i386, I indeed get a lot more of spilled registers, that is 340000. Still
that number is not that high, it's less than two times per TB. If we
consider that these register spills are pure loss (which is not always
the case, sometime the spilled register is actually never used later, so
it's just an anticipated save), that's 4 load/store per TB.

It means to compensate, the env register should not be loaded more than
4 times in a TB, which looks like quite difficult to achieve given how
often this register is used.

Please also note that spilling globals currently need access to the env
pointer, which might not be loaded, so another register spill is need to
load it. This will make the code a lot more complex than now to avoid a
deadlock (probably by spilling local temps first).

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]