qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH v0 0/8] Improve register allocator


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [RFC][PATCH v0 0/8] Improve register allocator
Date: Tue, 24 May 2011 14:40:03 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Tue, May 24, 2011 at 03:31:11PM +0400, Kirill Batuzov wrote:
> 
> 
> On Mon, 23 May 2011, Aurelien Jarno wrote:
> 
> > 
> > Thanks for this patch series. Your approach to solve this issue is
> > really different than mine. Instead I added more state to the dead/live
> > states, and use them to mark some input deads even for global, and mark
> > some output arguments to be synced. This informations are then used
> > directly in the tcg_reg_alloc_* functions to make better usage of the
> > available registers. On the other hand my patch series only tries to
> > really lower the number of spills and doesn't try to make better spill
> > choices.
> > 
> > I guess it would be a good idea that I continue with this approach (I
> > basically just have to fix a few cases were some regs are wrongly copied
> > back to memory), so that we can more easily compare the two approaches.
> > Your last patch is anyway interesting, having some statistics is always
> > something interesting.
> > 
> > In any case I really think we need a better register allocator before we
> > can do any serious optimization passes like constant or copy propagation,
> > otherwise we end up with a lot of register in use for no real reason.
> >
> When I started working on this patch series I first wanted to write a
> better register allocator, something linear scan based.  But TBs
> currently have quite specific and very simple structure.  They have globals 
> which are alive everywhere and temps, packed in a count of nests.  Each nest
> is a result of translation of one guest instruction.  Live ranges of temps in
> one nest always intersect, while live ranges of temps from different
> nests never intersect.  As a result more sophisticated algorithm being
> applied to this test case works very similar to a simple greedy algorithm we
> have right now.
> 
> Gathered statistics shows some interesting things too. I've run matrix
> multiplication benchmark (guest - ARM, host - x86, linux-user mode, with
> my patches applied) and here are the results:
> 
> spill count         3916
>   real spills       32
>   spills at bb end  1023
>   spills at call:
>     globals         2755
>     iarg passing    0
>     call cloobers   106
> 
> Real spills are spills generated by register allocator when it runs out
> of registers.  They are less than 1% of all spills.  Other tests show
> similar behavior.
> 
> I think any further improvements to register allocator without leveling
> conventions about saving globals at calls and BB ends somehow is
> useless.
> 

That's actually why in my implementation I distinguish saving the global
back to memory, and synchronizing it with memory, but keeping it in a
register if it is not write anymore after. A lot of calls (especially
qemu ld/st) actually do not need to have the global back in memory, but
only be synchronized with the value in memory in case an exception
happens. Doing so save the moves memory to register after the call.

Anyway your statistics are actually showing what I was trying to say in
the TCG_AREG0 thread: if we use the host registers correctly, TCG won't
be really able to use another register.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]