qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 11:49:48 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Sun, May 15, 2011 at 11:27:17AM +0200, Laurent Desnogues wrote:
> On Sun, May 15, 2011 at 9:15 AM, Blue Swirl <address@hidden> wrote:
> > On Sun, May 15, 2011 at 1:04 AM, Aurelien Jarno <address@hidden> wrote:
> >> On Sun, May 15, 2011 at 12:52:35AM +0300, Blue Swirl wrote:
> >>> On Sun, May 15, 2011 at 12:16 AM, Aurelien Jarno <address@hidden> wrote:
> >>> > On Sat, May 14, 2011 at 10:35:20PM +0300, Blue Swirl wrote:
> [...]
> >>> > The env register is used very often (basically for every load/store, but
> >>> > also a lot of helpers), so it makes sense to reserve a register for it.
> >>> >
> >>> > For what I understand from your patch series, you prefer to pass this
> >>> > register explicitly to TCG functions. This basically means this TCG
> >>> > global will be loaded to host register as soon as it is used, but also
> >>> > regularly, as globals are saved back to their canonical location before
> >>> > an helper or a load/store.
> >>> >
> >>> > So it seems that this patch series will just allowing the "env register"
> >>> > to change over time, though it will not spare one more register for the
> >>> > TCG code, and it will emit longer TCG code to regularly reload the env
> >>> > global into a host register.
> >>>
> >>> But there will be one more register available in some cases. In other
> >>
> >> Inside the TCG code, it will basically happens very rarely, given
> >> load/store are really the most used instructions, and they need to load
> >> the env register.
> >
> > Not exactly, from a sample run with -d op_opt:
> > $ egrep -v -e '^$' -v -e 'OP after' -v -e ' end' -v -e 'Search PC'
> > /tmp/qemu.log | awk '{print $1}' | sort | uniq -c|sort -rn
> > 1673966 movi_i32
> >  653931 ld_i32
> >  607432 mov_i32
> >  428684 st_i32
> >  326878 movi_i64
> >  308626 add_i32
> >  283186 call
> >  256817 exit_tb
> >  207232 nopn
> >  189388 goto_tb
> >  122398 and_i32
> >  117997 shr_i32
> >  89107 qemu_ld32
> >  82926 set_label
> >  82713 brcond_i32
> >  67169 qemu_st32
> >  55109 or_i32
> >  46536 ext32u_i64
> >  44288 xor_i32
> >  38103 sub_i32
> >  26361 shl_i32
> >  23218 shl_i64
> >  23218 qemu_st64
> >  23218 or_i64
> >  20474 shr_i64
> >  20445 qemu_ld64
> >  11161 qemu_ld8u
> >  10409 qemu_st8
> >   5013 qemu_ld16u
> >   3795 qemu_st16
> >   2776 qemu_ld8s
> >   1915 sar_i32
> >   1414 qemu_ld16s
> >    839 not_i32
> >    579 setcond_i32
> >    213 br
> >     42 ext32s_i64
> >     30 mul_i64
> 
> Unless I missed something, this doesn't show the usage of
> ld/st per TB, which is what Aurélien was looking for if I
> understood correctly.  All I can say is that you had at
> most 256817 TB's and 234507 qemu_ld/st, so about one per
> TB.
> 
> Anyway I must be thick, because I fail to see how
> generated code could access guest CPU registers without a
> pointer to the CPU env :-)
> 
> IIUC the SPARC translator uses ld_i32/st_i32 mainly for
> accessing the guest CPU registers, which due to register
> windows is held in a dedicated global temp.  Is that
> correct?  If so this is kind of hiding accesses to the
> CPU env;  all other targets read/write registers by using
> CPU env (through the use global temps in most cases).
> 
> So I think most (if not almost all) TB will need a pointer
> to CPU env, which is why I think Aurélien's proposal to
> keep a dedicated register that'd be loaded in the prologue
> is the only way to not degrade performance of the
> generated code (I'd add that this dedicated register
> should be the one defined by the ABI as holding the first
> parameter value, if that's possible;  I'm afraid this is
> not necessarily a good idea).
> 

I also thought about making it the one defined by the ABI as holding the
first parameter value, to avoid some register moves, but I am also
afraid this is not necessarily a good idea. It is usually a caller
saved register (because the caller is anyway overriding its value), 
which means that it has to be saved before calling every helper, and it
probably means more register move in that case.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]