qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 13:37:42 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Sun, May 15, 2011 at 02:33:55PM +0300, Blue Swirl wrote:
> On Sun, May 15, 2011 at 2:14 PM, Aurelien Jarno <address@hidden> wrote:
> > On Sun, May 15, 2011 at 02:03:40PM +0300, Blue Swirl wrote:
> >> On Sun, May 15, 2011 at 12:27 PM, Laurent Desnogues
> >> <address@hidden> wrote:
> >> > On Sun, May 15, 2011 at 9:15 AM, Blue Swirl <address@hidden> wrote:
> >> >> On Sun, May 15, 2011 at 1:04 AM, Aurelien Jarno <address@hidden> wrote:
> >> >>> On Sun, May 15, 2011 at 12:52:35AM +0300, Blue Swirl wrote:
> >> >>>> On Sun, May 15, 2011 at 12:16 AM, Aurelien Jarno <address@hidden> 
> >> >>>> wrote:
> >> >>>> > On Sat, May 14, 2011 at 10:35:20PM +0300, Blue Swirl wrote:
> >> > [...]
> >> >>>> > The env register is used very often (basically for every 
> >> >>>> > load/store, but
> >> >>>> > also a lot of helpers), so it makes sense to reserve a register for 
> >> >>>> > it.
> >> >>>> >
> >> >>>> > For what I understand from your patch series, you prefer to pass 
> >> >>>> > this
> >> >>>> > register explicitly to TCG functions. This basically means this TCG
> >> >>>> > global will be loaded to host register as soon as it is used, but 
> >> >>>> > also
> >> >>>> > regularly, as globals are saved back to their canonical location 
> >> >>>> > before
> >> >>>> > an helper or a load/store.
> >> >>>> >
> >> >>>> > So it seems that this patch series will just allowing the "env 
> >> >>>> > register"
> >> >>>> > to change over time, though it will not spare one more register for 
> >> >>>> > the
> >> >>>> > TCG code, and it will emit longer TCG code to regularly reload the 
> >> >>>> > env
> >> >>>> > global into a host register.
> >> >>>>
> >> >>>> But there will be one more register available in some cases. In other
> >> >>>
> >> >>> Inside the TCG code, it will basically happens very rarely, given
> >> >>> load/store are really the most used instructions, and they need to load
> >> >>> the env register.
> >> >>
> >> >> Not exactly, from a sample run with -d op_opt:
> >> >> $ egrep -v -e '^$' -v -e 'OP after' -v -e ' end' -v -e 'Search PC'
> >> >> /tmp/qemu.log | awk '{print $1}' | sort | uniq -c|sort -rn
> >> >> 1673966 movi_i32
> >> >>  653931 ld_i32
> >> >>  607432 mov_i32
> >> >>  428684 st_i32
> >> >>  326878 movi_i64
> >> >>  308626 add_i32
> >> >>  283186 call
> >> >>  256817 exit_tb
> >> >>  207232 nopn
> >> >>  189388 goto_tb
> >> >>  122398 and_i32
> >> >>  117997 shr_i32
> >> >>  89107 qemu_ld32
> >> >>  82926 set_label
> >> >>  82713 brcond_i32
> >> >>  67169 qemu_st32
> >> >>  55109 or_i32
> >> >>  46536 ext32u_i64
> >> >>  44288 xor_i32
> >> >>  38103 sub_i32
> >> >>  26361 shl_i32
> >> >>  23218 shl_i64
> >> >>  23218 qemu_st64
> >> >>  23218 or_i64
> >> >>  20474 shr_i64
> >> >>  20445 qemu_ld64
> >> >>  11161 qemu_ld8u
> >> >>  10409 qemu_st8
> >> >>   5013 qemu_ld16u
> >> >>   3795 qemu_st16
> >> >>   2776 qemu_ld8s
> >> >>   1915 sar_i32
> >> >>   1414 qemu_ld16s
> >> >>    839 not_i32
> >> >>    579 setcond_i32
> >> >>    213 br
> >> >>     42 ext32s_i64
> >> >>     30 mul_i64
> >> >
> >> > Unless I missed something, this doesn't show the usage of
> >> > ld/st per TB, which is what Aurélien was looking for if I
> >> > understood correctly.  All I can say is that you had at
> >> > most 256817 TB's and 234507 qemu_ld/st, so about one per
> >> > TB.
> >>
> >> The question was ratio of loads/stores to other instructions. The
> >> statistics are not per TB. There were about 174880 TBs.
> >>
> >> > Anyway I must be thick, because I fail to see how
> >> > generated code could access guest CPU registers without a
> >> > pointer to the CPU env :-)
> >> >
> >> > IIUC the SPARC translator uses ld_i32/st_i32 mainly for
> >> > accessing the guest CPU registers, which due to register
> >> > windows is held in a dedicated global temp.  Is that
> >> > correct?  If so this is kind of hiding accesses to the
> >> > CPU env;  all other targets read/write registers by using
> >> > CPU env (through the use global temps in most cases).
> >> >
> >> > So I think most (if not almost all) TB will need a pointer
> >> > to CPU env, which is why I think Aurélien's proposal to
> >> > keep a dedicated register that'd be loaded in the prologue
> >> > is the only way to not degrade performance of the
> >> > generated code (I'd add that this dedicated register
> >> > should be the one defined by the ABI as holding the first
> >> > parameter value, if that's possible;  I'm afraid this is
> >> > not necessarily a good idea).
> >>
> >> CPU env will be used, but the register could be made available for
> >> other uses too.
> >>
> >
> > What other uses? It is needed for almost every TB, so there is not other
> > possible use. Let's take an example, a simple TB from mips on x86_64:
> >
> > | IN: start_kernel
> > | 0x80510958:  jal        0x8051d50c
> > | 0x8051095c:  nop
> > |
> > | OP after liveness analysis:
> > |  movi_i32 ra,$0x80510960
> > |  movi_i32 PC,$0x8051d50c
> > |  exit_tb $0x0
> > |  end
> >
> > | OUT: [size=28]
> > | 0x40f22960:  mov    $0x80510960,%ebp
> > | 0x40f22965:  mov    %ebp,0x7c(%r14)
> > | 0x40f22969:  mov    $0x8051d50c,%ebp
> > | 0x40f2296e:  mov    %ebp,0x80(%r14)
> > | 0x40f22975:  xor    %eax,%eax
> > | 0x40f22977:  jmpq   0x115042e
> >
> > x86_64 uses r14 as TCG_AREG0. Despite the instructions being quite
> > simple (only 2 movi_i32), the resulting code makes 2 access to env to
> > save the two registers. Having to reload the env pointer each time to a
> > register would clearly increase the size of this TB.
> 
> I don't think TCG would be that simple, instead the pointer would be
> loaded only once in this case.
> 
> > Another question, imagine env is not in a register anymore. What kind of
> > code do you plan to use to load the env pointer into a register? This is
> > something we need to know to evaluate the cost of not having a fixed
> > register for env withing the TCG code.
> 
> The prologue could save it to the first temp dedicated for this.
> 

So basically you mean you want to keep a dedicated temp for env in the
TCG generated code??? It seems to go in the opposite of your previous
mails stating a full elimination of TCG_AREG0.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]