qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question on implementation detail of `temp_sync`


From: lrwei
Subject: Re: Question on implementation detail of `temp_sync`
Date: Thu, 6 Aug 2020 23:53:18 +0800

> lrwei <[hidden email]> writes:
> 
> > Sorry for the unintentional sending of an uncompleted message.
> 
> Questions about the internals of the TCG are very much in the remit of
> qemu-devel so are likely to get missed on qemu-discuss which is more
> aimed at user questions.

Thanks for your reply! Next time I will send my questions there.

> 
> <re-pasted to fix html noise>
> 
> > I understands that the current code works, but gets confused on why
> > `ts` needs to be loaded in to a register when `free_or_dead` is not
> > set.
> 
> It isn't, the break leaves the switch statement once it stores the
> constant to memory.

It seems to me that if `free_or_dead` was zero, further `tcg_out_sti`
and `break` in the if-block won't be executed at all. And that is
exactly what confused me.

> 
> > For example in the following scenario:
> > movi_i32     r0, 0x1
> > add_i32      r1, r1, r0
> > ...
> > (where r0 is not used any more, and both r0 and r1 are globals)
> 
> > If I am not mistaken, the code gen procedure of the first IR will call
> > `temp_sync` with `free_or_dead` not set, which load the constant in to
> > a register and store it back to memory. At this time, `r0` will be
> > `TEMP_VAL_REG` instead of `TEMP_VAL_CONST`, so the following IR can't
> > embed this constant operand in the assembly instruction it produces.
> > Also, this results in a seemingly useless register allocation (, why
> > don't the further use of r0 use the constant directly?)
> 
> Is this what you are actually seeing generated? If you run with -d
> in_asm,op,op_opt,out_asm it should be clear what actually happened.

The above example is originally made up to illustrate my question, but
I modified the frontend to directly emit the above IR (meaninglessly),
and here is the result:

---- Pasting output of Qemu ----

QEMU 5.0.93 monitor - type 'help' for more information
(qemu) OP:
 ld_i32 tmp0,env,$0xfffffffffffffff0
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0

 ---- 00000000
 movi_i32 r0,$0x1
 add_i32 r1,r1,r0
 movi_i32 pc,$0x4
 call hlt,$0x8,$0,env
 exit_tb $0x0
 set_label $L0
 exit_tb $0x7f8510000043

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xfffffffffffffff0      dead: 1  pref=0xffff
 movi_i32 tmp1,$0x0                       pref=0xffff
 brcond_i32 tmp0,tmp1,lt,$L0              dead: 0 1

 ---- 00000000                         
 movi_i32 r0,$0x1                         sync: 0  pref=0xffff
 add_i32 r1,r1,r0                         sync: 0  dead: 0 1 2  pref=0xffff
 movi_i32 pc,$0x4                         sync: 0  dead: 0  pref=0xffff
 call hlt,$0x8,$0,env                     dead: 0
 set_label $L0                          
 exit_tb $0x7f8510000043                

OUT: [size=72]
0x7f8510000100:  mov    -0x10(%rbp),%ebx                [tb header & initial 
instruction]
0x7f8510000103:  test   %ebx,%ebx
0x7f8510000105:  jl     0x7f8510000130
0x7f851000010b:  mov    $0x1,%ebx
0x7f8510000110:  mov    %ebx,0x0(%rbp)
0x7f8510000113:  mov    0x4(%rbp),%r12d
0x7f8510000117:  add    %r12d,%ebx
0x7f851000011a:  mov    %ebx,0x4(%rbp)
0x7f851000011d:  movl   $0x4,0x80(%rbp)
0x7f8510000127:  mov    %rbp,%rdi
0x7f851000012a:  callq  *0x10(%rip)        # 0x7f8510000140
0x7f8510000130:  lea    -0xf4(%rip),%rax        # 0x7f8510000043
0x7f8510000137:  jmpq   0x7f8510000018
  data: [size=8]
0x7f8510000140:  .quad  0x0000556e15d000d0

---- End of Qemu output ----

The constant is first loaded into `ebx` and then synced back to
its memory location. And further addition is of register format.
And I tried it again after remove the `free_or_dead` check in
`temp_sync`, it gives:

---- Pasting output of Qemu ----

QEMU 5.0.93 monitor - type 'help' for more information
(qemu) OP:
 ld_i32 tmp0,env,$0xfffffffffffffff0
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0

 ---- 00000000
 movi_i32 r0,$0x1
 add_i32 r1,r1,r0
 movi_i32 pc,$0x4
 call hlt,$0x8,$0,env
 exit_tb $0x0
 set_label $L0
 exit_tb $0x7f9db0000043

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xfffffffffffffff0      dead: 1  pref=0xffff
 movi_i32 tmp1,$0x0                       pref=0xffff
 brcond_i32 tmp0,tmp1,lt,$L0              dead: 0 1

 ---- 00000000                         
 movi_i32 r0,$0x1                         sync: 0  pref=0xffff
 add_i32 r1,r1,r0                         sync: 0  dead: 0 1 2  pref=0xffff
 movi_i32 pc,$0x4                         sync: 0  dead: 0  pref=0xffff
 call hlt,$0x8,$0,env                     dead: 0
 set_label $L0                          
 exit_tb $0x7f9db0000043                

OUT: [size=72]
0x7f9db0000100:  mov    -0x10(%rbp),%ebx                [tb header & initial 
instruction]
0x7f9db0000103:  test   %ebx,%ebx
0x7f9db0000105:  jl     0x7f9db000012d
0x7f9db000010b:  movl   $0x1,0x0(%rbp)
0x7f9db0000112:  mov    0x4(%rbp),%ebx
0x7f9db0000115:  inc    %ebx
0x7f9db0000117:  mov    %ebx,0x4(%rbp)
0x7f9db000011a:  movl   $0x4,0x80(%rbp)
0x7f9db0000124:  mov    %rbp,%rdi
0x7f9db0000127:  callq  *0x13(%rip)        # 0x7f9db0000140
0x7f9db000012d:  lea    -0xf1(%rip),%rax        # 0x7f9db0000043
0x7f9db0000134:  jmpq   0x7f9db0000018
  data: [size=8]
0x7f9db0000140:  .quad  0x0000564aed2120d0

---- End of Qemu output ----

This syncs the constant directly back to memory and use `inc`
instead of register add for `add_i32`, which is kind of the intuitive
thing to do. But I'm not sure whether it will break something else.

Given the above result, I'm curious about why the register is load here.
Is it because that repeatedly embedding the same large constant in
instructions inflates the code? (or it could be other reasons?) Looking
forward to your further reply!

Thanks,
lrwei

> 
> > So I wonder whether there is any reason for this loading a constant
> > into register, I'll be very appreciated if someone can point out the
> > reason for me.
> 
> <snip>
> 
> >
> >
> > Thanks in advance.
> > lrwei
> > ------------------Original------------------
> > From: "lrwei"<[hidden email]>
> > Date: Tue, Aug 4, 2020 12:06 PM
> > To: "qemu-discuss"<[hidden email]>
> > Subject: Question on implementation detail of `temp_sync`
> >
> <re-pasted fixing html noise>
> 
> > Hello to the list,
> > Recently I have been studying the code of TCG, and get confused by
> > the following detail in function `temp_sync` in tcg/tcg.c:
> 
> >     case TEMP_VAL_CONST:
> >         /* If we're going to free the temp immediately, then we won't
> >            require it later in a register, so attempt to store the
> >            constant to memory directly.  */
> >         if (free_or_dead
> >             && tcg_out_sti(s, ts->type, ts->val,
> >                            ts->mem_base->reg, ts->mem_offset)) {
> >             break;
> >         }
> >         temp_load(s, ts, tcg_target_available_regs[ts->type],
> >                   allocated_regs, preferred_regs);
> >         /* fallthrough */
> 
> > Would it be better to remove the `free_or_dead` in the if statement,
> > i.e. turn the function to be:
> 
> >     case TEMP_VAL_CONST:
> >         if (tcg_out_sti(s, ts->type, ts->val,
> >                            ts->mem_base->reg, ts->mem_offset)) {
> >             break;
> >         }
> >         temp_load(s, ts, tcg_target_available_regs[ts->type],
> >                   allocated_regs, preferred_regs);
> >         /* fallthrough */
> 
> 
> -- 
> Alex Bennée

reply via email to

[Prev in Thread] Current Thread [Next in Thread]