[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32
From: |
Richard Henderson |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers |
Date: |
Tue, 09 Sep 2014 09:03:43 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 |
On 09/04/2014 11:27 AM, Tom Musta wrote:
>> - tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
>> + tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_SO], cpu_so);
>
> This looks correct to me but is causing problems. The above statement seems
> to get dropped in the generated asm ... at least on a PPC host:
>
> IN:
> 0x00000000100005b4: cmpw cr3,r30,r29
>
> OUT: [size=160]
> 0x6041ad30: lwz r14,-4(r27)
> 0x6041ad34: cmpwi cr7,r14,0
> 0x6041ad38: bne- cr7,0x6041adbc
> 0x6041ad3c: ld r14,240(r27) <<< r30
> 0x6041ad40: ld r15,232(r27) <<< r31
> 0x6041ad44: cmpw cr7,r14,r15 <<< this is the TCG_COND_LTx code
> 0x6041ad48: li r16,1
> 0x6041ad4c: li r0,0
> 0x6041ad50: isel r16,r16,r0,28
> 0x6041ad54: stw r16,576(r27) <<< store cpu_cr[LT]
> 0x6041ad58: cmpw cr7,r14,r15
> 0x6041ad5c: li r16,1
> 0x6041ad60: li r0,0
> 0x6041ad64: isel r16,r16,r0,29
> 0x6041ad68: stw r16,580(r27) <<< store cpu_cr[GT]
> 0x6041ad6c: cmplw cr7,r14,r15
> 0x6041ad70: li r14,1
> 0x6041ad74: li r0,0
> 0x6041ad78: isel r14,r14,r0,30
> 0x6041ad7c: stw r14,584(r27) <<< store cpu_cr[EQ]
> 0x6041ad80: .long 0x0
> 0x6041ad84: .long 0x0
>
> Richard: any ideas or hints on how to proceed?
Check the op dumps and make sure it's there. If it is, but is getting
discarded somewhere further down the pipeline, then try and get me a testcase.
> This is a very nice cleanup ... but it oversteers just a little. For some CR
> logical instructions, the generated code can produce non-zero bits in the i32
> cr variable in places other than the LSB.
> For example, consider crnand, which produces the following on a PPC host:
>
> IN:
> 0x0000000010000578: crnand 4*cr7+so,4*cr7+lt,4*cr7+eq
>
> OUT: [size=112]
> 0x6041a630: lwz r14,-4(r27)
> 0x6041a634: cmpwi cr7,r14,0
> 0x6041a638: bne- cr7,0x6041a68c
> 0x6041a63c: lwz r14,640(r27)
> 0x6041a640: lwz r15,648(r27)
> 0x6041a644: nand r14,r14,r15
> 0x6041a648: andi. r14,r14,1
> 0x6041a64c: stw r14,652(r27)
> 0x6041a650: .long 0x0
> 0x6041a654: .long 0x0
> 0x6041a658: .long 0x0
> 0x6041a65c: .long 0x0
>
> The host nand operation will always produce an i32 value that has 1s in bits
> 0-30, since they are presumably zero. A brute-force fix would be to add a
> tcg_gen_andi_i32(D,D,1) to your macro. But I think this is required only for
> a subset of the
> instructions (crnand, crnor, creqv, crorc).
Note that since most hosts don't have nand, the combination
nand x,y,z
and x.x,1
would be better represented with
and x,y,z
xor x,x,1
r~