[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 11/13] target/i386: execute multiple REP/REPZ iterations with
From: |
Richard Henderson |
Subject: |
Re: [PATCH 11/13] target/i386: execute multiple REP/REPZ iterations without leaving TB |
Date: |
Sun, 15 Dec 2024 09:46:02 -0600 |
User-agent: |
Mozilla Thunderbird |
On 12/15/24 09:17, Paolo Bonzini wrote:
Il dom 15 dic 2024, 16:07 Richard Henderson <richard.henderson@linaro.org
<mailto:richard.henderson@linaro.org>> ha scritto:
> @@ -1384,6 +1409,12 @@ static void do_gen_rep(DisasContext *s, MemOp ot,
> gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
> }
>
> + if (can_loop) {
> + tcg_gen_subi_tl(cx_next, cpu_regs[R_ECX], 1);
Since we've just written back cx_next to ECX, this is the same as cx_next
-= 1, yes?
Yeah, I wanted to make cx_next die at the assignment to ECX but it probably does not make
a difference to generated code.
Not really. It would only make a difference if cx_next was never live outside the EBB.
But it is live across the branches to LOOP and LAST.
What might make a difference is to use the knowledge of known values in ECX, but less
usage of cx_next itself. Let cx_next die at the two
+ tcg_gen_brcondi_tl(TCG_COND_TSTEQ, cx_next, cx_mask, last);
by repeating the subtraction when updating ECX, i.e.
- tcg_gen_mov_tl(cpu_regs[R_ECX], cx_next);
+ tcg_gen_subi_tl(cpu_regs[R_ECX], cpu_regs[R_ECX], 1);
This would avoid spilling cx_next to the stack.
There's a the ext32u to place somewhere.
I guess you can't hoist outside the loop before the first invocation of FN, due to the
fault path. To eliminate it from the main loop you'd have to unroll once.
// no iteration
brcond tsteq ecx, mask, done
sub cxnext, ecx, 1
brcond tsteq cxnext, mask, last
// first iteration
fn
sub ecx, ecx, 1
extu ecx, ecx
sub cxnext, ecx, 1
brcond eq cxnext, 0, last
// subsequent iterations, ecx now known zero-extended.
loop:
fn
sub ecx, ecx, 1
sub cxnext, ecx, 1
brcond tstne, cxnext, max, loop
brcond eq cxnext, 0, last
etc. It doesn't seem worthwhile to eliminate one ext32u, which will almost certainly be
scheduled into the noise.
r~
- Re: [PATCH 08/13] target/i386: make cc_op handling more explicit for repeated string instructions., (continued)
[PATCH 09/13] target/i386: do not use gen_op_jz_ecx for repeated string operations, Paolo Bonzini, 2024/12/15
[PATCH 10/13] target/i386: optimize CX handling in repeated string operations, Paolo Bonzini, 2024/12/15
[PATCH 11/13] target/i386: execute multiple REP/REPZ iterations without leaving TB, Paolo Bonzini, 2024/12/15
[PATCH 12/13] target/i386: pull computation of string update value out of loop, Paolo Bonzini, 2024/12/15
[PATCH 13/13] target/i386: avoid using s->tmp0 for add to implicit registers, Paolo Bonzini, 2024/12/15