qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 11/13] target/i386: execute multiple REP/REPZ iterations with


From: Richard Henderson
Subject: Re: [PATCH 11/13] target/i386: execute multiple REP/REPZ iterations without leaving TB
Date: Sun, 15 Dec 2024 09:46:02 -0600
User-agent: Mozilla Thunderbird

On 12/15/24 09:17, Paolo Bonzini wrote:


Il dom 15 dic 2024, 16:07 Richard Henderson <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>> ha scritto:

     > @@ -1384,6 +1409,12 @@ static void do_gen_rep(DisasContext *s, MemOp ot,

     >           gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
     >       }
     >
     > +    if (can_loop) {
     > +        tcg_gen_subi_tl(cx_next, cpu_regs[R_ECX], 1);

    Since we've just written back cx_next to ECX, this is the same as cx_next 
-= 1, yes?


Yeah, I wanted to make cx_next die at the assignment to ECX but it probably does not make a difference to generated code.

Not really. It would only make a difference if cx_next was never live outside the EBB. But it is live across the branches to LOOP and LAST.

What might make a difference is to use the knowledge of known values in ECX, but less usage of cx_next itself. Let cx_next die at the two

+        tcg_gen_brcondi_tl(TCG_COND_TSTEQ, cx_next, cx_mask, last);

by repeating the subtraction when updating ECX, i.e.

-    tcg_gen_mov_tl(cpu_regs[R_ECX], cx_next);
+    tcg_gen_subi_tl(cpu_regs[R_ECX], cpu_regs[R_ECX], 1);

This would avoid spilling cx_next to the stack.

There's a the ext32u to place somewhere.

I guess you can't hoist outside the loop before the first invocation of FN, due to the fault path. To eliminate it from the main loop you'd have to unroll once.

        // no iteration
        brcond tsteq ecx, mask, done

        sub cxnext, ecx, 1
        brcond tsteq cxnext, mask, last

        // first iteration
        fn
        sub ecx, ecx, 1
        extu ecx, ecx

        sub cxnext, ecx, 1
        brcond eq cxnext, 0, last

        // subsequent iterations, ecx now known zero-extended.
 loop:
        fn
        sub ecx, ecx, 1

        sub cxnext, ecx, 1
        brcond tstne, cxnext, max, loop
        brcond eq cxnext, 0, last

etc. It doesn't seem worthwhile to eliminate one ext32u, which will almost certainly be scheduled into the noise.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]