qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 05/13] target/i386: reorganize ops emitted by do_gen_rep, dr


From: Richard Henderson
Subject: Re: [PATCH 05/13] target/i386: reorganize ops emitted by do_gen_rep, drop repz_opt
Date: Sun, 15 Dec 2024 08:19:10 -0600
User-agent: Mozilla Thunderbird

On 12/15/24 03:06, Paolo Bonzini wrote:
The condition for optimizing repeat instruction is more or less the
opposite of what you imagine: almost always the string instruction
was_not_ optimized and optimizing the loop relied on goto_tb.
This is obviously not great for performance, due to the cost of the
exit-to-main-loop check, but also wrong.  In fact, after expanding
dc->jmp_opt and simplifying "!!x" to "x", the condition for looping used
to be:

    ((cflags & CF_NO_GOTO_TB) ||
     (flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MASK))) && !(cflags & 
CF_USE_ICOUNT)

In other words, setting aside RF (it requires special handling for REP
instructions and it was completely missing), repeat instruction were
being optimized if TF or inhibit IRQ flags were set.  This is certainly
wrong for TF, because string instructions trap after every execution,
and probably for interrupt shadow too.

Get rid of repz_opt completely.  The next patches will reintroduce the
optimization, applying it in the common case instead of the unlikely
and wrong one.

While at it, place the CX/ECX/RCX=0 case is at the end of the function,
which saves a label and is clearer when reading the generated ops.
For clarity, mark the cc_op explicitly as DYNAMIC even if at the end
of the translation block; the cc_op can come from either the previous
instruction or the string instruction, and currently we rely on
a gen_update_cc_op() that is hidden in the bowels of gen_jcc() to
spill cc_op and mark it clean.

Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
---
  target/i386/tcg/translate.c | 60 ++++++++-----------------------------
  1 file changed, 13 insertions(+), 47 deletions(-)

It might have been clearer inlining gen_jz_ecx_string as a separate step, but no need to do that now. Yes, this is much clearer code generation.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]