qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH 3/6] tcg-x86_64: Implement setcond and movco


From: Richard Henderson
Subject: Re: [Qemu-devel] Re: [PATCH 3/6] tcg-x86_64: Implement setcond and movcond.
Date: Fri, 18 Dec 2009 09:11:31 -0800
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20090922 Fedora/3.0-3.9.b4.fc12 Thunderbird/3.0b4

On 12/18/2009 03:39 AM, Laurent Desnogues wrote:
+static void tcg_out_setcond(TCGContext *s, int cond, TCGArg arg0,
+                            TCGArg arg1, TCGArg arg2, int const_arg2, int rexw)

Perhaps renaming arg0 to dest would make things slightly
more readable.

Ok.

Also note that tcg_out_modrm will generate an unneeded prefix
for some registers. cf. the patch I sent to the list months ago.

Huh. Didn't notice since the disassembler printed what I expected to see. Is fixing this at the same time a requirement for acceptance? I'd prefer to tackle that separately, since no doubt it affects every use of P_REXB.

+        tgen_arithi32(s, ARITH_AND, arg0, 0xff);

Wouldn't movzbl be better?

Handled inside tgen_arithi32:

    } else if (c == ARITH_AND && val == 0xffu) {
        /* movzbl */
        tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, r0, r0);

I didn't feel the need to replicate that.

Regarding the xor optimization, I tested it on my i7 and it was
(very) slightly slower running a 64-bit SPEC2k gcc.

Huh. It used to be recommended. The partial word store used to stall the pipeline until the old value was ready, and the XOR was special-cased as a clear, which broke both the input dependency and also prevented a partial-register stall on the output.

Actually, this recommendation is still present: Section 3.5.1.6 in the November 2009 revision of the Intel Optimization Reference Manual.

If it's all the same, I'd prefer to keep what I have there. All other things being equal, the XOR is 2 bytes and the MOVZBL is 3.

+static void tcg_out_movcond(TCGContext *s, int cond, TCGArg arg0,
+                            TCGArg arg1, TCGArg arg2, int const_arg2,
+                            TCGArg arg3, TCGArg arg4, int rexw)

Perhaps renaming arg0 to dest would make things slightly
more readable.

Ok.

You should also add a note stating that arg3 != arg4.

I don't believe that's true though. It's caught immediately when we emit the movcond opcode, but there's no check later once copy-propagation has been done within TCG.

I check for that in the i386 and sparc backends, because dest==arg3 && dest==arg4 would actually generate incorrect code. Here in the x86_64 backend, where we always use cmov it doesn't generate incorrect code, merely inefficient.

I could add an early out for that case, if you prefer.

+    { INDEX_op_setcond_i32, { "r", "r", "re" } },
+    { INDEX_op_setcond_i64, { "r", "r", "re" } },
+
+    { INDEX_op_movcond_i32, { "r", "r", "re", "r", "r" } },
+    { INDEX_op_movcond_i64, { "r", "r", "re", "r", "r" } },

For the i32 variants, "ri" instead of "re" is enough.

Ah, quite right.


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]