qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH 3/6] tcg-x86_64: Implement setcond and movc


From: Laurent Desnogues
Subject: Re: [Qemu-devel] Re: [PATCH 3/6] tcg-x86_64: Implement setcond and movcond.
Date: Fri, 18 Dec 2009 18:41:28 +0100

On Fri, Dec 18, 2009 at 6:11 PM, Richard Henderson <address@hidden> wrote:
>> Also note that tcg_out_modrm will generate an unneeded prefix
>> for some registers. cf. the patch I sent to the list months ago.
>
> Huh.  Didn't notice since the disassembler printed what I expected to see.
>  Is fixing this at the same time a requirement for acceptance?
> I'd prefer to tackle that separately, since no doubt it affects every use of
> P_REXB.

I agree this change can be delayed.

>>> +        tgen_arithi32(s, ARITH_AND, arg0, 0xff);
>>
>> Wouldn't movzbl be better?
>
> Handled inside tgen_arithi32:
>
>    } else if (c == ARITH_AND && val == 0xffu) {
>        /* movzbl */
>        tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, r0, r0);
>
> I didn't feel the need to replicate that.

Oups, I compared with my code which has an explicit mozbl :)

>> Regarding the xor optimization, I tested it on my i7 and it was
>> (very) slightly slower running a 64-bit SPEC2k gcc.
>
> Huh.  It used to be recommended.  The partial word store used to stall the
> pipeline until the old value was ready, and the XOR was special-cased as a
> clear, which broke both the input dependency and also prevented a
> partial-register stall on the output.
>
> Actually, this recommendation is still present: Section 3.5.1.6 in the
> November 2009 revision of the Intel Optimization Reference Manual.
>
> If it's all the same, I'd prefer to keep what I have there.  All other
> things being equal, the XOR is 2 bytes and the MOVZBL is 3.

I agree too.  Anyway my measure is not representative enough
to mean anything.  And in that case I think shorter code is
better, so let's go for XOR.

>>> +static void tcg_out_movcond(TCGContext *s, int cond, TCGArg arg0,
>>> +                            TCGArg arg1, TCGArg arg2, int const_arg2,
>>> +                            TCGArg arg3, TCGArg arg4, int rexw)
>>
>> Perhaps renaming arg0 to dest would make things slightly
>> more readable.
>
> Ok.
>
>> You should also add a note stating that arg3 != arg4.
>
> I don't believe that's true though.  It's caught immediately when we emit
> the movcond opcode, but there's no check later once copy-propagation has
> been done within TCG.
>
> I check for that in the i386 and sparc backends, because dest==arg3 &&
> dest==arg4 would actually generate incorrect code.  Here in the x86_64
> backend, where we always use cmov it doesn't generate incorrect code, merely
> inefficient.
>
> I could add an early out for that case, if you prefer.

No, you can leave it as is unless someone else objects.


Laurent




reply via email to

[Prev in Thread] Current Thread [Next in Thread]