qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction o


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction opcodes
Date: Tue, 25 Oct 2016 18:48:03 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0


On 25/10/2016 18:46, Richard Henderson wrote:
> On 10/25/2016 05:46 AM, Paolo Bonzini wrote:
>>
>>
>> On 18/10/2016 17:10, Richard Henderson wrote:
>>> +    case INDEX_op_extract_i32:
>>> +        /* On the off-chance that we can use the high-byte registers.
>>> +           Otherwise we emit the same ext16 + shift pattern that we
>>> +           would have gotten from the normal tcg-op.c expansion.  */
>>> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
>>> +        if (args[1] < 4 && args[0] < 8) {
>>> +            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
>>> +        } else {
>>> +            tcg_out_ext16u(s, args[0], args[1]);
>>> +            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
>>> +        }
>>
>> Since the opcode is pretty rare, perhaps it's worth restricting the
>> constraints to, respectively, a new constraint for 0xff ("R"?) and "Q"?
>> It should generate slightly better code without constraining the
>> register allocator too much.
> 
> I tried that, but since our allocator does nothing to look forward to future
> uses, it will only properly load a value into Q if this is the first use of 
> the
> value within the TB.  Otherwise it'll generate an extra move to satisfy the
> constraint.
> 
> Given that movzwl can operate on any source, and can copy to another
> destination at the same time, it's wasteful to force the register allocator to
> generate the extra move.
> 
> This ext16u+shift form is what we'll generate without the special case here.
> So if you prefer I could drop the %[abcd]h special case entirely.

Nah, as you said there's always a chance of satisfying the constraint
(and of getting a better register allocator).

> The one that's particularly valuable is the 32-bit shift as extraction from a
> 64-bit input.  That turns out to happen lots for e.g. ppc64abi32 guest.

Sounds good, thanks!

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]