qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 15/15] tcg: use ext op for deposit


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 15/15] tcg: use ext op for deposit
Date: Sun, 10 Apr 2011 22:17:26 +0200

On 10.04.2011, at 22:08, Aurelien Jarno wrote:

> On Sun, Apr 10, 2011 at 09:25:33PM +0200, Alexander Graf wrote:
>> 
>> On 10.04.2011, at 21:23, Aurelien Jarno wrote:
>> 
>>> On Tue, Apr 05, 2011 at 09:55:09AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 05.04.2011, at 06:54, Aurelien Jarno wrote:
>>>> 
>>>>> On Mon, Apr 04, 2011 at 04:32:24PM +0200, Alexander Graf wrote:
>>>>>> With the s390x target we use the deposit instruction to store 32bit 
>>>>>> values
>>>>>> into 64bit registers without clobbering the upper 32 bits.
>>>>>> 
>>>>>> This specific operation can be optimized slightly by using the ext 
>>>>>> operation
>>>>>> instead of an explicit and in the deposit instruction. This patch adds 
>>>>>> that
>>>>>> special case to the generic deposit implementation.
>>>>>> 
>>>>>> Signed-off-by: Alexander Graf <address@hidden>
>>>>>> ---
>>>>>> tcg/tcg-op.h |    6 +++++-
>>>>>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>>>> 
>>>>> Have you really measuring a difference here? This should already be
>>>>> handled, at least on x86, by this code:
>>>>> 
>>>>>      if (TCG_TARGET_REG_BITS == 64) {
>>>>>          if (val == 0xffffffffu) {
>>>>>              tcg_out_ext32u(s, r0, r0);
>>>>>              return;
>>>>>          }
>>>>>          if (val == (uint32_t)val) {
>>>>>              /* AND with no high bits set can use a 32-bit operation.  */
>>>>>              rexw = 0;
>>>>>          }
>>>>>      }
>>>> 
>>>> I've certainly looked at the -d op logs and seen that instead of creating 
>>>> a const tcg variable plus an AND there was now an extu opcode issued, yes. 
>>>> No idea why the case up there didn't trigger.
>>>> 
>>> 
>>> The question there is looking at -d out_asm. They should be the same at
>>> the end as the code I pasted above is from tcg/i386/tcg-target.c.
>> 
>> Yes. I was trying to optimize for maximum op length. TCG defines a maximum 
>> number of tcg ops to be issued by each target instruction. Since s390 is 
>> very CISCy, there are instructions that translate into lots of microops, but 
>> are still faster than a C call (register save/restore mostly).
>> 
>> Without this patch, there are some places where we hit that number :).
> 
> Is it on 32-bit on or 64-bit? If we reach this number, it's probably
> better to either implement this instruction with an helper, or maybe
> increase the number of maximum ops. What is this instruction?

This was on x86_64. I hit limits with LMH and LM, but reduced them to fit into 
the picture with this optimization :). If you like, I can give you a statically 
linked binary that could exceed the limits.


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]