qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Consult] tilegx: About floating point instructions


From: Chen Gang
Subject: Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
Date: Wed, 5 Aug 2015 22:16:44 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 8/4/15 23:04, Richard Henderson wrote:
> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>
>> On 8/4/15 04:47, Chen Gang wrote:
>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>> but for me, I can not find any details about them (the ISA
>>>>> documents only give a summary description, but not details), e.g.
>>>>
>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>> black-box instructions.  You need only really implement one of the
>>>> four, with the rest of them being implemented as nops or moves.
>>>>
>>>> Looking at what gcc produces gives the hints:
>>>>
>>>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max      max, srca,
>>>> srcb fdouble_add_flags     flg, srca, srcb fdouble_addsub          max, 
>>>> min, flg 
>>>> fdouble_pack1              dst, max, flg fdouble_pack2             dst, 
>>>> max, zero
>>>>
>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>> from "flg" to "dst".
>>>>
>>>> Similarly for the single-precision:
>>>>
>>>> fsingle_add1               tmp, srca, srcb fsingle_addsub2         tmp, 
>>>> srca, srcb 
>>>> fsingle_pack1              flg, tmp fsingle_pack2          dst, tmp, flg
>>>>
>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>
>>
>> After check the tilegx.md completely, for me, we still need implement
>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
> 
> No, you can still implement all of muldf3 in fdouble_mul_flags.
> Again, the fdouble_pack1 copies from the flag input to the output.
> 
> Yes, there is a 64-bit multiply in there, but the tcg optimizer
> should be able to delete all of that as unused.  Especially if you have the
> fdouble_unpack* insns store zero into their destinations.
> 

For me, I am not quite sure. But I guess, what you said should be OK (at
least, what you said is very useful for the implementation).


> Don't get me wrong -- more accurate implementation of the actual
> insns would be nice, especially for debugging.  But if the insns
> aren't accurately documented I don't see what choice we have.
> 

For me, I guess, we can still try to implement the details.

 - The document has all floating point instructions' summary, so we can
   think of, or guess its implementation entirely.

 - gcc uses them all and completely, so it is our good sample and good
   reference (but we should not assume gcc must be correct, since we
   just use qemu for gcc testsuite).

 - Tilegx floating point format should be standard (at least, reference
   to the standard format), so we can reference the related information
   from google/baidu.


> On the good side, implementing the entire operation as part of the "flags" 
> step
> probably results in faster emulation.
> 

I guess so, too.


I shall try to finish the simple implementation, firstly. Then try to
implement the floating point instructions in details in the future (it
should be lower priority).


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed



reply via email to

[Prev in Thread] Current Thread [Next in Thread]