qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Consult] tilegx: About floating point instructions


From: Richard Henderson
Subject: Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
Date: Tue, 4 Aug 2015 08:04:32 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 08/04/2015 06:56 AM, Chen Gang wrote:
> 
> On 8/4/15 04:47, Chen Gang wrote:
>> On 8/4/15 00:40, Richard Henderson wrote:
>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>> but for me, I can not find any details about them (the ISA
>>>> documents only give a summary description, but not details), e.g.
>>>
>>> The tilegx splits the four/six cycle arithmetic into multiple
>>> black-box instructions.  You need only really implement one of the
>>> four, with the rest of them being implemented as nops or moves.
>>>
>>> Looking at what gcc produces gives the hints:
>>>
>>> fdouble_unpack_min  min, srca, srcb fdouble_unpack_max      max, srca,
>>> srcb fdouble_add_flags      flg, srca, srcb fdouble_addsub          max, 
>>> min, flg 
>>> fdouble_pack1               dst, max, flg fdouble_pack2             dst, 
>>> max, zero
>>>
>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>> insn can perform the whole operation, the pack1 insn performs a move
>>> from "flg" to "dst".
>>>
>>> Similarly for the single-precision:
>>>
>>> fsingle_add1                tmp, srca, srcb fsingle_addsub2         tmp, 
>>> srca, srcb 
>>> fsingle_pack1               flg, tmp fsingle_pack2          dst, tmp, flg
>>>
>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>
> 
> After check the tilegx.md completely, for me, we still need implement
> each of them precisely, or we can not emulate all cases (e.g. muldf3).

No, you can still implement all of muldf3 in fdouble_mul_flags.
Again, the fdouble_pack1 copies from the flag input to the output.

Yes, there is a 64-bit multiply in there, but the tcg optimizer
should be able to delete all of that as unused.  Especially if you have the
fdouble_unpack* insns store zero into their destinations.

Don't get me wrong -- more accurate implementation of the actual
insns would be nice, especially for debugging.  But if the insns
aren't accurately documented I don't see what choice we have.

On the good side, implementing the entire operation as part of the "flags" step
probably results in faster emulation.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]