qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Consult] tilegx: About floating point instructions


From: Chen Gang
Subject: Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
Date: Sat, 15 Aug 2015 17:56:07 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 8/13/15 22:59, Chen Gang wrote:
> Hello all:
> 
> For me, I guess for single insns, they are simple, and each calculation
> insns group can not be mixed with each other. So current implementation
> should be OK.
> 
> For double insns, I guess, only mul calculation can be mixed with other
> calculation groups (add/sub groups or int2float/double groups), because
> of optimization -- the mul calculation group have many insns.
> 

Oh, we are unlucky, after continue gcc testsuite, add/sub floating point
insns also can be mixed together! The related C code, -save-temps, and
objdump files are in attachments (is it gcc's issue? I guess not).

So, I guess, we have to 'crack' all floating point insns, precisely, or
we can not pass gcc testsuite.

At present, for me, I shall try to fix another issues which are found by
gcc testsuite, at last 'crack' the floating point insns. I guess, I can
not finish it in this month (I shall try to finish in the next month).


Thanks.

> So the implementation is below:
> 
> /*
>  * Assume floating point mul operation group can mix with other groups.
>  *
>  * fdouble_unpack_max: ; skipped.
>  *  
>  * fdouble_unpack_min: ; skipped.
>  *      
>  * fdouble_add_flags:  ; move calc flags to dest.
>  *                       save calc flags.
>  *                       save calc addsub result.
>  *
>  * fdouble_sub_flags:  ; move calc flags to dest.
>  *                       save calc flags.
>  *                       save calc addsub result.
>  *
>  * fdouble_addsub:     ; move calc addsub result to dest.
>  *                       set "addsub result" flag.
>  *
>  * fdouble_mul_flags:  ; move calc mul result to dest.
>  *
>  * fdouble_pack1:      ; if addsub result set
>  *                         && srca == saved addsub result
>  *                         && srcb == saved calc flags
>  *                           move srca to dest.
>  *                       else 
>  *                           move srcb to dest.
>  *
>  * fdouble_pack2:      ; if srcb == r63 && "addsub result" flag
>  *                           reset "addsub result" flag.
>  *                       else if srcb == r63
>  *                           pack srca dest (dest is orig srcb of pack1)
>  *                           reference from tilegx.md: float(uns)sidf2.
>  *                           get (u)int32_t a, then (u)int32_to_float64.
>  *                       else
>  *                           skipped.
>  */
> 
> 
> On 8/11/15 21:18, Chen Gang wrote:
>>
>> Oh, it seems a little complex, for a testsuite case, it lets double add
>> and double mul together! We need save more information for the correct
>> calculation in pack1.
>>
>> It is 20020314-1.exe, the related code (I guess it is correct):
>>
>>         ...
>>
>>         fdouble_unpack_max      r10, r3, zero
>> .LVL2:
>>         fdouble_unpack_max      r15, r2, zero
>>         fdouble_add_flags       r12, r0, r1
>>         mul_hu_lu       r13, r15, r10
>>         mul_lu_lu       r16, r15, r10
>>         mula_hu_lu      r13, r10, r15
>>         fdouble_unpack_min      r11, r0, r1
>>         {
>>         shli    r14, r13, 32
>>         fdouble_unpack_max      r17, r0, r1
>>         }
>>         {
>>         mul_hu_hu       r15, r15, r10
>>         add     r16, r16, r14
>>         }
>>         {
>>         shrui   r13, r13, 32
>>         fdouble_addsub  r17, r11, r12
>>         }
>>         {
>>         cmpltu  r14, r16, r14
>>         fdouble_mul_flags       r3, r2, r3
>>         }
>> .LVL3:
>>         {
>>         add     r13, r15, r13
>>         fdouble_pack1   r12, r17, r12
>>         }
>>         {
>>         add     r13, r13, r14
>>         fdouble_unpack_max      r10, r0, zero
>>         }
>>         fdouble_pack1   r3, r13, r3
>>         fdouble_pack2   r12, r17, zero
>>         fdouble_pack2   r3, r13, r16
>>
>>         ... 
>>
>> Welcome any additional ideas, suggestions and completions.
>>
>> Thanks.
>>
>> On 8/9/15 09:14, Chen Gang wrote:
>>> On 8/9/15 09:10, Chen Gang wrote:
>>>>
>>>> On 8/9/15 01:23, Chen Gang wrote:
>>>>> Hello all:
>>>>>
>>>>> Below is my current idea for all floating point insns. For me, it is not
>>>>> the precise implementation, even not completely implement -- assume pack
>>>>> insns can only for packing (u)int32_t when they are used individually:
>>>>>
>>>>>   fsingle_add1        ; return calc flags, save calc result to env.
>>>>>
>>>>>   fsingle_sub1        ; return calc flags, save calc result to env.
>>>>>
>>>>>   fsingle_addsub2     ; set "has result" flag.
>>>>>
>>>>>   fsingle_mul1        ; skip return value, save calc result to env.
>>>>>                         set "has result" flag.
>>>>>
>>>>>   fsingle_mul2        ; skipped.
>>>>>
>>>>>
>>>>>   fsingle_pack1       ; skipped.
>>>>>
>>>>>   fsingle_pack1       ; if "has result"
>>>>>                             reset "has result" flag.
>>>>>                             return calc result from env.
>>>>>                         else
>>>>>                             pack srca 
>>>>>                             reference from tilegx.md: float(uns)sisf2.
>>>>>                             get (u)int32_t a, then (u)int32_to_float32.
>>>>
>>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>>> are uint64_t):
>>>>
>>>
>>> Oh, sorry, for "pack srca" (not for "pack srca and srcb")
>>>
>>>>     switch (srca & 0x3ff) {
>>>>
>>>>     /* treat it as uint32_t */
>>>>     case 0x9e:
>>>>         return uint32_to_float32(srca >> 32, &FP_STATUS);
>>>>
>>>>     /* treat it as int32_t, must be negative number */
>>>>     case 0x29e:
>>>>         return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);
>>>>
>>>>     default:
>>>>         unimplemented (gen_exception).
>>>>     }
>>>>
>>>>>
>>>>>   fdouble_unpack_max: ; skipped.
>>>>>
>>>>>   fdouble_unpack_min: ; skipped.
>>>>>
>>>>>   fdouble_add_flags:  ; return calc flags, save calc result to env.
>>>>>
>>>>>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
>>>>>
>>>>>   fdouble_addsub:     ; set "has result" flag.
>>>>>
>>>>>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>>>>>                         set "has result" flag.
>>>>>
>>>>>   fdouble_pack1:      ; if "has result" 
>>>>>                             reset "has result" flag.
>>>>>                             return calc result from env.
>>>>>                         else
>>>>>                             pack srca and srcb.
>>>>>                             reference from tilegx.md: float(uns)sidf2.
>>>>>                             get (u)int32_t a, then (u)int32_to_float64.
>>>>>
>>>>  
>>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>>> are uint64_t):
>>>>
>>>>     switch (srcb & 0xffff) {
>>>>
>>>
>>> Oh, sorry, should use 0xfffff instead of 0xffff.
>>>
>>>>     /* treat it as uint32_t */
>>>>     case 0x21b00:
>>>>         return uint32_to_float64(srca >> 4, &FP_STATUS);
>>>>
>>>>     /* treat it as int32_t, must be negative number */
>>>>     case 0xa1b00:
>>>>         return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);
>>>>
>>>>     default:
>>>>         unimplemented (gen_exception).
>>>>     }
>>>>
>>>>>   fdouble_pack2:      ; skipped.
>>>>>
>>>>>
>>>>>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>>>>>    e.g gcc testsuit for complex number).
>>>>>
>>>>>
>>>>> Next, I shall implement the floating point insns, welcome any related
>>>>> ideas, suggestions, and completions.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On 8/5/15 22:16, Chen Gang wrote:
>>>>>> On 8/4/15 23:04, Richard Henderson wrote:
>>>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>>>>>
>>>>>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>>>>>
>>>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>>>>>> black-box instructions.  You need only really implement one of the
>>>>>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>>>>>
>>>>>>>>>> Looking at what gcc produces gives the hints:
>>>>>>>>>>
>>>>>>>>>> fdouble_unpack_min   min, srca, srcb fdouble_unpack_max      max, 
>>>>>>>>>> srca,
>>>>>>>>>> srcb fdouble_add_flags       flg, srca, srcb fdouble_addsub          
>>>>>>>>>> max, min, flg 
>>>>>>>>>> fdouble_pack1                dst, max, flg fdouble_pack2             
>>>>>>>>>> dst, max, zero
>>>>>>>>>>
>>>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>>>>>> from "flg" to "dst".
>>>>>>>>>>
>>>>>>>>>> Similarly for the single-precision:
>>>>>>>>>>
>>>>>>>>>> fsingle_add1         tmp, srca, srcb fsingle_addsub2         tmp, 
>>>>>>>>>> srca, srcb 
>>>>>>>>>> fsingle_pack1                flg, tmp fsingle_pack2          dst, 
>>>>>>>>>> tmp, flg
>>>>>>>>>>
>>>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> After check the tilegx.md completely, for me, we still need implement
>>>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>>>>>
>>>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>>>>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>>>>>
>>>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>>>>>> should be able to delete all of that as unused.  Especially if you have 
>>>>>>> the
>>>>>>> fdouble_unpack* insns store zero into their destinations.
>>>>>>>
>>>>>>
>>>>>> For me, I am not quite sure. But I guess, what you said should be OK (at
>>>>>> least, what you said is very useful for the implementation).
>>>>>>
>>>>>>
>>>>>>> Don't get me wrong -- more accurate implementation of the actual
>>>>>>> insns would be nice, especially for debugging.  But if the insns
>>>>>>> aren't accurately documented I don't see what choice we have.
>>>>>>>
>>>>>>
>>>>>> For me, I guess, we can still try to implement the details.
>>>>>>
>>>>>>  - The document has all floating point instructions' summary, so we can
>>>>>>    think of, or guess its implementation entirely.
>>>>>>
>>>>>>  - gcc uses them all and completely, so it is our good sample and good
>>>>>>    reference (but we should not assume gcc must be correct, since we
>>>>>>    just use qemu for gcc testsuite).
>>>>>>
>>>>>>  - Tilegx floating point format should be standard (at least, reference
>>>>>>    to the standard format), so we can reference the related information
>>>>>>    from google/baidu.
>>>>>>
>>>>>>
>>>>>>> On the good side, implementing the entire operation as part of the 
>>>>>>> "flags" step
>>>>>>> probably results in faster emulation.
>>>>>>>
>>>>>>
>>>>>> I guess so, too.
>>>>>>
>>>>>>
>>>>>> I shall try to finish the simple implementation, firstly. Then try to
>>>>>> implement the floating point instructions in details in the future (it
>>>>>> should be lower priority).
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>
>>>
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

Attachment: floating-point-double-add.tar.gz
Description: GNU Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]