Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, cl

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, cl

From:	Alex Bennée
Subject:	Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing
Date:	Mon, 16 Oct 2017 09:20:58 +0100
User-agent:	mu4e 0.9.19; emacs 26.0.90

Richard Henderson <address@hidden> writes:

> On 10/13/2017 09:24 AM, Alex Bennée wrote:
>> Half-precision helpers for float16 maths. I didn't bother hand-coding
>> the count leading zeros as we could always fall-back to host-utils if
>> we needed to.
>>
>> Signed-off-by: Alex Bennée <address@hidden>
>> ---
>>  fpu/softfloat-macros.h | 39 +++++++++++++++++++++++++++++++++++++++
>>  fpu/softfloat.c        | 21 +++++++++++++++++++++
>>  2 files changed, 60 insertions(+)
>>
>> diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
>> index 9cc6158cb4..73091a88a8 100644
>> --- a/fpu/softfloat-macros.h
>> +++ b/fpu/softfloat-macros.h
>> @@ -89,6 +89,31 @@ this code that are retained.
>>  # define SOFTFLOAT_GNUC_PREREQ(maj, min) 0
>>  #endif
>>
>> +/*----------------------------------------------------------------------------
>> +| Shifts `a' right by the number of bits given in `count'.  If any nonzero
>> +| bits are shifted off, they are ``jammed'' into the least significant bit 
>> of
>> +| the result by setting the least significant bit to 1.  The value of 
>> `count'
>> +| can be arbitrarily large; in particular, if `count' is greater than 16, 
>> the
>> +| result will be either 0 or 1, depending on whether `a' is zero or nonzero.
>> +| The result is stored in the location pointed to by `zPtr'.
>> +*----------------------------------------------------------------------------*/
>> +
>> +static inline void shift16RightJamming(uint16_t a, int count, uint16_t 
>> *zPtr)
>> +{
>> +    uint16_t z;
>> +
>> +    if ( count == 0 ) {
>> +        z = a;
>> +    }
>> +    else if ( count < 16 ) {
>> +        z = ( a>>count ) | ( ( a<<( ( - count ) & 16 ) ) != 0 );
>> +    }
>> +    else {
>> +        z = ( a != 0 );
>> +    }
>> +    *zPtr = z;
>> +
>> +}
>
> When are you going to use a SRJ of a uint16_t?  Isn't most of your actual
> arithmetic actually done on uint32_t?

The add/sub stuff currently uses it. Arguably it could do what it needs
with 32 bit as well but the spare exponent bits are enough for operating
on the significand. That said I'm fairly sure it all ends up 32 bit in
the generated code.

>
>> +/*----------------------------------------------------------------------------
>> +| Returns the number of leading 0 bits before the most-significant 1 bit of
>> +| `a'.  If `a' is zero, 16 is returned.
>> +*----------------------------------------------------------------------------*/
>> +
>> +static int8_t countLeadingZeros16( uint16_t a )
>> +{
>> +    if (a) {
>> +        return __builtin_clz(a);
>> +    } else {
>> +        return 16;
>> +    }
>> +}
>
> __builtin_clz works on "int".  You need to use clz32(a) - 16.

Ahh my mistake - I'd assumed it had the same smarts as the gcc atomics.
Maybe I should just use our utils functions afterall.

>
>> +/*----------------------------------------------------------------------------
>> +| Takes an abstract floating-point value having sign `zSign', exponent 
>> `zExp',
>> +| and significand `zSig', and returns the proper single-precision floating-
>
> s/single/half/
>
>> +| point value corresponding to the abstract input.  This routine is just 
>> like
>> +| `roundAndPackFloat32' except that `zSig' does not have to be normalized.
>> +| Bit 15 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
>> +| floating-point exponent.
>> +*----------------------------------------------------------------------------*/
>> +
>> +static float16
>> + normalizeRoundAndPackFloat16(flag zSign, int zExp, uint16_t zSig,
>> +                              float_status *status)
>> +{
>> +    int8_t shiftCount;
>> +
>> +    shiftCount = countLeadingZeros16( zSig ) - 1;
>> +    return roundAndPackFloat16(zSign, zExp - shiftCount, zSig<<shiftCount,
>> +                               true, status);
>
> Do I recall correctly that your lsb is between bits 7:6, like
> roundAndPackFloat32?  You've got 11 bits of sig.  Plus 7 bits of extra equals
> 18 bits.  Which doesn't fit in uint16_t.

No it takes a 32 bit sig in and deals with it internally.

>
> So, the reason that roundAndPackFloat32 uses 7 bits is that 7 + 24 == 31.
>
> We can either use a split at (15 - 11 =) 4 bits, and still fit in a uint16_t,
> or we can drop uint16_t and admit that the compiler is going to promote to 
> int,
> or uint32_t, anyway.  If we do that, we have options of a split between 4 and
> (31 - 11 =) 20 bits.
>
> We talked this week re fp->int conversion, it did seem Really Useful when we
> noted that sig << exp is representable in a uint32_t.  Which does suggest a
> choice at or below (32 - 11 - 14 =) 7.
>
>
> r~


--
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH 20/30] softfloat: half-precision compare functions, (continued)
- [Qemu-devel] [RFC PATCH 25/30] softfloat: float16_round_to_int, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 27/30] target/arm/translate-a64.c: add FP16 FRINTP to 2 reg misc, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 30/30] target/arm/translate-a64.c: add FP16 FCVTPS to 2 reg misc, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 28/30] softfloat: float16_to_int16 conversion, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 21/30] target/arm/translate-a64: add FP16 2-reg misc compare (zero), Alex Bennée, 2017/10/13
  - Re: [Qemu-devel] [RFC PATCH 21/30] target/arm/translate-a64: add FP16 2-reg misc compare (zero), Richard Henderson, 2017/10/16
- [Qemu-devel] [RFC PATCH 24/30] disas_simd_indexed: support half-precision operations, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing, Alex Bennée, 2017/10/13
  - Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing, Richard Henderson, 2017/10/15
    - Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing, Alex Bennée <=
- [Qemu-devel] [RFC PATCH 19/30] Fix mask for AdvancedSIMD 2 reg misc, Alex Bennée, 2017/10/13
  - Re: [Qemu-devel] [RFC PATCH 19/30] Fix mask for AdvancedSIMD 2 reg misc, Richard Henderson, 2017/10/16
- [Qemu-devel] [RFC PATCH 16/30] target/arm/translate-a64.c: add FP16 FADD/FMUL/FDIV to AdvSIMD 3 Same (!sub), Alex Bennée, 2017/10/13
  - Re: [Qemu-devel] [RFC PATCH 16/30] target/arm/translate-a64.c: add FP16 FADD/FMUL/FDIV to AdvSIMD 3 Same (!sub), Richard Henderson, 2017/10/16
- Re: [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress), no-reply, 2017/10/13
- Re: [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress), no-reply, 2017/10/14
- Re: [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress), Richard Henderson, 2017/10/16

Prev by Date: Re: [Qemu-devel] [RFC v2 16/22] monitor: enable IO thread for (qmp & !mux) typed
Next by Date: Re: [Qemu-devel] [PATCH] spapr_cpu_core: instantiate CPUs separately
Previous by thread: Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing
Next by thread: [Qemu-devel] [RFC PATCH 19/30] Fix mask for AdvancedSIMD 2 reg misc
Index(es):
- Date
- Thread