qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point


From: Chen Gang
Subject: Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation
Date: Sat, 12 Dec 2015 07:38:23 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 12/11/15 05:17, Richard Henderson wrote:
> On 12/10/2015 06:15 AM, Chen Gang wrote:
>> +#define TILEGX_F_MAN_HBIT   (1ULL << 59)
> ...
>> +static uint64_t fr_to_man(float64 d)
>> +{
>> +    uint64_t val = get_f64_man(d) << 7;
>> +
>> +    if (get_f64_exp(d)) {
>> +        val |= TILEGX_F_MAN_HBIT;
>> +    }
>> +
>> +    return val;
>> +}
> 
> One presumes that "HBIT" is the ieee implicit one bit.
> A better name or better comments would help there.
> 

OK, thanks. And after think of again, I guess, the real hardware does
not use HBIT internally (use the full 64 bits as mantissa without HBIT).

But what I have done is still OK (use 59 bits + 1 HBIT as mantissa), for
59 bits are enough for double mantissa (52 bits). It makes the overflow
processing easier, but has to process mul operation specially.

It we have to try to match the real hardware have done, I shall rewrite
the related code for mantissa. (I guess, we need to match the real
hardware have done).


> Do we know for sure that "7" is the correct number of guard bits?  From the 
> gcc
> implementation of floatsidf, I might guess that the correct number is "4".
> 

According to floatsidf, it seems "4", but after I expanded the bits, I
guess, it is "7".

/*
 * Double exp analyzing: (0x21b00 << 1) - 0x37(55) = 0x3ff
 *
 *   17  16  15  14  13  12  11  10   9   8   7    6   5   4   3   2   1   0
 *
 *    1   0   0   0   0   1   1   0   1   1   0    0   0   0   0   0   0   0
 *
 *    0   0   0   0   0   1   1   0   1   1   1    => 0x37(55)
 *
 *    0   1   1   1   1   1   1   1   1   1   1    => 0x3ff
 *
 */

I guess, I need restore this comments in helper_fdouble.c.


>> +static uint32_t get_fdouble_vexp(uint64_t n)
>> +{
>> +    return extract32(n, 7, 13);
>> +}
> 
> What's a "vexp"?
> 

It is exp + overflow bit + underflow bit. We can use vexp for internal
calculation, directly, and check uv and ov for the result. I guess the
real hardware will do like this.

The full description of the format is:

typedef union TileGXFPDFmtF {

    struct {
        uint64_t unknown0 : 7;    /* unknown */

        uint64_t vexp : 13;      /* vexp = exp | ov | uv */
#if 0 /* it is only the explanation for vexp above */
        uint64_t exp : 11;        /* exp, 0x21b << 1: 55 + TILEGX_F_EXP_DZERO */
        uint64_t ov : 1;          /* overflow for mul, low priority */
        uint64_t uv : 1;          /* underflow for mul, high priority */
#endif

        uint64_t sign : 1;        /* Sign bit for the total value */

        uint64_t calc: 2;         /* absolute add, sub, or mul */
        uint64_t inf: 1;          /* infinit */
        uint64_t nan: 1;          /* nan */

        /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
        uint64_t unordered : 1;   /* The two are unordered */
        uint64_t lt : 1;          /* 1st is less than 2nd */
        uint64_t le : 1;          /* 1st is less than or equal to 2nd */
        uint64_t gt : 1;          /* 1st is greater than 2nd */
        uint64_t ge : 1;          /* 1st is greater than or equal to 2nd */
        uint64_t eq : 1;          /* The two operands are equal */
        uint64_t neq : 1;         /* The two operands are not equal */

        uint64_t unknown1 : 32;   /* unknown */
    } fmt;
    uint64_t ll;                  /* only for easy using */
} TileGXFPDFmtF;


>> +uint64_t helper_fdouble_unpack_min(CPUTLGState *env,
>> +                                   uint64_t srca, uint64_t srcb)
>> +{
>> +    uint64_t v = 0;
>> +    uint32_t expa = get_f64_exp(srca);
>> +    uint32_t expb = get_f64_exp(srcb);
>> +
>> +    if (float64_is_any_nan(srca) || float64_is_any_nan(srcb)
>> +        || float64_is_infinity(srca) || float64_is_infinity(srcb)) {
>> +        return 0;
>> +    } else if (expa > expb) {
>> +        if (expa - expb < 64) {
>> +            set_fdouble_man(&v, fr_to_man(srcb) >> (expa - expb));
>> +        } else {
>> +            return 0;
>> +        }
>> +    } else if (expa < expb) {
>> +        if (expb - expa < 64) {
>> +            set_fdouble_man(&v, fr_to_man(srca) >> (expb - expa));
> 
> I very sincerely doubt that a simple right-shift is correct.  In order to
> obtain proper rounding for real computation, a sticky bit is required.  That
> is, set bit 0 if any bits are shifted out.  See the implementation of
> shift64RightJamming in fpu/softfloat-macros.h.
> 

Oh, really, thanks.


>> +uint64_t helper_fdouble_addsub(CPUTLGState *env,
>> +                               uint64_t dest, uint64_t srca, uint64_t srcb)
>> +{
>> +    if (get_fdouble_calc(srcb) == TILEGX_F_CALC_ADD) {
>> +        return dest + srca; /* maybe set addsub overflow bit */
> 
> Definitely not.  That would be part of packing.
> 

If we need to try to match the real hardware have done, the related
implementation above is incorrect.

And for my current implementation (I guess, it should be correct):

typedef union TileGXFPDFmtV {
    struct {
        uint64_t mantissa : 60;   /* mantissa */
        uint64_t overflow : 1;    /* carry/overflow bit for absolute add/mul */
        uint64_t unknown1 : 3;    /* unknown */
    } fmt;
    uint64_t ll;                  /* only for easy using */
} TileGXFPDFmtV;


In helper_fdouble_addsub(), both dest and srca are unpacked, so they are
within 60 bits. So one time absolute add are within 61 bits, so let bit
61 as overflow bit is enough.


>> +/* absolute-add/mul may cause add/mul carry or overflow */
>> +static bool proc_oflow(uint64_t *flags, uint64_t *v, uint64_t *srcb)
>> +{
>> +    if (get_fdouble_man_of(*v)) {
>> +        set_fdouble_vexp(flags, get_fdouble_vexp(*flags) + 1);
>> +        *srcb >>= 1;
>> +        *srcb |= *v << 63;
>> +        *v >>= 1;
>> +        clear_fdouble_man_of(v);
>> +    }
>> +    return get_fdouble_vexp(*flags) > TILEGX_F_EXP_DMAX;
>> +}
>> +
>> +uint64_t helper_fdouble_pack2(CPUTLGState *env, uint64_t flags /* dest */,
>> +                              uint64_t srca, uint64_t srcb)
>> +{
>> +    uint64_t v = srca;
>> +    float64 d = float64_set_sign(float64_zero, get_fdouble_sign(flags));
>> +
>> +    /*
>> +     * fdouble_add_flags, fdouble_sub_flags, or fdouble_mul_flags have
>> +     * processed exceptions. So need not process fp_status, again.
>> +     */
> 
> No need to process fp_status at all, actually.  Tile-GX (and pro) do not
> support exception flags, so everything we do with fp_status is discarded.
> 
> Indeed, we should probably not store fp_status in env at all, but create it on
> the stack in any function that actually needs one.
> 

OK, thanks.

>> +
>> +    if (get_fdouble_nan(flags)) {
>> +        return float64_val(float64_default_nan);
>> +    } else if (get_fdouble_inf(flags)) {
>> +        return float64_val(d |= float64_infinity);
> 
> s/|=/|/
> 

OK, thanks.

>> +    /* absolute-mul needs left shift 4 + 1 bytes to match the real mantissa 
>> */
>> +    if (get_fdouble_calc(flags) == TILEGX_F_CALC_MUL) {
>> +        v <<= 5;
>> +        v |= srcb >> 59;
>> +        srcb <<= 5;
>> +    }
> 
> As with single, I don't like this calc thing.  We can infer what's required
> from principals.
> 
> We're given two words containing mantissa, and a "flags" word containing sign,
> exponent, and other flags.  For add, sub, and floatsidf, the compiler passes 
> us
> 0 as the low word; for mul the compiler passes us the result of a 64x64->128
> bit multiply.
> 

OK, thanks. It looks, we have to try to match what the hardware have
done.

> The first step would be to normalize the 128-bit value so that the highest bit
> set is TILEGX_F_MAN_HBIT in the high word, adjusting the exponent in the
> process.  Fold the low word into the sticky bit of the high word (high |= (low
> != 0)) for rounding purposes.
> 

OK, thanks. And my original implementation did not consider about the
sticky bit.

> The second step would be to round and pack, similar to roundAndPackFloat64,
> except that your HBIT is at a different place than softfloat.c.
> 

It sounds good (and originally I really considered about it). If we have
an export common function for it, that will be really good.

At present, I use (u)int64_to_float64(), then process exp again.


>> +    d = calc(fsrca, fsrcb, fp_status); /* also check exceptions */
> 
> There are no exceptions to check.
>

OK, thanks. 

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed



reply via email to

[Prev in Thread] Current Thread [Next in Thread]