Re: [Qemu-devel] [PATCH v2 08/14] hardfloat: support float32/64 addition

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 08/14] hardfloat: support float32/64 addition

From:	Alex Bennée
Subject:	Re: [Qemu-devel] [PATCH v2 08/14] hardfloat: support float32/64 addition and subtraction
Date:	Wed, 28 Mar 2018 11:17:06 +0100
User-agent:	mu4e 1.1.0; emacs 26.0.91

Emilio G. Cota <address@hidden> writes:

> Note that for float32 we do most checks on the float32 and not on
> the native type; for float64 we do the opposite. This is faster
> than going either way for both, as shown below.

This looks like the difference between SIMD float instructions and
straight mask and cmp. I guess it doesn't pay off until the bigger
sizes. It would be nice to not have to jump through hoops to convince
the compiler of that though. The fpclassify leads to an intrinsic
built-in so the compiler probably knows more about the behaviour.

>
> I am keeping both macro-based definitions to ease testing of
> either option.
>
> Performance results (single and double precision) for fp-bench
> run under aarch64-linux-user on an Intel(R) Core(TM) i7-4790K
> CPU @ 4.00GHz host:
>
> - before:
> add-single: 86.74 MFlops
> add-double: 86.46 MFlops
> sub-single: 83.33 MFlops
> sub-double: 84.57 MFlops
>
> - after this commit:
> add-single: 188.89 MFlops
> add-double: 172.27 MFlops
> sub-single: 187.69 MFlops
> sub-double: 171.89 MFlops
>
> - w/ both using float32/64_is_normal etc.:
> add-single: 187.63 MFlops
> add-double: 143.51 MFlops
> sub-single: 187.91 MFlops
> sub-double: 144.23 MFlops
>
> - w/ both using fpclassify etc.:
> add-single: 166.61 MFlops
> add-double: 172.32 MFlops
> sub-single: 169.13 MFlops
> sub-double: 173.09 MFlops
>
> Signed-off-by: Emilio G. Cota <address@hidden>
> ---
>  fpu/softfloat.c | 120 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 112 insertions(+), 8 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index ffe16b2..e0ab0ca 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -829,8 +829,8 @@ float16  __attribute__((flatten)) float16_add(float16 a, 
> float16 b,
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 __attribute__((flatten)) float32_add(float32 a, float32 b,
> -                                             float_status *status)
> +static float32 __attribute__((flatten, noinline))
> +soft_float32_add(float32 a, float32 b, float_status *status)
>  {
>      FloatParts pa = float32_unpack_canonical(a, status);
>      FloatParts pb = float32_unpack_canonical(b, status);
> @@ -839,8 +839,8 @@ float32 __attribute__((flatten)) float32_add(float32 a, 
> float32 b,
>      return float32_round_pack_canonical(pr, status);
>  }
>
> -float64 __attribute__((flatten)) float64_add(float64 a, float64 b,
> -                                             float_status *status)
> +static float64 __attribute__((flatten, noinline))
> +soft_float64_add(float64 a, float64 b, float_status *status)
>  {
>      FloatParts pa = float64_unpack_canonical(a, status);
>      FloatParts pb = float64_unpack_canonical(b, status);
> @@ -859,8 +859,8 @@ float16 __attribute__((flatten)) float16_sub(float16 a, 
> float16 b,
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 __attribute__((flatten)) float32_sub(float32 a, float32 b,
> -                                             float_status *status)
> +static float32 __attribute__((flatten, noinline))
> +soft_float32_sub(float32 a, float32 b, float_status *status)
>  {
>      FloatParts pa = float32_unpack_canonical(a, status);
>      FloatParts pb = float32_unpack_canonical(b, status);
> @@ -869,8 +869,8 @@ float32 __attribute__((flatten)) float32_sub(float32 a, 
> float32 b,
>      return float32_round_pack_canonical(pr, status);
>  }
>
> -float64 __attribute__((flatten)) float64_sub(float64 a, float64 b,
> -                                             float_status *status)
> +static float64 __attribute__((flatten, noinline))
> +soft_float64_sub(float64 a, float64 b, float_status *status)
>  {
>      FloatParts pa = float64_unpack_canonical(a, status);
>      FloatParts pb = float64_unpack_canonical(b, status);
> @@ -879,6 +879,110 @@ float64 __attribute__((flatten)) float64_sub(float64 a, 
> float64 b,
>      return float64_round_pack_canonical(pr, status);
>  }

I don't know if we want to put a comment in about the inline strategy?

>From what I can tell if we do flatten into the ultimate function it just
means the compiler needs a bigger preamble which it can get away with
on hopefully the fast path. So I reluctantly agree the macros are worth
it.

>
> +#define GEN_FPU_ADDSUB(add_name, sub_name, soft_t, host_t,              \
> +                       host_abs_func, min_normal)                       \
> +    static inline __attribute__((always_inline)) soft_t                 \
> +    fpu_ ## soft_t ## _addsub(soft_t a, soft_t b, bool subtract,        \
> +                              float_status *s)                          \
> +    {                                                                   \
> +        soft_t ## _input_flush2(&a, &b, s);                             \
> +        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \
> +                   (soft_t ## _is_normal(b) || soft_t ## _is_zero(b)) && \
> +                   s->float_exception_flags & float_flag_inexact &&     \
> +                   s->float_rounding_mode == float_round_nearest_even)) { \
> +            host_t ha = soft_t ## _to_ ## host_t(a);                    \
> +            host_t hb = soft_t ## _to_ ## host_t(b);                    \
> +            host_t hr;                                                  \
> +            soft_t r;                                                   \
> +                                                                        \
> +            if (subtract) {                                             \
> +                hb = -hb;                                               \
> +            }                                                           \
> +            hr = ha + hb;                                               \
> +            r = host_t ## _to_ ## soft_t(hr);                           \
> +            if (unlikely(soft_t ## _is_infinity(r))) {                  \
> +                s->float_exception_flags |= float_flag_overflow;        \
> +            } else if (unlikely(host_abs_func(hr) <= min_normal) &&     \
> +                       !(soft_t ## _is_zero(a) &&                       \
> +                         soft_t ## _is_zero(b))) {                      \
> +                goto soft;                                              \
> +            }                                                           \
> +            return r;                                                   \
> +        }                                                               \
> +    soft:                                                               \
> +        if (subtract) {                                                 \
> +            return soft_ ## soft_t ## _sub(a, b, s);                    \
> +        } else {                                                        \
> +            return soft_ ## soft_t ## _add(a, b, s);                    \
> +        }                                                               \
> +    }                                                                   \
> +                                                                        \
> +    soft_t add_name(soft_t a, soft_t b, float_status *status)           \
> +    {                                                                   \
> +        return fpu_ ## soft_t ## _addsub(a, b, false, status);          \
> +    }                                                                   \
> +                                                                        \
> +    soft_t sub_name(soft_t a, soft_t b, float_status *status)           \
> +    {                                                                   \
> +        return fpu_ ## soft_t ## _addsub(a, b, true, status);           \
> +    }
> +
> +GEN_FPU_ADDSUB(float32_add, float32_sub, float32, float, fabsf, FLT_MIN)
> +#undef GEN_FPU_ADDSUB
> +
> +#define GEN_FPU_ADDSUB(add_name, sub_name, soft_t, host_t,              \
> +                       host_abs_func, min_normal)                       \
> +    static inline __attribute__((always_inline)) soft_t                 \
> +    fpu_ ## soft_t ## _addsub(soft_t a, soft_t b, bool subtract,        \
> +                              float_status *s)                          \
> +    {                                                                   \
> +        double ha, hb;                                                  \
> +                                                                        \
> +        soft_t ## _input_flush2(&a, &b, s);                             \
> +        ha = soft_t ## _to_ ## host_t(a);                               \
> +        hb = soft_t ## _to_ ## host_t(b);                               \
> +        if (likely((fpclassify(ha) == FP_NORMAL ||                      \
> +                    fpclassify(ha) == FP_ZERO) &&                       \
> +                   (fpclassify(hb) == FP_NORMAL ||                      \
> +                    fpclassify(hb) == FP_ZERO) &&                       \
> +                   s->float_exception_flags & float_flag_inexact &&     \
> +                   s->float_rounding_mode == float_round_nearest_even)) { \
> +            host_t hr;                                                  \
> +                                                                        \
> +            if (subtract) {                                             \
> +                hb = -hb;                                               \
> +            }                                                           \
> +            hr = ha + hb;                                               \
> +            if (unlikely(isinf(hr))) {                                  \
> +                s->float_exception_flags |= float_flag_overflow;        \
> +            } else if (unlikely(host_abs_func(hr) <= min_normal) &&     \
> +                       !(soft_t ## _is_zero(a) &&                       \
> +                         soft_t ## _is_zero(b))) {                      \
> +                goto soft;                                              \
> +            }                                                           \
> +            return host_t ## _to_ ## soft_t(hr);                        \
> +        }                                                               \
> +    soft:                                                               \
> +        if (subtract) {                                                 \
> +            return soft_ ## soft_t ## _sub(a, b, s);                    \
> +        } else {                                                        \
> +            return soft_ ## soft_t ## _add(a, b, s);                    \
> +        }                                                               \
> +    }                                                                   \
> +                                                                        \
> +    soft_t add_name(soft_t a, soft_t b, float_status *status)           \
> +    {                                                                   \
> +        return fpu_ ## soft_t ## _addsub(a, b, false, status);          \
> +    }                                                                   \
> +                                                                        \
> +    soft_t sub_name(soft_t a, soft_t b, float_status *status)           \
> +    {                                                                   \
> +        return fpu_ ## soft_t ## _addsub(a, b, true, status);           \
> +    }
> +
> +GEN_FPU_ADDSUB(float64_add, float64_sub, float64, double, fabs, DBL_MIN)
> +#undef GEN_FPU_ADDSUB
> +
>  /*
>   * Returns the result of multiplying the floating-point values `a' and
>   * `b'. The operation is performed according to the IEC/IEEE Standard


Reviewed-by: Alex Bennée <address@hidden>

--
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v2 00/14] fp-test + hardfloat, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 06/14] target/tricore: use float32_is_denormal, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 03/14] softfloat: fix {min, max}nummag for same-abs-value inputs, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 08/14] hardfloat: support float32/64 addition and subtraction, Emilio G. Cota, 2018/03/27
  - Re: [Qemu-devel] [PATCH v2 08/14] hardfloat: support float32/64 addition and subtraction, Alex Bennée <=
- [Qemu-devel] [PATCH v2 05/14] softfloat: add float{32, 64}_is_{de, }normal, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 01/14] tests: add fp-bench, a collection of simple floating-point microbenchmarks, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 13/14] hardfloat: support float32/64 comparison, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 04/14] fp-test: add muladd variants, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 10/14] hardfloat: support float32/64 division, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 14/14] hardfloat: support float32_to_float64, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 07/14] fpu: introduce hardfloat, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 12/14] hardfloat: support float32/64 square root, Emilio G. Cota, 2018/03/27
- [Qemu-devel] [PATCH v2 09/14] hardfloat: support float32/64 multiplication, Emilio G. Cota, 2018/03/27
  - Re: [Qemu-devel] [PATCH v2 09/14] hardfloat: support float32/64 multiplication, Alex Bennée, 2018/03/28

Prev by Date: Re: [Qemu-devel] [PATCH 3/4] nbd/server: implement dirty bitmap export
Next by Date: Re: [Qemu-devel] [RFC for-2.13 11/12] target/ppc: Remove unnecessary POWERPC_MMU_V3 flag from mmu_model
Previous by thread: [Qemu-devel] [PATCH v2 08/14] hardfloat: support float32/64 addition and subtraction
Next by thread: [Qemu-devel] [PATCH v2 05/14] softfloat: add float{32, 64}_is_{de, }normal
Index(es):
- Date
- Thread