[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat |
Date: |
Tue, 27 Mar 2018 12:49:48 +0100 |
User-agent: |
mu4e 1.1.0; emacs 26.0.91 |
Emilio G. Cota <address@hidden> writes:
> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the comment at the top of hostfloat.c for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.
The thing I would avoid is generating is any x87 instructions as we can
get weird effects if the compiler ever decides to stash a signalling NaN
in an x87 register.
Anyway perhaps -fno-fast-math should be explicit when building fpu/* code?
>
> The licensing in softfloat.h is complicated at best, so to keep things
> simple I'm adding this as a separate, GPL'ed file.
I don't think we need to worry about this. It's fine to add GPL only
stuff to softfloat.c and since the re-factoring (or before really) we
"own" this code and are unlikely to upstream anything.
My preference would be to include this all in softfloat.c unless there
is a very good reason not to.
>
> This patch just adds some boilerplate code; subsequent patches add
> operations, one per commit to ease bisection.
>
> Signed-off-by: Emilio G. Cota <address@hidden>
> ---
> Makefile.target | 2 +-
> include/fpu/hostfloat.h | 14 +++++++
> include/fpu/softfloat.h | 1 +
> fpu/hostfloat.c | 96
> +++++++++++++++++++++++++++++++++++++++++++++++
> target/m68k/Makefile.objs | 2 +-
> tests/fp-test/Makefile | 2 +-
> 6 files changed, 114 insertions(+), 3 deletions(-)
> create mode 100644 include/fpu/hostfloat.h
> create mode 100644 fpu/hostfloat.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 6549481..efcdfb9 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -97,7 +97,7 @@ obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o
> tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
> obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
> obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
> obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
> -obj-y += fpu/softfloat.o
> +obj-y += fpu/softfloat.o fpu/hostfloat.o
> obj-y += target/$(TARGET_BASE_ARCH)/
> obj-y += disas.o
> obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
> diff --git a/include/fpu/hostfloat.h b/include/fpu/hostfloat.h
> new file mode 100644
> index 0000000..b01291b
> --- /dev/null
> +++ b/include/fpu/hostfloat.h
> @@ -0,0 +1,14 @@
> +/*
> + * Copyright (C) 2018, Emilio G. Cota <address@hidden>
> + *
> + * License: GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#ifndef HOSTFLOAT_H
> +#define HOSTFLOAT_H
> +
> +#ifndef SOFTFLOAT_H
> +#error fpu/hostfloat.h must only be included from softfloat.h
> +#endif
> +
> +#endif /* HOSTFLOAT_H */
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 8fb44a8..8963b68 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -95,6 +95,7 @@ enum {
> };
>
> #include "fpu/softfloat-types.h"
> +#include "fpu/hostfloat.h"
>
> static inline void set_float_detect_tininess(int val, float_status *status)
> {
> diff --git a/fpu/hostfloat.c b/fpu/hostfloat.c
> new file mode 100644
> index 0000000..cab0341
> --- /dev/null
> +++ b/fpu/hostfloat.c
> @@ -0,0 +1,96 @@
> +/*
> + * hostfloat.c - FP primitives that use the host's FPU whenever possible.
> + *
> + * Copyright (C) 2018, Emilio G. Cota <address@hidden>
> + *
> + * License: GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + * Fast emulation of guest FP instructions is challenging for two reasons.
> + * First, FP instruction semantics are similar but not identical,
> particularly
> + * when handling NaNs. Second, emulating at reasonable speed the guest FP
> + * exception flags is not trivial: reading the host's flags register with a
> + * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
> + * and trapping on every FP exception is not fast nor pleasant to work with.
> + *
> + * This module leverages the host FPU for a subset of the operations. To
> + * do this it follows the main idea presented in this paper:
> + *
> + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
> + * binary translator." Software: Practice and Experience 46.12
> (2016):1591-1615.
> + *
> + * The idea is thus to leverage the host FPU to (1) compute FP operations
> + * and (2) identify whether FP exceptions occurred while avoiding
> + * expensive exception flag register accesses.
> + *
> + * An important optimization shown in the paper is that given that exception
> + * flags are rarely cleared by the guest, we can avoid recomputing some
> flags.
> + * This is particularly useful for the inexact flag, which is very frequently
> + * raised in floating-point workloads.
> + *
> + * We optimize the code further by deferring to soft-fp whenever FP
> + * exception detection might get hairy. Fortunately this is not common.
> + */
> +#include <math.h>
> +
> +#include "qemu/osdep.h"
> +#include "fpu/softfloat.h"
> +
> +#define GEN_TYPE_CONV(name, to_t, from_t) \
> + static inline to_t name(from_t a) \
> + { \
> + to_t r = *(to_t *)&a; \
> + return r; \
> + }
> +
> +GEN_TYPE_CONV(float32_to_float, float, float32)
> +GEN_TYPE_CONV(float64_to_double, double, float64)
> +GEN_TYPE_CONV(float_to_float32, float32, float)
> +GEN_TYPE_CONV(double_to_float64, float64, double)
> +#undef GEN_TYPE_CONV
> +
> +#define GEN_INPUT_FLUSH(soft_t) \
> + static inline __attribute__((always_inline)) void \
> + soft_t ## _input_flush__nocheck(soft_t *a, float_status *s) \
> + { \
> + if (unlikely(soft_t ## _is_denormal(*a))) { \
> + *a = soft_t ## _set_sign(soft_t ## _zero, \
> + soft_t ## _is_neg(*a)); \
> + s->float_exception_flags |= float_flag_input_denormal; \
> + } \
> + } \
> + \
> + static inline __attribute__((always_inline)) void \
> + soft_t ## _input_flush1(soft_t *a, float_status *s) \
> + { \
> + if (likely(!s->flush_inputs_to_zero)) { \
> + return; \
> + } \
> + soft_t ## _input_flush__nocheck(a, s); \
> + } \
> + \
> + static inline __attribute__((always_inline)) void \
> + soft_t ## _input_flush2(soft_t *a, soft_t *b, float_status *s) \
> + { \
> + if (likely(!s->flush_inputs_to_zero)) { \
> + return; \
> + } \
> + soft_t ## _input_flush__nocheck(a, s); \
> + soft_t ## _input_flush__nocheck(b, s); \
> + } \
> + \
> + static inline __attribute__((always_inline)) void \
> + soft_t ## _input_flush3(soft_t *a, soft_t *b, soft_t *c, \
> + float_status *s) \
> + { \
> + if (likely(!s->flush_inputs_to_zero)) { \
> + return; \
> + } \
> + soft_t ## _input_flush__nocheck(a, s); \
> + soft_t ## _input_flush__nocheck(b, s); \
> + soft_t ## _input_flush__nocheck(c, s); \
> + }
> +
> +GEN_INPUT_FLUSH(float32)
> +GEN_INPUT_FLUSH(float64)
Having spent time getting rid of a bunch of macro expansions I'm wary of
adding more in. However for these I guess it's kind of marginal.
> +#undef GEN_INPUT_FLUSH
> diff --git a/target/m68k/Makefile.objs b/target/m68k/Makefile.objs
> index ac61948..2868b11 100644
> --- a/target/m68k/Makefile.objs
> +++ b/target/m68k/Makefile.objs
> @@ -1,5 +1,5 @@
> obj-y += m68k-semi.o
> obj-y += translate.o op_helper.o helper.o cpu.o
> -obj-y += fpu_helper.o softfloat.o
> +obj-y += fpu_helper.o softfloat.o hostfloat.o
> obj-y += gdbstub.o
> obj-$(CONFIG_SOFTMMU) += monitor.o
> diff --git a/tests/fp-test/Makefile b/tests/fp-test/Makefile
> index 703434f..187cfcc 100644
> --- a/tests/fp-test/Makefile
> +++ b/tests/fp-test/Makefile
> @@ -28,7 +28,7 @@ ibm:
> $(WHITELIST_FILES):
> wget -nv -O $@ http://www.cs.columbia.edu/~cota/qemu/fpbench-$@
>
> -fp-test$(EXESUF): fp-test.o softfloat.o
> +fp-test$(EXESUF): fp-test.o softfloat.o hostfloat.o
>
> clean:
> rm -f *.o *.d $(OBJS)
--
Alex Bennée
- [Qemu-devel] [PATCH v1 14/14] hostfloat: support float32_to_float64, (continued)
- [Qemu-devel] [PATCH v1 14/14] hostfloat: support float32_to_float64, Emilio G. Cota, 2018/03/21
- [Qemu-devel] [PATCH v1 02/14] tests: add fp-test, a floating point test suite, Emilio G. Cota, 2018/03/21
- [Qemu-devel] [PATCH v1 04/14] fp-test: add muladd variants, Emilio G. Cota, 2018/03/21
- [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat, Emilio G. Cota, 2018/03/21
- Re: [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat, no-reply, 2018/03/21
- Re: [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat, no-reply, 2018/03/22
- Re: [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat, Alex Bennée, 2018/03/22