qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 01/14] tests: add fp-bench, a collection of s


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [PATCH v1 01/14] tests: add fp-bench, a collection of simple floating-point microbenchmarks
Date: Tue, 27 Mar 2018 13:21:26 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Mar 27, 2018 at 09:45:14 +0100, Alex Bennée wrote:
> Emilio G. Cota <address@hidden> writes:
(snip)
> > +/*
> > + * Disable optimizations (e.g. "a OP b" outside of the inner loop) with
> > + * volatile.
> > + */
> > +#define GEN_BENCH_1OPF(NAME, FUNC, PRECISION)                           \
> > +    static void NAME(volatile PRECISION *res)                           \
> > +    {                                                                   \
> > +        uint64_t ra = SEED_A;                                           \
> > +        uint64_t i, j;                                                  \
> > +                                                                        \
> > +        for (i = 0; i < n_ops; i += OPS_PER_ITER) {                     \
> > +            volatile PRECISION a = glue(get_random_, PRECISION)(&ra);   \
> > +                                                                        \
> > +            for (j = 0; j < OPS_PER_ITER; j++) {                        \
> > +                *res = FUNC(a);                                         \
> > +            }                                                           \
> > +        }                                                               \
> > +    }
> > +
> 
> Have you had a chance to look at if this will vectorise? I have a
> similar benchmark which I compile with multiple options to test normal,
> NEON/AdvSIMD and SVE enabled loops.

It does not. I'm pretty sure the volatile there prevents the compiler
from doing anything smart. In this case I don't want the compiler
to vectorise though, but I can see how that would be a nice
benchmark to have in addition to the above.

> > +        case 'p':
> > +            precision = optarg;
> > +            if (strcmp(precision, "float") &&
> > +                strcmp(precision, "single") &&
> > +                strcmp(precision, "double")) {
> > +                fprintf(stderr, "Unsupported precision '%s'\n", precision);
> > +                exit(EXIT_FAILURE);
> 
> Supporting half-precision if the compiler does would also be useful here.

I wasn't speeding those up so didn't care to test them. But yes I can see how
that could be useful for arm/aarch64; we can add it later.

> > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > index ef9b88c..f6121ee 100644
> > --- a/tests/Makefile.include
> > +++ b/tests/Makefile.include
> > @@ -587,7 +587,7 @@ test-obj-y = tests/check-qnum.o tests/check-qstring.o 
> > tests/check-qdict.o \
> >     tests/rcutorture.o tests/test-rcu-list.o \
> >     tests/test-qdist.o tests/test-shift128.o \
> >     tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o \
> > -   tests/atomic_add-bench.o
> > +   tests/atomic_add-bench.o tests/fp-bench.o
> 
> Not sure why but "make check" didn't build this. I had to explicitly
> "make tests/fp-bench". I guess along with atomic_add_bench though these
> are explicitly guest facing tests so maybe we should move them once
> tests/tcg is working again. I'll have another run at that this week.

That was intentional; these are benchmarks rather than tests so I
wouldn't expect make check to build them or run them at all. So that was 


> >  $(test-obj-y): QEMU_INCLUDES += -Itests
> >  QEMU_CFLAGS += -I$(SRC_PATH)/tests
> > @@ -639,6 +639,7 @@ tests/test-qht-par$(EXESUF): tests/test-qht-par.o 
> > tests/qht-bench$(EXESUF) $(tes
> >  tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
> >  tests/test-bufferiszero$(EXESUF): tests/test-bufferiszero.o 
> > $(test-util-obj-y)
> >  tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o 
> > $(test-util-obj-y)
> > +tests/fp-bench$(EXESUF): tests/fp-bench.o $(test-util-obj-y)
> >
> >  tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
> >     hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
> 
> Anyway for this version:
> 
> Reviewed-by: Alex Bennée <address@hidden>

Thanks! I'll keep this for v3 (I sent v2 yesterday), since not
much changed.

If I had more time to work on this I'd like to have a -t soft/host flag
like in fp-test. Right now there is no such flag so we default to "host";
IOW, we end up testing the performance of the whole sausage, i.e. guest
compiler + QEMU. This is useful because it represents real-life
scenarios. However, if we tested the functions in fpu/ directly,
we'd get benchmarking that (1) would be more sensitive to the functions
we want to benchmark, and (2) would not depend on the particular
implementation of the QEMU target (e.g. i386 does not emit fma
at all!).

Thanks,

                Emilio




reply via email to

[Prev in Thread] Current Thread [Next in Thread]