qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 02/14] tests: add fp-test, a floating point t


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [PATCH v1 02/14] tests: add fp-test, a floating point test suite
Date: Tue, 27 Mar 2018 14:00:18 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Mar 27, 2018 at 11:13:01 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota <address@hidden> writes:
> 
> > This will allow us to run correctness tests against our
> > FP implementation. The test can be run in two modes (called
> > "testers"): host and soft. With the former we check the results
> > and FP flags on the host machine against the model.
> > With the latter we check QEMU's fpu primitives against the
> > model. Note that in soft mode we are not instantiating any
> > particular CPU (hence the HW_POISON_H hack to avoid macro poisoning);
> > for that we need to run the test in host mode under QEMU.
> >
> > The input files are taken from IBM's FPGen test suite:
> > https://www.research.ibm.com/haifa/projects/verification/fpgen/
> >
> > I see no license file in there so I am just downloading them
> > with wget. We might want to keep a copy on a qemu server though,
> > in case IBM takes those files down in the future.
> 
> Hmm the files themselves have:
> 
>   Copyright of IBM Corp. 2005
> 
> So I'm not sure we can take them into any of our source trees.

Yes I don't think committing these would be appropriate. I guess keeping
our own copy of the tarball somewhere would be OK, though. My worry is
that at some point IBM's server will die and we'll have no more test
files.

> However what are we testing here?

fp-test allows you to test against a model. This model can be anything;
for now, I grabbed model files from IBM's fpgen, which tests for
IEEE compliance.

We can add more model files; the muladd tests are an example of that.

> Do we just want to test our implementation is IEEE compliant or should
> we generate our own test cases on validated hardware to check that our
> emulation of a guest is correct (e.g. correct modulo the valid
> variations in the standard)?

I think what we want is a large core of tests that test the standard,
plus a smaller set of tests that cover ISA particularities. I think
the IBM models are a good starting point for the former -- note
though that some operations are not covered in the models (despite
the syntax description specifying an op for them, e.g. int-to-float
conversions.)

> > The "IBM" syntax of those files (for now the only syntax supported
> > in fp-test) is documented here:
> > https://www.research.ibm.com/haifa/projects/verification/fpgen/papers/ieee-test-suite-v2.pdf
> >
> > Note that the syntax document has some inaccuracies; the appended
> > parsing code works around some of those.
> >
> > The exception flag (-e) is important: many of the optimizations
> > included in the following commits assume that the inexact flag
> > is set, so "-e x" is necessary in order to test those code paths.
> >
> > The whitelist flag (-w) points to a file with test cases to be ignored.
> > I have put some whitelist files online, but we should have them
> > on a QEMU-related server.
> >
> > Thus, a typical of fp-test is as follows:
> >
> >   $ cd qemu/build/tests/fp-test
> >   $ make -j && \
> >     ./fp-test -t soft ibm/*.fptest \
> >     -w whitelist.txt \
> >     -e x
> 
> So this is a unit test of our code rather than a test program running
> under QEMU?

Having the -t host/soft flags allows you flexibility in what to test.

With "host" mode, you're generating a binary that knows nothing
about QEMU, i.e. all its FP operations are native. You can use this
to (1) figure out whether your host diverts from the model [hopefully
it doesn't in anything substantial], and (2) test whether QEMU mimics
the corresponding host by running the binary under *-linux-user.

With "soft" mode, you're testing QEMU's soft-fp implementation. This
allows you to check it against the model. Note that it doesn't let
you check anything specific about a target CPU (hence the HW_POISON_H
hack); for this you'd have to go to point (2) above. Here instead we're
checking QEMU's FP implementation directly against the models.

>  I noted running under x86-64-linux-user fails pretty quick.

As I wrote below, I think this is due to bugs in the i386 target.

On a host x86_64 machine:
$ ./fp-test -t host ibm/* -w whitelist.txt -w whitelist-tininess-after.txt
All tests OK.
Tests passed: 74426. Not handled: 53297, whitelisted: 2748

$ ../../x86_64-linux-user/qemu-x86_64 ./fp-test -t host \
        ibm/* -w whitelist.txt -w whitelist-tininess-after.txt -n 2>/dev/null
Tests failed: 57479. Parsing: 0, result:14, flags:57465
Tests passed: 16947. Not handled: 53297, whitelisted: 2748

The results are different when run under QEMU, which means
this target is not doing things correctly (despite -t soft
passing).

> If so we really should be building this automatically in make check.

Yes, passing -soft mode would certainly be valuable and trivial
to integrate since there is nothing built that is target-dependent.

(snip)
> > --- /dev/null
> > +++ b/tests/fp-test/fp-test.c
(snip)
> > +enum precision {
> > +    PREC_FLOAT,
> > +    PREC_DOUBLE,
> > +    PREC_QUAD,
> > +    PREC_FLOAT_TO_DOUBLE,
> > +};
> 
> Again we will want to include half-precision.

Sure. We'll need to come up with model files for that though.

(snip)
> > +    }
> > +    if (unlikely(!strcmp("-inf", p) || !strcmp("-Inf", p))) {
> > +        ret->val = fp_choose(prec, 0xff800000, 0xfff0000000000000);
> > +        return 0;
> > +    }
> > +
> > +    len = strlen(p);
> > +
> > +    if (strchr(p, 'P')) {
> 
> Given we have already strlen'ed should we use memchr(p, 'P', len) or
> even strchrnul(p,'P') with a check we haven't run over the end? OR maybe
> use glib's string stuff instead.

I think in this case strchr is warranted; p points to a valid string
(otherwise strlen would also be dangerous).

(snip)
> > +/* Syntax of IBM FP test cases:
> > + * 
> > https://www.research.ibm.com/haifa/projects/verification/fpgen/syntax.txt
> > + */
> > +static enum error ibm_test_line(const char *line)
> > +{
> > +    struct test_op t;
> > +    /* at most nine fields; this should be more than enough for each field 
> > */
> > +    char s[9][64];
> > +    char *p;
> > +    int n, field;
> > +    int i;
> > +
> > +    /* data lines start with either b32 or d(64|128) */
> > +    if (unlikely(line[0] != 'b' && line[0] != 'd')) {
> > +        return ERROR_COMMENT;
> > +    }
> > +    n = sscanf(line, "%63s %63s %63s %63s %63s %63s %63s %63s %63s",
> > +               s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8]);
> > +    if (unlikely(n < 5 || n > 9)) {
> > +        return ERROR_INPUT;
> > +    }
> 
> Personally I find g_strsplit(line, " ", 9) and g_strfreev() handy for
> this sort of parsing.

I figured the above might be simpler since I wouldn't have to worry
about freeing memory. Also, scanf directly returns n, which is used
later.

Thanks,

                E.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]