[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register o
From: |
Dave N6NZ |
Subject: |
Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations] |
Date: |
Sun, 13 Jan 2008 15:19:29 -0800 |
User-agent: |
Thunderbird 1.5 (X11/20051201) |
Weddington, Eric wrote:
Hi John, Dave, others,
Here are some random thoughts about a benchmark test suite:
- GCC has a page on benchmarks:
<http://gcc.gnu.org/benchmarks/>
However all of those are geared towards larger processors and host
systems. There is a link to a benchmark that focuses on code size,
CSiBE, <http://www.inf.u-szeged.hu/csibe/>. Again, that benchmark is
geared towards larger processors.
This creates a need to have a benchmark that is geared towards 8-bit
microcontroller environments in general, and specifically for the AVR.
What would we like to test?
Code size for sure. Everyone always seems to be interested in code size.
There is an interest in seeing how the GCC compiler performs from one
version to the next, to see if optimizations have improved or if they
have regressed.
Which I would call regression tests, not "benchmarks", per se. Of
performance regressions, I would guess that code size regressions under
-Os are the #1 priority for the typical user. (A friend is currently
tearing his hair out over a code size regression in a commercial PIC C
compiler -- he needs to release a minor firmware update to the field...
but not even the original code fits his flash any more...)
It's worth drawing a distinction between benchmarks and regression
tests. They need to be written differently. A regression test needs to
sensitize a particular condition, and needs to be small enough to be
debuggable. A benchmark needs to be "realistic", which often makes them
harder to debug. I say we need both. The performance regression tests
can easily roll into release criteria. A suite of performance
benchmarks is more useful as a confirmatory "measure of goodness" -- but
actual mysteries in the aggregate score will most likely be chased with
smaller tests.
My guess is that existing tests my help us a lot in the benchmark
category, but the regression tests will require some elbow grease on our
part to get a good set. There's a good chance we can extract good
regression tests from existing benchmark-sized tests.
A semi-related question is how many of these tests can be pushed up
stream? If we could get a handful of uCtlr-oriented code size
regression tests packaged up so that the developers of the generic
optimizer could run them as release criteria, it would, I would think,
improve the overall quality of gcc for all uCtlr targets.
There is also an interest in comparing AVR compilers, such as how GCC
compares to IAR, Codevision or ImageCraft compilers.
Who is interested? gcc developers, as a means to keep gcc competitive?
Or potential users? The former is benchmarking, the latter is moving
towards bench-marketing. Not that marketing is bad, but that sort of
thing can be a distraction. In any case, the tests that are meaningful
here are the benchmark "overall goodness" test suite, not the targeted
test suite.
And sometimes there is an interest in comparing AVR against other
microcontrollers, notably Microchip's PIC and TI's MSP430.
Different processor with same compiler? Different processor with best
compiler? -- Now this is beginning to sound like SPEC.
Because there are these different interests, it is challenging to come
up with appropriate code samples to showcase and benchmark these
different issues. But we could also implement this in stages, and focus
on AVR-specific code, and GCC-specific AVR code at that.
Clarity of classification is import. Different buckets for different
issues.
If we are going to put together a benchmark test suite, like others
benchmarks for GCC (for larger processors), then I would think that it
would be better to model it somewhat after those other benchmarks. I see
that they tend to use publicly available code, and a variety of
different types of applications.
For benchmarking, and bench-marketing, that's a good approach. I'll be
redundant and say those are probably not what you want to be debugging.
It would make sense for what I'll call a "avr-gcc dashboard". I see a
web page with a bunch of bar graphs on it. A summary bar at the top
that is the weighted sum of individual test bars. As an avr-gcc user,
that kind of summary page would be very useful from one release to the
next for setting expectations regarding performance on your own
application. As an avr-gcc release master, it's a good dashboard for
tracking progress and release worthy-ness.
We should have something similar. Some
suggested projects: FreeRTOS (for the AVR)
Sounds good,
>, uIP (however, we need to
pick a specific implementation of it for the AVR; I have a copy of
uIP-Crumb644),
Another good one
the Atmel 802.15.4 MAC,
Need to check license on that one -- but a good choice otherwise
and the GCC version of the
Butterfly firmware. I also have a copy of the "TI Competitive
Benchmark", which they, and other semiconductor companies, have used to
do comparisons between processors.
Not familiar with it. Also, check the license. Processor manufacturers
(like, oh, for instance, *all* the several I have worked for) are very
touchy about benchmarks and benchmark publications. My sea charts have
a notation: "Here be lawyers".
Does anyone have other suggestions on projects to include in the
Benchmark? One are that seems to be lacking is some application that
uses floating point. Any help to find some application in this area
would be much appreciated.
Yup. Floating point is important, but we could probably make some
synthetic benchmarks pretty quickly that were meaningful. Need to watch
the data sets, though, since run time can vary greatly once you get into
gradual underflow or NaN's and such. Also, remember these may need to
run on a simulator, and need to complete in our lifetime.
There needs to be some consensus on what we measure, how we measure it,
what output files we want generated, and hopefully some way to
automatically generate composite results. I'm certainly open to anything
in this area. I would think that we need to be as open as possible on
this, with documentation (minimal, it can be a text file) on what are
our methods, how the results were arrived at, but importantly that the
secondary/generated files be available for others to review and verify
the results.
Agree completely.
On practicalities: I am certainly willing to host the benchmark test
suite on the WinAVR project on SourceForge and use it's CVS repository.
If it is desired to have it in a more neutral place, such as avr-libc,
I'm open to that too, if Joerg Wunsch is willing.
Seems to me that as long as they are publicly available under an
appropriate license, it doesn't really matter much who backs them up :)
Thoughts?
Test categories:
1. float v. scalar
2. targeted test v. benchmark v. published dashboard metric
3. member of quick v. extended v. full test list
4. size v. speed
That unrolls into 36 test lists, but the same test may appear multiple
times (in both quick and extended, perhaps both size and speed).
As to priorities, IMO the top two priorities are:
1. targeted scalar size
2. targeted scalar speed
Why? To get tests that target specific optimization regressions. A
size regression is more painful to an embedded developer than a speed
regression. Floating point math is largely in a library so less at risk
for a compiler optimization regression.
I'm not saying other things are not important, that's just my take on
what to tackle first (after infrastructure, of course.)
-dave
BTW -- having a defined place to put a performance regression test is a
good start. Any performance regression that pops up should have a test
written for it and cataloged in the framework.
Thanks,
Eric Weddington
- AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], (continued)
- AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], Weddington, Eric, 2008/01/11
- Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], John Regehr, 2008/01/13
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Registeroptimisations], Weddington, Eric, 2008/01/13
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Registeroptimisations], John Regehr, 2008/01/14
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVRRegisteroptimisations], Weddington, Eric, 2008/01/14
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVRRegisteroptimisations], John Regehr, 2008/01/14
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list]GCC-AVRRegisteroptimisations], Weddington, Eric, 2008/01/14
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list]GCC-AVRRegisteroptimisations], John Regehr, 2008/01/14
- RE: AVR Benchmark Test Suite [was: RE:[avr-gcc-list]GCC-AVRRegisteroptimisations], Weddington, Eric, 2008/01/22
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Registeroptimisations], Weddington, Eric, 2008/01/13
- Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations],
Dave N6NZ <=
- Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], John Regehr, 2008/01/13
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], Weddington, Eric, 2008/01/13
- Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], Joerg Wunsch, 2008/01/14
- RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVRRegister optimisations], Weddington, Eric, 2008/01/14
- Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVRRegister optimisations], Joerg Wunsch, 2008/01/14