My preliminary ideas for that would be the following: a suite of small
C programs that contain a main() method that contains the various code
to be tested (i.e. method calls, various expressions, complex
branching, access IO registers, etc, etc) that cover a broad base of
functionality. A single global variable in the C program would receive
the ultimate result of that computation--for example the accumulated
value of a complex loop. Then when compiled, the memory location at
which that global variable is extracted from the symbol table
information in the binary. That address and the expected value (which
is only set to the correct value by in-program tests) can be passed to
the simulator through the test file (as it is now--simply comments
with @Result = <list of state predicates>).
As far as Avrora goes, all of this functionality exists and is working.
Could be a good idea to set up a robust and rigorous test suite for
the code generation of avr-gcc.