qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Automatic generation of code-generator components


From: Eliot Moss
Subject: [Qemu-devel] Automatic generation of code-generator components
Date: Tue, 13 Jul 2010 14:09:56 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5

Dear QEMU developers --

I have had some email conversation with a few active developers,
and with their encouragement, want to open it up for the whole list
to comment.

For several years my research group at UMass has been developing
generic code-generator generator (CGG) technology. Historic CGGs
have always been tied to a particular code-generation framework,
that is, to a particular intermediate representation (IR) and
compiler.  Our tool, called GIST (for Generator of Instruction
Selectors Tool), is designed to work from any reasonable IR and
to connect to any reasonable framework.  More technical details
below, but what we are hoping for is to be able to say that if
we make this industrial-strength with some funding from the
National Science Foundation, the QEMU community will be interested
in using it.  No commitment -- just that you think it *might*
be a good idea if we can make it go.  We would use QEMU as one
of our "demo" environments.

Ok, more details.  We have an architecture description language
called CISL (CoGenT Instruction Set Language; CoGenT is our overall
project's name).  It is somewhat like C or Java in appearance. You
define the various memories and registers, and the instructions.
To generate an instruction selector from input ISA A (generally
a compiler IR, but not necessarily) to output ISA B (generally,
but not always, a hardware architecture), you start with descriptions
of A and B in CISL -- some of which may already be around.  You
also write what we call a *mapping* from A to B, which simply
indicates where on B each memory/register of A should go.  The
tool then finds instruction selector patterns, at least one for
each instruction of the A machine.

For any given retargetable *framework* (compiler, interpreter, emulator),
we write one *adapter*, that knows how to take GIST patterns in their
internal form and write them out in the way that the framework
needs them.

Here's an example.  Suppose we are going from A = QEMU IR to
B = MIPS, that is, the same as the TCG back end for an emulator
running on the MIPS processor.  We have written a CISL
description for the QEMU IR (yes, already), and suppose we have
one for the MIPS, sufficient for code generation anyway.
[Side note: Compilers do not generally use every instruction
of their target, e.g., not the privileged mode ones, etc.
Also, in the presence of register allocation, they generally
target a slightly virtualized machine -- one with a huge
number of registers, which register allocation then resolves
to real registers and occasional spilled locations.]

The mapping would talk about how to find QEMU memory on the
MIPS (perhaps a dedicated base register), etc., and would
also capture the conventions for calling helper routines,
and so on.

The adapter for QEMU TCG back ends would generate something
like a C switch statement with one case for each QEMU IR
instruction. Each case might have some additional case
analysis. This is because (as you see in QEMU), a given
IR instruction can have special cases depending on values
of constants, whether something is in a register, etc.
GIST will have found different patterns for each of these,
and with each one there would be a *constraint*, indicating
when it applies.  For example, patterns for adding a constant
value on the MIPS would likely have a special case for
constants that fit in 16 bits, since then you can use one
immediate instruction.  Likewise, the constant 0 is a
special case since it can just be a move.  In addition
to constraints, patterns have costs, which one can develop
for any given target, but would typically be based on
number of instructions, number of instruction bytes,
number of memory references, etc. Thus the case analysis
for a given instruction would check for the lowest cost
patterns first, and would conclude with the most general
pattern (but which may be the most expensive).

The adapter would also need to generate the information
needed by the QEMU TCG register allocator.

Now, here are some things of additional interest:

- While QEMU IR -> emulation host code-generation is maybe
  the most obvious case, we can also handle the "front end"
  emulation target -> QEMU IR generation.  This probably
  requires a slightly different description of machine A
  than when A is the emulation host -- after all, we must
  handle *all* instructions, including privileged ones,
  etc.  But it is possible to make the descriptions
  modular in such a way that instructions used in both
  cases are not repeated.

- I noticed that someone is looking at interpretation
  rather than compilation.  We have seen that we can generate
  functional simulators (very close to emulators) from
  CISL descriptions.  Thus, it would be possible to generate
  a simulator for any of the machines of interest.  What
  QEMU provides is a framework with all the memory and
  device modeling, etc.

- An approach like this might make it easier to maintain a
  range of different models of the same ISA.  It might also
  facilitate moving towards multicore emulation, maybe even
  heterogeneous multicore.  It would also make it easier to
  change around how the simulated memory is organized and
  accessed, if that would be helpful.

- It would make it particularly simple to build an emulator for
  a new or extended machine.  Of course you still need a compiler
  for it, but we can use the same description to generate a C
  compiler, etc.

This would be a several year long project, with real support ($$)
for three or more years.  The goal is for GIST to have its own
self-sustaining open-source community after that.  We are in
conversation with some other software communities of interest
concerning whether they would also be in favor of the project.
These include the Jikes RVM Java Virtual Machine project
(both the optimizing and the non-optimizing compilers), another
compiler framework, and a simulator framework.

I look forward to your thoughts, questions, and reactions.

Regards -- Eliot Moss
==============================================================================
J. Eliot B. Moss, Professor               http://www.cs.umass.edu/~moss    www
Director, Arch. and Lang. Impl. Lab.      +1-413-545-4206                voice
Department of Computer Science            +1-413-695-4226                 cell
140 Governor's Drive, Room 372            +1-413-545-1249                  fax
University of Massachusetts at Amherst    address@hidden              email
Amherst, MA  01003-9264  USA              +1-413-545-2746 Laurie Downey  sec'y
==============================================================================



reply via email to

[Prev in Thread] Current Thread [Next in Thread]