Re: [Qemu-devel] qemu vs gcc4

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu vs gcc4

From:	Rob Landley
Subject:	Re: [Qemu-devel] qemu vs gcc4
Date:	Tue, 31 Oct 2006 15:41:44 -0500
User-agent:	KMail/1.9.1
Welcome to Stupid Question Theatre!  With your host, Paul Brook.  Today's 
contestant is: Rob Landley.  How dumb will it get?

On Tuesday 31 October 2006 2:02 pm, Paul Brook wrote:
> The basic principle is very similar. Host code is decomposed into an 
> intermediate form consisting of simple operations, then native code is 
> generated from those operations.

I got that part.  It's the how I'm still head-scratching over.

The disassembly routines seem relatively compiler-independent, but I'm under 
the impression that turning the intermediate result (the string of qops) into 
large blocks of translated code involves gluing together a bunch of smaller 
blocks of pregenerated code.  These pregenerated blocks were spit out by gcc 
and are where the all the compiler dependencies that aren't clear bugs come 
from.

I thought what you were doing was replacing the pregenerated blocks with 
hand-coded assembly statements, but your description here seems to be about 
changing the disassembly routines that figure out which qops to string 
together in part 2.

> In the existing dyngen implementation most operands to ops are implicit,
> with only a few ops taking explicit arguments. The principle with the new
> system is that all operands are explicit.

Having looked ahead to your example before replying to this, I think I 
understand that part now.  (Just barely.)

> The intermediate representation used by the code generator resembles an 
> imaginary machine. This machine has various different instructions (qops), 
> and a nominally infinite register file (qregs).

Each qreg is represented as an integer index?

> Each qop takes zero or more arguments, each of which may be an input or
> output.

The input or output is always one of these qreg indexes?  (Some of the 
existing ones seem to take immediate values...)

> In addition to dynamically allocated qregs there are a fixed set of qregs
> that map onto the guest CPU state. This is to simplify code generation.

These are indexes 0, 1, and 2?

Ok, looking at target-arm/translate.c, we have:

static inline void gen_op_addl_T0_T1(void)
{
    gen_op_add32(QREG_T0, QREG_T0, QREG_T1);
}

So what is QREG_T0 anyway?  This is hard to grep for. 'find . | grep -v svn | 
xargs grep "QREG_T0"' doesn't produce anything useful, so there's got to be 
preprocessor concatenation stuff with ## going on, let's try just QREG on the 
*.h files, and yup at the start of qop.h there's this:

enum target_qregs {
    QREG_NULL,
#define DEFO32(name, offset) QREG_ ## name,
#define DEFO64(name, offset) DEFO32(name, offset)
#define DEFF32(name, reg) DEFO32(name, reg)
#define DEFF64(name, reg) DEFO32(name, reg)
#define DEFR(name, reg, mode) DEFO32(name, reg)
#include "qregs.def"

And that has "DEFR(T0, AREG1, QMODE_I32)" which...  Ok, DEFR() discards the 
third argument ("mode") completely, and then DEFO32() discards the second 
argument (offset), and what's left is just the name, so it's position 
dependent (so why have the darn macros at ALL?)

My brain hurts a lot now.  I'm just letting you know.  What is all this 
complication actually trying to accomplish?

> Each qreg has a particular type (32/64 bit, integer or float).

You mean each qop's arguments have a particular type, and the arguments are 
always in qregs?  Or each qreg has a type permanently associated with that 
qreg?  Or the value currently in a qreg has a type associated with it, but 
the next value stored in that qreg may have a different type?

> It's up to 
> you to make sure the argument types match those expected by the qop. It's 
> generally fairly obvious from the name. eg. add32 adds I32 values, addf64 
> adds F64 values, etc. The exception is that I64 values can be used in place 
> of I32. The upper 64-bit of outputs are undefined in this case, and the
> value must be explicitly extended before the full 64 bits are used.

Possible translation: you can feed a qreg containing an I64 value to a qop 
taking an i32 argument, and it'll typecast the sucker down intelligently, but 
if you produce an I32 result and expect to use that qreg's value as an I64 
argument later, you have to call a sign-extending qop on it first?

> The old dyngen ops are actually implemented as a special case qops.

You mean each dyngen op produces multiple qops?  (And/or is a bundle of qops?)

> As an example take the arm instruction
> 
>   add, r0, r1, r2, lsl #2
> 
> This is equivalent to the C expression
> 
>  r0 = r1 + (r2 << 2)
> 
> The old dyngen translate.c would do:
> 
>   gen_op_movl_T1_r2()
>   gen_op_shll_T1_im(2)
>   gen_op_movl_T0_r1();
>   gen_op_addl(); /* does T0 = T0 + T1 */
>   gen_op_movl_r0_T0

Digging down into target-arm/translate.c, function disas_arm_insn(), I'm...  
still having to take your word for it.  All the gen_op_movl_T1 variants I'm 
seeing end with _im which I presume means "immediate".  The alternative is 
_cc, but what does that mean?  (Presumably not "closed captioned".)

> When fully converted to the new system this would become:
> 
>   int tmp = gen_new_qreg(); /* Allocate a temporary reg.  */
>   /* gen_im32 is a helper that allocates a new qreg and
>      initializes it to an immediate value.  */
>   gen_op_add32(tmp, QREG_R2, gen_im32(2));
>   gen_op_add32(QREG_R0, QREG_R1, tmp);

Ok (still looking at target-arm/translate.c), I think you're not defining 
anything new here, you're just removing wrappers like gen_op_add_T1_im() 
which just wrap a single call to gen_op_add32(), and untangling the result?

What the heck does gen_intermediate_code() do?  It's a wrapper for a function 
that returns the same value and takes the exact same arguments in the same 
order.  All that's different is the name.  Why does that exist?

> One of the changes I've made to target-arm/translate.c is to replace all 
uses 
> of T2 with new pseudo-regs. IN many cases I've left the code structure as it 
> was (using the global T0/T1 temporaries), but replaced the dyngen ops with 
> the equivalent qops. eg. movl and andl now generate mov32 and and32 qops.

Um, is my earlier characterization of "unwrapping stuff" at all close?

> The standard qops are defined in qops.def. A target can also define
> additional qops in qop-target.def. The target specific qops are to simplify 
> implementation the i386 static flag propagation pass. the expand_op_* 
> routines.

Yeah, I looked at that and the macros that generate it in qops.h.  There seem 
to be exactly two states (QREG_BLAH and QREGHI_BLAH) which can be reached 
from five different macros.  The "offset", "reg", and "mode" entries are 
universally ignored, and all you actually _get_ is a big enum of identifiers 
in a certain order.  I have no idea what's going on.

> For operations that are too complicated to be expressed as qops there is a 
> mechanism for calling helper functions. The m68k target uses this for 
> division and a couple of other things.

Ok, now I'm really lost.

> The implementation make fairly heavy use of the C preprocessor to generate 
> code from .def files. There's also a small shell script that pulls the 
> definiteions of the helper routines out of qop-helper.c

Ah, hang on.  There's target_reginfo in translate-all.c, that's using some of 
the other values.  So what the heck does translate-all.c do?  (Shared code 
called by all the platform-dependent translate functions?)

> The debug dumps can be quite useful. In particular -d in_asm,op will dump
> the input asm and the resulting OPs.

I'll have to find a system with gcc3 installed on it so I can actually try 
this out.  (Hmmm, I have a Red Hat 9 image I run under qemu, maybe it would 
build under that?)

> For converting targets you can probably ignore most of the translate-all and 
> host-*/ changes. These implement generating code from the qops.

Ok, this implies that qops are a new thing.  Which looking at the code sort of 
supports.  Which means I don't understand what's going on at all.

> This works 
> by the host defining a set of "hard" qregs that correspond to host CPU 
> registers, and constraints for the operands of each qop. Then we do register 
> allocation and spilling to satisfy those constraints. The qops can then be 
> assembled directly into binary code.

I need to re-read this later.  My brain's full and I'm deeply confused.

> There is also mechanisms for implementing floating point and 64-bit
> arithmetic even if the target doesn't support this natively. The target code
> doesn't need to worry about this, it just generates 64-bit/fp qops and they
> will be decomposed as neccessary.

The implementation calls the appropriate host functions to handle the floating 
point, using soft-float if necessary?  (Under the old dyngen thing outputting 
blocks of gcc-produced code, I could understand how that works.  But if 
you're outputting assembly directly...  I'm back in the "totally lost" aread 
again, I think.)

> Paul

Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] qemu vs gcc4, (continued)
Prev by Date: Re: [Qemu-devel] ColdFire/m68k target
Next by Date: [Qemu-devel] [PATCH] USB network interface
Previous by thread: Re: [Qemu-devel] qemu vs gcc4
Next by thread: Re: [Qemu-devel] qemu vs gcc4
Index(es):
- Date
- Thread