Re: [Qemu-devel] qemu vs gcc4

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu vs gcc4

From:	Paul Brook
Subject:	Re: [Qemu-devel] qemu vs gcc4
Date:	Tue, 31 Oct 2006 22:08:18 +0000
User-agent:	KMail/1.9.5
On Tuesday 31 October 2006 20:41, Rob Landley wrote:
> Welcome to Stupid Question Theatre!  With your host, Paul Brook.  Today's
> contestant is: Rob Landley.  How dumb will it get?
>
> On Tuesday 31 October 2006 2:02 pm, Paul Brook wrote:
> > The basic principle is very similar. Host code is decomposed into an
> > intermediate form consisting of simple operations, then native code is
> > generated from those operations.
>
> I got that part.  It's the how I'm still head-scratching over.
>
> The disassembly routines seem relatively compiler-independent, but I'm
> under the impression that turning the intermediate result (the string of
> qops) into large blocks of translated code involves gluing together a bunch
> of smaller blocks of pregenerated code.  These pregenerated blocks were
> spit out by gcc and are where the all the compiler dependencies that aren't
> clear bugs come from.

Correct.

> I thought what you were doing was replacing the pregenerated blocks with
> hand-coded assembly statements, but your description here seems to be about
> changing the disassembly routines that figure out which qops to string
> together in part 2.

Replacing the pregenerated blocks with hand written assembly isn't feasible. 
Each target has its own set of ops, and each host would need its own assembly 
implementation of those ops. Multiply 11 targets by 11 hosts and you get a 
unmaintainable mess :-)

> > In the existing dyngen implementation most operands to ops are implicit,
> > with only a few ops taking explicit arguments. The principle with the new
> > system is that all operands are explicit.
>
> Having looked ahead to your example before replying to this, I think I
> understand that part now.  (Just barely.)
>
> > The intermediate representation used by the code generator resembles an
> > imaginary machine. This machine has various different instructions
> > (qops), and a nominally infinite register file (qregs).
>
> Each qreg is represented as an integer index?

Yes.

> > Each qop takes zero or more arguments, each of which may be an input or
> > output.
>
> The input or output is always one of these qreg indexes?  (Some of the
> existing ones seem to take immediate values...)

It is always a qreg.
Potentially we could decide that some qregs are constants rather than 
variables, and use that information for gode generation, but that's a 
slightly different issue.

> > In addition to dynamically allocated qregs there are a fixed set of qregs
> > that map onto the guest CPU state. This is to simplify code generation.
>
> These are indexes 0, 1, and 2?

They are defined by th code you quote below. However this is an implementation 
detail, and could change. You should use the named constants.

> Ok, looking at target-arm/translate.c, we have:
>
> static inline void gen_op_addl_T0_T1(void)
> {
>     gen_op_add32(QREG_T0, QREG_T0, QREG_T1);
> }
>
> So what is QREG_T0 anyway?  This is hard to grep for. 'find . | grep -v svn
> | xargs grep "QREG_T0"' doesn't produce anything useful, so there's got to
> be preprocessor concatenation stuff with ## going on, let's try just QREG
> on the *.h files, and yup at the start of qop.h there's this:

It corresponds to "T0" in dyngen. In addition to the actual CPU state, dyngen 
uses 3 fixed register as scratch workspace. for qop purposes these are part 
of the guest CPU state. They're only there to aid conversion of the 
translation code, they'll go away eventually.

> enum target_qregs {
>     QREG_NULL,
> #define DEFO32(name, offset) QREG_ ## name,
> #define DEFO64(name, offset) DEFO32(name, offset)
> #define DEFF32(name, reg) DEFO32(name, reg)
> #define DEFF64(name, reg) DEFO32(name, reg)
> #define DEFR(name, reg, mode) DEFO32(name, reg)
> #include "qregs.def"
>
> And that has "DEFR(T0, AREG1, QMODE_I32)" which...  Ok, DEFR() discards the
> third argument ("mode") completely, and then DEFO32() discards the second
> argument (offset), and what's left is just the name, so it's position
> dependent (so why have the darn macros at ALL?)

Because qregs.def in included in at least two other places. This is the C 
preprocessor trickery I mentioned :-)

> My brain hurts a lot now.  I'm just letting you know.  What is all this
> complication actually trying to accomplish?

Generation of 3 different things (QREG_* constants, the target_reginfo 
structure, and qreg_names) from a single source. This avoid having to keep 3 
big hairy arrays in sync with each other.
It's also used implement 64-bit qregs as a pair of 32-bit qregs on 32-bit 
hosts.

> > Each qreg has a particular type (32/64 bit, integer or float).
>
> You mean each qop's arguments have a particular type, and the arguments are
> always in qregs?  Or each qreg has a type permanently associated with that
> qreg?  

Both the above.

> Or the value currently in a qreg has a type associated with it, but 
> the next value stored in that qreg may have a different type?

A qreg has a fixed type. The value stored in that qreg has that type. To 
convert it to a different type you need to use an explicit conversion qop.

> > It's up to
> > you to make sure the argument types match those expected by the qop. It's
> > generally fairly obvious from the name. eg. add32 adds I32 values, addf64
> > adds F64 values, etc. The exception is that I64 values can be used in
> > place of I32. The upper 64-bit of outputs are undefined in this case, and
> > the value must be explicitly extended before the full 64 bits are used.
>
> Possible translation: you can feed a qreg containing an I64 value to a qop
> taking an i32 argument, and it'll typecast the sucker down intelligently,
> but if you produce an I32 result and expect to use that qreg's value as an
> I64 argument later, you have to call a sign-extending qop on it first?

Exactly.
If you mix I32,F32 and/or F64 in this way Bad Things will happen.

> > The old dyngen ops are actually implemented as a special case qops.
>
> You mean each dyngen op produces multiple qops?  (And/or is a bundle of
> qops?)

A dyngen op is a single qop that does magical unknown things.

> > As an example take the arm instruction
> >
> >   add, r0, r1, r2, lsl #2
> >
> > This is equivalent to the C expression
> >
> >  r0 = r1 + (r2 << 2)
> >
> > The old dyngen translate.c would do:
> >
> >   gen_op_movl_T1_r2()
> >   gen_op_shll_T1_im(2)
> >   gen_op_movl_T0_r1();
> >   gen_op_addl(); /* does T0 = T0 + T1 */
> >   gen_op_movl_r0_T0
>
> Digging down into target-arm/translate.c, function disas_arm_insn(), I'm...
> still having to take your word for it.  All the gen_op_movl_T1 variants I'm
> seeing end with _im which I presume means "immediate".  The alternative is
> _cc, but what does that mean?  (Presumably not "closed captioned".)

_cc are variants that set the condition codes. I may have got T0 and T1 
backwards in the first 3 lines.

> > When fully converted to the new system this would become:
> >
> >   int tmp = gen_new_qreg(); /* Allocate a temporary reg.  */
> >   /* gen_im32 is a helper that allocates a new qreg and
> >      initializes it to an immediate value.  */
> >   gen_op_add32(tmp, QREG_R2, gen_im32(2));
> >   gen_op_add32(QREG_R0, QREG_R1, tmp);
>
> Ok (still looking at target-arm/translate.c), I think you're not defining
> anything new here, you're just removing wrappers like gen_op_add_T1_im()
> which just wrap a single call to gen_op_add32(), and untangling the result?
>
> What the heck does gen_intermediate_code() do?  It's a wrapper for a
> function that returns the same value and takes the exact same arguments in
> the same order.  All that's different is the name.  Why does that exist?

Hysterical raisins. ie. nothing useful.

> > One of the changes I've made to target-arm/translate.c is to replace all
> > uses 
> > of T2 with new pseudo-regs. IN many cases I've left the code structure as
> > it was (using the global T0/T1 temporaries), but replaced the dyngen ops
> > with the equivalent qops. eg. movl and andl now generate mov32 and and32
> > qops.
>
> Um, is my earlier characterization of "unwrapping stuff" at all close?

Not entirely. I'm also replacing fixed locations (T2) with dynamicall 
allocated qregs.

> > The standard qops are defined in qops.def. A target can also define
> > additional qops in qop-target.def. The target specific qops are to
> > simplify implementation the i386 static flag propagation pass. the
> > expand_op_* routines.
>
> Yeah, I looked at that and the macros that generate it in qops.h.  There
> seem to be exactly two states (QREG_BLAH and QREGHI_BLAH) which can be
> reached from five different macros.  The "offset", "reg", and "mode"
> entries are universally ignored, and all you actually _get_ is a big enum
> of identifiers in a certain order.  I have no idea what's going on.

As mentioned above, qregs.def is included elsewhere.

> > For operations that are too complicated to be expressed as qops there is
> > a mechanism for calling helper functions. The m68k target uses this for
> > division and a couple of other things.
>
> Ok, now I'm really lost.

Most x86 instructions set the condition code flags. However most of the time 
these flags are ignored. eg. if you have to consecutive add instructions the 
first will set the flags, and the second will immediately overwrite them.

qemu contains a back-propagation pass that will remove the code to set the 
flags after the first instruction. Currently this is implemented by changing 
an addl_cc op into a plain addl op.

The flag-setting code would most likely require several qops to implement, so 
it would be much harder to prove it is not needed and remove it. So there is 
a mechanism for adding extra target qops, doing the flag elimination pass, 
then expanding those to generic qops.

m68k generates the _cc ops neccessary for doing this, but is missing the 
back-propagation optimization pass.

On RISC targets like ARM most instructions don't set the condition codes, so 
we don't bother doing this.

> > The implementation make fairly heavy use of the C preprocessor to
> > generate code from .def files. There's also a small shell script that
> > pulls the definiteions of the helper routines out of qop-helper.c
>
> Ah, hang on.  There's target_reginfo in translate-all.c, that's using some
> of the other values.  So what the heck does translate-all.c do?  (Shared
> code called by all the platform-dependent translate functions?)

There are three fairly independent stages:
1) target-*/translate.c converts guest code into qops.
2) translate-all.c messes about with those qops a bit (allocates host 
registers, etc).
3) translate-op.c,translate-qop.c and target-*/ turns those qops into host 
code.

> > The debug dumps can be quite useful. In particular -d in_asm,op will dump
> > the input asm and the resulting OPs.
>
> I'll have to find a system with gcc3 installed on it so I can actually try
> this out.  (Hmmm, I have a Red Hat 9 image I run under qemu, maybe it would
> build under that?)

Probably.

> > For converting targets you can probably ignore most of the translate-all
> > and host-*/ changes. These implement generating code from the qops.
>
> Ok, this implies that qops are a new thing.  Which looking at the code sort
> of supports.  Which means I don't understand what's going on at all.

qops and dyngen ops are both small "functions" that are represented in a 
similar way. The difference is that dyngen ops are target specific fixed 
functions, whereas qops are generic parameterized functions.

While they are really separate things, the details have been chosen so it 
should be possible to adapt the existing translate.c code rather than having 
to rewrite it from scratch. Decoding x86 instruction semantics is 
complicated :-)

Many of the simpler dyngen ops can be replaced with a single qop. Others can 
be replaces with a sequence of a few qops. Some of the more complicated ones 
may need to be moved into helper functions.

> > This works
> > by the host defining a set of "hard" qregs that correspond to host CPU
> > registers, and constraints for the operands of each qop. Then we do
> > register allocation and spilling to satisfy those constraints. The qops
> > can then be assembled directly into binary code.
>
> I need to re-read this later.  My brain's full and I'm deeply confused.

I started off by saying qops were effectively instructions for an imaginary 
machine. translate-all.c rearranges them so they match up very closely with 
the instructions available on the host. Once this has been done turning them 
into binary code is relatively simple.

> > There is also mechanisms for implementing floating point and 64-bit
> > arithmetic even if the target doesn't support this natively. The target
> > code doesn't need to worry about this, it just generates 64-bit/fp qops
> > and they will be decomposed as neccessary.
>
> The implementation calls the appropriate host functions to handle the
> floating point, using soft-float if necessary?  (Under the old dyngen thing
> outputting blocks of gcc-produced code, I could understand how that works. 
> But if you're outputting assembly directly...  I'm back in the "totally
> lost" aread again, I think.)

Err, sort of. There's a couple of different layers.

In translate.c you'll do something like

  tmp = gen_new_qreg(QMODE_F32);
  gen_op_addf32(tmp, QREG_FOO, QREG_BAR).

If the host implements the floating point qops 'natively' then this will work 
exactly the same as the integer qops and end up as host floating point 
instructions. Currently this is not implemented for any hosts.

If native host FP is not available qemu will include appropriate bits so that 
after macro expansion and inlining you end up with:

  tmp = gen_new_qreg(QMODE_I32);
  gen_op_helper(HELPER_addf32, tmp, QREG_FOO, QREG_BAR).

and the addf32 helper does the floating point addition using the "softfloat" 
library. The qemu softfloat library implementation may actually use hardware 
floating point rather than doing everything manually.

Likewise if the host doesn't have 64-bit operations gen_op_and64 will actually 
expand to a pair of and32 operations.

Paul
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] qemu vs gcc4, (continued)
Prev by Date: [Qemu-devel] [PATCH] USB network interface
Next by Date: Re: [Qemu-devel] qemu vs gcc4
Previous by thread: Re: [Qemu-devel] qemu vs gcc4
Next by thread: Re: [Qemu-devel] qemu vs gcc4
Index(es):
- Date
- Thread