dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DotGNU]Yet another volunteer


From: Marco Manfredini
Subject: [DotGNU]Yet another volunteer
Date: Mon, 06 Aug 2001 04:40:56 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.3) Gecko/20010801

Hello, my name is Marco Manfredini and some of you probably remember my from #dotgnu as marquise.

I'm reading the discussions regarding the dostgnu-IL support with great interest and would like to make a couple of annotations from here.

I was looking on the gcc->bytecode problem, after I read the call for volunteers and checked out this document:
http://cobolforgcc.sourceforge.net/cobol_toc.html
which has a intriguing chapter 14 that deals with the undocumented internals of gcc, mainly the AST->RTL part.

Was I read was disillusioning. Not only that the author clearly shows the coding problems that are raisen by the current coding practice in gcc, he also made clear that neither RTL or the "tree" ADT are powerful enough to preserve the semantics (i.e. signatures, expression logic) of the source code in a reasonable manner.

MS-IL/Metadata works AFAIK on the premise, that you can reconstruct the signatures of the "exported" entities (functions, classes). This is required for different reasons. One is, that you can #import an assembly, for reuse it in you code, another is for bytecode-validation (i.e. security). I can only suppose that the type-information problem can be solved by interpretation of debugging information (given the "RTL-wants-registers" problem can be solved), which looks like a kludge.

Compiling the gcc languages to IL will raise more issues. For example, it will probably turn out that the ulgy "managed" stuff (including the __gc et.al. keyworkd) has to be supported with gcc, because data under the control of the runtime cannot be handled so frankly as C/C++ allows it.

But what I want to say is, that we should not stick on the byte-code idea for the dotgnu-IL because of these reasons:

* Sure, the dotgnu-IL would support signatures of the external visible entities, but a byte-code representation would loose the expression semantics. This becomes clear if we consider a JIT which tries to generate optimal code for a platform. The usual approach is to reconstruct the expression tree from the byte code, because the expression tree gives a better overview about what happens and allows global optimization and inlining. So why should we remove information that has to be reconstructed later?

* A byte-code representation is too low-level which makes it less expressive. It deals with "values", but no more with "types","functions","scopes". This can make things very hard, for example predicting which variables are active and alive for a given byte-code line, and which associated cleanup code they have. This information would be neccesary to implement dtor-cleanup, and had to be included in the generated code as "Meta-Information". On the other hand, information about the currently active values and their types could be also used to implement an efficient garbage collector scheme, but once we defined the IL-Format we are more or less stuck to the amount of information we put into it.

* I dont't say that byte-code cannot do what I want, but I am sure that interesting applications (instantiation of generic types and function, global code optimization and inlining, aspect-oriented programming) are made unneccesarily hard.

I agree that we should invent before they patent and want to suggest that we should follow, what I can recognize from the previous posts on this list: The idea to use a semantical rich tree structure as a IL, which preserves as many informations about the code as possible (without being to language specific) and that we should give it a stupid name. I'd propose "melody", since this project offers a lot obvious puns on the background of C#.

I am currently developing ideas on how melody could look like, and try to gather information about similar ideas (i.e. anything that's like "compressed binary AST" as IL). Less for inspritation, but for the our invention claims.

Currently my melody boils down to the idea, that we'd have constructor expressions which build the semantitic entities we wish to represent. To give you an example. This C-function:

bool even(unsigned num)
{
 unsigned k=num%2;
 return k==0;
}

would look in a human-readable version of melody as

001 function $even($num: uint32):boolean
002 {
003  let($k:uint32 ; modulo<uint32>($num, 2);)
004  {
005  (equal<uint32>($k,0))
006  }
007 }

The text-version looks like paraphrase of the original C function, which doesn't mean that I cheat, but illustrates how much I would like to preserve.

Just a short description on happes here:

First, the operations which do "stuff" look like this:

op<type>(parameters).

You see for example modulo<uint32>($num,2). This means "calculate modulo with two parameter which are uint32". The compiler found, that $num is unsigned and compiles the right function. This is obvious if you remember that CPU usually has instructions like add.l addu.l addu.w, i.e. the right op-code for the right operand. Btw. If we can maintain a readable textual representation for melody, then it would also make a valuable debugging tool, since you can see the hidden operations (conversions, temporaries etc.) that you compiler has produced to compile you innocent looking code.

line 001 starts the constructor for a function which is named $even (to distiguish the name from the "keywords"). The num parameter has become an uint32, the C-translator decided that "unsigned int" should be a 32-bit value. (Btw. In RTL or gcc-tree the analogous expression wouldn't mention anymore, that the parameter is an /unsigned/ value. It only contains the information that it is a 32-Bit value, but we need this information to extract the signature.).

Now line 3 is interesting, because it shows how locals are introduced. I'd propose to use the let-syntax which you might be familiar with, if you are in functional programming. If not: My idea is, that every statement represents a value. For example the above "function" represents a ..eh..function. The let statement again, is the value of its expression with a certain name binding. In Scheme I can say:

        (let ((x 0)(y 0)) (+ x y)


You can read this as: the value of this statement is x+y where x is 0 and y is 0.

The melody-let statement here says: the value is $k=0 where $k is $num mod 2.

So the let contains a lot of information: It says, that $k is only available during the inner expression (which tells us about its block scope), it says that $k is initialized with '$num mod 2' (which is different from assigment, if you are familiar with C++ you will understand what I mean). Note that let has a third parameter (left blank), which is /the optional cleanup expression/ which has to be evaluated when the let-body is evaluated. This can be used by a compiler to represent its destructor semantics! (Now, do you see the fine difference between initialization and assigment? If the initialization expression generates an exception, then we are /not/ calling the cleanup expression, since there is nothing to cleanup!).

Well, this is just an example I made up to entertain you, I admit that there a still lot of things to consider (data aligment, how do I define classes & operations on them, avoiding machine dependencies etc.), but I hope you get the picture. But here's another teaser for you. Imagine what you can, if melody would allow you to define functors (that is functions in the melody language). For example:

functor even(T)
{
 function ($num:T)
 {
   let($k:T ; modulo<T>($num, 2);)
   {
    (equal<T>($k,0))
   }
 }
}

melodies value-types are functions, structs, expressions. It can operate on a type like on a number. So a "functor" implements essectially genericity, since he f.e. compute a new function by replacing the type-parameter!! This means, you can essentially /compile templates into melody code/ and instantiate these generics on demand! And I can inline them as well, because I'm producing a new expression tree, not stupid bytecode.

Well, I hope I'm not completely on the wrong track and that the idea is useful for the project. Comment please!

--
Marco
bytecodes? we don't need no stinkin' bytecodes.

P.S.
It should be possible to build a frontend to gcc which understands melody.











reply via email to

[Prev in Thread] Current Thread [Next in Thread]