[DotGNU]Yet another volunteer

dotgnu-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DotGNU]Yet another volunteer

From:	Marco Manfredini
Subject:	[DotGNU]Yet another volunteer
Date:	Mon, 06 Aug 2001 04:40:56 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.3) Gecko/20010801

Hello, my name is Marco Manfredini and some of you probably remember myfrom #dotgnu as marquise.

I'm reading the discussions regarding the dostgnu-IL support with greatinterest and would like to make a couple of annotations from here.

I was looking on the gcc->bytecode problem, after I read the call forvolunteers and checked out this document:

http://cobolforgcc.sourceforge.net/cobol_toc.html

which has a intriguing chapter 14 that deals with the undocumentedinternals of gcc, mainly the AST->RTL part.

Was I read was disillusioning. Not only that the author clearly showsthe coding problems that are raisen by the current coding practice ingcc, he also made clear that neither RTL or the "tree" ADT are powerfulenough to preserve the semantics (i.e. signatures, expression logic) ofthe source code in a reasonable manner.

MS-IL/Metadata works AFAIK on the premise, that you can reconstruct thesignatures of the "exported" entities (functions, classes). This isrequired for different reasons. One is, that you can #import anassembly, for reuse it in you code, another is for bytecode-validation(i.e. security). I can only suppose that the type-information problemcan be solved by interpretation of debugging information (given the"RTL-wants-registers" problem can be solved), which looks like a kludge.

Compiling the gcc languages to IL will raise more issues. For example,it will probably turn out that the ulgy "managed" stuff (including the__gc et.al. keyworkd) has to be supported with gcc, because data underthe control of the runtime cannot be handled so frankly as C/C++ allows it.

But what I want to say is, that we should not stick on the byte-codeidea for the dotgnu-IL because of these reasons:

* Sure, the dotgnu-IL would support signatures of the external visibleentities, but a byte-code representation would loose the expressionsemantics. This becomes clear if we consider a JIT which tries togenerate optimal code for a platform. The usual approach is toreconstruct the expression tree from the byte code, because theexpression tree gives a better overview about what happens and allowsglobal optimization and inlining. So why should we remove informationthat has to be reconstructed later?

* A byte-code representation is too low-level which makes it lessexpressive. It deals with "values", but no more with"types","functions","scopes". This can make things very hard, forexample predicting which variables are active and alive for a givenbyte-code line, and which associated cleanup code they have. Thisinformation would be neccesary to implement dtor-cleanup, and had to beincluded in the generated code as "Meta-Information". On the other hand,information about the currently active values and their types could bealso used to implement an efficient garbage collector scheme, but oncewe defined the IL-Format we are more or less stuck to the amount ofinformation we put into it.

* I dont't say that byte-code cannot do what I want, but I am sure thatinteresting applications (instantiation of generic types and function,global code optimization and inlining, aspect-oriented programming) aremade unneccesarily hard.

I agree that we should invent before they patent and want to suggestthat we should follow, what I can recognize from the previous posts onthis list: The idea to use a semantical rich tree structure as a IL,which preserves as many informations about the code as possible (withoutbeing to language specific) and that we should give it a stupid name.I'd propose "melody", since this project offers a lot obvious puns onthe background of C#.

I am currently developing ideas on how melody could look like, and tryto gather information about similar ideas (i.e. anything that's like"compressed binary AST" as IL). Less for inspritation, but for the ourinvention claims.

Currently my melody boils down to the idea, that we'd have constructorexpressions which build the semantitic entities we wish to represent. Togive you an example. This C-function:


bool even(unsigned num)
{
 unsigned k=num%2;
 return k==0;
}

would look in a human-readable version of melody as

001 function $even($num: uint32):boolean
002 {
003  let($k:uint32 ; modulo<uint32>($num, 2);)
004  {
005  (equal<uint32>($k,0))
006  }
007 }

The text-version looks like paraphrase of the original C function, whichdoesn't mean that I cheat, but illustrates how much I would like topreserve.


Just a short description on happes here:

First, the operations which do "stuff" look like this:

op<type>(parameters).

You see for example modulo<uint32>($num,2). This means "calculate modulowith two parameter which are uint32". The compiler found, that $num isunsigned and compiles the right function. This is obvious if youremember that CPU usually has instructions like add.l addu.l addu.w,i.e. the right op-code for the right operand.Btw. If we can maintain a readable textual representation for melody,then it would also make a valuable debugging tool, since you can see thehidden operations (conversions, temporaries etc.) that you compiler hasproduced to compile you innocent looking code.

line 001 starts the constructor for a function which is named $even (todistiguish the name from the "keywords"). The num parameter has becomean uint32, the C-translator decided that "unsigned int" should be a32-bit value. (Btw. In RTL or gcc-tree the analogous expression wouldn'tmention anymore, that the parameter is an /unsigned/ value. It onlycontains the information that it is a 32-Bit value, but we need thisinformation to extract the signature.).

Now line 3 is interesting, because it shows how locals are introduced.I'd propose to use the let-syntax which you might be familiar with, ifyou are in functional programming. If not: My idea is, that everystatement represents a value. For example the above "function"represents a ..eh..function. The let statement again, is the value ofits expression with a certain name binding. In Scheme I can say:


        (let ((x 0)(y 0)) (+ x y)

You can read this as: the value of this statement is x+y where x is 0and y is 0.

The melody-let statement here says: the value is $k=0 where $k is $nummod 2.

So the let contains a lot of information: It says, that $k is onlyavailable during the inner expression (which tells us about its blockscope), it says that $k is initialized with '$num mod 2' (which isdifferent from assigment, if you are familiar with C++ you willunderstand what I mean). Note that let has a third parameter (leftblank), which is /the optional cleanup expression/ which has to beevaluated when the let-body is evaluated. This can be used by a compilerto represent its destructor semantics! (Now, do you see the finedifference between initialization and assigment? If the initializationexpression generates an exception, then we are /not/ calling the cleanupexpression, since there is nothing to cleanup!).

Well, this is just an example I made up to entertain you, I admit thatthere a still lot of things to consider (data aligment, how do I defineclasses & operations on them, avoiding machine dependencies etc.), but Ihope you get the picture. But here's another teaser for you. Imaginewhat you can, if melody would allow you to define functors (that isfunctions in the melody language). For example:


functor even(T)
{
 function ($num:T)
 {
   let($k:T ; modulo<T>($num, 2);)
   {
    (equal<T>($k,0))
   }
 }
}

melodies value-types are functions, structs, expressions. It can operateon a type like on a number. So a "functor" implements essectiallygenericity, since he f.e. compute a new function by replacing thetype-parameter!! This means, you can essentially /compile templates intomelody code/ and instantiate these generics on demand! And I can inlinethem as well, because I'm producing a new expression tree, not stupidbytecode.

Well, I hope I'm not completely on the wrong track and that the idea isuseful for the project. Comment please!


--
Marco
bytecodes? we don't need no stinkin' bytecodes.

P.S.
It should be possible to build a frontend to gcc which understands melody.

[Prev in Thread]

Current Thread

[Next in Thread]

[DotGNU]Yet another volunteer, Marco Manfredini <=
- [DotGNU]Melody, Norbert Bollow, 2001/08/06
  - Re: [DotGNU]Melody, Marco Manfredini, 2001/08/06
    - Re: [DotGNU]Melody, Norbert Bollow, 2001/08/06
    - Re: [DotGNU]Melody, Scott Lanham, 2001/08/06
    - Re: [DotGNU]Melody, Marco Manfredini, 2001/08/07
    - Re: [DotGNU]Melody, Norbert Bollow, 2001/08/07
    - Re: [DotGNU]Melody, Marco Manfredini, 2001/08/07
    - Re: [DotGNU]Melody, Scott Lanham, 2001/08/07
    - Re: [DotGNU]Melody, Marco Manfredini, 2001/08/07
    - Re: [DotGNU]Melody, Scott Lanham, 2001/08/07

Prev by Date: Re: [DotGNU]Antonio from Peru
Next by Date: Re: [DotGNU]distributed fs
Previous by thread: [DotGNU]Risks of the Passport Single Signon Protocol
Next by thread: [DotGNU]Melody
Index(es):
- Date
- Thread