help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermediary representation and bison?


From: Ilyes Gouta
Subject: Re: intermediary representation and bison?
Date: Mon, 24 Dec 2007 17:29:15 +0100
User-agent: Thunderbird 2.0.0.9 (X11/20071115)


Thank you, Laurence and Evan, for your extensive explanations! I had a look on ANTLR and it seems to be a great tool and probably it will be my second option. I think I'm going to try to get the scanner to construct an AST (instead of emitting code) so I'll stick with bison for a while. As you said, the rest of the compilation phases such as resource binding and code emission would become just direct products of the AST traversal.

Thank you again for your suggestions!

Best regards,
Ilyes Gouta.

Evan Lavelle wrote:
I had the same problem on my first language; here's what I did.

I started off thinking that I needed a simple interpreter, so I wrote the Bison code, and basically did all the required work in the Bison actions. This worked well; It looks like you're at this stage.

A little later, I realised that this wasn't quite good enough. The specific problem, I think, was that I needed some forward declarations - ie. sometimes code earlier in the source file needed to know about something later in the source file. Tricky. After some thought, I decided that I could do this with minimal changes, just by rescanning the input again, with Bison. I scanned it once, remembered the bits I needed to know about, reset yyin to the beginning of the file, and scanned it again properly. I now had a 2-pass interpreter, which worked well.

As time went on, it became obvious that this was very limiting. I needed to do all sorts of extra things; some of these were:

1 - more forward declarations

2 - the language allowed constant expressions, so the sematic checking code had to confirm that an expression was actually constant. how do you do this in one pass? The easy answer is to have an extra pass - you scan the code finding everthing that should be a constant, evaluate any expression you find there, replace the expression with a constant (if you can), and let the semantic checker confirm that there's a constant there.

3 - As semantic checking became more complex, it became obvious that I had to analyse *all* the code before I had enough information to analyse *some* of the code. This will depend on your language.

4 - some things are next to impossible to code generate just be scanning and rescanning the input. since you're looking at C, what about sequence points, and expressions with side-effects? How do you handle the side-effects? Imagine code-generating "y = x++, c(), x" - where do you put in the 'x' increment?

5 - optimisations? A code generator for a new target? What if you have to scan something right-to-left? And so on.

Anyway, it quickly became obvious that handling anything that wasn't trivial is next to impossible just be scanning and rescanning the input. Fortunately, there's a really simple answer. The lexer and scanner do very little, apart from creating an AST. The scanner is no longer the 'compiler'; it's just a very simple front-end that creates a data structure, the AST. The compilation process is now a standard data-processing problem - it's nothing more than analysing, manipulating, and transforming the AST. Each analysis or transformation of the AST is, loosely speaking, a compiler "pass"; you can do this as few, or as many, times as you wish. It's trivial to add a new pass when you have some new feature or requirement. The AST *is* your program; the scanning is basically irrelevant.

And you can still have an "interpreter" at the end of it, if you want one; as soon as you finish "code generation", you just run your "code", or whatever it is that actually does anything. Technically, however, you should probably call this JIT compilation, rather than interpreting.

The bad news is that you need to know a *lot* more to do this, over and above writing a Bison scanner. The (user) scanner code (ie. your "actions") is probably a lot less than 5% of a practical and useful compiler. So, stick with scanning your input until you find out what the problems are and, if it's really necessary to fix them, rewrite your code around an AST. If you need to find out more about ASTs, look at Antlr; you can use the Antlr library to create and maintain your AST.

Evan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]