help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermediary representation and bison?


From: Laurence Finston
Subject: Re: intermediary representation and bison?
Date: Thu, 27 Dec 2007 10:37:21 +0100 (CET)

On Mon, 24 Dec 2007, Ilyes Gouta wrote:

> Thank you, Laurence and Evan, for your extensive explanations! 

You're welcome.  

It occurred to me that another reason why 
GCC creates a syntax tree or whatever it's called (I know virtually  
nothing about the theory of compilers) is that GCC has multiple front-ends 
for different languages, i.e., FORTRAN and Pascal.  I'm not sure how C++ 
is handled, since it requires support for features such as code generation 
for templates.

Evan brought up a problem which seems to me to be related to something I 
call "the parse before the parse" problem:  Sometimes one needs 
information to affect the course of parsing which can only be found by 
more parsing, having taken place beforehand.  Multiple passes is one 
solution.  Another possibility  _might_ be to spawn a thread and call the  
same or a different parser to handle some section of the input and/or code 
generated by your main parser, and then make its results available 
somewhere.  The technique of "faking" tokens and the use of an object 
passed as a parameter to `yyparse' to contain information and/or the state 
of the program would probably be useful here.

I find that Bison and threads work together well and the extra effort 
of making `yyparse' thread-safe is minimal.

Laurence

> 
> Thank you, Laurence and Evan, for your extensive explanations! I had a look on
> ANTLR and it seems to be a great tool and probably it will be my second
> option. I think I'm going to try to get the scanner to construct an AST
> (instead of emitting code) so I'll stick with bison for a while. As you said,
> the rest of the compilation phases such as resource binding and code emission
> would become just direct products of the AST traversal.
> 
> Thank you again for your suggestions!
> 
> Best regards,
> Ilyes Gouta.
> 
> Evan Lavelle wrote:
> > I had the same problem on my first language; here's what I did.
> > 
> > I started off thinking that I needed a simple interpreter, so I wrote the
> > Bison code, and basically did all the required work in the Bison actions.
> > This worked well; It looks like you're at this stage.
> > 
> > A little later, I realised that this wasn't quite good enough. The specific
> > problem, I think, was that I needed some forward declarations - ie.
> > sometimes code earlier in the source file needed to know about something
> > later in the source file. Tricky. After some thought, I decided that I could
> > do this with minimal changes, just by rescanning the input again, with
> > Bison. I scanned it once, remembered the bits I needed to know about, reset
> > yyin to the beginning of the file, and scanned it again properly. I now had
> > a 2-pass interpreter, which worked well.
> > 
> > As time went on, it became obvious that this was very limiting. I needed to
> > do all sorts of extra things; some of these were:
> > 
> > 1 - more forward declarations
> > 
> > 2 - the language allowed constant expressions, so the sematic checking code
> > had to confirm that an expression was actually constant. how do you do this
> > in one pass? The easy answer is to have an extra pass - you scan the code
> > finding everthing that should be a constant, evaluate any expression you
> > find there, replace the expression with a constant (if you can), and let the
> > semantic checker confirm that there's a constant there.
> > 
> > 3 - As semantic checking became more complex, it became obvious that I had
> > to analyse *all* the code before I had enough information to analyse *some*
> > of the code. This will depend on your language.
> > 
> > 4 - some things are next to impossible to code generate just be scanning and
> > rescanning the input. since you're looking at C, what about sequence points,
> > and expressions with side-effects? How do you handle the side-effects?
> > Imagine code-generating "y = x++, c(), x" - where do you put in the 'x'
> > increment?
> > 
> > 5 - optimisations? A code generator for a new target? What if you have to
> > scan something right-to-left? And so on.
> > 
> > Anyway, it quickly became obvious that handling anything that wasn't trivial
> > is next to impossible just be scanning and rescanning the input.
> > Fortunately, there's a really simple answer. The lexer and scanner do very
> > little, apart from creating an AST. The scanner is no longer the 'compiler';
> > it's just a very simple front-end that creates a data structure, the AST.
> > The compilation process is now a standard data-processing problem - it's
> > nothing more than analysing, manipulating, and transforming the AST. Each
> > analysis or transformation of the AST is, loosely speaking, a compiler
> > "pass"; you can do this as few, or as many, times as you wish. It's trivial
> > to add a new pass when you have some new feature or requirement. The AST
> > *is* your program; the scanning is basically irrelevant.
> > 
> > And you can still have an "interpreter" at the end of it, if you want one;
> > as soon as you finish "code generation", you just run your "code", or
> > whatever it is that actually does anything. Technically, however, you should
> > probably call this JIT compilation, rather than interpreting.
> > 
> > The bad news is that you need to know a *lot* more to do this, over and
> > above writing a Bison scanner. The (user) scanner code (ie. your "actions")
> > is probably a lot less than 5% of a practical and useful compiler. So, stick
> > with scanning your input until you find out what the problems are and, if
> > it's really necessary to fix them, rewrite your code around an AST. If you
> > need to find out more about ASTs, look at Antlr; you can use the Antlr
> > library to create and maintain your AST.
> > 
> > Evan
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]