[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: intermediary representation and bison?
From: |
Laurence Finston |
Subject: |
Re: intermediary representation and bison? |
Date: |
Thu, 27 Dec 2007 10:37:21 +0100 (CET) |
On Mon, 24 Dec 2007, Ilyes Gouta wrote:
> Thank you, Laurence and Evan, for your extensive explanations!
You're welcome.
It occurred to me that another reason why
GCC creates a syntax tree or whatever it's called (I know virtually
nothing about the theory of compilers) is that GCC has multiple front-ends
for different languages, i.e., FORTRAN and Pascal. I'm not sure how C++
is handled, since it requires support for features such as code generation
for templates.
Evan brought up a problem which seems to me to be related to something I
call "the parse before the parse" problem: Sometimes one needs
information to affect the course of parsing which can only be found by
more parsing, having taken place beforehand. Multiple passes is one
solution. Another possibility _might_ be to spawn a thread and call the
same or a different parser to handle some section of the input and/or code
generated by your main parser, and then make its results available
somewhere. The technique of "faking" tokens and the use of an object
passed as a parameter to `yyparse' to contain information and/or the state
of the program would probably be useful here.
I find that Bison and threads work together well and the extra effort
of making `yyparse' thread-safe is minimal.
Laurence
>
> Thank you, Laurence and Evan, for your extensive explanations! I had a look on
> ANTLR and it seems to be a great tool and probably it will be my second
> option. I think I'm going to try to get the scanner to construct an AST
> (instead of emitting code) so I'll stick with bison for a while. As you said,
> the rest of the compilation phases such as resource binding and code emission
> would become just direct products of the AST traversal.
>
> Thank you again for your suggestions!
>
> Best regards,
> Ilyes Gouta.
>
> Evan Lavelle wrote:
> > I had the same problem on my first language; here's what I did.
> >
> > I started off thinking that I needed a simple interpreter, so I wrote the
> > Bison code, and basically did all the required work in the Bison actions.
> > This worked well; It looks like you're at this stage.
> >
> > A little later, I realised that this wasn't quite good enough. The specific
> > problem, I think, was that I needed some forward declarations - ie.
> > sometimes code earlier in the source file needed to know about something
> > later in the source file. Tricky. After some thought, I decided that I could
> > do this with minimal changes, just by rescanning the input again, with
> > Bison. I scanned it once, remembered the bits I needed to know about, reset
> > yyin to the beginning of the file, and scanned it again properly. I now had
> > a 2-pass interpreter, which worked well.
> >
> > As time went on, it became obvious that this was very limiting. I needed to
> > do all sorts of extra things; some of these were:
> >
> > 1 - more forward declarations
> >
> > 2 - the language allowed constant expressions, so the sematic checking code
> > had to confirm that an expression was actually constant. how do you do this
> > in one pass? The easy answer is to have an extra pass - you scan the code
> > finding everthing that should be a constant, evaluate any expression you
> > find there, replace the expression with a constant (if you can), and let the
> > semantic checker confirm that there's a constant there.
> >
> > 3 - As semantic checking became more complex, it became obvious that I had
> > to analyse *all* the code before I had enough information to analyse *some*
> > of the code. This will depend on your language.
> >
> > 4 - some things are next to impossible to code generate just be scanning and
> > rescanning the input. since you're looking at C, what about sequence points,
> > and expressions with side-effects? How do you handle the side-effects?
> > Imagine code-generating "y = x++, c(), x" - where do you put in the 'x'
> > increment?
> >
> > 5 - optimisations? A code generator for a new target? What if you have to
> > scan something right-to-left? And so on.
> >
> > Anyway, it quickly became obvious that handling anything that wasn't trivial
> > is next to impossible just be scanning and rescanning the input.
> > Fortunately, there's a really simple answer. The lexer and scanner do very
> > little, apart from creating an AST. The scanner is no longer the 'compiler';
> > it's just a very simple front-end that creates a data structure, the AST.
> > The compilation process is now a standard data-processing problem - it's
> > nothing more than analysing, manipulating, and transforming the AST. Each
> > analysis or transformation of the AST is, loosely speaking, a compiler
> > "pass"; you can do this as few, or as many, times as you wish. It's trivial
> > to add a new pass when you have some new feature or requirement. The AST
> > *is* your program; the scanning is basically irrelevant.
> >
> > And you can still have an "interpreter" at the end of it, if you want one;
> > as soon as you finish "code generation", you just run your "code", or
> > whatever it is that actually does anything. Technically, however, you should
> > probably call this JIT compilation, rather than interpreting.
> >
> > The bad news is that you need to know a *lot* more to do this, over and
> > above writing a Bison scanner. The (user) scanner code (ie. your "actions")
> > is probably a lot less than 5% of a practical and useful compiler. So, stick
> > with scanning your input until you find out what the problems are and, if
> > it's really necessary to fix them, rewrite your code around an AST. If you
> > need to find out more about ASTs, look at Antlr; you can use the Antlr
> > library to create and maintain your AST.
> >
> > Evan
>