help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parse Trees and Bison


From: Tim Van Holder
Subject: Re: Parse Trees and Bison
Date: Fri, 23 Apr 2004 08:30:07 +0200
User-agent: Mozilla Thunderbird 0.5 (Windows/20040207)

address@hidden wrote:
I'm trying to understand the best way by adding C functions to my BISON
grammer file to being able to generate a syntax tree of my source code.
A simple program source file is as follows:

  Dim x as Integer = 2
  Print x

A reduced version for this example of my grammer file is:

  program : lines ;
  lines   : line lines ;
  line    : stmt term ;
  term    : ':' | '\n' ;
  stmt    : /*empty */ | dim | print ;
  dim     : tDIM ident tAS dtype tEQ expr |
            tDIM ident tAS dtype ;
  print   : tPRINT expr ;
  ident   : tNAME ;
  dtype   : tINTEGER | tREAL | tLONG ;
  expr    : ident | tSTRING | tNUMBER ;

Typically the lexer will create syntax tree nodes for (almost) every
token it returns.  This mostly leaves only the actual tree construction
to the parser.
A pseudocode example from your snippet:

program
: lines
  {
    $$ = new_syntax_node(BASIC_PROGRAM);
    append_children ($$, $1);
  }
;

lines
: line lines
  {
    if ($1 == NULL)
      $$ = $2;
    else {
      $1->next_sibling = $2; /* or set_next_sibling ($1, $2); */
      $$ = $1;
    }
  }
| /* empty - allows for empty programs */
  {
    $$ = NULL;
  }
;

line
: statement terminator
  /* assume terminator does not get its own syntax node; then
   * the default rule ($$ = $1) does what we want */
;

terminator
: ':'
| '\n' /* consider tNEWLINE instead */
;

statement
: dim_statement
| print_statement
| /* empty */
  {
    $$ = NULL;
  }
;

dim_statement
: tDIM identifier tAS datatype opt_initial_value
  {
    append_child ($1, $2);
    /* Assume we store the AS <type> as a subtree;
     * for pure compiler purposes we don't really need the
     * AS in the syntax tree though. */
    append_child ($3, $4);
    append_child ($1, $3);
    if ($5 != NULL)
      append_child ($1, $5);
    $$ = $1;
  }
;

opt_initial_value
: tEQ expression
  {
    append_child ($1, $2);
    $$ = $1;
  }
| /* empty */
  {
    $$ = NULL;
  }
;

print_statement
: tPRINT expression
  {
    append_child ($1, $2);
    $$ = $1;
  }
;

identifier
: tNAME
;

datatype
: tINTEGER
| tLONG
| tREAL
| tSTRING
;

expression
: identifier
| literal
;

literal
: tSTRING_LITERAL
| tINTEGER_LITERAL
| tDECIMAL_LITERAL
;


It's up to you whether you use the same type for all tokens
(SYNTAX_TREE_NODE*), or whether you keep separate types for
the list-type rules (like 'lines').
It also depends on the purpose of your parser whether or not
you need to place all tokens in the parse tree (a compiler
typically doesn't have to, while a source translator/pretty
printer/etc. usually does).

I know when I reach the "ident : tNAME;" rule that is when I need to create a symbol and add the symbol to the current scope's symbol table. There are no other symbols I should create right?

Then going back up the stack, where do I create the nodes for my syntax tree?

And as Alfonso said, it usually makes more sense to hold off on symbol
table creation until you have the full parse tree.
In some circumstances that is not desirable (e.g. if the token returned
by the lexer is different based on what's in the symbol table); in those
cases you should at least wait until you have a usable subtree (e.g. the
'dim' rule above.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]