grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] Easier lexer generation.


From: Per Cederberg
Subject: Re: [Grammatica-users] Easier lexer generation.
Date: Tue, 28 Jun 2005 01:13:47 +0200

Well, the Grammatica lexer is essentially identical to most other
tools. It provides support for defining tokens via regexps (and
plain strings). If you want to use EBNF, you can just write those 
"tokens" as productions instead.

The general problem with grammars where tokens are specified in
EBNF is efficiency and ambiguities. That is not a problem in a
language specification document where readability is the highest
concern. When implementing a parser however, using regexp tokens
normally gives much better tokenizer performance (or is at least
easier to optimize). So it is normally worth the trouble to
translate the appropriate EBNF definitions into tokens.

When it comes to grammar ambiguities, there are of course few
popular parsers that handle them well. Grammatica attempts its
best by using an unlimited number of look-ahead tokens to decide
on which production to choose, making it able to handle general
LL(k) grammars. Being a recursive descent parser, it processes
the grammar top-down using the calculated look-ahead lists only
when needed.

That is normally enough to be able to parse tricky
languages such as Java or C++, at least after some minor grammar 
rewrites. For efficiency though, the number of look-ahead tokens
should be reduced to a bare minimum (eg 1). Processing the grammar
takes time and large look-ahead comparisons makes the parser
slower.

Otherwise, I think the best argument for Grammatica in
comparison to other tools are the many features that are enabled
by default, such as error recovery, nice-looking parser error
messages, debugging the grammar without writing code, etc. But
then I'm clearly partial, of course... ;-)

Cheers,

/Per

On mon, 2005-06-27 at 21:04 +0300, Matti Katila wrote:
> Hi!
> 
> I have been lurking around compiler compilers for a week and found
> grammatica from many others (sablecc, runcc, antlr, javacc, yacc to name a
> few). Grammatica seems to be well coded, LL and free.
> 
> My problem is that I would like to have more elegant lexer to easier
> grammar construction. Ok, let me give you some examples to help you
> understand what am I talking about:
> 
> Python - http://docs.python.org/ref/lexical.html
> Scheme R5RS - http://www.swiss.ai.mit.edu/~jaffer/r5rs_9.html#SEC73
> Haskell - http://www.haskell.org/onlinereport/syntax-iso.html
> Java - 
> http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#29542
> 
> What I'm looking for is that many of these lexical syntaxes use ebnf
> grammar to describe lexer. As I have mainly understood from the examples
> of grammatica it doesn't support ebnf for lexer yet. Am I right?
> 
> And another question is: how grammatica supports ambiguities in the
> grammar, e.g., to use it to parse c++ or perl source which is known to
> have ambiguity?
> 
> Yeah, that was all for now.
> 
> 
>    -Matti






reply via email to

[Prev in Thread] Current Thread [Next in Thread]