help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Magnus Lie Hetland
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Thu, 20 May 2004 18:40:01 +0200
User-agent: Mutt/1.4i

Hans Aberg <address@hidden>:
>
[snip]
> >> I.e., is this some kind of dynamic language specification?
> >
> >Not necessarily dynamic, but language specification, yes.
> 
> It is dynamic, as it is alterable at runtime.

Depends on what you mean by runtime. The language is defined by the
user format specification, which is fixed at startup time. The
language isn't a fixed part of the program, so if that makes it
dynamic, sure. No more dynamic than Bison, though :)

[snip]
> Thus, from this point of view, there is nothing such as "non-greedy" or
> "greedy" parsing, unless one can tweak the grammar to simulate it, or tweak
> the generated parser to do it.

Sure.

> Further, if you want to make use of error recovery techniques to emulate
> that, then you should probably use a LR(1) parser, because it will stop
> immediately in the grammar rule when an error token is discovered.

Sounds like a good point.

> LALR(1) may export some actions before detecting the error. Both
> algorithms will though neither gulp up extra tokens.

OK. Good to know.

[snip]
> >There are no extensible language specifications in Atox -- I just
> >generate a Bison input file from the user input to Atox and run Bison
> >on that automatically. (Or, at least, that's what I do in an
> >unreleased prototype.) In other words, Bison is used as a "subroutine"
> >in Atox, and will be run once for every input supplied by the user.
> 
> Bison or the Bison generated parser?

Both.

> If you generate a Biosn .y file from the Atox specs, and compile
> that using Bison, and then run the parser, then that amounts to a
> dynamic grammar generation.

OK -- it's done in a batch-oriented fashion, though, and the grammar
isn't modified during the parsing (which means that the language can't
change itself, for example).

> The point of the DINO suggestion would be to avoid that.

OK. I'll have another look at it.

> On the other hand, if you just run a fixed Bison generated parser,
> then that would not be needed.

In a way, there are (or would be -- I'm not currently using Bison) two
phases: (1) generate a parser from the supplied format, (2) use the
parser to mark up the plain text. The parser can be cached for later
uses, and so need not be dynamic, really.

[snip]
> Why is your LL(1) parser slow?

Because my implementation isn't really performance-oriented, I guess
(and, partly, because it's implemented in Python). And because I
ended up with lots of separate regexps to search for, among other
things. So my original plan for optimization was to use Flex for the
tokenization, and to build my LL(1) parser on top of that. Still a
viable option, I think.

> Is it because it is written in Python, or because it is
> non-deterministic?

It's completely deterministic, actually.

> And what problems appear when you translate it into a LALR(1)
> grammar?

The problem is that my parser is really not straight LL(1), but
"LL(1)-like" in that it always uses the first occurrence of a legal
lookahed token in the text. This is quite easy when I have full
control over parser and lexer (and especially easy with the LL(1)
parsing strategy). I suppose using LR(1) might work as well if I can
make it work in this somewhat non-standard way. It seems that making
Flex/Bison do this isn't altogether trivial (as has been demonstrated
repeatedly in these discussions).

> Have you tried using ANTLR <http://antlr.org>, which is LL(1)?

I've used it before, and I've considered it in this context too.
Haven't rule it out -- it may be that it would make things easier. If
I do use LL(1), though, I'll probably need backtracking to deal with
some features (and a full general parser is more user friendly, after
all [1]). Not sure how to do backtracking in Antlr, except using
syntactic predicates (which I guess might cover the need, if I can
figure out how to generate them automatically in the right places, and
make them behave properly with the context-dependent lexing).

[1] http://www.acm.org/crossroads/xrds7-5/bison.html

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park




reply via email to

[Prev in Thread] Current Thread [Next in Thread]