help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Hans Aberg
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Wed, 19 May 2004 21:02:02 +0200

At 23:01 +0200 2004/05/18, Magnus Lie Hetland wrote:
>> What is the Atox?
>
>It's what all this is all about :)
>
>It's a tool for writing plain-text-to-XML converters. See
>http://atox.sf.net for more information.

Thanks for the pointer. Do you have any examples (I do not want to get too
deep into this stuff)? I.e., is this some kind of dynamic language
specification?

>> A dynamic language specification? Then perhaps you might make use of
>> a dynamic parser, such as an Early parser.
>
>I've used an Earley parser in an earlier (no pun intended) version.
>That was a bit slow (probably in part because of my overly redundant
>grammar at the time, and in part because it was implemented in
>Python). I haven't been able to find an Earley parser generator that
>is as easily available as Bison, and since Bison now has GLR parsing
>(which can handle the same set of grammars as Earley, IIUC), I thought
>it to be a better alternative.

I think the DINO one I gave a reference too might be worth having a look at.

>What do you think would be the advantages of an Earley parser as
>opposed to an GLR parser like Bison?

Bison, in all its forms and uses, is not suitable for processing dynamic
(extensible) language specifications. If one makes a very modest operator
precedence implementation with many levels (as in for example Prolog), then
one ends up putting th operators on a stack which later sorts out the
precedences. It is currently hard to combine Bison with small segments of
dynamic grammar specifications.

So if your ATOX is a dynamic language specification, unless you find a way
to work around it, as in the case of operator precedences, you will find
Bison hard to use.

>> It seems that you identify blocks by newlines.
>
>In this example, yes -- can't assume that in general, though. It would
>be up to the user. (There are many cases where it would not be the
>case.) This was just an example to show how (1) full parsing is
>needed, and (2) which tokens are significant (as opposed to "plain
>text") depends on the state of the full (e.g. context-free) parser.

It seems that this depends on the ATOX, which is some kind of dynamic
language specification. Then it might be difficult to handle with a typical
Flex/Bison setup.

>> Then, headers and such may require special lexer start conditions (=
>> lexer contexts), just in order to get around the LALR(1) limitation
>> of Bison.
>
>Not sure exactly what you mean here -- LALR(1) is not a problem, as
>far as I can see (especially since it is no longer a limitation, if
>one uses GLR). The issue lies in skipping the non-significant tokens.

The problem with GLR is that when an LALR(1) ambiguity appears, it merely
splits up the parser to handle all the possibilities. The one writing the
parser then has to add code to treat these ambiguities.

>... The only issue I'm trying to
>resolve is discerning significant tokens from plain text (and this
>"significance" depending on the parse state).

I am not sure how this problem how "discarding (?in-)significant tokens"
appears. Do have any example?

Second, if already have a proper dynamic language specification (the
ATOX?), how do you know which tokens to discard? -- It sounds as the wrong
way to go simply.

I have to ask you these questions, because there is no easy way to do
anything you ask for in Bison.

> I thought I would do
>this by checking for legal lookahead tokens and passing those to Flex,
>but if Bison can (as it has been said) ask for more than a single
>lookahead token before the parse state changes, this list of legal
>tokens may be invalid for those later tokens, and this won't work.

There is currently no way to let Bison split the parser based on
ambiguities in the tokens. I have asked ofr such a feature of Flex in the
Flex list. This would be useful in many cases. Then, the next step would be
to let Bison to parse all the possibilities based on that. But currently,
that is not possible.

Also, when an ambiguity appears in the input, then the GLR parser does not
resolve it by asking for more tokens, but merely splits the parser,
processing each possibility. Each possibility is still though all the same
old LALR(1). So you do not automatically climb LR(k) by this method, but
one still has to indicate how to merge the different possibilities. There
is an example in the Bison manual when this might be useful, resolving a
certain C++ ambiguity.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]