help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Hans Aberg
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Thu, 20 May 2004 15:44:29 +0200

At 21:41 +0200 2004/05/19, Magnus Lie Hetland wrote:
>> >> What is the Atox?
>> >
>> >It's what all this is all about :)
>> >
>> >It's a tool for writing plain-text-to-XML converters. See
>> >http://atox.sf.net for more information.
>>
>> Thanks for the pointer. Do you have any examples (I do not want to get too
>> deep into this stuff)?
>
>Several examples are available in the tar-file of the distribution. (I
>haven't put them on the Web site yet.)

I have had a look at your atox.pdf file.

>> I.e., is this some kind of dynamic language specification?
>
>Not necessarily dynamic, but language specification, yes.

It is dynamic, as it is alterable at runtime.

I think there is a problem when trying to work together this type of
language descriptions with conventional grammars, as the LR parsers will
take the whole grammar into account in the algorithm generation. Thus, even
though the actual parser just looks one token ahead in LALR(1) and LR(1),
the underlying theory parsers a whole sentence, and tries to generate a
parse tree from that.

Thus, from this point of view, there is nothing such as "non-greedy" or
"greedy" parsing, unless one can tweak the grammar to simulate it, or tweak
the generated parser to do it.

Further, if you want to make use of error recovery techniques to emulate
that, then you should probably use a LR(1) parser, because it will stop
immediately in the grammar rule when an error token is discovered. LALR(1)
may export some actions before detecting the error. Both algorithms will
though neither gulp up extra tokens.

Bison was originally written by Corbett in order investigate error recovery
techniques, but that part has now been removed. Error recover techniques
(also in the context of GLR) is still a research topic, so it will probably
not help you.

>> I think the DINO one I gave a reference too might be worth having a look at.
>
>I did have a look, but didn't quite see the relevance. I guess maybe
>I'll have another look.
>
>> >What do you think would be the advantages of an Earley parser as
>> >opposed to an GLR parser like Bison?
>>
>> Bison, in all its forms and uses, is not suitable for processing
>> dynamic (extensible) language specifications.
>
>There are no extensible language specifications in Atox -- I just
>generate a Bison input file from the user input to Atox and run Bison
>on that automatically. (Or, at least, that's what I do in an
>unreleased prototype.) In other words, Bison is used as a "subroutine"
>in Atox, and will be run once for every input supplied by the user.

Bison or the Bison generated parser? If you generate a Biosn .y file from
the Atox specs, and compile that using Bison, and then run the parser, then
that amounts to a dynamic grammar generation.

The point of the DINO suggestion would be to avoid that.

On the other hand, if you just run a fixed Bison generated parser, then
that would not be needed.

>> I am not sure how this problem how "discarding (?in-)significant
>> tokens" appears. Do have any example?
>
>That's basically what this discussion has been about -- several
>examples in earlier postings. For example, an asterisk may be
>significant normally (as an indicator of emphasis) but not inside a
>piece of verbatim text, for example. Then I would need Bison or Flex
>or whatever to know the difference.

There is no way to do that directly in a Bison grammar. So one has to tweak
the lexer to give a way special tokens, which then can be handled by
special grammar rules in Bison.

>> Second, if already have a proper dynamic language specification (the
>> ATOX?), how do you know which tokens to discard?
>
>It's all very simple. I have an LL(1) parser (more or less) that
>supplies the legal lookahead tokens to the scanner, and the scanner
>returns the first occurrence of one of those. Any text up to that
>point is treated as plain text. It works like a charm, but my current
>implementation is a bit slow. That's basically why I started looking
>into using Flex for the regexp matching, and, while I was at it, at
>Bison for the parsing.

Why is your LL(1) parser slow? Is it because it is written in Python, or
because it is non-deterministic? And what problems appear when you
translate it into a LALR(1) grammar? Have you tried using ANTLR
<http://antlr.org>, which is LL(1)?

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]