help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Magnus Lie Hetland
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Wed, 19 May 2004 21:41:34 +0200
User-agent: Mutt/1.4i

Hans Aberg <address@hidden>:
>
> At 23:01 +0200 2004/05/18, Magnus Lie Hetland wrote:
> >> What is the Atox?
> >
> >It's what all this is all about :)
> >
> >It's a tool for writing plain-text-to-XML converters. See
> >http://atox.sf.net for more information.
> 
> Thanks for the pointer. Do you have any examples (I do not want to get too
> deep into this stuff)?

Several examples are available in the tar-file of the distribution. (I
haven't put them on the Web site yet.)

> I.e., is this some kind of dynamic language specification?

Not necessarily dynamic, but language specification, yes.

[snip]
> 
> I think the DINO one I gave a reference too might be worth having a look at.

I did have a look, but didn't quite see the relevance. I guess maybe
I'll have another look.

> >What do you think would be the advantages of an Earley parser as
> >opposed to an GLR parser like Bison?
> 
> Bison, in all its forms and uses, is not suitable for processing
> dynamic (extensible) language specifications.

There are no extensible language specifications in Atox -- I just
generate a Bison input file from the user input to Atox and run Bison
on that automatically. (Or, at least, that's what I do in an
unreleased prototype.) In other words, Bison is used as a "subroutine"
in Atox, and will be run once for every input supplied by the user.

So not dynamism or magic there -- just a plain parser generator with
the added feature of skipping non-significant parts of the input (and
treating them as plain text).

[snip]
> I am not sure how this problem how "discarding (?in-)significant
> tokens" appears. Do have any example?

That's basically what this discussion has been about -- several
examples in earlier postings. For example, an asterisk may be
significant normally (as an indicator of emphasis) but not inside a
piece of verbatim text, for example. Then I would need Bison or Flex
or whatever to know the difference.

> Second, if already have a proper dynamic language specification (the
> ATOX?), how do you know which tokens to discard?

It's all very simple. I have an LL(1) parser (more or less) that
supplies the legal lookahead tokens to the scanner, and the scanner
returns the first occurrence of one of those. Any text up to that
point is treated as plain text. It works like a charm, but my current
implementation is a bit slow. That's basically why I started looking
into using Flex for the regexp matching, and, while I was at it, at
Bison for the parsing.

[snip]

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park




reply via email to

[Prev in Thread] Current Thread [Next in Thread]