help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Magnus Lie Hetland
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Sat, 15 May 2004 23:02:51 +0200
User-agent: Mutt/1.4i

Laurence Finston <address@hidden>:
>
> I think it should be possible to solve your problem in a way similar
> to that described in the Bison manual for handling comments.  Since
> the plain text presumably simply needs to be output verbatim there's
> no reason for the parser to parse it. It can be processed in a rule
> for the "plain text begin" token,

Sadly, there is no such token.

> in which you would call `yylex()' repeatedly, or you could have
> `yylex()' process it, so that the parser never sees the plain text.

I don't see how this helps... The problem lies in knowing where the
plain text ends, and for that I need to enlist Bison. Handling the
text isn't a problem -- detecting the next non-text element is.

I see two main possibilities:

  1. For each position where plain text may occur, find out what
     "non-plain-text" tokens may occur (and terminate the plain text)
     and then use those to construct a non-ambiguous grammar (where
     the plain text will be LL(1)).

  2. Use a plain text rule that will match anything, but let any legal
     rules override it (and thus terminate it).

The first solution (which is the least satisfying) would require
figuring out the following tokens (as I said, this has to be done
automatically, as the grammar is user-supplied). I can either do this
myself (hacky/messy) or try to get it from the LR table constructed by
Bison. I don't see a straightforward way of doing this, but it should
be possible.

The second solution is what I'm hoping for, but I don't know how to do
it. If it is at all possible, it seems like I'll need to combine GLR
parsing with (dynamic) priority somehow -- but I don't know how.

Just an example to show why a fixed set of markup tokens (that will
end the plain text) won't do:

  Foo *bar 'baz *fee* fie' foe*.

Let's say the single quote represents verbatim text (code). Then the
plain text between the two single quotes will contain two asterisks
that are *not* to be interpreted as markup-tokens, while the asterisks
outside should be interpreted as markup for emphasis.

This would be deducible from the grammar, because the code production
rule wouldn't allow emphasis inside it. The lexer couldn't possibly
know about this -- it would have to be handled as part of the parsing.

Thanks for your help, anyway :)

> Laurence

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park




reply via email to

[Prev in Thread] Current Thread [Next in Thread]