help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Laurence Finston
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Sat, 15 May 2004 22:36:55 +0200
User-agent: IMHO/0.98.3+G (Webmail for Roxen)

I think it should be possible to solve your problem in a way similar to that
described in the Bison manual for handling comments.  Since the plain text
presumably simply needs to be output verbatim there's no reason for the parser
to parse it.  It can be processed in a rule for the "plain text begin" token,
in which you would call `yylex()' repeatedly, or you could have `yylex()'
process it, so that the parser never sees the plain text.

Laurence

-------------------
> Hi!
> 
> I'm working on a project (atox.sf.net) where I want to use Bison to
> add markup to plain text. I need a rule to represent the plain text
> segments (between the markup tokens) and that's proving a challenge.
> 
> I have already solved the lexing part, so the entire text is segmented
> into tokens, with a special "between"-token for any text that isn't a
> markup token. The problem is that in some context, something like a
> newline might be a significant part of the markup, while in other
> contexts it may not be.
> 
> Basically, what I need is some form of non-greedy wildcard rule --
> something that will match any token up to the first occurrence of a
> legal rule.
> 
> The simple (but possibly messy, implementation-wise) solution would be
> to analyse the grammar and create specific wildcard rules for each
> plain text segment, allowing only the tokens that can't start any
> legal rules. This would be sort of an LL(1) way if doing it, and it
> would require some possibly quite hairy grammar analysis
> (especially with empty rules). One possibility (if I could figure out
> how to do it) would be to use the LR table in Bison to check which
> tokens could start a rule in any given context.
> 
> *However* -- what I'm *hoping* to be able to do is this: Use a GLR
> parser, define a generic wildcard token that will match any token,
> somehow give it a low priority (it doesn't seem like either normal
> precedence or %prec works quite the way I need it to) so that the
> parser will know when to terminate the wildcard.
> 
> Let's say, for example, that asterisks are used to indicate emphasis
> Let's say the following is the input:
> 
>   Fee fie *foe* fum.
> 
> The grammar would match a series of chunks (either code or emphasis)
> interspersed with wildcards. When it hits the first asterisk, it
> wouldn't know what to do (it could just be a lone asterisk -- a second
> one would be needed to make an emphasis chunk). So the GLR comes into
> play. One parser keeps shifting, while one reduces the wildcard before
> shifting the asterisk.
> 
> It *seems* to me that this approach might work, but I can't get it
> quite right. I've had some success with some simple examples, but once
> I start writing bigger grammars I get lots of ambiguity, conflicts and
> even run out of parser stack (even though I have left recursion).
> 
> I can understand the ambiguity (how would the parser know which one to
> use) -- is there any way to resolve that? (Can %dprec be used here?)
> Is this approach possible at all, when I can't assume much about the
> grammar? (It will be user-supplied...)
> 
> Here is a simple example I've been toying with (after my more complex
> tests started falling apart ;) -- with the actions removed for
> readability:
> 
> %%
> 
> document : fill chunks SEMI
>          ;
> 
> chunks   : /* empty */
>          | chunks emph fill
>          ;
> 
> emph     : STAR fill STAR
>          ;
> 
> fill     : /* empty */
>          | fill token
>          ;
> 
> token    : FOO | BAR | SPACE | STAR
>          ;
> 
> %%
> 
> This is a sample input:
> 
>   foo bar foo *foo bar* foo
> 
> Why doesn't this work? Is there any way of making it work? Any other
> general ways to approach this problem? Any way to determine the
> possible lookaheads in a given situation (for the LL(1) wildcard; i.e.
> in this case I'd be looking for the star)? (I've seen some postings
> about the latter for tab completion, but it seems a bit
> complicated...)
> 
> Sorry that this posting was a bit long. If anyone has the patience to
> read through it and give me some pointers I'd be very grateful.
> 
> Thanks,
> 
> - M
> 
> -- 
> Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
> http://hetland.org              "Shut up!"  - Linkin Park
> 
> 
> _______________________________________________
> address@hidden http://mail.gnu.org/mailman/listinfo/help-bison
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]