[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-greedy wildcard possible? (Long)
From: |
Magnus Lie Hetland |
Subject: |
Re: Non-greedy wildcard possible? (Long) |
Date: |
Sun, 16 May 2004 10:30:43 +0200 |
User-agent: |
Mutt/1.4i |
Laurence Finston <address@hidden>:
>
[snip]
> > Sure -- I do signal it. I just mean that there is no fixed such token;
> > it all depends on context. (That doesn't mean that the grammar is/need
> > be context sensitive, though.)
>
> This seems to be a contradiction to me. In a context-free grammar a
> token, i.e., a terminal symbol, is unique and unambiguous.
Sure, that's not a problem. What it means semantically isn't fixed,
though. (Just think of binary vs. unary minus, for example.)
The tokenization isn't really my problem.
> The Bison manual describes a way of handling context dependencies,
> but it's fairly tricky.
Yes, I've seen that (or so I think -- depends on what you mean). There
isn't really any context dependencies here in that way; the wildcard
rule can (at least in the simplest version) be generated as a normal
context-free production. It's just that I have to do some extra
analysis to do it.
> From what you've said about the problem you're trying to solve, I
> have my doubts about whether Bison is the right tool for the job.
Maybe you're right. I just thought that if I could do it with an LL(1)
parser with a little extra functionality (as I've already done), it
should be certainly be possible to use an LALR parser. I guess I could
just use Flex and write my own parser engine on top of it.
> > decide what tokens can be used as markup in a given context) is
> > specified by the user, so I can't make any assumptions about it.
> > (Atox is, in many ways, a parser generator.)
>
> It sounds to me like you want Bison to generate a parser generator,
> but that's not its intended purpose.
Bison *is* a parser generator -- I'm trying to use it as a back-end.
In other words, I read an input format, do some processing, and feed
code to Flex and Bison in order to produce a parser for the user.
How this is outside the scope of Bison, I don't see.
The specifics of the parsing job may be unsuitable, though, of course.
> > Yes, that's the problem. I cannot say that globally. I need the
> > full power of the grammar to decide.
>
> I don't understand how you're using the term "grammar". To me,
> "grammar" (at least of the context-free variety) implies that there
> are syntactical elements with a fixed value.
Grammars don't concern themselves with value (of the semantical kind),
do they? The documents I'm trying to parse can certainly be described
with context-free grammars (I've tried to explain how, but I guess I
haven't been quite clear). The problem is either (1) analyzing the
input-grammar in order to produce a wildcard that doesn't contain any
relevant terminating tokens, or (2) making Bison handle an ambiguous
grammar.
[snip]
> According to my understanding of regular expression matching
> algorithms, they are by nature "greedy" in that they attempt to
> match the longest possible string.
By "nature" is perhaps a bit strong. Most are by "default", yes. In
the current Atox, I'm ussing the non-greedy closure operator in Python
quite a lot.
> I believe that a "wildcard" facility makes sense in a scanner, but
> not in a parser.
That doesn't help me much, my language (and the wildcard) isn't
regular -- it's context-free at best.
> Parsing tries to create a hierarchical structure from input; if
> elements are read that don't fit into the structure, and error
> recovery doesn't discard them or make them fit, it fails.
Sure. I don't see how this conflicts with my goal? Just take a simple
example -- let's say I want to parse a string that ends with an
exclamation point. I might get an input like this:
Foo! Bar! Baz!
I'd like to parse this into a hierarchical structure which would let
me (at the topmost level) get the "Foo! Bar! Baz" and "!" parts
separately. This can certainly be described by a context-free grammar
(although not LL(1)) but not by a regular language (if the string up
to the exclamation point is arbitrary). So in order to handle this
sort of thing, I'd need something stronger than a scanner.
> It's possible to change the way the scanner handles input at
> run-time, and to have it return different tokens depending on
> context, but it's not possible to change the meanings of tokens
> within the parser.
I don't need to.
> From what I know of markup languages, there's no need for this kind
> of hierarchical structure.
I've tried to explain why I need this. I may have rules that say, for
example, that the first single-line paragraph of the document is the
title, but only if there is no multi-line paragraph before it, for
example. (There are many, many other similar examples.)
> While the resulting documents may be structured hierarchically, the
> input can be processed sequentially.
If you have a very simple markup language, yes. I'm trying to add
markup to a plain text file with mostly *implicit* markup, which is
why I need some more complex rules. I've simpler versions, but they
haven't been satisfactory.
> Laurence
--
Magnus Lie Hetland "Wake up!" - Rage Against The Machine
http://hetland.org "Shut up!" - Linkin Park
- Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/15
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/15
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/15
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/15
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/15
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/15
- Re: Non-greedy wildcard possible? (Long),
Magnus Lie Hetland <=
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/16
- How about this, then... (Not quite non-greedy wildcard), Magnus Lie Hetland, 2004/05/16
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/16
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/16
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/17
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/17
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/17
- Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/17
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/18
- Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/18