help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Laurence Finston
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Thu, 20 May 2004 20:16:10 +0200
User-agent: IMHO/0.98.3+G (Webmail for Roxen)

-------------------
> Laurence Finston <address@hidden>:
> >
> [snip]
> > I don't like to wait, whether it's linear, logarithmic, or
> > exponential time.  Documents can easily have hundreds of thousands
> > of characters.  If I have to match strings in the input against 100
> > "suspicious" tokens this will take time.  In addition, since there
> > are no hardwired criteria for determining the beginning and end of
> > an item of input, it seems quite likely that every character in the
> > input will have to be tested against the list of "suspicious"
> > tokens, whether individually, or as the character in the nth
> > position of a string.
> 
> I don't understand your point here; if the lexer has created a state
> machine for the tokens (as in flex) you only have to look at each
> character once -- not once for each "possible" token. This is no
> different from ordinar tokenization.
> 

The kind of matching you seem to want is quite complex.  I don't know 
whether it can be formulated in Flex's regexp syntax.  In addition, 
you've stated that the list of suspicious tokens differs depending on 
circumstances.  I don't know whether you've changed your mind in the 
interim, because I haven't been following that thread of the discussion.  
At one time, you wanted the list of suspicious tokens to depend on the 
valid tokens in a given parser state.  In order to know this, you must 
have run Bison on the Bison input file, or be able to figure it out 
yourself.  A further consequence is that one regexp won't be enough.
You would probably have to use start conditions, and feed information 
back from the parser to the scanner in order to switch from one to 
another, if I have understood other remarks of yours correctly. I don't 
know how Flex performs regexp matching, but the manual states that 
backing up is sometimes required and this can degrade performance. 
Since I don't think this approach will lead to success anyway, I don't 
see any point in going any further into objections with respect to the 
details of possible implementations.

> This is, in fact, one of the reasons why my current implementation is
> slow, I think, because I'm using the simplistic implementation of
> searching for the various tokens independently.
> 
> > It might be possible to reduce the number of comparisons by sorting
> > the "suspicious" tokens into an appropriate data structure, but this
> > has associated costs, too, including the necessity of writing code
> > to traverse the data structure.
> 
> This is what is done in Flex anyway -- and it's lightning fast.
> Really.
> 
> [snip]
> > I don't think all of this work is justified, since in most cases the
> > result will be "it's not markup, output it verbatim."  
> 
> Well, if you've got a suggestion for another way of doing it, I'm
> dying to hear it :]
> 

There's a story about an animator who worked in a studio making animated 
cartoons.  He always used to look forward to solving the big drawing 
problems, so he saved them for last and got all the little problems out 
of the way first.  But when he finally got around to working on the big 
ones, he always found that he didn't have anything left to do.

You've said on various occasions that the user specifies certain things, 
such as how titles, headings, lists, etc., are formatted.  This 
specification must somehow be translated into a Bison input file.  My 
approach would be to write one such Bison input file, and a corresponding Flex
input file, or a `yylex()' function, keeping in mind that I would like to
generalize and automate the process later.  I think in this way you will get a
better idea of how Bison works.  I discarded my first Bison grammar for 3DLDF
because of what I learned by working with Bison.
I think this approach would be much more promising, seeing as how you haven't 
worked with Bison before.  I also think this task is quite challenging enough
to be getting on with.  I don't think it would be a good idea to try to write
a program for generating Bison input files without having written a Bison
input file oneself first.

I think that "a heading is a single line of text followed only by 
whitespace and two newlines" is a reasonable specification, although it may
have to be refined a bit.  I _don't_ think that "a heading might be 
that, but it might be something else, it's specified by the user" is a 
specification at all.  

I also think that "less markup" is a reasonable goal, but that "virtually 
no markup" is not.

Laurence



reply via email to

[Prev in Thread] Current Thread [Next in Thread]