[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-greedy wildcard possible? (Long)
From: |
Laurence Finston |
Subject: |
Re: Non-greedy wildcard possible? (Long) |
Date: |
Thu, 20 May 2004 20:16:10 +0200 |
User-agent: |
IMHO/0.98.3+G (Webmail for Roxen) |
-------------------
> Laurence Finston <address@hidden>:
> >
> [snip]
> > I don't like to wait, whether it's linear, logarithmic, or
> > exponential time. Documents can easily have hundreds of thousands
> > of characters. If I have to match strings in the input against 100
> > "suspicious" tokens this will take time. In addition, since there
> > are no hardwired criteria for determining the beginning and end of
> > an item of input, it seems quite likely that every character in the
> > input will have to be tested against the list of "suspicious"
> > tokens, whether individually, or as the character in the nth
> > position of a string.
>
> I don't understand your point here; if the lexer has created a state
> machine for the tokens (as in flex) you only have to look at each
> character once -- not once for each "possible" token. This is no
> different from ordinar tokenization.
>
The kind of matching you seem to want is quite complex. I don't know
whether it can be formulated in Flex's regexp syntax. In addition,
you've stated that the list of suspicious tokens differs depending on
circumstances. I don't know whether you've changed your mind in the
interim, because I haven't been following that thread of the discussion.
At one time, you wanted the list of suspicious tokens to depend on the
valid tokens in a given parser state. In order to know this, you must
have run Bison on the Bison input file, or be able to figure it out
yourself. A further consequence is that one regexp won't be enough.
You would probably have to use start conditions, and feed information
back from the parser to the scanner in order to switch from one to
another, if I have understood other remarks of yours correctly. I don't
know how Flex performs regexp matching, but the manual states that
backing up is sometimes required and this can degrade performance.
Since I don't think this approach will lead to success anyway, I don't
see any point in going any further into objections with respect to the
details of possible implementations.
> This is, in fact, one of the reasons why my current implementation is
> slow, I think, because I'm using the simplistic implementation of
> searching for the various tokens independently.
>
> > It might be possible to reduce the number of comparisons by sorting
> > the "suspicious" tokens into an appropriate data structure, but this
> > has associated costs, too, including the necessity of writing code
> > to traverse the data structure.
>
> This is what is done in Flex anyway -- and it's lightning fast.
> Really.
>
> [snip]
> > I don't think all of this work is justified, since in most cases the
> > result will be "it's not markup, output it verbatim."
>
> Well, if you've got a suggestion for another way of doing it, I'm
> dying to hear it :]
>
There's a story about an animator who worked in a studio making animated
cartoons. He always used to look forward to solving the big drawing
problems, so he saved them for last and got all the little problems out
of the way first. But when he finally got around to working on the big
ones, he always found that he didn't have anything left to do.
You've said on various occasions that the user specifies certain things,
such as how titles, headings, lists, etc., are formatted. This
specification must somehow be translated into a Bison input file. My
approach would be to write one such Bison input file, and a corresponding Flex
input file, or a `yylex()' function, keeping in mind that I would like to
generalize and automate the process later. I think in this way you will get a
better idea of how Bison works. I discarded my first Bison grammar for 3DLDF
because of what I learned by working with Bison.
I think this approach would be much more promising, seeing as how you haven't
worked with Bison before. I also think this task is quite challenging enough
to be getting on with. I don't think it would be a good idea to try to write
a program for generating Bison input files without having written a Bison
input file oneself first.
I think that "a heading is a single line of text followed only by
whitespace and two newlines" is a reasonable specification, although it may
have to be refined a bit. I _don't_ think that "a heading might be
that, but it might be something else, it's specified by the user" is a
specification at all.
I also think that "less markup" is a reasonable goal, but that "virtually
no markup" is not.
Laurence
- Re: Non-greedy wildcard possible? (Long), (continued)
Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/18
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/18
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/19
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/20
- Re: Non-greedy wildcard possible? (Long),
Laurence Finston <=
- Message not available
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/20
- Re: Non-greedy wildcard possible? (Long), Laurence Finston, 2004/05/20
- Message not available
- Re: Non-greedy wildcard possible? (Long), Magnus Lie Hetland, 2004/05/20
Re: Non-greedy wildcard possible? (Long), Frank Heckenbach, 2004/05/27
Re: Non-greedy wildcard possible? (Long), Hans Aberg, 2004/05/20
RE: Non-greedy wildcard possible? (Long), Vincent Zweije, 2004/05/28