help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flex or bison?


From: Hans Aberg
Subject: Re: Flex or bison?
Date: Sat, 18 Sep 2010 10:34:08 +0200

On 18 Sep 2010, at 09:15, Hans Lodder wrote:

I am currently performing a Seach Engine Optimization (SEO) of HTML web-pages of my web-site (on Win XP Home SP3). In order to do that it is important to know, which 3 words are used most frequently on the page. So I wrote a cross referencer (in C) to find those. The 2nd step is find the 3 most frequently used word groups, consisting of 2 words. The results of both should be combined.

Now I have several possibilities. It is easy to do this in C as well. Alternatives are using flex, or the combination of flex and bison.

To have Flex identify a word is easy:

[-0-9A-Za-z]+

So is the identification of 2 words:

[-0-9A-Za-z](' '|\t)[-0-9A-Za-z]

The easiest way to implement this is to write 2 programs, and manually combine the result.

Now my question is: Can both be combined in 1 Flex, or Flex and Bison program. Flex will try to satisfy the longest match, so it will not find the single word. Does this imply that I should introduce some functionality like a 'Moving Average Filter'? Are there better solutions?

In a common Flex/Bison setup, the lexer finds the identifiers which are handed over to the parser, though one may do it otherwise, would need arise. So translated to your case, the Flex generated lexer would find the words handed over to the Bison generated parser words.

But you might want be able to identify word sequences with overlap, like in a sequence of words w_1, w_2, w_3, ..., finding both w_1 w_2 and w_2 w_3. The parser/lexer combination consumes the input without backtracking, so you need to do this in the code of the actions.

Since the parsing is very simple, you might be better off with doing it all in a high level language, for example Haskell (using Hugs and GHC/GHCi) or perhaps Perl, cutting development time.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]