help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flex or bison?


From: Hans Lodder
Subject: Re: Flex or bison?
Date: Sun, 19 Sep 2010 09:33:48 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3

On 18-9-2010 10:34, Hans Aberg wrote:
On 18 Sep 2010, at 09:15, Hans Lodder wrote:

I am currently performing a Seach Engine Optimization (SEO) of HTML
web-pages of my web-site (on Win XP Home SP3). In order to do that it
is important to know, which 3 words are used most frequently on the
page. So I wrote a cross referencer (in C) to find those. The 2nd step
is find the 3 most frequently used word groups, consisting of 2 words.
The results of both should be combined.

Now I have several possibilities. It is easy to do this in C as well.
Alternatives are using flex, or the combination of flex and bison.

To have Flex identify a word is easy:

[-0-9A-Za-z]+

So is the identification of 2 words:

[-0-9A-Za-z](' '|\t)[-0-9A-Za-z]

The easiest way to implement this is to write 2 programs, and manually
combine the result.

Now my question is: Can both be combined in 1 Flex, or Flex and Bison
program. Flex will try to satisfy the longest match, so it will not
find the single word. Does this imply that I should introduce some
functionality like a 'Moving Average Filter'? Are there better solutions?

In a common Flex/Bison setup, the lexer finds the identifiers which are
handed over to the parser, though one may do it otherwise, would need
arise. So translated to your case, the Flex generated lexer would find
the words handed over to the Bison generated parser words.

But you might want be able to identify word sequences with overlap, like
in a sequence of words w_1, w_2, w_3, ..., finding both w_1 w_2 and w_2
w_3. The parser/lexer combination consumes the input without
backtracking, so you need to do this in the code of the actions.

Since the parsing is very simple, you might be better off with doing it
all in a high level language, for example Haskell (using Hugs and
GHC/GHCi) or perhaps Perl, cutting development time.

Thank you! I will look into those.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]