freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Indexing - Yet Another (BETTER) Version


From: Henri Chorand
Subject: [Freecats-Dev] Indexing - Yet Another (BETTER) Version
Date: Tue, 28 Jan 2003 13:00:10 +0100

Hi all,

Sorry to bother you with all these different files. The ONLY modified part
is (ALSO) the fuzzy search algorithm.

Apart from typos in numbering, I have suddenly realized that we need to
weigh N-Grams in order to avoid giving a higher priority to "long" words
than to "short" words, which happened because we generated as many N-Grams
as we could from any given word.

Other improvement: when a given word is very long, we now take only its
longest (and most discriminating) N-Grams, and drop the larger number of
(quite probably less meaningful) small ones. (This will also provide a
little more speed).

I included a basic working algorithm which may, of course, be refined.

All this may be done differently for context search.


Thanks for your patience. I won't publish anything else today, I promise !


Regards,

Henri Chorand

Attachment: DB_indexing 0.2.1.rtf
Description: MS-Word document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]