[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freecats-Dev] Python; matching/indexing strategies
From: |
Stanislav Visnovsky |
Subject: |
Re: [Freecats-Dev] Python; matching/indexing strategies |
Date: |
Tue, 8 Jul 2003 08:28:25 +0200 (CEST) |
On Tue, 8 Jul 2003, Tim Morley wrote:
> -- Trujillo also presents a convincing case for "Stoplists", i.e. a short
> list of words to be ignored for each language treated. (He proposes "a, nd,
> an, by, from, of, or, the, that, to, with" for English.) This does
> contradict a General Guideline(TM) that was mentioned at the meeting a
> couple of weeks ago, namely that whatever we develop should be as
> language-independent as possible. However, consider the example given:
> Input: "Delete all the files in the folder."
> Match1: "Put all the cartridges in the safe."
> Match2: "Delete folder files."
> Match2 is clearly better to the human eye, but it has only three words in
> common with the Input, whereas Match1 has four. Match1 is also closer in
> overall length to the input. Without eliminating "always irrelevant" words,
> I'm not sure how we could get round this.
This is language independent. The list will be simply empty for
some languages. And matching for English would improve quite a lot.
Stanislav