freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freecats-Dev] Python; matching/indexing strategies


From: Stanislav Visnovsky
Subject: Re: [Freecats-Dev] Python; matching/indexing strategies
Date: Tue, 8 Jul 2003 08:28:25 +0200 (CEST)

On Tue, 8 Jul 2003, Tim Morley wrote:

>  -- Trujillo also presents a convincing case for "Stoplists", i.e. a short
> list of words to be ignored for each language treated. (He proposes "a, nd,
> an, by, from, of, or, the, that, to, with" for English.) This does
> contradict a General Guideline(TM) that was mentioned at the meeting a
> couple of weeks ago, namely that whatever we develop should be as
> language-independent as possible. However, consider the example given:
>   Input: "Delete all the files in the folder."
>   Match1: "Put all the cartridges in the safe."
>   Match2: "Delete folder files."
> Match2 is clearly better to the human eye, but it has only three words in
> common with the Input, whereas Match1 has four. Match1 is also closer in
> overall length to the input. Without eliminating "always irrelevant" words,
> I'm not sure how we could get round this.

This is language independent. The list will be simply empty for 
some languages. And matching for English would improve quite a lot.

Stanislav






reply via email to

[Prev in Thread] Current Thread [Next in Thread]