ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-discuss] Re: Saving ifile database source files


From: Clemens Fischer
Subject: [Ifile-discuss] Re: Saving ifile database source files
Date: 30 Aug 2003 14:54:18 +0200
User-agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (berkeley-unix)

* Joe Kelsey:

> Currently, I plan to delete old database files to keep the directory
> sizes under control.

you don't have to do this:  ifile keeps only so many words in its
database.  for this it has a stoplist and throws out rarely used
words.  back when i used ifile for spam/non-spam cassification, my
database never grew beyond a few hundred kilobytes and i never had to
trim it.

> Why spend so much time on the website tallking about organizing huge
> quantities of mail if all you only really need the word counts?

a "good" spam-corpus is worth a lot (to me, at least), especially if
it contains the entire "diversity" of sh*t spammers come up with.  i'd
say ifile does good with tenths of messages if you only have
spam/non-spam, but a few hundred of both are better.  as for trusting
the person compiling the spams, i had a look at some, and they
contained nothing but real spam.  the only thing that might matter to
you is this:  spam sent to americans differs considerably from that
sent to europeans, and you definitely need a number of asian-language
spam these days.

  clemens




reply via email to

[Prev in Thread] Current Thread [Next in Thread]