ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-discuss] A few questions about ifile features compared to other c


From: Xavier DUTOIT
Subject: [Ifile-discuss] A few questions about ifile features compared to other classifiers
Date: Mon, 22 Sep 2003 17:59:54 +0200
User-agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.5b) Gecko/20030901 Thunderbird/0.2

Dear all,

I've been looking at lots of bayesian filtering OSS around, and yours seem to be one of few multi-purpose (with POPFile) as opposed to "simple" antispam tools. I think ifile is the most interresting one to do what I want (a generic mail classifier on the server), but compared to others (mostly spam based), it seems to be not as complete as the other. I've seen some features that seem to be quite usefull,  could you tell me what you thing about them (ie. I don't find them usefull, to complicated to include, working on it, since you've asked, try to add such a feature...) :

1) storage based on a real database ( Berkeley DB for instance) instead of your file format ?
Do you think it would improve its performance ?

2) Mail parsing .
Features like recognition and decoding of MIME attachments in quoted-printable and base64 encoding, Ignores HTML tags in emails, handling things like V'I'A'G'R'A (random choosed example ;), Scores only the Received, Subject, To, From, and Cc headers...

Well, if I'm correct, the only thing you can do right now is either parsing the header in full or ignoring it. Althrough It arguable about where to put the mail parser code (should it be done elsewhere that in ifile ?), I feel that it is important to take the mail formats specificities into account when analysing its content.

Have you tried to adapt the code writed in bogofilter for instance to add such features to ifile ? Do you think It's worth trying (I'm volunteering) ?

3) A last thing about sort accuracy.
I read in one page that some of you reached 96% accurate classification. How have you calculated that ? Do you all have such high percentage ?



Thanks in advance,

Xavier



reply via email to

[Prev in Thread] Current Thread [Next in Thread]