bsf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TODO


From: Aldrin Martoq
Subject: Re: TODO
Date: Fri, 4 Jul 2003 03:28:45 -0400
User-agent: Mutt/1.3.28i

On Thu, Jul 03, 2003 at 04:57:29PM -0400, Ma?ungo wrote:
> I can't access CSV in the Savannah page, so here we go... 

First, check:
http://savannah.nongnu.org/cvs/?group=bsf
http://savannah.nongnu.org/cgi-bin/viewcvs/bsf/

Next, download:
cvs -d:pserver:address@hidden:/cvsroot/bsf login
cvs -d:pserver:address@hidden:/cvsroot/bsf co testsuite


> INCOMPLETE TODO/WISHLIST/IDEAS/FOR DICUSSION:
[..]

I'm not sure of how much "improved" is my version. As stated somewhere
in this list, it's more a _testsuite_ rather than a full useful lovely
nice program.


Instead of this TODO, I would like to throw all away and start with a
new framework, which will allow us to change specific parts of the
program AND to use the program as a testsuite, as a daemon,
or as a MUA plugin. Say:

- Fetch: get a message. Differents ways:
        - server (sendmail, exim, whatever plugin).
        - unix mua (pine, mutt, you name it).
        - gnome/kde/other mua's (evolution, ...).
        - Outlook Express or whatever is using Windows people.
        
        Output is a message, mbox format.

- Processing: reads mbox formated message, un-{mime,pgpencrypt,code,html}.
Extract tokens, keywors or any other useful data.
        Output is a set of "streams":
        - metadata (From header, signature..., ?)
        - message body
        - other metadata like message length, mime type, language, ...

- Core: reads streams. Calculate Prob. Use metadata. Output is:
        - message body guilty percent (80%)
        - metadata accepted rules (email whitelist, guilty percent,
        accepted languages, ...)

- Output: do something with the results of core, return a code or
something to the caller. Some examples:
        - Drop the message
        - Add X-Spam-* headers
        - Move to spam folder
        - exit (0) or (1) or <0 in case of error :-)

- Database manager: handles all tokens, update/remove tokens. Keeps
whitelists. Keeps statistics of filter performance. May suggests some
tunnings during and after training. Examples:
        - "It seems your guilty limit is too low(high) that we are
        getting false negatives (positives). Would you like to set it 50%?" 
        - "Would you like to add these emails to the whitelist?"


If you _agree_ with this:
        - Volunteers?
        - I could start this in about 1 week.



OTOH, I'm quite dissapointed with Bayesian filters. While I worked here,
I "visioned" a full system which will simulate a secretary on your
desktop... Remember that? :-). Spam-filtering would be a part of that
system.

Some outlines are in the _January_ email:
http://mail.gnu.org/archive/html/bsf-devel/2003-01/msg00000.html
Some screenshots of remembrance-agent:
http://www.dcc.uchile.cl/~amartoq/info-agent/


But I think we should finish this, and then start a new project.


-- 
Aldrin.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]