ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ifile-discuss] iFile as help-desk front-end


From: Jason Rennie
Subject: Re: [Ifile-discuss] iFile as help-desk front-end
Date: Fri, 31 Oct 2003 08:44:41 -0500

address@hidden said:
> 1. What do the numbers reported by ifile -q really mean?
>       I believe that for this system, simply giving up and routing to  a
> human would be better than guessing wrong, so I'd like to have a
> "unknown" bin that collects the stuff that isn't matched well by
> ifile. I was under the impression that the numbers reported were a
> "quality of match" metric, but in cases where nothing matches
> (feeding Jabberwocky to ifile when it's been trained on an OS X FAQ)
> returns 0 for all categories. Is this a special case, and if I get
> exactly zero, or some very negative number, I should assume the match
> is poor? 

These are log-likelihoods, one for each class model.  A good indicator of
the "quality of match" is the ratio of the first two numbers.  If you
divide the first number by the second, you'll always get a ratio less than
one.  If the ratio is very close to one (e.g. 0.99), it means that ifile
had a hard time differentiating between those two classes.  If it is
smaller (e.g. 0.9), you can be fairly confident of the prediction.

For example, ifile -q on your e-mail gives me

ifile/discuss -777.66423702
mlists/spam -945.67046261
...

The ratio is 0.822.  Not much doubt that it got the classification 
correct.  Best thing for you to do would be to train up ifile, give it 
test messages and look at the numbers that this ratio outputs to get a 
sense of what number correspond to confident/not confident.

address@hidden said:
>       Is there any advantage to this approach, or am I better off  letting
> ifile sort things out over a large number of bins 

The tiered representation can help, especially if the decisions in the 
tree are very clear-cut.  Another thing that would improve performance of 
ifile is what's known as ECOC classification.  This is where you 
build many different binary classifiers by randomly grouping categories 
together.  Here's a writeup:

  http://www.ai.mit.edu/~jrennie/papers/aimemo2001.pdf

Feel free to send me questions.

Jason





reply via email to

[Prev in Thread] Current Thread [Next in Thread]