bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Some idle musings re. ratings


From: kvandoel
Subject: Re: [Bug-gnubg] Some idle musings re. ratings
Date: Thu, 11 Sep 2003 18:57:22 +0200 (CEST)

On Thu, 11 Sep 2003, Joern Thyssen wrote:

> > This is not a realistic model in that no-one ever plays with a
> > consistent MWC, but I was wondering if we could use the luck adjusted
> > result as an indicator of MWC to model FIBS ratings and to compare them
> > to the current function  which Kees van Doel generated.
>
> I think that is exactly what what Kees did.
>
> He calculated relative rating estimates of gnubg 0-ply versus gnubg
> 0-ply with noise and found a relationship between error rates and the
> relative rating estimates.

That's an  excellent summary. I'm glad  at least one person  has read my
writeup!

> Rethinking, I would have prefered to relate the error rates and MWCs,
> and then apply the FIBS rating formulae afterwards.

Actually that IS what I did, I must have forgotten to mention it. I will
correct  that.   It is  impossible  to  work  with the  relative  rating
directly  as  it  is  undefined  quite  often  when  the  luck  adjusted
prob. estimates are negative.

So I  do all the averaging  on the luck adjusted  results, including the
neg. probability  estimates (wouldn't  it be better  to clamp <0  and >1
probabilities to 0  and 1 is a question I've  pondered.., any ideas?). I
also get a variance and confidence interval from this. Then, at the very
end, I translate  the estimated MWC and confidence  interval into an ELO
interval.

> One of the issues raised by Douglas is if the experiment of 0-ply versus
> 0-ply noise really represents a good model of human play.

My working  assumtion is that  it is. So  far data on real  people shows
that is usually  is but there is an outlier: Mr  Albert Silver. I'd like
to analyse more human matches but  I don't think anyone is going to help
me getting them except Albert, so I'll probably drop the ball on this at
some point.

> Also, what relationship do we expect between error rates and lost MWC?
> My guess would be linear for some fixed match length, but it depends on
> if you make the errors in the beginning or towards the end of a match.

> Using 0-ply with noise will roughly give constant error rates throughout
> the match, so I'd expect to see a linear relationship between error
> rates and lost MWC for a fixed match length. I'm not sure if this holds
> for human play -- I guess you would need to investigate a large number
> of human matches.

Yes, as you can see from the data at
http://www.cs.ubc.ca/~kvdoel/tmp/ratings
the rating data looks linear.

> The FIBs rating formula is:
>
> D = -2000/sqrt(n) * log10( 1/p - 1 )
>
> For values of p around 0.5 the rating difference D is close to linear in
> p, so we would get a linear relationship between D and error rates --
> exactly that Kees found.

> Kees uses a a/N+b extrapolation formulae for the match length.  I would
> guess from the FIBS rating formulae that a/sqrt(N)+b would be better.

Note that  this formula also  popped clearly out  of the data,  I didn't
arbitrarily chose that  extrapolation. That power is really  1, not 1/2,
and one of  the conclusions you can draw from these  results is that the
FIBS  formula  is  indeed  flawed  (i.e., your  rating  depends  on  the
matchlenght you  play: if you  play short matches agains  weaker players
and long ones against stronger players  you get a higher rating than the
other way around. Quite substantially higher if you believe my data.

Regarding modeling human error by noise:

One thing I would expect is the noise errors to be uniformly distributed
over the moves of a match,  whereas the human errors would tend to clump
together when a  type of position arises that  is incorrectly handled by
the human. For myself, when I  get to a difficult position (in the human
sense) my errors clump in that region, because it is more difficult.

I don't see  however how that (clumping versus  uniformity) would affect
the rating.

I understand  the noise is injected in  the outputs of the  NN. I always
have had the feeling  that it would be a better model  of human error to
inject the noise into the WEIGHTS of the NN. Now that I think about it I
think this might also introduce clumping effects, like when the position
moves  into a  region whose  processing has  been damaged  a lot  by the
partial lobotomy.

I guess  I could just  externally disturb the  weights file to  create a
number of braindamaged bots and  experiment with those.  Any pointers to
where I  can find the file  format for the  wieghts files? Or is  this a
stupid idea anyways?

Kees





reply via email to

[Prev in Thread] Current Thread [Next in Thread]