bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Some idle musings re. ratings


From: Jim Segrave
Subject: Re: [Bug-gnubg] Some idle musings re. ratings
Date: Thu, 11 Sep 2003 20:31:23 +0200
User-agent: Mutt/1.4i

On Thu 11 Sep 2003 (18:57 +0200), address@hidden wrote:
> On Thu, 11 Sep 2003, Joern Thyssen wrote:
> 
> > > This is not a realistic model in that no-one ever plays with a
> > > consistent MWC, but I was wondering if we could use the luck adjusted
> > > result as an indicator of MWC to model FIBS ratings and to compare them
> > > to the current function  which Kees van Doel generated.
> >
> > I think that is exactly what what Kees did.
> >
> > He calculated relative rating estimates of gnubg 0-ply versus gnubg
> > 0-ply with noise and found a relationship between error rates and the
> > relative rating estimates.
> 
> That's an  excellent summary. I'm glad  at least one person  has read my
> writeup!
> 
> > Rethinking, I would have prefered to relate the error rates and MWCs,
> > and then apply the FIBS rating formulae afterwards.
> 
> Actually that IS what I did, I must have forgotten to mention it. I will
> correct  that.   It is  impossible  to  work  with the  relative  rating
> directly  as  it  is  undefined  quite  often  when  the  luck  adjusted
> prob. estimates are negative.

Ok - I didn't express myself well as that's (relate the error rates
and MWCs) what I was trying to express. I did read your writeup. I was
simply playing with a very simplistic model to see how the FIBs rating
behaves in the face of consistent MWC.

> > One of the issues raised by Douglas is if the experiment of 0-ply versus
> > 0-ply noise really represents a good model of human play.
> 
> My working  assumtion is that  it is. So  far data on real  people shows
> that is usually  is but there is an outlier: Mr  Albert Silver. I'd like
> to analyse more human matches but  I don't think anyone is going to help
> me getting them except Albert, so I'll probably drop the ball on this at
> some point.

I suspect that if I played on FIBS I'd also be an outlier.

> Regarding modeling human error by noise:
> 
> One thing I would expect is the noise errors to be uniformly distributed
> over the moves of a match,  whereas the human errors would tend to clump
> together when a  type of position arises that  is incorrectly handled by
> the human. For myself, when I  get to a difficult position (in the human
> sense) my errors clump in that region, because it is more difficult.
> 
> I don't see  however how that (clumping versus  uniformity) would affect
> the rating.

I usually find when playing a 7 point match against gnubg that of the
say 6 games in the match, 4 of them will give me a consistent
low (for me) error rate and one or two games will have almost all of
the errors, not uncommonly 2 or 3 major errors within a move or two of
each other.
 
> I understand  the noise is injected in  the outputs of the  NN. I always
> have had the feeling  that it would be a better model  of human error to
> inject the noise into the WEIGHTS of the NN. Now that I think about it I
> think this might also introduce clumping effects, like when the position
> moves  into a  region whose  processing has  been damaged  a lot  by the
> partial lobotomy.

Hmm, I sometimes think that describes my off days at bg.

> I guess  I could just  externally disturb the  weights file to  create a
> number of braindamaged bots and  experiment with those.  Any pointers to
> where I  can find the file  format for the  wieghts files? Or is  this a
> stupid idea anyways?

I have my doubts about this one. It would certainly be even harder to
give any justification for altering the weights as being a model of
human (mis)play, even more so that injecting random noise. And I
suspect it would be much harder to produce a controlled result.

I'd also speculate you'll have difficulty finding any general model
for real human errors - there are the ones caused by simply failing to
see a move, whether by not looking, miscounting points or
whatever. There are ones caused by not understanding a type of
game. There are ones caused by too careless play, steaming, being too
cautious and waiting too long, failing to see potential responses, you
name it, people will find a way to screw it up. And every person will
have their own mix of errors they make, which even for one person may
vary with time, alcohol, their assesment of their opponent, etc.


-- 
Jim Segrave           address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]