bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Some idle musings re. ratings


From: Joseph Heled
Subject: Re: [Bug-gnubg] Some idle musings re. ratings
Date: Fri, 12 Sep 2003 07:27:26 +1200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624


I guess you can create "braindamaged" bots by altering the weights file, but I can't see why that would model humans better. This leads to more "random" errors, and human error is, if anything, "less" random than bot error.

And I am not sure if we need to. The source of human errors and modeling it is a fascinating problem, but we are not trying to model it. Why do we care how/when the errors occur? When they do they change our chance to win, and wins/loses change our rating.

What can happen is that GNUbg may do a worse estimate for humans which play in a style which leads to positions GNUbg understands less.

ParlorBot, which we hope is around 2050 reaches 1900 once in a while, so a 150 points "difference" on FIBS is not necessarily an anomaly.

-Joseph


Jim Segrave wrote:
On Thu 11 Sep 2003 (18:57 +0200), address@hidden wrote:

On Thu, 11 Sep 2003, Joern Thyssen wrote:


This is not a realistic model in that no-one ever plays with a
consistent MWC, but I was wondering if we could use the luck adjusted
result as an indicator of MWC to model FIBS ratings and to compare them
to the current function  which Kees van Doel generated.

I think that is exactly what what Kees did.

He calculated relative rating estimates of gnubg 0-ply versus gnubg
0-ply with noise and found a relationship between error rates and the
relative rating estimates.

That's an  excellent summary. I'm glad  at least one person  has read my
writeup!


Rethinking, I would have prefered to relate the error rates and MWCs,
and then apply the FIBS rating formulae afterwards.

Actually that IS what I did, I must have forgotten to mention it. I will
correct  that.   It is  impossible  to  work  with the  relative  rating
directly  as  it  is  undefined  quite  often  when  the  luck  adjusted
prob. estimates are negative.


Ok - I didn't express myself well as that's (relate the error rates
and MWCs) what I was trying to express. I did read your writeup. I was
simply playing with a very simplistic model to see how the FIBs rating
behaves in the face of consistent MWC.


One of the issues raised by Douglas is if the experiment of 0-ply versus
0-ply noise really represents a good model of human play.

My working  assumtion is that  it is. So  far data on real  people shows
that is usually  is but there is an outlier: Mr  Albert Silver. I'd like
to analyse more human matches but  I don't think anyone is going to help
me getting them except Albert, so I'll probably drop the ball on this at
some point.


I suspect that if I played on FIBS I'd also be an outlier.


Regarding modeling human error by noise:

One thing I would expect is the noise errors to be uniformly distributed
over the moves of a match,  whereas the human errors would tend to clump
together when a  type of position arises that  is incorrectly handled by
the human. For myself, when I  get to a difficult position (in the human
sense) my errors clump in that region, because it is more difficult.

I don't see  however how that (clumping versus  uniformity) would affect
the rating.


I usually find when playing a 7 point match against gnubg that of the
say 6 games in the match, 4 of them will give me a consistent
low (for me) error rate and one or two games will have almost all of
the errors, not uncommonly 2 or 3 major errors within a move or two of
each other.
I understand  the noise is injected in  the outputs of the  NN. I always
have had the feeling  that it would be a better model  of human error to
inject the noise into the WEIGHTS of the NN. Now that I think about it I
think this might also introduce clumping effects, like when the position
moves  into a  region whose  processing has  been damaged  a lot  by the
partial lobotomy.


Hmm, I sometimes think that describes my off days at bg.


I guess  I could just  externally disturb the  weights file to  create a
number of braindamaged bots and  experiment with those.  Any pointers to
where I  can find the file  format for the  wieghts files? Or is  this a
stupid idea anyways?


I have my doubts about this one. It would certainly be even harder to
give any justification for altering the weights as being a model of
human (mis)play, even more so that injecting random noise. And I
suspect it would be much harder to produce a controlled result.

I'd also speculate you'll have difficulty finding any general model
for real human errors - there are the ones caused by simply failing to
see a move, whether by not looking, miscounting points or
whatever. There are ones caused by not understanding a type of
game. There are ones caused by too careless play, steaming, being too
cautious and waiting too long, failing to see potential responses, you
name it, people will find a way to screw it up. And every person will
have their own mix of errors they make, which even for one person may
vary with time, alcohol, their assesment of their opponent, etc.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]