bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: Strange FIBS ratings


From: Joseph Heled
Subject: Re: [Bug-gnubg] Re: Strange FIBS ratings
Date: Tue, 09 Sep 2003 11:08:20 +1200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624


I started my affair with GNUbg trying to add cube handling to version 0.0 (it could play only 1ptr's at that time). I put a bot on FIBS to see how I was doing. I knew nothing about cube theory then (little has changed :), but I do remember one thing - I would find a bug or finally understand a point from some email or web page - happily code it in expecting to see the bot rating shoot up - but nothing of the sort happened. In the end I came to the conclusion that any reasonable cube handling would do - as long as you play your checkers right. At that point I started looking at improving GNUbg nets :)

So, I certainly agree with Kees conclusion about cube errors being less important than checkers errors. I can't find any merit in Jim (Segrave) quote from the GammonLine thread about non/irretrievable errors. Those "arguments" are a simple rehashing of "Some of my mistakes don't matter since my opponent will make worse mistakes later", which is not giving any new insight.

Perhaps this is surprising to BG experts, but there is little doubt in my mind it is true. What needs proving is that what is true for BOTS is true for Humans. And there is probably a big IF here, since Kees got his results from GNUbg play only. I think it might prove hard because personally I think the FIBS formula is flawed when it comes to opponents with a large rating difference.

-Joseph


address@hidden wrote:
On Mon, 8 Sep 2003, Albert Silver wrote:


Not really. The checker error rate was around 0.013 to 0.014, which in
my book is very poor and deservedly described as only Intermediate.


I'd like to read that book :-)


This is the first big problem I'm having with this. My current average
rating according to GNU, using the new formula, is somewhere between
1850-1900. Probably closer to 1850, but this is unimportant. This is
extremely flattering, but my FIBS rating is about 150 points less. None
of this is exagerrated. It's quite possible I am incredibly unlucky
since in many, if not most, of my matches I play better than my
opponents, according to GNU, but this last item may be sour grapes so
I'll compile some results before affirming this. One thing is 100%
certain: in my last 20 matches, I have only twice had GNU give me a
rating in the 1700s, not one time below, and my rating is around 1700 at
FIBS.


I found for myself similarly that  I usually get a higher rating (1900 -
2100) than  I actually  have (1900),  but  occasionally I  get a  really
dreadfull rating  like 1400 or  worse. These disaster games  correct the
error based estimate.

For example  my "kvandoel" alias over  176 games gets a  GNUBG rating of
1878 +-26 (90% conf. interval). My actual rating is 1910.  Another alias
I sometimes play  under over 63 games gets a GNUBG  rating of 1935 +-28,
actual rating is  1900. Yet another alias got 1916 +-  41 over 25 games,
actual rating 1860.

Some data for other players that I've played a lot (I won't mention their
alias here, for privacy reasons):

(Average of 6  ) 2033 +- 30  actual 2070
(Average of 7  ) 1963 +- 67  actual 2000
(Average of 8  ) 1795 +- 107 actual 1870
(Average of 6  ) 1882 +- 134 actual 1960
(Average of 11 ) 1805 +- 125 actual 1900
(Average of 10 ) 1756 +- 83  actual 1900

(I set the rating offset to  2200, compared to 2050 default, so subtract
150 to those number to convert to your convention).


I'll separate all my FIBS matches, and start saving them to send to you,
plus our ratings. Just tell me what you need exactly.


I need a collection of .mat files.  Put them in a .zip file or something
like that.  I'll run my scripts and send you the results (computed GNUBG
ratings of  the players).  Make sure there  are no badly  formatted .mat
files in the collection or I won't be able to run things.


So the textual scores need to be aligned with the rating version.


I'm having some reservations on this now. I have seen a couple of huge
error rates (what I'm accustomed to thinking of as absolutely dreadful
play overall)  be rightfully  described as Casual  Player by  GNU, and
still get  a very high rating.  I'm  not sure I want  to start finding
those huge error  rates be called World Class or  Expert as this would
be way  off IMO. Right now  I'm inclined to leave  the textual ratings
untouched.


As far as  I know the previous error based  classification was not based
on facts, the current one is. What  reason do you have to believe in the
current relation between level and error rate, which you describe by the
word "rightfully", besides you have gotten used to it?


Don't get me wrong, I am very appreciative of your efforts, but right
now there are problems that need resolving IMHO.


You'll  have  to  be more  specific  I  think  to create  a  meaningfull
argument.  If you  are right,  there must  be something  wrong  with the
results      of      the       simulations      I      presented      on
www.cs.ubc.ca/~kvdoel/tmp/ratings.

I'm open  to that possibility,  but if there  is nothing wrong  with it,
then your accustomed way of thinking is wrong.

Anyways, I think  the best thing to do now is  to try to validate/refute
the  current  rating estimator  by  estimating  ratings  of players  and
compare with their actual rating.


Kees



_______________________________________________
Bug-gnubg mailing list
address@hidden
http://mail.gnu.org/mailman/listinfo/bug-gnubg







reply via email to

[Prev in Thread] Current Thread [Next in Thread]