Re: [Bug-gnubg] Re: Strange FIBS ratings

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: Strange FIBS ratings

From:	Joseph Heled
Subject:	Re: [Bug-gnubg] Re: Strange FIBS ratings
Date:	Tue, 09 Sep 2003 11:08:20 +1200
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

I started my affair with GNUbg trying to add cube handling to version0.0 (it could play only 1ptr's at that time).I put a bot on FIBS to see how I was doing. I knew nothing about cubetheory then (little has changed :), but I do remember one thing - Iwould find a bug or finally understand a point from some email or webpage - happily code it in expecting to see the bot rating shoot up - butnothing of the sort happened. In the end I came to the conclusion thatany reasonable cube handling would do - as long as you play yourcheckers right. At that point I started looking at improving GNUbg nets :)

So, I certainly agree with Kees conclusion about cube errors being lessimportant than checkers errors. I can't find any merit in Jim (Segrave)quote from the GammonLine thread about non/irretrievable errors. Those"arguments" are a simple rehashing of "Some of my mistakes don't mattersince my opponent will make worse mistakes later", which is not givingany new insight.

Perhaps this is surprising to BG experts, but there is little doubt inmy mind it is true. What needs proving is that what is true for BOTS istrue for Humans. And there is probably a big IF here, since Kees got hisresults from GNUbg play only. I think it might prove hard becausepersonally I think the FIBS formula is flawed when it comes to opponentswith a large rating difference.


-Joseph


address@hidden wrote:

On Mon, 8 Sep 2003, Albert Silver wrote:

Not really. The checker error rate was around 0.013 to 0.014, which in
my book is very poor and deservedly described as only Intermediate.



I'd like to read that book :-)

This is the first big problem I'm having with this. My current average
rating according to GNU, using the new formula, is somewhere between
1850-1900. Probably closer to 1850, but this is unimportant. This is
extremely flattering, but my FIBS rating is about 150 points less. None
of this is exagerrated. It's quite possible I am incredibly unlucky
since in many, if not most, of my matches I play better than my
opponents, according to GNU, but this last item may be sour grapes so
I'll compile some results before affirming this. One thing is 100%
certain: in my last 20 matches, I have only twice had GNU give me a
rating in the 1700s, not one time below, and my rating is around 1700 at
FIBS.



I found for myself similarly that  I usually get a higher rating (1900 -
2100) than  I actually  have (1900),  but  occasionally I  get a  really
dreadfull rating  like 1400 or  worse. These disaster games  correct the
error based estimate.

For example  my "kvandoel" alias over  176 games gets a  GNUBG rating of
1878 +-26 (90% conf. interval). My actual rating is 1910.  Another alias
I sometimes play  under over 63 games gets a GNUBG  rating of 1935 +-28,
actual rating is  1900. Yet another alias got 1916 +-  41 over 25 games,
actual rating 1860.

Some data for other players that I've played a lot (I won't mention their
alias here, for privacy reasons):

(Average of 6  ) 2033 +- 30  actual 2070
(Average of 7  ) 1963 +- 67  actual 2000
(Average of 8  ) 1795 +- 107 actual 1870
(Average of 6  ) 1882 +- 134 actual 1960
(Average of 11 ) 1805 +- 125 actual 1900
(Average of 10 ) 1756 +- 83  actual 1900

(I set the rating offset to  2200, compared to 2050 default, so subtract
150 to those number to convert to your convention).

I'll separate all my FIBS matches, and start saving them to send to you,
plus our ratings. Just tell me what you need exactly.



I need a collection of .mat files.  Put them in a .zip file or something
like that.  I'll run my scripts and send you the results (computed GNUBG
ratings of  the players).  Make sure there  are no badly  formatted .mat
files in the collection or I won't be able to run things.

So the textual scores need to be aligned with the rating version.

I'm having some reservations on this now. I have seen a couple of huge
error rates (what I'm accustomed to thinking of as absolutely dreadful
play overall)  be rightfully  described as Casual  Player by  GNU, and
still get  a very high rating.  I'm  not sure I want  to start finding
those huge error  rates be called World Class or  Expert as this would
be way  off IMO. Right now  I'm inclined to leave  the textual ratings
untouched.



As far as  I know the previous error based  classification was not based
on facts, the current one is. What  reason do you have to believe in the
current relation between level and error rate, which you describe by the
word "rightfully", besides you have gotten used to it?

Don't get me wrong, I am very appreciative of your efforts, but right
now there are problems that need resolving IMHO.



You'll  have  to  be more  specific  I  think  to create  a  meaningfull
argument.  If you  are right,  there must  be something  wrong  with the
results      of      the       simulations      I      presented      on
www.cs.ubc.ca/~kvdoel/tmp/ratings.

I'm open  to that possibility,  but if there  is nothing wrong  with it,
then your accustomed way of thinking is wrong.

Anyways, I think  the best thing to do now is  to try to validate/refute
the  current  rating estimator  by  estimating  ratings  of players  and
compare with their actual rating.


Kees



_______________________________________________
Bug-gnubg mailing list
address@hidden
http://mail.gnu.org/mailman/listinfo/bug-gnubg

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-gnubg] Re: Strange FIBS ratings, (continued)
- Re: [Bug-gnubg] Strange FIBS ratings, Nardy Pillards, 2003/09/05

Prev by Date: Re: [Bug-gnubg] Re: Strange FIBS ratings
Next by Date: Re: [Bug-gnubg] Checker/Cube annotation + suggestion
Previous by thread: RE: [Bug-gnubg] Re: Strange FIBS ratings
Next by thread: Re: [Bug-gnubg] Re: Strange FIBS ratings
Index(es):
- Date
- Thread