bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-gnubg] Re: Strange FIBS ratings


From: kvandoel
Subject: RE: [Bug-gnubg] Re: Strange FIBS ratings
Date: Mon, 8 Sep 2003 20:50:22 +0200 (CEST)

On Mon, 8 Sep 2003, Albert Silver wrote:

> Not really. The checker error rate was around 0.013 to 0.014, which in
> my book is very poor and deservedly described as only Intermediate.

I'd like to read that book :-)

> This is the first big problem I'm having with this. My current average
> rating according to GNU, using the new formula, is somewhere between
> 1850-1900. Probably closer to 1850, but this is unimportant. This is
> extremely flattering, but my FIBS rating is about 150 points less. None
> of this is exagerrated. It's quite possible I am incredibly unlucky
> since in many, if not most, of my matches I play better than my
> opponents, according to GNU, but this last item may be sour grapes so
> I'll compile some results before affirming this. One thing is 100%
> certain: in my last 20 matches, I have only twice had GNU give me a
> rating in the 1700s, not one time below, and my rating is around 1700 at
> FIBS.

I found for myself similarly that  I usually get a higher rating (1900 -
2100) than  I actually  have (1900),  but  occasionally I  get a  really
dreadfull rating  like 1400 or  worse. These disaster games  correct the
error based estimate.

For example  my "kvandoel" alias over  176 games gets a  GNUBG rating of
1878 +-26 (90% conf. interval). My actual rating is 1910.  Another alias
I sometimes play  under over 63 games gets a GNUBG  rating of 1935 +-28,
actual rating is  1900. Yet another alias got 1916 +-  41 over 25 games,
actual rating 1860.

Some data for other players that I've played a lot (I won't mention their
alias here, for privacy reasons):

(Average of 6  ) 2033 +- 30  actual 2070
(Average of 7  ) 1963 +- 67  actual 2000
(Average of 8  ) 1795 +- 107 actual 1870
(Average of 6  ) 1882 +- 134 actual 1960
(Average of 11 ) 1805 +- 125 actual 1900
(Average of 10 ) 1756 +- 83  actual 1900

(I set the rating offset to  2200, compared to 2050 default, so subtract
150 to those number to convert to your convention).

> I'll separate all my FIBS matches, and start saving them to send to you,
> plus our ratings. Just tell me what you need exactly.

I need a collection of .mat files.  Put them in a .zip file or something
like that.  I'll run my scripts and send you the results (computed GNUBG
ratings of  the players).  Make sure there  are no badly  formatted .mat
files in the collection or I won't be able to run things.

> > So the textual scores need to be aligned with the rating version.

> I'm having some reservations on this now. I have seen a couple of huge
> error rates (what I'm accustomed to thinking of as absolutely dreadful
> play overall)  be rightfully  described as Casual  Player by  GNU, and
> still get  a very high rating.  I'm  not sure I want  to start finding
> those huge error  rates be called World Class or  Expert as this would
> be way  off IMO. Right now  I'm inclined to leave  the textual ratings
> untouched.

As far as  I know the previous error based  classification was not based
on facts, the current one is. What  reason do you have to believe in the
current relation between level and error rate, which you describe by the
word "rightfully", besides you have gotten used to it?

> Don't get me wrong, I am very appreciative of your efforts, but right
> now there are problems that need resolving IMHO.

You'll  have  to  be more  specific  I  think  to create  a  meaningfull
argument.  If you  are right,  there must  be something  wrong  with the
results      of      the       simulations      I      presented      on
www.cs.ubc.ca/~kvdoel/tmp/ratings.

I'm open  to that possibility,  but if there  is nothing wrong  with it,
then your accustomed way of thinking is wrong.

Anyways, I think  the best thing to do now is  to try to validate/refute
the  current  rating estimator  by  estimating  ratings  of players  and
compare with their actual rating.


Kees





reply via email to

[Prev in Thread] Current Thread [Next in Thread]