[Bug-gnubg] Re: ratings formula, checker play vs. cube

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Re: ratings formula, checker play vs. cube

From:	kvandoel
Subject:	[Bug-gnubg] Re: ratings formula, checker play vs. cube
Date:	Sun, 14 Sep 2003 02:46:04 +0200 (CEST)

Sat, 13 Sep 2003 18:47:05, Robert-Jan Veldhuizen:

>>I think you are still missing the point that the contributions of
>>checker play  errors and  cube play errors  are converted  to equivalent
>>rating  loss  INDEPENDENTLY.

>No, you miss my point. The above is obvious.

It's not so obvious, and is a very recent feature.

>Let me repeat. The problem is,  cube errors are input into your formula
>AFTER  being   divided  by  what  GNUBG  considers   "actual  or  close
>decisions". As I explained, that is often no good for match play and as
>such this gives inaccurate ratings. Read my other posts to see why.

One of your  arguments for this that you brought  forward was that since
checker errors  are divided by total unforced  moves, for "consistency",
or  to make  the "easier  to compare",  cube errors  should  also.  This
argument makes no sense since  cube errors and checker errors contribute
independently to the rating estimate. Do we agree then that this is not
a valid argument?

>Dividing  the total cube  error by  all cube  decisions gives  a better
>estimate of someone's  cube skill. Try it if you  don't believe me, I'd
>say. I've analysed about 400 matches and I'd say GNUBG's current method
>is worse than the old method of counting all cube decisions.

I don't understand this argument. You've analysed 400 matches and looked
at   the    cube   error   numbers.    Then   you   somehow    say   the
cubeError/totalCubeDecisions is  a "better"  measure of your  cube skill
than cubeError/actualCubeDecisions. How do you make  this judgement?

I can  say I've analysed  over 100,000 matches  in the last month  and I
think  it's better  to  use the  current cubeError/actualDecision.  That
doesn't prove anything.

Your other arguments make sense to me, but they don't smell right to me.
Let's address them one by one and see if I can poke holes in them.

1)

>One obvious, and not that rare example is when both players are 2-away.
>With winning chances for the  player on roll being anywhere between 30%
>and 70%,  it is almost  always correct to  double, but some  times it's
>optional (no market losers f.i.), and often it's a very close decision.
>If a player  doubles immediately on his first  opportunity, he's making
>no error,  but it counts  for one actual  (and close) decision  per the
>current scheme. If both players  somehow delay doubling (which can have
>its advantages), every  turn will count as a  close cube decision.  So,
>one could  have two similar matches  where 2-away both  is reached, one
>where a  player (correctly) doubles  at the first opportunity,  and one
>where f.i.  the cube (again  correctly) gets turned after 16 moves. The
>latter will  give both players  a much lower  cube error rate  with the
>current scheme, because 8 decisions are added for each player.

If  both  players  somehow   delay  doubling,  this  will  increase  the
denominator and reduce the  cube_error, ASSUMING no further mistakes are
made. BUt  if a player delayed  and became a gammon  favourite and still
doubled (incorrectly), or became an underdog and didn't realize this and
still doubled ,  then the error rate  would go up. SO it's  not like you
suggest, that by delaying you can just manipulate the cube error with no
risk.

If you  delay, you need  more skill to  avaoid error, no the  effect you
object to, that the cube_error gets smaller is correct.

2)

>If it  somehow happens that someone  plays on for the  gammon at 2-away
>both, that  is in theory  a bad plan  (double/take and you just  need a
>single win). However  GNUBG will now "reward" you  for this strategy by
>adding (almost) all  moves as actual or close  cube decisions, and this
>can instantly  add like 25  decisions for BOTH  players. In a 5  or 7pt
>match, that is  often more than all other decisions  in the match added
>up.  This certainly does not  reflect the skill involved, and is mostly
>an anomaly IMO.

Why? I often delay against doubling weaker players as they usually
know to take after I open with 3-1 and double on the next move, but
after reaching a somewhat more complicated position they need their
judgement and are more likely to make a mistake.

This is  correctly reflected in counting  every move as  a cube decision
then.

3)

>Another example is  when one player gets closed out  and is very likely
>to get gammoned  when he loses; (re)doubling then is  often just a tiny
>error, and  this might repeat for  some turns as the  other side brings
>the checkers around and bears off. Again GNUBG will add relatively many
>to the  number of  actual or  close cube decisions,  while it  does not
>really  reflect any  more  skill involved.  As  a result  this kind  of
>situation will drop one's cube  error rate by the current scheme, which
>does not seem to reflect cube skill very well.

I think it does reflect cube  skill, similar to the other examples.  You
say "doubling then is often just a  tiny error" but when it is not it is
a big error. The fact that you  don't make this big error means you have
cube skill, which is reflected in this.

All these  examples share the same  idea: by not forcing  a decision now
you  can lower  the effect  of previous  cube errors  by  increasing the
denominator, at the  risk of making additional cube  errors. That is, if
you succeed in lowering your  previous cube errors, you must have enough
skill so the lowering is actually correct.

Now  why  do   you  think  the  "old"  way,   measuring  cube-errors  by
totalError/totalCubeDecisions  is  better?   In  all your  examples  the
effect of increasing the denominator with sequences of not doubling when
you could is still there, but I argue that's OK, you NEED this effect as
it involves  skill. But  in addition to  this, the denominator  now also
increases when for example you get doubled early, take and remain a huge
underdog forthe  remainder of  the game. Every  one of  those no-brainer
"don't redouble"  decisions now dilutes  your cube errors, which  is NOT
correct.

To summarize, I  argue that your examples show the  opposite of what you
intended.


Kees

[Prev in Thread]

Current Thread

[Next in Thread]

RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/09
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/09
  - RE: [Bug-gnubg] Re: Strange FIBS ratings, Albert Silver, 2003/09/09
    - RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
    - RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
    - [Bug-gnubg] Re: ratings formula, checker play vs. cube, kvandoel, 2003/09/11
    - [Bug-gnubg] Re: ratings formula, checker play vs. cube, kvandoel, 2003/09/12
    - [Bug-gnubg] Re: ratings formula, checker play vs. cube, kvandoel <=
    - RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
    - RE: [Bug-gnubg] Re: Strange FIBS ratings, Albert Silver, 2003/09/10
    - RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
    - RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/14

Prev by Date: [Bug-gnubg] external player bug
Next by Date: RE: [Bug-gnubg] Re: The importance of METs
Previous by thread: [Bug-gnubg] Re: ratings formula, checker play vs. cube
Next by thread: RE: [Bug-gnubg] Re: Strange FIBS ratings
Index(es):
- Date
- Thread