[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Murat's first mutant cube script experiment and preliminary results
From: |
MK |
Subject: |
Re: Murat's first mutant cube script experiment and preliminary results |
Date: |
Sun, 28 Jan 2024 22:04:50 -0700 |
User-agent: |
Mozilla Thunderbird |
On 1/28/2024 8:35 PM, Mark Higgins wrote:
Hi Mark,
Thanks for taking interest and commenting in the subject
I'm not going to negate anything you said but will reply
by further explaining my experiments. I have a feeling
that once you understand better, you will be able to
contribute a lot more ideas.
I think one problem with your current test is that it
does way too much (random) doubling. So you get huge
total scores coming out, which adds loads of noise to
the results and makes it hard to be confident what it
means.
As I said previously, I was first going to replicate an
experiment done in RGB a few years ago, upon my urging,
by a mathematician, in which the mutant doubled at > 50%,
took at > 0% and never dropped.
One of the many threads in RGB you may want to read on
this is:
https://groups.google.com/g/rec.games.backgammon/c/k61QtBwlsBk/m/EGa4NXdmAgAJ
In which he had said:
"To summarize: Like for the expected value of a single
"game (as shown previously), we have a Petersburg
"Paradox occuring for the lead of GNU Backgammon in a
"session of such games, so the expected value of this
"lead does not exist (base of the exponential term > 1
"for the math people, "oscillations too wild" for the
"non-math people).
I have the script for that and will do that experiment later.
But first I wanted to do an even worse mutant experiment on
purpose, causing extremely high, unlimited cube values. Yes,
in this case it will take a lot more trials to derive any
meaning out of the experiment but I think it's useful as the
lowest starting point.
A simpler test might be to compare gnubg's best strategy > against a "dumber"
doubling strategy that, say, never
offers the cube, and always takes.
Well, this is an idea that may be worth trying. It will
force the games to be played out but it won't be as useful
for the effort of debunking the "cube skill theory", which
claims among other things that beavers and raccoons require
even more skill than simple doubles/takes.
BTW: I'm not saying that there is no cube skill at all but
that it's way too exaggerated. I'm arguing that early in the
game, cubeful equities are so inaccurate that cube skill is
pretty much non-existent. It becomes really decisive mostly
towards the final moves of the game.
If we assume gnubg is "perfect", then anytime the dumber
strategy takes when it should be a pass, it'll lose
expected value, in the amount of the equity error. ....
so we'd expect that the dumber strategy would lose, on
average, about 0.03 cents per game.
One of the goal of my experiments is to show that even the
"dumbest" cube mutant will win more than what would be
expected from its error rate. This is to show that equity
and error calculations are inaccurate of unknown amount,
at least some but maybe beyond way beyond belief.
I have less dumb mutant experiment and then one mutant cube
strategy that I have concocted, which I believe will not only
win more than expected from its error rate but actually win
more than GnuBG World Class.
I want to do these experiment in order of worse to better.
The standard deviation of score in a regular backgammon
money game is something like 1.3, IIRC; so the statistical
measurement error on the average is around 1.3 / sqrt(N),
where N is the number of games you play. If you want that
to be, say, 0.006 (5x smaller than the 0.03 signal we're
trying to find), when N would be about 50k games.
These are great comments but to be honest, I'm struggling to
understand how do they apply to what I'm trying to do. If we
keep communicating, I may come to fully appreciate.
So you could run that and see whether the dumb strategy does,
in fact, lose in head to head play against the standard; or
whether it's about even, and all this fancy cube stuff is nonsense.
Whether they lose by a lot, or by a little less, or come out
even, or BG gods for come out on top, I'm hoping that all my
mutant cube strategies will poke holes of different sizes in
the current so-called "cube skill theory" (how dare anyone
can call it "theory" is another question). Measuring the size
of the holes will come later and perhaps will be done better
by others than myself. I just want to at least provide the
data.
The only "best/perfect cube strategy" I will accept will come
from training the bots through cubeful and "matchful" (a term
I coined) self-play, instead of extrapolating cubeful equities
by applying "untested" formulas to cubeless equities and
extrapolating matchfull equities by applying MET's to cubeless
equities...
I only want better bots. But to create a need for them, I must
first try to destroy the mediocre offsprings of TD-Gammon v2.
(TD-Gammon v1 was okay. It became human-biased later).
MK