[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Murat's first mutant cube script experiment and preliminary results
From: |
MK |
Subject: |
Re: Murat's first mutant cube script experiment and preliminary results |
Date: |
Wed, 31 Jan 2024 11:28:52 -0700 |
User-agent: |
Mozilla Thunderbird |
Hello again Mark,
I already have a script for my second experiment but I still
want to run the first one for 50,000-100,000 games just to
see what the results will be.
I improved my scripts to keep track of the error rates and
the number of cube decisions, so that I can make more sense
out of the results, doing better calculations.
For that, I would like confirm that I understand your below
previous comments correctly, if you don't mind clarifying.
On 1/28/2024 8:35 PM, Mark Higgins wrote:
Make that happens once every three games, and maybe the
average error size is 0.1, so we'd expect that the dumber
strategy would lose, on average, about 0.03 cents per game.
By "0.03 cents" above, did you mean "0.03 percent"? I don't
understand equities and points lost, etc. as well as winning
chances and results in percentages. Could you reword your
comments with that in mind?
How many games do you need to simulate such that the
statistical measurement error on the average score is much
less than 0.03? The standard deviation of score in a regular
backgammon money game is something like 1.3, IIRC;
Is this the generally accepted "imperfectness" of bots? For
well over 10 years, I asked people how much "imperfectness"
they actually mean by that word but I never got any answers.
It must mean "nearly perfect", i.e. very close to 100%, but
what is that actual magic number? 1%, 5%, 10%, 25%, 49.9%??
At what point does the cube skill stop being nearly perfect
and becomes garbage?
Can you clarify what does your above 1.3 mean in this context?
the statistical measurement error on the average is around
1.3 / sqrt(N), where N is the number of games you play. If
you want that to be, say, 0.006 (5x smaller than the 0.03
signal we're trying to find), when N would be about 50k games.
So you could run that and see whether the dumb strategy
does in fact, lose in head to head play against the standard;
or whether it's about even, and all this fancy cube stuff is
nonsense.
In this particular experiment, mutant strategy most likely
will lose big. My next mutant probably will still lose but
by less and in my mutant after that may come out even. In all
of my experiments, what I will be looking at is how much the
mutant strategies lose or win compared to what is expected
from their cube error rates, rather than if they win > 50%,
because that will be enough to show that bots' equity and
error calculations are wrong.
XG does calculate a "should have won" number, (if you are
familiar with it). In order to calculate the same, I am
extracting from the "match info" the "total-cube" (i.e. the
number of actual cube actions), "error-skill" and "error-cost"
(i.e. the normalized and un-normalized cube errors per cube
decision). With those, I assume I will be able to calculate
the total amount of errors for the entire 50,000 games(?)
At this point, I don't understand, thus don't know if I need
to use the normalized or un-normalized error rate but I hope
I will figure it out while the session is running. Any help
with this will be appreciated.
Also, I have no idea about how to calculate a "should have
won" number like in XG. Can you or someone else here help me
with that by chance..?
MK