bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Murat's first mutant cube script experiment and preliminary results


From: Mark Higgins
Subject: Re: Murat's first mutant cube script experiment and preliminary results
Date: Sun, 28 Jan 2024 22:35:27 -0500

I think one problem with your current test is that it does way too much (random) doubling. So you get huge total scores coming out, which adds loads of noise to the results and makes it hard to be confident what it means.

A simpler test might be to compare gnubg's best strategy against a "dumber" doubling strategy that, say, never offers the cube, and always takes. In that world, the cube is never larger than 2, so it fixes the noise from randomly humongous cube values.

If you do that, then you might wonder how many games those two strategies need to play against themselves to see a difference. First, you'd need to estimate the size of the signal we're trying to find. If we assume gnubg is "perfect", then anytime the dumber strategy takes when it should be a pass, it'll lose expected value, in the amount of the equity error. Make that happens once every three games, and maybe the average error size is 0.1, so we'd expect that the dumber strategy would lose, on average, about 0.03 cents per game. 

How many games do you need to simulate such that the statistical measurement error on the average score is much less than 0.03? The standard deviation of score in a regular backgammon money game is something like 1.3, IIRC; so the statistical measurement error on the average is around 1.3 / sqrt(N), where N is the number of games you play. If you want that to be, say, 0.006 (5x smaller than the 0.03 signal we're trying to find), when N would be about 50k games.

So that tells you how many games you need to play in your simulation to see if there's a measurable difference: around 50k. Maybe my numbers are wrong by a factor of 2 here or there, but that's the idea.

So you could run that and see whether the dumb strategy does, in fact, lose in head to head play against the standard; or whether it's about even, and all this fancy cube stuff is nonsense.




On Sun, Jan 28, 2024 at 10:14 PM MK <playbg-rgb@yahoo.com> wrote:
On 1/28/2024 4:29 PM, Joseph Heled wrote:

> I hope you realize you will need hundreds of thousands
> of games, millions maybe, to get statistical significance.

Okay, well, let's try to have a rational conversation
about this. (I won't quote the entire previous post.)

How many hundreds of thousands or millions of games
have you guys ran to determine that the current cube
strategy adopted by all bots and humans (except me),
is indeed the best strategy?

None. Zero. You all believe it "myth-ematically".
(Ha! Aren't I the "monster punster"..? :)

BTW: It's never too late and there is a fairly easy
way of doing this, and won't take long if done in a
spot checking manner for starters. Let me know if
you guys would want to do it and dare the outcome.

Still, I don't mind your asking from me something that
you all haven't asked from yourselves. I'll be glad to
run a hundred thousand games for starters (and more if
needed later). But before I do that, we need to agree
on certain things.

1- Will you trust my results? Probably not. Nobody in
the past openly said that they trusted the results of
my previous experiments. That's why I'm always urging
you guys to run your own tests and sharing my scripts.

2- You and preferably a "statistically significant" ;)
number of credible members of the BG community have to
commit beforehand to what kind of results will convince
you all that the current "cube skill theory" is bogus.

If such a self-destructive mutant cube strategy in this
script wins 5% against GnuBG World-Class, for example,
will it be enough to convince you all? Maybe 10%? More?
You have to name it and commit to it before I start.

I have older spare CPU's that I can run 24/7 for this.
But alternatively and preferably for more reasons than
one, several of you can run shorter sessions concurrently
and accummulate several hundreds of thousands, if not
millions, of games in a matter of a few days, with your
own results that you can trust.

How about it guys...??

BTW: this not the only mutant cube experiment I have in
mind but let's start with this one from the very bottom.

MK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]