bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Cubeful and matchful training a BG bot


From: Murat K
Subject: Cubeful and matchful training a BG bot
Date: Wed, 10 Apr 2024 15:48:04 -0600
User-agent: Mozilla Thunderbird

Hi Ian,

Since this specific subject also strayed from the value of
cube ownership, I'll do the same with it by creating a new
thread and posting it also to RGB and Bgonline. My response
to you is below the quoted posts.

> ---------------------------------------------------------
> *From:* MK <playbg-rgb@yahoo.com>
> *Sent:* Wednesday, April 3, 2024 10:01:17 PM
> *To:* Ian Shaw <Ian.Shaw@riverauto.co.uk>; GnuBg Bug <bug-gnubg@gnu.org>
> *Subject:* Re: Interesting question/experiment about value of cube ownership
> On 4/2/2024 5:13 AM, Ian Shaw wrote:
>
>> What would be your proposed structure for training a
>> cubeful bot? What gains and obstacles do you foresee.
>
> I don't know what you mean by "structure". What I propose
> is doing the same thing done training TD-Gammon v.1, i.e.
> random self-play, but this time also cubeful and matchful,
> i.e. random cube as well as checker decisions.
>
> Apparently Tseauro still works at IBM with access to huge
> CPU powers. Perhaps he can be put to shame for the damage
> he caused to BG AI by what he did with TD-Gammon v.2 and
> be urged to redeem himself.
>
> In other forums, people talk about doing "XG rollouts on
> Amazon's cloud servers", etc. Doing more biased rollouts
> is plain stupid/illogical. Any such efforts would be put
> to better use in training a new bot instead. The question
> is who would volunteer to do it.
>
> People like the Alpha-Zero team, etc. don't seem to want
> to touch "gamblegammon" with a ten feet pole, possibly
> because of the gambling nature of the game.
>
> In the past, I have suggested in RGB that random rollout
> feature can be added to GnuBG and results from trustable
> users can be collected over time in a central database
> to gradually create a bot that won't rely on concocted,
> biased/inaccurate cube formulas and match equity tables.
>
> Unfortunately the faithfuls are happy with their dogmas
> and no better bots are likely in the near future... :(

> ----------------------------------------------------------

On 4/3/2024 11:44 PM, Ian Shaw wrote:
> MK: What I PROPOSE is doing the same thing done training
> TD-Gammon v.1, I.E. random self-play, but this time also
> cubeful and MATCHFUL, i.e. random cube as well as checker
> decisions.
>
> As I remember it (though it's many years since I read the
> research), the self-play wasn't accomplished by picking
> random moves. It was the initial network weights that were
> random. The move picked was the best-ranked move of all
> the evaluated moves. This is a calculation, not a random
> selection.
>
> How do you propose to rank double vs no double, and take
> vs pass?

> ----------------------------------------------------------

I didn't say the selection was random. The self-play moves
were random. There were no "calculations" either. Moves were
compared and better performing ones rose up in rank. It was
kind of a "bubble sorting" of large numbers of statistical
data. I remember that Tom Keith had used the expression
"percolating up" in describing how he trained a Hypergammon
bot through cubeless random self-play. It's the only way,
(using "empirical data and scientific method"), to train a
"non-human-biased" BG bot, (at least as best as technically,
minimally as possible).

To answer your last question, just like checker decisions,
cube decisions to double, take, pass, etc. would be random
also and the "correct" cube decisions would "bubble up" the
same way. It will take huge amounts of computing power and
time, but nowadays we have both.

For "matchful" play, checker and/or cube decisions based on
match score need to be random as well, even if that requires
exponentially more computing power and time. Again, we have
both. It's just a matter of whether we want to do it. We can
distribute the task and/or spread it over time to let the
empirical, statistical data trickle in and accumulate.

Perhaps other people more knowledgeable in bot training can
suggest ways to go about it in more technical details.

MK


reply via email to

[Prev in Thread] Current Thread [Next in Thread]