bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New backgammon engine: wildbg


From: Murat K
Subject: Re: New backgammon engine: wildbg
Date: Fri, 1 Dec 2023 15:21:46 -0700
User-agent: Mozilla Thunderbird



On 11/22/2023 5:05 AM, Carsten Wenderdel wrote:

As far as I know TD-Gammon was initially trained through random cubeless, 
matchless, (even backgammonless) self-play; that is only single games with 1 
and 2 point wins.
Is this how you are training your bot?

Only the neural net inputs are „TD“. The outputs also include backgammon 
probabilities.

Okay, real vs extrapolated backgammon probabilities is an improvement.

No TD-learning or other reinforcement learning is used. Instead I use cubeless 
money game rollouts, followed by supervised learning.

I don't understand the concept of "cubeless money game". How is that different than a "plain single game"? (which some people erroneously call a "1-pointer" even though it allows 2 and 3 point wins??)

Alpha Zero uses the same net for evaluation of positions and moves. For that 
you need thousands of outputs to encode each legal move. I don’t see how this 
is feasible in backgammon.
I currently don’t plan to do anything than cubeless money game rollouts. If 
someone with programming skills wants to try something different themselves, 
I’m willing to help, for example providing the move generation code.

Perhaps my usage of "Alpha-Zero" was misleading/misunderstood. I meant "learning through all random decisions without human bias injected" through cube equity formulas or match equity tables.

With this hopefully clarifying it, why not train your bot using "cubeful money game rollouts"? It should be feasible since rollouts with random cube decisions as well as random checker decisions go real fast. This will give real vs extrapolated cubeful equities also. To me "cubeless money game" sounds nonsensical if not oxymoron.

I was concurrently suggesting that even if you don't train your bot now using random cube and checker decisions, you should later provide in your bot the capability of doing such random rollouts at least one position at a time, in order to see how an "unbiased bot" would have played it.

MK


reply via email to

[Prev in Thread] Current Thread [Next in Thread]