gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] Patch arend_1_14.2 -- Missing endgame pattern


From: Gunnar Farneback
Subject: Re: [gnugo-devel] Patch arend_1_14.2 -- Missing endgame pattern
Date: Mon, 12 Nov 2001 22:25:12 +0100
User-agent: EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/20.7 (sparc-sun-solaris2.7) (with unibyte mode)

Arend wrote:
> Maybe you could choose what you think is likely to be better?

A rule of thumb is to guide the engine as far forward as possible (as
long as it is an unbranched sequence), which is an argument for the
defend_both construction. I think cases where other moves than the
basic cut need to be considered are rare enough that they may warrant
more specific patterns.

> I have another very general question about GnuGo's move valuation.
> 
> There are basically two ways two asign a value to a move, I think the
> technical terms are miai counting and deiri counting. The first compares
> the value of the position before the move was made with the value
> of the position after the move was made; the other is to compare the value
> of the position after a black move with the value of the position 
> after a white move played in the same area.
> If everything is gote, the second way produces values that are twice
> as high.
> 
> If I do not misunderstand the code, GnuGo is inconsistent here. 
> Moves that (OWL-)attack or defend a dragon XYZ produce a value of
> 2 * dragon[XYZ].effective_size. Unless the opponent can save/kill the
> respective dragon in sente, this is deiri counting.
> OTOH, influence_delta_territory clearly uses miai counting.
> 
> Am I missing something?

Unfortunately I'm mostly ignorant about these terms, so I'll just
describe how things are supposed to work and let others determine the
technical terms and whether things are inconsistent.

To begin with, the single purpose of the move valuation is to try to
rank the moves so that the best move gets the highest score (well,
there's the problem of determinacy too). In principle these values
could be arbitrary, but in order to make it easier to evaluate how
well the valuation performs, not to mention simplify the tuning, we
try to assign values which are consistent with the points on the
board.

This valuation basically follows the principles in e.g. "The Endgame"
by Ogawa and Davies. A move like * below is worth one point in gote

OOOX
O.*X
OOOX

and given the value 1. A move in sente or reverse sente is counted
double, e.g. the move at * here

OOOOOOOXXX
O.*XXXX.X
OOOOOOOXXX

would be one point sente for O and one point reverse sente for X and
should be given the value 2. Double sente moves should be valued much
higher than their nominal value, which we discuss later.

The territorial valuation done by the influence function uses the idea
that if O makes a move at (pos), we first assign territory under the
assumption that X moves first in all local positions in the original
position and then under the assumption that X moves first in all local
positions after O having made the move at (pos). These two territory
assignments are compared and the difference gives the territorial
value of the move. To give an example, consider this position where we
want to estimate the value of an O move at *:

OOOXXX
..OX..
..OX..
...*..
------

Before the move we assume X moves first in the local position (and
that O has to connect), which gives territory like this (lower case
letter identify territory for each player):

OOOXXX
ooOXxx
o.OXxx
o...xx
------

Then we let O make the move at * and assume X moves first again next.
The territory then becomes (X is also assumed to have to connect):

OOOXXX
ooOXxx
ooOX.x
oo.O.x
------

We see that this makes a difference in territory of 4, which is what
influence_delta_territory() should report. Then we have the question
of sente, reverse sente and double sente. This is implemented by the
concepts of followup_value (sente) and reverse_followup_value (reverse
sente) and is a lot less exact. Basically certain patterns try to
detect whether there are more points to gain if we are allowed another
move in a row and add a followup_value. Other patterns try to decide
whether the opponent would have more points to gain by two moves in a
row in the local area and add a reverse_followup_value. These followup
values are added to the territorial value so that a sente move (high
followup_value, no or low reverse_followup_value) and reverse sente
move (no or low followup_value, high reverse_followup_value) both get
about double the value, while a double sente move (high followup_value
+ high reverse_followup_value) gets still more. The double sente value
computation is probably not very good.

To give an example of territorial value where something is captured,
consider the O move at * here,

XXXXXXXO
X.OOOOXO
X.O..O*O
--------

As before we first let the influence function determine territory
assuming X moves first, i.e. with a captured group:

XXXXXXXO
XxyyyyXO
Xxyxxy.O
--------

Here y indicates X territory + captured stone, i.e. these count for
two points. After the O move at * we instead get

XXXXXXXO
X.OOOOXO
X.OooOOO
--------

and we see that X has 16 territory fewer and O has two territory more,
for a total difference of 18 points.

That the influence function counts the value of captured stones is new
from 3.1.13. Previously this was instead done using the effective_size
heuristic. The effective size is the number of stones plus the
surrounding empty spaces which are closer to this string or dragon
than to any other stones. Here the O string would thus have effective
size 6 (number of stones) + 2 (interior eye) + 2*0.5 (the two empty
vertices to the left of the string, split half each with the
surrounding X string) + 1*0.33 (the connection point, split between
three strings) = 9.33. As noted this value was doubled, giving 18.67
which is reasonably close to the correct value of 18. The effective
size heuristic is still used in certain parts of the move valuation
where we can't easily get a more accurate value from the influence
function.

This account is simplified to the extent that it ignores the moyo and
area concepts (areas of influence which are not secure territory),
strategical value (weakening and strengthening of other stones on the
board), various other issues in the move valuation, and the fact that
the influence function isn't quite as well tuned as the examples above
may indicate. But it should give a fairly good idea of how the design
is intended. This information should probably be added to the docs
somewhere, but I would really appreciate if someone else could edit it
in there.

/Gunnar



reply via email to

[Prev in Thread] Current Thread [Next in Thread]