gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] reading connections


From: Gunnar Farneback
Subject: Re: [gnugo-devel] reading connections
Date: Tue, 09 Oct 2001 21:31:47 +0200
User-agent: EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/20.7 (sparc-sun-solaris2.7) (with unibyte mode)

Tristan wrote:
> The corrected test file is at :
> 
> http://www.ai.univ-paris8.fr/~cazenave/test_golois.tar.gz

This looks much better. There are a few missing sgf files though and
some broken test cases, as discussed below.

I first made a new try with capture.tst and got these results:

  fonda 412% ../regression/regress.pike capture.tst 
     1       22     0       0    0.39 - pass 1 T7 (1 T7)
     2     8394     0       0    0.27 - pass 1 M1 (1 M1)
     3      238     0       0    0.17 - pass 0 (0)
     4      542     0       0    0.12 - FAIL 0 (1 H5)
     5     8528     0       0    0.37 - FAIL 0 (1 S3)
     6      230     0       0    0.10 - FAIL 1 N2 (1 O3)
     7        1     0       0    0.10 - FAIL 1 S2 (1 S3)
     8        1     0       0    0.16 - FAIL 1 G8 (0)
     9       23     0       0    0.16 - pass 0 (0)
    10      899     0       0    0.13 - FAIL 1 P11 (1 O11)
    11       52     0       0    0.10 - pass 0 (0)
    12        3     0       0    0.09 - pass 0 (0)
    13     4998     0       0    0.27 - FAIL 0 (1 D8)
    14       40     0       0    0.34 - pass 0 (0)
    15        3     0       0    0.24 - pass 0 (0)
    16     1632     0       0    0.36 - pass 1 A12 (1 A12)
    17        1     0       0    0.08 - FAIL 1 H5 (0)
  Total nodes: 25607 0 0
  Total time: 3.44
  Number of passed tests: 9/17
  8 unexpected failures (4, 5, 6, 7, 8, 10, 13, 17)

This is somewhat worse than I had hoped. On the other hand some of the
failures seem to only reflect a philosophical difference between these
test cases and the ones in regression/reading.tst. In e.g. test case
6, N2 also does capture the O2 string, although O3 is a better way of
doing this because it leaves fewer weaknesses on the outside. The
tactical reading of GNU Go is not expected to necessarily generate the
best attack or defense move, just to come up with one that works. Thus
GNU Go needs additional correct answer alternatives in some of the
test cases.

How well does Golois do on this test suite? Have you tried Golois on
GNU Go's reading.tst?


Next I tried vie.tst, which seems to be similar to owl.tst in scope.

fonda 413% ../regression/regress.pike vie.tst 
   1    17598     2       0    1.29 - pass 1 T16 (1 T16)
   2     9449    11       0    0.90 - FAIL 2 H4 (1 H4)
   3    38444    26       0    1.85 - pass 1 G2 (1 G2)
   4    20656    94       0    1.95 - pass 1 G2 (1 G2)
   5     8730    10       0    0.84 - pass 1 G2 (1 G2)
   6     5224     2       0    0.59 - pass 1 G2 (1 G2)
   7    65364     2       0    2.17 - pass 1 B17 (1 B17)
   8   217171   675       0   15.80 - FAIL 1 R7 (1 S9)
   9    44907    26       0    1.87 - FAIL 1 R14 (1 P15)
  10    16209    11       0    1.01 - FAIL 0 (1 P15)
  11    17700     2       0    1.00 - pass 1 T7 (1 T7)
  12     8448     2       0    0.73 - pass 1 T7 (1 T7)
  13    22375    15       0    1.31 - FAIL 0 (1 J7)
  14    10239    15       0    1.03 - pass 1 G7 (1 G7)
  15    24013     4       0    1.54 - pass 0 (0)
  16   233466   170       0    9.73 - FAIL 1 M3 (0)
  17   115894   392       0    6.67 - FAIL 1 E6 (1 E2)
  18    87856    44       0    3.70 - pass 1 F2 (1 F2)
  19   111063   139       0    5.45 - FAIL 0 (1 S8)
  20    21525    22       0    1.18 - FAIL 1 Q14 (1 H9)
  21    19166     2       0    1.05 - pass 1 M15 (1 M15)
  22   263830  1000       0   15.95 - FAIL 0 (1 T18)
  23    71149   194       0    5.12 - pass 1 B2 (1 B2)
  24    27505    94       0    2.37 - pass 0 (0)
  25    29416     6       0    1.42 - pass 1 B11 (1 B11)
  26    76684     4       0    2.36 - pass 1 M5 (1 M5)
  27    48418    81       0    2.56 - FAIL 1 F3 (1 H2)
  28    65699   142       0    2.94 - pass 0 (0)
  29     6360     1       0    0.61 - pass 0 (0)
  30   119178   346       0    4.71 - pass 0 (0)
  31     2192     1       0    0.49 - pass 0 (0)
  32     2767     2       0    0.64 - pass 1 F1 (1 F1)
  33     2565     2       0    0.55 - pass 1 B2 (1 B2)
  34     6126     1       0    0.61 - pass 0 (0)
  35    72645    10       0    2.86 - pass 1 T11 (1 T11)
  35        0     0       0   
  35        0     0       0   
  35        0     0       0   
  35        0     0       0   
  40    16757    10       0    0.85 - pass 1 M12 (1 M12)
  41    47644    62       0    2.18 - FAIL 1 A6 (1 B6)
  42    54156    78       0    2.38 - FAIL 1 A4 (1 B6)
  43    33720    13       0    1.47 - FAIL 1 D7 (1 C6)
  43        0     0       0   
  45    14153     4       0    0.73 - FAIL 1 A7 (1 B4)
  46     3279     8       0    0.21 - pass 1 J2 (1 J2)
  47    10480    12       0    0.38 - pass 1 J2 (1 J2)
  48   172301    14       0    4.69 - pass 1 S10 (1 S10)
  49    37821    37       0    1.97 - pass 0 (0)
  50   106612   259       0    6.05 - pass 0 (0)
  51    65473   100       0    2.99 - pass 0 (0)
  52    29950     2       0    1.31 - pass 1 R19 (1 R19)
  53    24981     0       0    1.01 - pass 0 (0)
  54   126576  1000       0   26.07 - FAIL 1 H19 (1 G14)
  55    50213   119       0    2.61 - pass 0 (0)
  56    15008     0       0    0.75 - pass 0 (0)
  57    89143     1       0    2.74 - pass 0 (0)
  58      844     1       0    0.15 - FAIL 0 (1 C11)
Total nodes: 2809142 5270 0
Total time: 163.38
Number of passed tests: 36/53
17 unexpected failures (2, 8, 9, 10, 13, 16, 17, 19, 20, 22, 27, 41, 42, 43, 
45, 54, 58)

As you can see I haven't put too much effort into error handling in
this regression script. The problem with tests 36--39 is that the
files LifeDeath0001.sgf to LifeDeath0004.sgf are missing. The problem
with test case 44 is that C9 is an empty point. This is a (false) eye
of a dragon so it could arguably be counted to it, but GNU Go does not
understand this. Changing C9 to C8 would solve the problem for GNU Go.
For the owl reading too, GNU Go is only supposed to come up with a
working move, not necessarily the best one in a more global sense. I
haven't checked whether that's an issue here.

There were no problems with global.tst, which includes wholeboard move
generation problems, but the results don't look all that good.

  fonda 414% ../regression/regress.pike global.tst
     1  1281100  4402       0   77.41 - FAIL F5 (B3)
     2    66454   175       0    5.38 - pass T16 (T16)
     3   124049   631       0   15.69 - pass H4 (H4)
     4  2347945 11796       0   171.58 - FAIL E18 (Q6)
     5  1062041  4424       0   73.86 - pass O4 (O4)
     6   383704   869       0   21.34 - FAIL F5 (J6)
     7  1127211  2971       0   60.10 - pass A5 (A5)
     8  1183079  3082       0   57.83 - FAIL G11 (H9)
     9  1245774  2928       0   66.75 - FAIL F2 (F4)
    10  1115352  3105       0   53.50 - FAIL P9 (A5)
    11    19495    82       0    3.57 - FAIL E8 (D8)
    12    62202   205       0    6.02 - pass F1 (F1)
    13    39186   108       0    3.91 - pass T7 (T7)
    14   825966  5410       0   117.17 - pass S9 (S9)
    15   101882   177       0    9.28 - FAIL G15 (T18)
    16   653448  3533       0   74.31 - FAIL E18 (O10)
    17   710936  2660       0   62.56 - FAIL O13 (P9)
    18   323302  1462       0   34.54 - FAIL G15 (T15)
    19     9073    48       0    2.60 - pass F14 (F14)
    20    52151     6       0    3.08 - FAIL P4 (R15)
    21   899581  6025       0   115.21 - FAIL N12 (G11)
    22   512694   702       0   24.94 - FAIL L18 (F2)
    23   355399  1050       0   23.62 - FAIL F16 (G15)
    24    51254   119       0    6.04 - pass M15 (M15)
    25   368729  1234       0   21.88 - FAIL C12 (D12)
    26  1343611  4286       0   73.81 - FAIL H4 (D12)
    27   419499   469       0   19.88 - FAIL P13 (T18)
    28   559214  1800       0   31.66 - FAIL N10 (N11)
    29   164699  1041       0   23.62 - pass B2 (B2)
    30    63100   136       0    5.71 - pass B11 (B11)
    31  1138739  3839       0   77.86 - FAIL F15 (M8)
    32    43108   120       0    4.93 - pass G8 (G8)
    33  1855335  6106       0   122.98 - pass Q9 (Q9)
    34  2645039  9879       0   209.38 - FAIL S7 (N6)
    35   415676   133       0   13.77 - FAIL S7 (M12)
    36   271315   133       0   10.24 - FAIL S7 (B6)
    37   264214   133       0   10.08 - FAIL S7 (D8)
    38   264214   133       0   10.14 - FAIL S7 (B6)
    39    53057    80       0    3.06 - pass M12 (M12)
    40   439541  1232       0   24.82 - FAIL A6 (B6)
    41   518420   948       0   25.20 - FAIL A4 (B6)
    42   272983   480       0   14.45 - pass C6 (C6)
    43    19495    82       0    3.70 - FAIL E8 (D8)
    44   846954  4384       0   47.19 - FAIL C3 (G4)
    45    93724   185       0    4.92 - FAIL A7 (B4)
    46   147312   248       0    7.04 - pass M4 (M4)
    47    69925    74       0    3.51 - FAIL K5 (L8)
    48  1459427  5802       0   68.80 - pass D5 (D5)
    49   647230  2471       0   30.71 - pass D5 (D5)
    50    55666    84       0    1.85 - pass J2 (J2)
  Total nodes: 28993504 101482 0
  Total time: 1961.48
  Number of passed tests: 19/50
  31 unexpected failures (1, 4, 6, 8, 9, 10, 11, 15, 16, 17, 18, 20, 21, 22, 
23, 25, 26, 27, 28, 31, 34, 35, 36, 37, 38, 40, 41, 43, 44, 45, 47)


Finally, and not very surprisingly, connect.tst fails almost
everywhere (using the new readconnect.c). The biggest surprise may be
that it actually manages to pass one test case somehow. There's a lot
of potential for quick improvements here. :-)

The first step might be to revise the code to keep track of the
effective move so it doesn't just return PASS. Tristan, do you want
help with developing the code, or is it better if we don't touch it
more than necessary for the time being?

  fonda 415% ../regression/regress.pike connect.tst 
     1        0     0       0    0.31 - FAIL 1 PASS (1 Q6)
     2        0     0       0    0.09 - FAIL 1 PASS (0)
     3        0     0       0    0.09 - FAIL 1 PASS (0)
     4        0     0       0    0.09 - FAIL 1 PASS (0)
     5        0     0       0    0.08 - FAIL 1 PASS (0)
     6        0     0       0    0.08 - FAIL 1 PASS (0)
     7        0     0       0    0.09 - FAIL 1 PASS (0)
     8        0     0       0    0.09 - FAIL 1 PASS (0)
     9        0     0       0    0.08 - pass 0 (0)
    10        0     0       0    0.08 - FAIL 1 PASS (1 A6)
    11        0     0       0    0.08 - FAIL 0 (1 B7)
    12        0     0       0    0.08 - FAIL 1 PASS (0)
    13        0     0       0    0.09 - FAIL 1 PASS (1 B1)
    14        0     0       0    0.09 - FAIL 0 (1 F5)
    15        0     0       0    0.08 - FAIL 0 (1 B7)
    16        0     0       0    0.09 - FAIL 0 (1 D5)
    17        0     0       0    0.13 - FAIL 1 PASS (1 N11)
    18        0     0       0    0.15 - FAIL 1 PASS (1 N11)
    19        0     0       0    0.13 - FAIL 1 PASS (1 P13)
    20        0     0       0    0.13 - FAIL 1 PASS (1 O12)
    21        0     0       0    0.13 - FAIL 1 PASS (1 O12)
    22        0     0       0    0.13 - FAIL 0 (1 O12)
    23        0     0       0    0.12 - FAIL 1 PASS (1 M7)
    24        0     0       0    0.11 - FAIL 1 PASS (0)
    25        0     0       0    0.11 - FAIL 0 (1 G1)
    26        0     0       0    0.13 - FAIL 0 (1 F1)
    27        0     0       0    0.12 - FAIL 1 PASS (0)
    28        0     0       0    0.13 - FAIL 1 PASS (1 J7)
    29        0     0       0    0.10 - FAIL 1 PASS (0)
    30        0     0       0    0.11 - FAIL 1 PASS (0)
    31        0     0       0    0.15 - FAIL 1 PASS (1 L10)
    32        0     0       0    0.07 - FAIL 1 PASS (0)
    33        0     0       0    0.14 - FAIL 1 PASS (1 F14)
    34        0     0       0    0.11 - FAIL 1 PASS (1 R7)
    35        0     0       0    0.11 - FAIL 1 PASS (0)
    36        0     0       0    0.08 - FAIL 1 PASS (0)
    37        0     0       0    0.08 - FAIL 1 PASS (0)
    38        0     0       0    0.11 - FAIL 1 PASS (1 G11)
    39        0     0       0    0.13 - FAIL 1 PASS (1 G15)
    40        0     0       0    0.16 - FAIL 1 PASS (0)
    41        0     0       0    0.10 - FAIL 1 PASS (1 G15)
    42        0     0       0    0.16 - FAIL 1 PASS (1 M15)
    43        0     0       0    0.10 - FAIL 1 PASS (1 H2)
    44        0     0       0    0.08 - FAIL 1 PASS (1 H18)
    45        0     0       0    0.09 - FAIL 1 PASS (1 G17)
    46        0     0       0    0.13 - FAIL 1 PASS (1 B4)
    47        0     0       0    0.15 - FAIL 1 PASS (0)
    48        0     0       0    0.10 - FAIL 1 PASS (0)
    49        0     0       0    0.10 - FAIL 1 PASS (1 C13)
    49        0     0       0   
    51        0     0       0    0.17 - FAIL 1 PASS (1 O15)
    52        0     0       0    0.09 - FAIL 1 PASS (1 C13)
    53        0     0       0    0.12 - FAIL 1 PASS (0)
    54        0     0       0    0.09 - FAIL 1 PASS (0)
    55        0     0       0    0.09 - FAIL 1 PASS (0)
    56        0     0       0    0.09 - FAIL 1 PASS (0)
    56        0     0       0   
    56        0     0       0   
    59        0     0       0    0.35 - FAIL 1 PASS (1 N10)
    60        0     0       0    0.09 - FAIL 1 PASS (1 G16)
    61        0     0       0    0.09 - FAIL 1 PASS (1 P5)
    62        0     0       0    0.14 - FAIL 1 PASS (1 G8)
    63        0     0       0    0.13 - FAIL 1 PASS (1 G8)
    64        0     0       0    0.14 - FAIL 1 PASS (1 G8)
    65        0     0       0    0.14 - FAIL 1 PASS (1 G8)
    66        0     0       0    0.11 - FAIL 1 PASS (1 K1)
    67        0     0       0    0.16 - FAIL 1 PASS (0)
    68        0     0       0    0.07 - FAIL 0 (1 E11)
    69        0     0       0    0.11 - FAIL 0 (1 E11)
    70        0     0       0    0.09 - FAIL 1 PASS (0)
    71        0     0       0    0.09 - FAIL 0 (1 P4)
    72        0     0       0    0.10 - FAIL 1 PASS (1 O11)
    73        0     0       0    0.12 - FAIL 1 PASS (0)
    74        0     0       0    0.08 - FAIL 1 PASS (1 Q17)
    75        0     0       0    0.08 - FAIL 0 (1 R18)
    76        0     0       0    0.08 - FAIL 1 PASS (1 J3)
    77        0     0       0    0.08 - FAIL 1 PASS (1 J2)
    78        0     0       0    0.08 - FAIL 1 PASS (0)
    79        0     0       0    0.14 - FAIL 1 PASS (0)
  Total nodes: 0 0 0
  Total time: 8.62
  Number of passed tests: 1/76
  75 unexpected failures (1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 59, 
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79)

The failure in test case 50 seems to be an erroneous test case:

loadsgf Handtalk980819-1.sgf
50 disconnect a2 a2
#? [1 a2]

There is no stone at a2 and disconnecting a2 from a2 doesn't make much
sense. Test cases 57 and 58 seem to have identical problems.

/Gunnar



reply via email to

[Prev in Thread] Current Thread [Next in Thread]