[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnugo-devel] reading connections
From: |
Gunnar Farneback |
Subject: |
Re: [gnugo-devel] reading connections |
Date: |
Tue, 09 Oct 2001 21:31:47 +0200 |
User-agent: |
EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/20.7 (sparc-sun-solaris2.7) (with unibyte mode) |
Tristan wrote:
> The corrected test file is at :
>
> http://www.ai.univ-paris8.fr/~cazenave/test_golois.tar.gz
This looks much better. There are a few missing sgf files though and
some broken test cases, as discussed below.
I first made a new try with capture.tst and got these results:
fonda 412% ../regression/regress.pike capture.tst
1 22 0 0 0.39 - pass 1 T7 (1 T7)
2 8394 0 0 0.27 - pass 1 M1 (1 M1)
3 238 0 0 0.17 - pass 0 (0)
4 542 0 0 0.12 - FAIL 0 (1 H5)
5 8528 0 0 0.37 - FAIL 0 (1 S3)
6 230 0 0 0.10 - FAIL 1 N2 (1 O3)
7 1 0 0 0.10 - FAIL 1 S2 (1 S3)
8 1 0 0 0.16 - FAIL 1 G8 (0)
9 23 0 0 0.16 - pass 0 (0)
10 899 0 0 0.13 - FAIL 1 P11 (1 O11)
11 52 0 0 0.10 - pass 0 (0)
12 3 0 0 0.09 - pass 0 (0)
13 4998 0 0 0.27 - FAIL 0 (1 D8)
14 40 0 0 0.34 - pass 0 (0)
15 3 0 0 0.24 - pass 0 (0)
16 1632 0 0 0.36 - pass 1 A12 (1 A12)
17 1 0 0 0.08 - FAIL 1 H5 (0)
Total nodes: 25607 0 0
Total time: 3.44
Number of passed tests: 9/17
8 unexpected failures (4, 5, 6, 7, 8, 10, 13, 17)
This is somewhat worse than I had hoped. On the other hand some of the
failures seem to only reflect a philosophical difference between these
test cases and the ones in regression/reading.tst. In e.g. test case
6, N2 also does capture the O2 string, although O3 is a better way of
doing this because it leaves fewer weaknesses on the outside. The
tactical reading of GNU Go is not expected to necessarily generate the
best attack or defense move, just to come up with one that works. Thus
GNU Go needs additional correct answer alternatives in some of the
test cases.
How well does Golois do on this test suite? Have you tried Golois on
GNU Go's reading.tst?
Next I tried vie.tst, which seems to be similar to owl.tst in scope.
fonda 413% ../regression/regress.pike vie.tst
1 17598 2 0 1.29 - pass 1 T16 (1 T16)
2 9449 11 0 0.90 - FAIL 2 H4 (1 H4)
3 38444 26 0 1.85 - pass 1 G2 (1 G2)
4 20656 94 0 1.95 - pass 1 G2 (1 G2)
5 8730 10 0 0.84 - pass 1 G2 (1 G2)
6 5224 2 0 0.59 - pass 1 G2 (1 G2)
7 65364 2 0 2.17 - pass 1 B17 (1 B17)
8 217171 675 0 15.80 - FAIL 1 R7 (1 S9)
9 44907 26 0 1.87 - FAIL 1 R14 (1 P15)
10 16209 11 0 1.01 - FAIL 0 (1 P15)
11 17700 2 0 1.00 - pass 1 T7 (1 T7)
12 8448 2 0 0.73 - pass 1 T7 (1 T7)
13 22375 15 0 1.31 - FAIL 0 (1 J7)
14 10239 15 0 1.03 - pass 1 G7 (1 G7)
15 24013 4 0 1.54 - pass 0 (0)
16 233466 170 0 9.73 - FAIL 1 M3 (0)
17 115894 392 0 6.67 - FAIL 1 E6 (1 E2)
18 87856 44 0 3.70 - pass 1 F2 (1 F2)
19 111063 139 0 5.45 - FAIL 0 (1 S8)
20 21525 22 0 1.18 - FAIL 1 Q14 (1 H9)
21 19166 2 0 1.05 - pass 1 M15 (1 M15)
22 263830 1000 0 15.95 - FAIL 0 (1 T18)
23 71149 194 0 5.12 - pass 1 B2 (1 B2)
24 27505 94 0 2.37 - pass 0 (0)
25 29416 6 0 1.42 - pass 1 B11 (1 B11)
26 76684 4 0 2.36 - pass 1 M5 (1 M5)
27 48418 81 0 2.56 - FAIL 1 F3 (1 H2)
28 65699 142 0 2.94 - pass 0 (0)
29 6360 1 0 0.61 - pass 0 (0)
30 119178 346 0 4.71 - pass 0 (0)
31 2192 1 0 0.49 - pass 0 (0)
32 2767 2 0 0.64 - pass 1 F1 (1 F1)
33 2565 2 0 0.55 - pass 1 B2 (1 B2)
34 6126 1 0 0.61 - pass 0 (0)
35 72645 10 0 2.86 - pass 1 T11 (1 T11)
35 0 0 0
35 0 0 0
35 0 0 0
35 0 0 0
40 16757 10 0 0.85 - pass 1 M12 (1 M12)
41 47644 62 0 2.18 - FAIL 1 A6 (1 B6)
42 54156 78 0 2.38 - FAIL 1 A4 (1 B6)
43 33720 13 0 1.47 - FAIL 1 D7 (1 C6)
43 0 0 0
45 14153 4 0 0.73 - FAIL 1 A7 (1 B4)
46 3279 8 0 0.21 - pass 1 J2 (1 J2)
47 10480 12 0 0.38 - pass 1 J2 (1 J2)
48 172301 14 0 4.69 - pass 1 S10 (1 S10)
49 37821 37 0 1.97 - pass 0 (0)
50 106612 259 0 6.05 - pass 0 (0)
51 65473 100 0 2.99 - pass 0 (0)
52 29950 2 0 1.31 - pass 1 R19 (1 R19)
53 24981 0 0 1.01 - pass 0 (0)
54 126576 1000 0 26.07 - FAIL 1 H19 (1 G14)
55 50213 119 0 2.61 - pass 0 (0)
56 15008 0 0 0.75 - pass 0 (0)
57 89143 1 0 2.74 - pass 0 (0)
58 844 1 0 0.15 - FAIL 0 (1 C11)
Total nodes: 2809142 5270 0
Total time: 163.38
Number of passed tests: 36/53
17 unexpected failures (2, 8, 9, 10, 13, 16, 17, 19, 20, 22, 27, 41, 42, 43,
45, 54, 58)
As you can see I haven't put too much effort into error handling in
this regression script. The problem with tests 36--39 is that the
files LifeDeath0001.sgf to LifeDeath0004.sgf are missing. The problem
with test case 44 is that C9 is an empty point. This is a (false) eye
of a dragon so it could arguably be counted to it, but GNU Go does not
understand this. Changing C9 to C8 would solve the problem for GNU Go.
For the owl reading too, GNU Go is only supposed to come up with a
working move, not necessarily the best one in a more global sense. I
haven't checked whether that's an issue here.
There were no problems with global.tst, which includes wholeboard move
generation problems, but the results don't look all that good.
fonda 414% ../regression/regress.pike global.tst
1 1281100 4402 0 77.41 - FAIL F5 (B3)
2 66454 175 0 5.38 - pass T16 (T16)
3 124049 631 0 15.69 - pass H4 (H4)
4 2347945 11796 0 171.58 - FAIL E18 (Q6)
5 1062041 4424 0 73.86 - pass O4 (O4)
6 383704 869 0 21.34 - FAIL F5 (J6)
7 1127211 2971 0 60.10 - pass A5 (A5)
8 1183079 3082 0 57.83 - FAIL G11 (H9)
9 1245774 2928 0 66.75 - FAIL F2 (F4)
10 1115352 3105 0 53.50 - FAIL P9 (A5)
11 19495 82 0 3.57 - FAIL E8 (D8)
12 62202 205 0 6.02 - pass F1 (F1)
13 39186 108 0 3.91 - pass T7 (T7)
14 825966 5410 0 117.17 - pass S9 (S9)
15 101882 177 0 9.28 - FAIL G15 (T18)
16 653448 3533 0 74.31 - FAIL E18 (O10)
17 710936 2660 0 62.56 - FAIL O13 (P9)
18 323302 1462 0 34.54 - FAIL G15 (T15)
19 9073 48 0 2.60 - pass F14 (F14)
20 52151 6 0 3.08 - FAIL P4 (R15)
21 899581 6025 0 115.21 - FAIL N12 (G11)
22 512694 702 0 24.94 - FAIL L18 (F2)
23 355399 1050 0 23.62 - FAIL F16 (G15)
24 51254 119 0 6.04 - pass M15 (M15)
25 368729 1234 0 21.88 - FAIL C12 (D12)
26 1343611 4286 0 73.81 - FAIL H4 (D12)
27 419499 469 0 19.88 - FAIL P13 (T18)
28 559214 1800 0 31.66 - FAIL N10 (N11)
29 164699 1041 0 23.62 - pass B2 (B2)
30 63100 136 0 5.71 - pass B11 (B11)
31 1138739 3839 0 77.86 - FAIL F15 (M8)
32 43108 120 0 4.93 - pass G8 (G8)
33 1855335 6106 0 122.98 - pass Q9 (Q9)
34 2645039 9879 0 209.38 - FAIL S7 (N6)
35 415676 133 0 13.77 - FAIL S7 (M12)
36 271315 133 0 10.24 - FAIL S7 (B6)
37 264214 133 0 10.08 - FAIL S7 (D8)
38 264214 133 0 10.14 - FAIL S7 (B6)
39 53057 80 0 3.06 - pass M12 (M12)
40 439541 1232 0 24.82 - FAIL A6 (B6)
41 518420 948 0 25.20 - FAIL A4 (B6)
42 272983 480 0 14.45 - pass C6 (C6)
43 19495 82 0 3.70 - FAIL E8 (D8)
44 846954 4384 0 47.19 - FAIL C3 (G4)
45 93724 185 0 4.92 - FAIL A7 (B4)
46 147312 248 0 7.04 - pass M4 (M4)
47 69925 74 0 3.51 - FAIL K5 (L8)
48 1459427 5802 0 68.80 - pass D5 (D5)
49 647230 2471 0 30.71 - pass D5 (D5)
50 55666 84 0 1.85 - pass J2 (J2)
Total nodes: 28993504 101482 0
Total time: 1961.48
Number of passed tests: 19/50
31 unexpected failures (1, 4, 6, 8, 9, 10, 11, 15, 16, 17, 18, 20, 21, 22,
23, 25, 26, 27, 28, 31, 34, 35, 36, 37, 38, 40, 41, 43, 44, 45, 47)
Finally, and not very surprisingly, connect.tst fails almost
everywhere (using the new readconnect.c). The biggest surprise may be
that it actually manages to pass one test case somehow. There's a lot
of potential for quick improvements here. :-)
The first step might be to revise the code to keep track of the
effective move so it doesn't just return PASS. Tristan, do you want
help with developing the code, or is it better if we don't touch it
more than necessary for the time being?
fonda 415% ../regression/regress.pike connect.tst
1 0 0 0 0.31 - FAIL 1 PASS (1 Q6)
2 0 0 0 0.09 - FAIL 1 PASS (0)
3 0 0 0 0.09 - FAIL 1 PASS (0)
4 0 0 0 0.09 - FAIL 1 PASS (0)
5 0 0 0 0.08 - FAIL 1 PASS (0)
6 0 0 0 0.08 - FAIL 1 PASS (0)
7 0 0 0 0.09 - FAIL 1 PASS (0)
8 0 0 0 0.09 - FAIL 1 PASS (0)
9 0 0 0 0.08 - pass 0 (0)
10 0 0 0 0.08 - FAIL 1 PASS (1 A6)
11 0 0 0 0.08 - FAIL 0 (1 B7)
12 0 0 0 0.08 - FAIL 1 PASS (0)
13 0 0 0 0.09 - FAIL 1 PASS (1 B1)
14 0 0 0 0.09 - FAIL 0 (1 F5)
15 0 0 0 0.08 - FAIL 0 (1 B7)
16 0 0 0 0.09 - FAIL 0 (1 D5)
17 0 0 0 0.13 - FAIL 1 PASS (1 N11)
18 0 0 0 0.15 - FAIL 1 PASS (1 N11)
19 0 0 0 0.13 - FAIL 1 PASS (1 P13)
20 0 0 0 0.13 - FAIL 1 PASS (1 O12)
21 0 0 0 0.13 - FAIL 1 PASS (1 O12)
22 0 0 0 0.13 - FAIL 0 (1 O12)
23 0 0 0 0.12 - FAIL 1 PASS (1 M7)
24 0 0 0 0.11 - FAIL 1 PASS (0)
25 0 0 0 0.11 - FAIL 0 (1 G1)
26 0 0 0 0.13 - FAIL 0 (1 F1)
27 0 0 0 0.12 - FAIL 1 PASS (0)
28 0 0 0 0.13 - FAIL 1 PASS (1 J7)
29 0 0 0 0.10 - FAIL 1 PASS (0)
30 0 0 0 0.11 - FAIL 1 PASS (0)
31 0 0 0 0.15 - FAIL 1 PASS (1 L10)
32 0 0 0 0.07 - FAIL 1 PASS (0)
33 0 0 0 0.14 - FAIL 1 PASS (1 F14)
34 0 0 0 0.11 - FAIL 1 PASS (1 R7)
35 0 0 0 0.11 - FAIL 1 PASS (0)
36 0 0 0 0.08 - FAIL 1 PASS (0)
37 0 0 0 0.08 - FAIL 1 PASS (0)
38 0 0 0 0.11 - FAIL 1 PASS (1 G11)
39 0 0 0 0.13 - FAIL 1 PASS (1 G15)
40 0 0 0 0.16 - FAIL 1 PASS (0)
41 0 0 0 0.10 - FAIL 1 PASS (1 G15)
42 0 0 0 0.16 - FAIL 1 PASS (1 M15)
43 0 0 0 0.10 - FAIL 1 PASS (1 H2)
44 0 0 0 0.08 - FAIL 1 PASS (1 H18)
45 0 0 0 0.09 - FAIL 1 PASS (1 G17)
46 0 0 0 0.13 - FAIL 1 PASS (1 B4)
47 0 0 0 0.15 - FAIL 1 PASS (0)
48 0 0 0 0.10 - FAIL 1 PASS (0)
49 0 0 0 0.10 - FAIL 1 PASS (1 C13)
49 0 0 0
51 0 0 0 0.17 - FAIL 1 PASS (1 O15)
52 0 0 0 0.09 - FAIL 1 PASS (1 C13)
53 0 0 0 0.12 - FAIL 1 PASS (0)
54 0 0 0 0.09 - FAIL 1 PASS (0)
55 0 0 0 0.09 - FAIL 1 PASS (0)
56 0 0 0 0.09 - FAIL 1 PASS (0)
56 0 0 0
56 0 0 0
59 0 0 0 0.35 - FAIL 1 PASS (1 N10)
60 0 0 0 0.09 - FAIL 1 PASS (1 G16)
61 0 0 0 0.09 - FAIL 1 PASS (1 P5)
62 0 0 0 0.14 - FAIL 1 PASS (1 G8)
63 0 0 0 0.13 - FAIL 1 PASS (1 G8)
64 0 0 0 0.14 - FAIL 1 PASS (1 G8)
65 0 0 0 0.14 - FAIL 1 PASS (1 G8)
66 0 0 0 0.11 - FAIL 1 PASS (1 K1)
67 0 0 0 0.16 - FAIL 1 PASS (0)
68 0 0 0 0.07 - FAIL 0 (1 E11)
69 0 0 0 0.11 - FAIL 0 (1 E11)
70 0 0 0 0.09 - FAIL 1 PASS (0)
71 0 0 0 0.09 - FAIL 0 (1 P4)
72 0 0 0 0.10 - FAIL 1 PASS (1 O11)
73 0 0 0 0.12 - FAIL 1 PASS (0)
74 0 0 0 0.08 - FAIL 1 PASS (1 Q17)
75 0 0 0 0.08 - FAIL 0 (1 R18)
76 0 0 0 0.08 - FAIL 1 PASS (1 J3)
77 0 0 0 0.08 - FAIL 1 PASS (1 J2)
78 0 0 0 0.08 - FAIL 1 PASS (0)
79 0 0 0 0.14 - FAIL 1 PASS (0)
Total nodes: 0 0 0
Total time: 8.62
Number of passed tests: 1/76
75 unexpected failures (1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
The failure in test case 50 seems to be an erroneous test case:
loadsgf Handtalk980819-1.sgf
50 disconnect a2 a2
#? [1 a2]
There is no stone at a2 and disconnecting a2 from a2 doesn't make much
sense. Test cases 57 and 58 seem to have identical problems.
/Gunnar
- Re: [gnugo-devel] reading connections, (continued)
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/06
- Re: [gnugo-devel] reading connections, cazenave tristan, 2001/10/08
- Re: [gnugo-devel] reading connections, cazenave tristan, 2001/10/08
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/08
- Re: [gnugo-devel] reading connections, Tristan Cazenave, 2001/10/08
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/08
- Re: [gnugo-devel] reading connections, Daniel Bump, 2001/10/08
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/08
- Re: [gnugo-devel] reading connections, cazenave tristan, 2001/10/09
- Re: [gnugo-devel] reading connections, cazenave tristan, 2001/10/09
- Re: [gnugo-devel] reading connections,
Gunnar Farneback <=
- Re: [gnugo-devel] reading connections, cazenave tristan, 2001/10/10
- Re: [gnugo-devel] reading connections, Daniel Bump, 2001/10/10
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/10
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/10
- Re: [gnugo-devel] reading connections, Trevor Morris, 2001/10/10
- Re: [gnugo-devel] reading connections, Daniel Bump, 2001/10/10
- Re: [gnugo-devel] reading connections, Gunnar Farneback, 2001/10/11
- Re: [gnugo-devel] reading connections, Trevor Morris, 2001/10/11
- Re: [gnugo-devel] reading connections, Daniel Bump, 2001/10/11
- Re: [gnugo-devel] reading connections, cazenave tristan, 2001/10/11