[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Deprelibenchmark 2010
From: |
Frank Berger |
Subject: |
[Bug-gnubg] Deprelibenchmark 2010 |
Date: |
Sat, 23 Jul 2011 13:59:11 +0200 |
Hi,
as mentioned some time ago, I got the files of the 2010 Benchmark from Michael
Depreli and analyzed the data to construct an XML file that contains all
positions that were differently played with their rollout data. I carefully
checked the data for consistency (I got a pretty good idea how much work it has
been to collect this data manually. Kudos to Michael) added rollouts and
together with Michael we looked at the problems and fixed them. Feel free to
add some additional quality control.
The idea of having such a file is to have a benchmark that allows quick (BGB
needs just half an hour) and verifiable results. How difficult it has been to
estimate bot strength in the past? Remember the Big-Bot-Shootout? 6000 25-point
matches, month of computing time but for statistical relevant results (other
than that JF is worse) it is still to few.
With the Depreli 2010 benchmark this should have been improved vastly now.
Naturally through the selection of positions, the rollout method etc. there
might be deviations from the "truth", but it is better by far than anything we
had before.
My expectation is to establish a file format for benchmarks, so we get more of
this data in the future. I invite anyone to add suggestions, rollouts (or
rollout data with a different bot.. Xavier?) etc., but to have one "master"
copy of this file I ask you to send modifications to me.
I would be glad if further benchmarks would be created, or where SW is
developed where the creation of such an benchmark is simplified etc. That's
should be just a start I hope.
You find the files here:
http://www.bgblitz.de/Depreli2010/dep_2010_id.xml.zip
The sgf files of the matches are here: http://www.bgblitz.de/Depreli2010/sgf.zip
The excel file is here:
http://www.bgblitz.de/Depreli2010/BOT%20SHOOTOUT500R8.xls
(I asked Michael for permission)
Just to have an idea how the file looks I appended a short piece at the end.
ciao
Frank
<benchmark>
<displayName>Depreli Benchmark 2010</displayName>
<comment>Benchmark an AI against approx. 5000 difficult positions</comment>
<plies>3</plies>
<benchPositions>
<benchPosition>
<cubeDecision>false</cubeDecision>
<id>010202G</id>
<positionID>sM/gARTB28EBIg:cAkOAAAACAAE</positionID>
<xgID>XGID=-Aaa--DBC---dC---b-ebA--A-:0:0:1:34:1:0:0:0:10</xgID>
<responses>
<response>
<move>24-21,6-2</move>
<equityDiff>0.00</equityDiff>
</response>
<response>
<move>24-21,13-9</move>
<equityDiff>-0.008</equityDiff>
</response>
<response>
<move>24-21,8-4</move>
<equityDiff>-0.017</equityDiff>
</response>
<response>
<move>8-1</move>
<equityDiff>-0.042</equityDiff>
</response>
</responses>
</benchPosition>
<benchPosition>
<cubeDecision>true</cubeDecision>
<id>010208S</id>
<positionID>z88BAAzFdg8YAA:cAkAAAAACAAE</positionID>
<xgID>XGID=-AAb-BBCD------B----c-f-d-:0:0:1:00:1:0:0:0:10</xgID>
<responses>
<response>
<cubeAction>No Double</cubeAction>
<equityDiff>0.0000</equityDiff>
</response>
<response>
<cubeAction>Double</cubeAction>
<equityDiff>-0.0100</equityDiff>
</response>
<response>
<cubeAction>Take</cubeAction>
<equityDiff>-0.2510</equityDiff>
</response>
<response>
<cubeAction>Pass</cubeAction>
<equityDiff>0.0000</equityDiff>
</response>
</responses>
</benchPosition>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-gnubg] Deprelibenchmark 2010,
Frank Berger <=