Aya's selfplay games for training value network.

(since 2017/04/05)
These are Aya's selfplay games for training value network.
There are about 1,440,000 games.
Selfplay program strength is about KGS 4d and 5d.

Using these games, KGS 4d over, tygem 9d, and GoGoD, I made slightly (+70 Elo) better value network
 than using KGS 4d over, tygem 9d, and GoGoD.
White tends to win in KGS komi 0.5 games. So I deleted 41% white win games in komi 0.5 game.

Value network is 64 Filters, 14 Layers with batch normalization.
Caffe's prototxt is 20170405aya_value.tar.gz.

Input features are 50 channels (49 + 1 (turn)).
[Computer-go] DCNN can solve semeai?
http://computer-go.org/pipermail/computer-go/2016-February/008606.html
[Computer-go] Value Network
http://computer-go.org/pipermail/computer-go/2016-March/008768.html
16 positions are selected from one game. And they have same game result.


A game has 4 stages.

1. First move is selected from opening book. And rorated 8 symmetry randomly.
2. 2nd - 16th moves are selected randomly from Policy network probability.
3. 2000 playouts/move self play. And it ends when a player resigns.
4. resume with 300 playouts/move selfplay from resign position. And remove all dead stones.
   This 300 playouts/move selfplay is about KGS 3d. Only root node is created by Policy network.

Komi is 7.5, Chinese rule.

Note
If Aya plays (1,1) or (1,19) or (19,1) or (19,19) or (9,1)(=J19) within 16 moves,
 it is bug. It should be deleted.
If original game result "GN[B+R," and game result "RE[B+3.5]" is different (like RE[W+5.5]),
 it should be deleted. 300 playout/move maybe make mistakes.

Difference from AlphaGo's paper
This method is different from AlphaGo's paper. Their method is 
 1. make a position by Policy(SL) probability from initial position.
 2. play a move uniformly at random from available moves.
 3. play left moves by Policy(RL) to the end.

AlphaGo's RL is about 2800 (CGOS BayesElo).


Aya's selfplay games
Each file has about 20000 games.

2000 playouts/move. all node is created by Policy network.
CGOS BayesElo 2969 (Aya795p1v0cn33_2k) as of 2017-04-04.

  pi7_19_0114_2k_r16_add300_1.tar.bz2
  pi7_19_0114_2k_r16_add300_2.tar.bz2
  pi7_19_0114_2k_r16_add300_3.tar.bz2
  pi7_19_0114_2k_r16_add300_4.tar.bz2
  pi7_19_0114_2k_r16_add300_5.tar.bz2
  pi7_19_0114_2k_r16_add300_6.tar.bz2
  pi7_19_0114_2k_r16_add300_7.tar.bz2
  pi7_19_0114_2k_r16_add300_8.tar.bz2


1000 playouts/move. all node is created by Policy network.
CGOS BayesElo 2850?

  pi7_19_1k_r16_re_add300_1.tar.bz2
  pi7_19_1k_r16_re_add300_2.tar.bz2
  pi7_19_1k_r16_re_add300_3.tar.bz2
  pi7_19_1k_r16_re_add300_4.tar.bz2
  pi7_19_1k_r16_re_add300_5.tar.bz2
  pi7_19_1k_r16_re_add300_6.tar.bz2
  pi7_19_1k_r16_re_add300_7.tar.bz2
  pi7_19_1k_r16_re_add300_8.tar.bz2


2000 playouts/move. Only root node is created by Policy network.
CGOS BayesElo 2766 (Aya793d_524_ro_2k), KGS 4d (AyaBotD3)

  pw_19_0114_ro_2k_r16_add300_1.tar.bz2
  pw_19_0114_ro_2k_r16_add300_2.tar.bz2
  pw_19_0114_ro_2k_r16_add300_3.tar.bz2
  pw_19_0114_ro_2k_r16_add300_4.tar.bz2
  pw_19_0114_ro_2k_r16_add300_5.tar.bz2
  pw_19_0114_ro_2k_r16_add300_6.tar.bz2
  pw_19_0114_ro_2k_r16_add300_7.tar.bz2
  pw_19_0114_ro_2k_r16_add300_8.tar.bz2

   p203_19_0114_ro_2k_r16_add300_1.tar.bz2
   p203_19_0114_ro_2k_r16_add300_2.tar.bz2
   p203_19_0114_ro_2k_r16_add300_3.tar.bz2
   p203_19_0114_ro_2k_r16_add300_4.tar.bz2
   p203_19_0114_ro_2k_r16_add300_5.tar.bz2
   p203_19_0114_ro_2k_r16_add300_6.tar.bz2
   p203_19_0114_ro_2k_r16_add300_7.tar.bz2
   p203_19_0114_ro_2k_r16_add300_8.tar.bz2
   p203_19_0114_ro_2k_r16_add300_9.tar.bz2
  p203_19_0114_ro_2k_r16_add300_10.tar.bz2
  p203_19_0114_ro_2k_r16_add300_11.tar.bz2
  p203_19_0114_ro_2k_r16_add300_12.tar.bz2
  
   p204_19_0114_ro_2k_r16_add300_1.tar.bz2
   p204_19_0114_ro_2k_r16_add300_2.tar.bz2
   p204_19_0114_ro_2k_r16_add300_3.tar.bz2
   p204_19_0114_ro_2k_r16_add300_4.tar.bz2
   p204_19_0114_ro_2k_r16_add300_5.tar.bz2
   p204_19_0114_ro_2k_r16_add300_6.tar.bz2
   p204_19_0114_ro_2k_r16_add300_7.tar.bz2
   p204_19_0114_ro_2k_r16_add300_8.tar.bz2
   p204_19_0114_ro_2k_r16_add300_9.tar.bz2
  p204_19_0114_ro_2k_r16_add300_10.tar.bz2
  p204_19_0114_ro_2k_r16_add300_11.tar.bz2
  p204_19_0114_ro_2k_r16_add300_12.tar.bz2
  
   p205_19_0114_ro_2k_r16_add300_1.tar.bz2
   p205_19_0114_ro_2k_r16_add300_2.tar.bz2
   p205_19_0114_ro_2k_r16_add300_3.tar.bz2
   p205_19_0114_ro_2k_r16_add300_4.tar.bz2
   p205_19_0114_ro_2k_r16_add300_5.tar.bz2
   p205_19_0114_ro_2k_r16_add300_6.tar.bz2
   p205_19_0114_ro_2k_r16_add300_7.tar.bz2
   p205_19_0114_ro_2k_r16_add300_8.tar.bz2
   p205_19_0114_ro_2k_r16_add300_9.tar.bz2
  p205_19_0114_ro_2k_r16_add300_10.tar.bz2
  p205_19_0114_ro_2k_r16_add300_11.tar.bz2
  p205_19_0114_ro_2k_r16_add300_12.tar.bz2
  
   p206_19_0114_ro_2k_r16_add300_1.tar.bz2
   p206_19_0114_ro_2k_r16_add300_2.tar.bz2
   p206_19_0114_ro_2k_r16_add300_3.tar.bz2
   p206_19_0114_ro_2k_r16_add300_4.tar.bz2
   p206_19_0114_ro_2k_r16_add300_5.tar.bz2
   p206_19_0114_ro_2k_r16_add300_6.tar.bz2
   p206_19_0114_ro_2k_r16_add300_7.tar.bz2
   p206_19_0114_ro_2k_r16_add300_8.tar.bz2
   p206_19_0114_ro_2k_r16_add300_9.tar.bz2
  p206_19_0114_ro_2k_r16_add300_10.tar.bz2
  p206_19_0114_ro_2k_r16_add300_11.tar.bz2
  p206_19_0114_ro_2k_r16_add300_12.tar.bz2


Sample of sgf

pi7_19_0114_2k_r16_add300_1.tar.bz2

$ cat i7_19_0114_2k_r16_add300_1/20170308_2007_00001.sgf
(;GM[1]SZ[19]KM[7.5]
GN[B+R,228,/home/yss/aya/i7_19_0114_2k_r16_1/20170117_2336_01826.sgf]
                        ... original game result, White 228th move is resign, original file name
RE[B+3.5]               ... final game result
RU[Chinese]
;B[cd]                  ... first move is selected from opening book. And be rotated 8 symmetry randomly.
;W[pd]C[ 0.500,    2]
;B[oe]C[ 1.000,    1]
;W[pe]C[ 0.000,    1]
;B[od]C[ 1.000,    1]
;W[oc]C[ 1.000,    1]
;B[pf]C[-0.000,    1]
;W[qf]C[ 0.000,    1]
;B[pc]C[-0.000,    1]
;W[qc]C[ 0.000,    1]
;B[nc]C[-0.000,    1]
;W[pb]C[ 1.000,    1]
;B[ob]C[ 1.000,    1]
;W[pc]C[ 1.000,    1]
;B[eq]C[ 1.000,    1]
;W[pg]C[ 0.000,    1]
;B[of]C[-0.000,    1]   ... selected randomly from Policy Network up to 16 moves.
;W[ph]C[ 0.385, 1007]   ... 0.385 is Black win rate for "ph". "ph" is selected 1007 times in Root.
;B[cp]C[ 0.427, 1048]
;W[ed]C[ 0.489, 1294]
;B[gc]C[ 0.481,  954]
;W[cc]C[ 0.502, 1327]
;B[bc]C[ 0.490, 1637]
;W[dc]C[ 0.502, 1471]
;B[cg]C[ 0.501, 2471]   ... sometimes selected number is over 2000, because of using previous transposition table
;W[gd]C[ 0.499, 2245]
;B[hd]C[ 0.489, 1647]

...

;W[fj]C[ 0.801, 1349]
;B[im]C[ 0.824,  646]
;W[ad]C[ 0.569,  163]   ... 300 playout/move from here. In 2000 playouts/move, White resigns.
;B[af]C[ 0.608,  301]       Playout result is game result with territory adjustment by sigmoid function.
;W[lp]C[ 0.595,  143]       +3.5 points win is about +0.60 win, not +1.0
;B[mp]C[ 0.617,  423]
;W[el]C[ 0.578,  125]

...

;B[tt]C[ 0.607,  296]
;W[gj]C[ 0.571,  876]
;B[tt]C[ 0.607,  274]
;W[tt]C[ 0.603,  300]   ... two pass, game end
)



Pro and KGS games.

GoGoD,         80000 games http://gogodonline.co.uk/ pro games. $15
BudukMovies    59439 games https://badukmovies.com/pro_games pro games. free.
tygem 9d       22477 games http://baduk.sourceforge.net/TygemAmateur.7z strong amatuer games.
KGS 4d over  1450946 games http://www.u-go.net/gamerecords-4d/

Acknowledgment
I'd like to thank you to Kunihito Hoki for his precious advice.