Japanese
AobaZero is a user-participated Shogi AI project that had tested the AlphaZero Shogi experiment. Currently undergoing unique improvements.


If you are interested, please join us. Anyone can contribute using Google Colab.

GitHub Source, executable files. GitHub(Japanese top page)

2022-04-18 v32 update required. First play urgency. dfpn tsume-solver for all nodes. w3954, 54540k games.
2022-04-07 v30 update required. Root Policy temperarure is from 1.0 to 1.8. +50 ELO stronger. Randomness up to 30 moves was reverted to the original AlphaZero format. The win rate was not well adjusted to stay within a certain range, so all moves in the initial phase had the same win rate. kldgain=0.000006. 789 playouts per move on average. w3933, 53930k games.
2022-02-25 v28 update required. Network structure is changed. NN's input has Piece movable area. Policy output is from 11259(139x9x9) to 2187(27x9x9). Swish activation. Tsume solver up to 3 plies. Randomness up to 30 plies is based on one ply searched value. w3881, 52390k games.
2021-08-31 Another distributed Deep reinforcement learning for Shogi handicap games, AobaKomaochi is running.
2021-04-27 AobaZero has finished, and moved to 40 block. 39825k games, w3459 is last 20 block, w3460 is 40 block. Thank you!
update history


2023-09-26 01:07 JST(update every 30 minutes)
In past hour,number of clients are 4, 450 games.
In past 24 hours, number of clients are 7, 8756 games.
Total 65660383 games. Latest weight= w4323. Next is in 17.0 hours. Thank you for your contribution!
In past 1000 games, Average of moves 89.0, Sente winrate 0.683, Draw rate 0.033
In past 1,000,000 games, Average of moves 88.8, Sente winrate 0.669, Draw rate 0.040

Elo progress. It is based on a self-match with the previous weight. Left vertical axis is Elo. Right is floodgate and vs Kristallweizen 1k,10k,50k,100k,200k,500k. Horizontal axis is number of trained games(1 unit is 10000 games).
As of 2023-09-25.

AobaZero 800playouts/move vs Kristallweizen 500k/move. 800 match games.


You can see the process of acquiring Shogi knowledge from the game records.
Self-play games without noise. Each game uses same weight.

You can see the transition of opening moves.


Self-play games for training.
Self-play games for every 10,000 games added. The top of the page is the latest game. It will be updated every other day.

These are self-play games for training. It often plays blunder for the first 30 moves.
And sometimes it choose a move that is not a best by adding noise on root node.


Game records
From arch000000000000.csa.xz to arch000043310000.csa.xz.
From arch000045990000.csa.xz to arch000046400000.csa.xz.
These will be updated each two weeks.
From
no000000000000.csa to
no000000121031.csa
 are generated by not using neural network, but random function.
The first game that is generated by neural network is
no000000121032.csa
no000001017999.csa. Up to here, 64x15block, window is past 100,000 games.
no000001018000.csa. From here, 256x20block, window is past 500,000 games.
Weights
From w0001.txt.xz to w3702.txt.xz.
From w3703 to w3705.txt.xz.
Network size is 64 x 15 block up to w448, 256 x 20 block from w449.
w001  ...  64x15b,minibatch   64, learning rate 0.01,     wd 0.00005, momentum 0.9,   120000 games
w156  ...  64x15b,minibatch   64, learning rate 0.001,    wd 0.00005, momentum 0.9,   430000 games
w449  ... 256x20b,minibatch   64, learning rate 0.01,     wd 0.0002,  momentum 0.9,  1018000 games
w465  ... 256x20b,minibatch   64, learning rate 0.001,    wd 0.0002,  momentum 0.9,  1180000 games
w775  ... 256x20b,minibatch 4096, learning rate 0.02,     wd 0.0002,  momentum 0.9,  4220000 games
w787  ... 256x20b,minibatch  128, learning rate 0.0002,   wd 0.0002,  momentum 0.9,  4340000 games
w1450 ... 256x20b,minibatch  128, learning rate 0.00002,  wd 0.0002,  momentum 0.9, 10980000 games
w2047 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 16948000 games
w2750 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.00004, momentum 0.9, 23982000 games
w3022 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.00004, momentum 0.9, 26706447 games, Weights updated 34285 games
w3077 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.00004, momentum 0.9, 28352543 games, Replay buffer 1000000 games
w3148 ... 256x20b,minibatch  128, learning rate 0.0000002,wd 0.00004, momentum 0.9, 30474874 games
w3299 ... 256x20b,minibatch  128, learning rate 0.0000002,wd 0.0002,  momentum 0.9, 34987582 games
w3460 ... 256x40b,minibatch   64, learning rate 0.000001, wd 0.0002,  momentum 0.9, 39825686 games(AlphaZero test ends)
w3616 ... 256x40b,minibatch   64, learning rate 0.0000005,wd 0.0002,  momentum 0.9, 44461876 games
w3703 ... 256x20b,minibatch   64, learning rate 0.000002, wd 0.0002,  momentum 0.9, 47075522 games, Temperature 1.0 -> 1.3 
w3770 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 49046722 games, average of searched winrate and game result
w3806 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 50130343 games, kldgain 0.0000013
w3881 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 52390111 games, NN strucure is dlshogi like. Randomness upto 30 plies based on one ply searched value.
w3933 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 53934959 games, Root policy temperature is 1.8. Randomeness upto 30 moves was AlphaZero style. kldgain 0.000006
w3954 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 54545373 games, initial node winrate is based on parent winrate(fpu). dfpn tsume-solver for all nodes.
w4177 ... 256x20b,minibatch  128, learning rate 0.000002 ,wd 0.0002,  momentum 0.9, 61229235 games, Reduce learning positions upto 30moves and resign within 40 moves. Replaced weight is +64 ELO stronger.
Weights are updated each  2000 games ( 4000 iterations) up to w448.
Weights are updated each 10000 games (20000 iterations) from  w449.
Weights are updated each 10000 games (10000 iterations) from  w787.
Weights are updated each 34285 games (32000 iterations) from w3022.
Replay buffer is past 1000000 games(from past 500000 games) from w3077.