Japanese
AobaKomaochi is a distributed Deep reinforcement learning for Shogi handicap games without human knowledge.


Handicaps are seven kinds. Lance(kyo ochi), Bishop(kaku ochi), Rook(hisha ochi), 2-Piece(ni-mai ochi), 4-Piece(yon-mai ochi), 6-Pieces(roku-mai ochi) and No handicap(hirate).
Winrate are adjusted to keep 0.5 by weakening Black(shitate or sente) player strength.
Can AI discover a new opening, or rediscover Two-Pawn Sacrifice Push, Silver Tandem, etc?
If you are interested, please join us. Anyone can contribute using Google Colab.

GitHub Source and Windows binary. GitHub(Japanese top page)

2021-12-04 v26. include latest weight w1250.
2021-12-04 Training has finished. 13 million games was generated for 6 months. Lance handicap is Double Ranging Rook(both File 7), Bishop is Feint Ranging Rook(File 7), 2-Piece is similar to Silver Tandem, In 4-Piece and 6-Piece, Shitate does not make castle. Paper and slide. (In Japanese). Thank you!
2021-10-14 Drop the learning rate to 0.000001. (from 10044k games, w955).
2021-09-22 v24 Bug fix. Update reqiured. Restart from w744, 7940001 games.
2021-09-20 v23 kldgain option for training. update required. w745, 7940000 games.
2021-08-05 Drop the learning rate to 0.0001. (from 3711k games, w321).
2021-06-28 v1.1 softmax temperature > 1.0 is adjusted, even if moves <= 30. aobak ver is 20. w92,1430000 games.
2021-06-23 Windows version(v1.0) is released.
2021-06-07 Fixed adjustment ELO method.
2021-06-07 Bug fix. It fails to find 1 ply mate sometimes.
2021-06-06 Web site open. Google Colab is available. Interestingly, at present, uwate(white)'s winrate is high in 6-Piece. This is because less pieces player has more chance to get pieces if you move pieces almost randomly. AobaKomaochi uses 27-point declare rule. The removed pieces are counted towards uwate(white)'s total.


2021-12-04 16:49 JST(update every 30 minutes)
In past hour,number of clients are 3, 1873 games.
In past 24 hours, number of clients are 21, 46540 games.
Total 13002907 games. Latest weight= w1250. Next is in 5.3 hours. Thank you for your contribution!
In past 7000 gamesIn past 500,000 games
Average of movesSente winrateDraw rateAverage of movesSente winrateDraw rate Handicap ELO
No handicap 115.1 0.480 0.027 113.2 0.498 0.028 32
Lance 113.2 0.501 0.020 114.6 0.499 0.019 135
Bishop 124.7 0.493 0.013 124.0 0.501 0.013 387
Rook 103.8 0.492 0.009 107.8 0.500 0.008 443
2-Piece 107.7 0.484 0.002 107.5 0.500 0.003 635
4-Piece 84.3 0.527 0.000 88.8 0.500 0.001 744
6-Piece 88.0 0.496 0.000 87.9 0.500 0.001 882

Elo progress. self-match with 1playout/move (left vertical axis), and vs Kristallweizen(6.00) 20k/move(right vertical axis). Right Elo is based on floodgate.
As of 2021-12-04.

AobaKomaochi 100playout/move vs Kristallweizen(6.00) 20k/move. 400 match games.


Game samples without noise. There is no ELO adjustment for black player. So black tends to win.
Self-play games without noise. Each game uses same weight.

You can see the transition of opening moves.


Self-play games for training.
1 sample per 1 weight. It will be updated each day.

For randomeness, it often plays blunder for the first 30 moves. And Black strength is adjusted.


Game records
0-059 060-289 290-549 550-809 810-
From
no000000000000.csa to
no000000500007.csa
 are generated by not using neural network, but random function.
The first game that is generated by neural network is
no000000500008.csa
256x20block, replay buffer is past 500,000 games.
Weights
w001-w400 w401-w790 w790-
Network size is 256 x 20 block (ResNet). AlphaZero style.
w001  ... 256x20b,minibatch  128, learning rate 0.01,     wd 0.0002, momentum 0.9,   500000 games fail at w009.
w001  ... 256x20b,minibatch  128, learning rate 0.001,    wd 0.0002, momentum 0.9,   500000 games. restart with smaller lr.
w321  ... 256x20b,minibatch  128, learning rate 0.0001,   wd 0.0002, momentum 0.9,  3711485 games
w524  ... 256x20b,minibatch  128, learning rate 0.00001,  wd 0.0002, momentum 0.9,  5738768 games
w745  ... 256x20b,minibatch  128, learning rate 0.00001,  wd 0.0002, momentum 0.9,  7940000 games. kldgain
w955  ... 256x20b,minibatch  128, learning rate 0.000001, wd 0.0002, momentum 0.9, 10044983 games
w1031 ... 256x20b,minibatch  128, learning rate 0.00001,  wd 0.0002, momentum 0.9, 10802622 games. value loss is from game result to (game result + search winrate)/2.0
w1170 ... 256x20b,minibatch  128, learning rate 0.000001, wd 0.0002, momentum 0.9, 12192627 games