An attempt to replicate the methods used in the AlphaGo paper

Mastering the game of Go with deep neural networks and tree search

https://www.nature.com/articles/nature16961

My inspiration to develop and learn about AlphaGo came from the documentary: AlphaGo - The Movie | Full award-winning documentary. This is my first-ever neural net written and trained. I have learnt this from scratch as I implemented this.

To understand the intuition behind tree search and older methods, I followed the textbook, and reused some of the game environment from Deep Learning and the Game of Go

The following methods helped me develop an intuition around tree search:

MCTS
AlphaBeta pruning
Random Policy

You can find the following games to compare performances:

AlphaGo uses 3 training pipelines to improve its policy and value estimation.

Supervised Learning Policy Network

I've added code to create training and test data from the KGS server games.

Steps to prepare data:

Download the tar.gz game files from KGS server and put them in data dir.
Run run_dataprocessor.py to generate the dataset.

The Training Pipeline for Supervised Learning is written in the notebook alphago.ipynb

Reinforcement Learning Policy Network

Uses the REINFORCE algorithm to update the policy towards winning moves. The Training pipeline for Reinforcement Learning of Policy Network is written in the notebook alphago.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dlgo		dlgo
mcts		mcts
.gitignore		.gitignore
README.md		README.md
abprune_v_mcts.py		abprune_v_mcts.py
abprune_v_randombot.py		abprune_v_randombot.py
alphago.ipynb		alphago.ipynb
bot_v_bot.py		bot_v_bot.py
create_zobrist_py.sh		create_zobrist_py.sh
gen_random_games.sh		gen_random_games.sh
gen_zobrist_hash.py		gen_zobrist_hash.py
mcts_v_randombot.py		mcts_v_randombot.py
run_dataprocessor.py		run_dataprocessor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An attempt to replicate the methods used in the AlphaGo paper

Mastering the game of Go with deep neural networks and tree search

Supervised Learning Policy Network

Reinforcement Learning Policy Network

Reinforcement Learning of Value Network [TODO]

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

An attempt to replicate the methods used in the AlphaGo paper

Mastering the game of Go with deep neural networks and tree search

Supervised Learning Policy Network

Reinforcement Learning Policy Network

Reinforcement Learning of Value Network [TODO]

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages