A Noob's Guide to Minimax

Nov 25, 2020

Recently, I watched Deepmind's AlphaGo documentary on YouTube and was super impressed and inspired to continue pursuing Machine Learning research. I was watching with Dad and he said he could see the spark in my eyes (lol).

Catch it here if you haven't. Stunning videography, no doubt → Super inspiring for ML enthusiasts.

This sparked a voluntary need to learn more about Game AI and the different search/optimisation algorithms used to find the best move in a game.

Preface: I created a Tic Tac Toe playing bot based on the Minimax algorithm. Now that I have a basic understanding of how it works, I want to explain it in simple English.

https://github.com/rish-16/Minimax-Heuristic-Search-T3

Game AI 101

The documentary spoke about different techniques used by the Deepmind team. While I'm not getting into the nitty-gritty details (no spoilers, I promise), I'll be talking about the one I started with – Minimax.

Most Game AIs work the same → you pass in the current game state and use a search function to find the best move. This search function assigns a score to each board state that tells the agent whether it's winning or losing. Based on this score, it finds different paths to go along until it finds a winning combination of moves, and finally takes the move that starts it all (aka, the golden move).

Enter Minimax

Minimax (also called MM and Minmax) is a concept from Game Theory that's used in finding the ideal strategy for two competing agents in a perfect information scenario (no information is hidden from each other).

It goes something like this → every game has two players, a maximiser and a minimiser; hence the name Minimax. The maximiser aims to maxmise their chance of winning and the minimiser wants to minimise their chance of losing (read that again, might take you a minute to wrap your head around it). The algorithm works on this premise. Usually, the AI is the maximiser and the opponent (human or otherwise) is the minimiser.

When playing games, humans usually visualise the game board a few timesteps into the future. Likewise, AI agents use Game Trees – a data structure that contains all the possible states of the board n timesteps into the future – to find the best move. Through a process called Rollout, all the possible game moves are evaluated till the end of the game.

Here's the Game Tree for game state B:

Here, we look 3 timesteps into the future.

While finding the best move, the agent needs to also keep track of the opponent's moves as well, which is why you can see the alternating nature of the Game Tree, with each branch being played by the maximiser or the minimiser in succession.

Scoring Nodes/Game States

A node is said to be terminal when no more moves can be made → as in, the game has terminated. This usually occurs when it's a draw or one of the sides has won (either the maximiser or minimiser).

To find out if a node is terminal, we run the node (as in, the represented game state) through a Heuristic Function. This function returns a score that indicated whether one of the players won, lost, or drawed the game. Think of it like an actual mathematical function that takes in a board state and spits out a score.

For instance, in the game Tic Tac Toe, we can use a Heuristic Function that tells us whether X won or drawed using the scores, +1 for win, -1 for loss, and 0 for a draw. Depending on the game, you'd have to change your Heuristic Function as well.

Creating the Game Tree

Minimax unravels a Game Tree until all the final nodes are in terminal states (win/lost/draw). It then traverses the tree from the terminal nodes to the root nodes from the bottom to the top. Non-terminal nodes always have children when terminal nodes do not.

Let's say we start with a game state G with 2 possible moves, a and b, that can be made. The Game Tree contains all the possible moves that can be made at the current game state and as possible moves for those moves as well, as shown below. They are represented by the yellow nodes in the diagram below.

In a Human vs AI game, assume the maximiser to be the AI and minimiser to be the human.

The fully unraveled Game Tree for game state G. The tree is developed until all the nodes are terminal.

Traversing the Game Tree

Let's start from the bottom – at the terminal nodes – and alternate between maximiser and minimiser as we move upwards to the root node.

It's the maximiser's turn. From all the available options, the maximiser chooses the score that maximises its chances of winning → the highest scores. Starting from the left of the tree above, the maximiser chooses 2, 5, 6, and 4. These scores are then passed up to the parent nodes currently without a score.
Now, it's the minimiser's turn. From all the available options, the minimiser choose the score that minimises its chances of losing → the lowest scores. Starting from the left of the tree, the minimiser chooses 2 and 4. These scores are passed to the parent nodes also without a score.
Next, it's the maximiser's turn; we've reached the root node → the game state G. The maximiser now has 2 options → a or b. As we align with the maximiser's objective, we choose the higher number, 4 and pass it to the parent (root) node.
The final Game Tree looks like this. The maximiser can go ahead and choose move b that gives it a higher chance of winning since 4 is the greatest number of all options (namely 2 and 4).

Minimax Doesn't Always Work

Minimax is a great algorithm. In fact, it's applicable to many 2-player, perfect information games. Yet, it has its drawbacks that make it very slow.

Branching Factor too high → Branching Factor refers to the number of branches (or children) a current node has. Imagine a game like Chess with a BF of 31 or Go with a BF of 250. It'd take a ridiculous amount of time to iteratively go through all the nodes, one by one, and evaluate them.
Think of the difficulty in feeding 3 children and putting them to sleep compared to feeding 100 children and getting them to sleep. Which one would take more time?
Search Depth too high → When going down the tree, the algorithm can see n steps into the future, called the Search Depth. Every set of child branches unraveled is another step into the future. The higher the search depth, the longer the time to backtrack and reverse-traverse the tree from the terminal nodes, which makes Minimax inefficient.
Imagine diving into the ocean. The higher your depth from the surface, the more time you take to resurface when your oxygen tanks are running low.

To beat these inefficiences, you can use other add-on algorithms like Alpha-Beta Pruning, which wholly ignores parts of the Game Tree to speed up the search. I'll probably write an article on it some other time.

For games with a small set of possible moves, Minimax is an amazing algorithm. For games with large Branching Factors, we'd prefer to use another algorithm like Monte-Carlo Tree Search or a Neural Network based approach like what DeepMind did with AlphaGo (Read the paper here).

RishTech

Discussion about this post

Ready for more?