The history of designing computer players for board games goes back to the 1970′s when a series of chess playing programs were created to play in real tournaments. The best known of these players is IBM’s Deep Blue that defeated world chess champion Gary Kasparov in 1997.
The most notable computer chess players were all using brute force algorithms. They would investigate all the possible moves at every turn and choose the best move. If you have all your 16 pieces left and each of them have 5 different possible moves, then you have 80 different options at every turn! If you were to build a game tree to model the different decisions of the players, after investigating 10 consecutive turns, your tree would contain about 10 000 000 000 000 000 000 nodes (that is 10 billion billion nodes). Needless to say, the brute force approach is not efficient at all. A surprising fact, however, is that the computer players that employ the brute force algorithms are still used to date and still remain the strongest players that exist. The reason they do so well against real players, is that their strength is directly correlated to the speed and resources of the machine that it is running on. As machines become more powerful (as they do every year, according to Moore‘s law), the computer chess players become more powerful.
In the world of artificial intelligence, designing computer opponents for classic and modern board games poses various interesting problems. As with the computer chess players, when designing computer players for board games, developers generally make use of game trees to model a player’s decisions at different stages of the game. The tree is a simple directed graph where each node in the graph represents the game at a different time (or state). In games like chess, building such a tree is fairly straightforward. The levels of the tree represent alternating player decisions and at each level you either have to maximize your gain (if the level represents your turn to play) or minimize your opponent’s gain (if it is their turn to play). This results in what is commonly referred to as a Minimax tree.
The above approach works well for classic board games where each player only performs a single action in their turn and there are no stochasticity (dice rolls) involved. Modern board game AIs require more sophisticated and complex approaches to game tree building.
Consider a modern strategy board game like Risk, where each turn of a player consists of multiple phases. First you recruit new troops to place on your territories, then you choose one of the opponent’s territories to attack (which you could do repeatedly) and finally you can move some remaining troops between two territories you own. Thus, each player’s turn consists of at least 3 decisions, although a player typically makes around 8 decisions in their turn. In addition to the above complexity, there are dice rolls involved. The result of an attack is determined by each player rolling a set of dice. As a result of the complexity of decisions and the inherent randomness of Risk, designing a computer player for Risk is notoriously difficult. The decision space is also much greater than that of chess. During the recruitment phase in Risk, a player might typically own 21 territories and consequently receive 7 troops to place on these territories. That gives the player approximately 800 000 different combinations of troops and territories!
There are not many existing computer players for Risk, but some research has been done on the most effective strategy. Micheal Wolf analysed the theoretical complexity of Risk and built computer players that makes use of temporal difference learning to find a strategy while they play. In my research I investigate a new approach to Risk AI design, namely Monte Carlo Tree Search (MCTS). This approach has been used on various classic board games, like Chess, Backgammon and Computer-Go. There is a lack of research on the performance of MCTS in games that contain stochasticity. Although Backgammon has randomness, it lacks to decision complexity of Risk. Thus, investigating MCTS for Risk poses unique and interesting challenges.
Monte Carlo Tree Search is an anytime algorithm, meaning it can run for a predetermined time and return the best result it has obtained so far. The strength of the player is directly proportional to how long the algorithm is run. Thus, design players in the most efficient way possible (to make every second count) is of great importance.
Thus far I have used two techniques to make my MCTS player more efficient.
- Hash the game states. At different stages in building the game tree, the state of the game has to be evaluated and used to rank the decision that a player makes at a node. Say you attack Territory A then Territory B and both times you defeat the territory, then you would be in the exact same state as if you first attacked Territory B then A. This happens almost half the time in building the game tree for Risk, so reevaluating the game state each time wastes a lot of time. To eliminate this inefficiency, I use a custom hash function based on Zobrist hashing to hash the board state to a data structure that can easily be accessed whenever the game state is to be evaluated.
- Don’t investigate everything! As previously mentioned, a player might have 800 000 possible decisions at a certain node. If all of these were to be investigated, the tree would possibly be only one level deep and we would know very little about which decision is the best to make. I enforce a dynamic ordering of nodes where I look at a certain number of possible decisions, say 100, then evaluate and rank them according to the resulting state of the board after making the decision. Then I pick only the very best one to add to the tree. This ensures that better decisions are made locally at nodes and it reduces the creation of irrelevant nodes.
The MCTS player I developed thus far expands around 200 nodes per second (considering the complexity of decisions in Risk, this is pretty good). By employing the hashing method, I could generate about 20% more nodes in the same time frame. The selective node expansion method resulted in much deeper trees. When all the possible decisions at a node is investigated, the tree might only have a depth of one or two, but the selective node expansion method results in trees that reaches depths of between 30 and 50.
So far no real testing against human players have been done, but the way forward looks exciting and only time will tell if the players are actually strong enough to beat the most experienced of human players.