Monte Carlo Tree Search (MCTS)

Monte Carlo Tree Search

A planning algorithm that builds a search tree incrementally by combining tree search with Monte Carlo sampling (rollouts). At each step it selects a promising node, expands it, simulates a rollout from it, and backpropagates the result up the tree.

Intuition

Instead of exhaustively exploring a game/decision tree (impossible for large state spaces), MCTS focuses search on the most promising branches by using random simulations to estimate state values. Over many iterations, it converges to better estimates for states visited more often.

The Four Phases

1. SELECTION     — From root, use a tree policy (e.g., UCB1) to traverse 
                   existing nodes until reaching a leaf or unexpanded node
2. EXPANSION     — Add one (or more) child nodes to the tree
3. SIMULATION    — From the new node, run a rollout (random or heuristic 
                   policy) until a terminal state → get reward
4. BACKPROPAGATION — Propagate the reward back up the tree, updating 
                     visit counts N(s) and value estimates Q(s)

UCB1 Selection (UCT)

The most common tree policy uses UCB adapted for trees (UCT — Upper Confidence bounds applied to Trees):

where:

  • — average return from taking action in state
  • — visit count for state
  • — visit count for action in state
  • — exploration constant (balances Exploration vs Exploitation)

Key Properties

  • Anytime: returns better estimates the more iterations it gets
  • Asymptotically optimal: converges to minimax with enough iterations
  • No domain knowledge required (but benefits from good rollout policies)
  • Scales to huge state spaces: doesn’t need to explore the full tree

Connection to RL

MCTS is a form of planning at decision time — it plans from the current state using a model of the environment (the game rules). This connects to:

  • Dyna — integrating planning with learning
  • Model of the Environment — MCTS requires a generative model
  • Rollout algorithms (§8.10 in Sutton & Barto) — MCTS generalizes rollouts with the tree structure

Applications

  • AlphaGo / AlphaGo Zero: MCTS + deep neural networks for Go
  • Board games (Chess, Shogi) — AlphaZero
  • General game playing, real-time strategy games

Connections

Appears In