Rollout Algorithm

Rollout Algorithm

A decision-time planning method that estimates the value of each action from the current state by simulating complete trajectories (rollouts) using a rollout policy (also called base policy or default policy). The action with the highest estimated value is then executed.

How It Works

  1. From current state , for each candidate action :
    • Simulate a trajectory starting with action , then following the rollout policy until episode end
    • Record the return
  2. Repeat multiple times to get average returns
  3. Execute the action with the highest average return

Look Before You Leap

Instead of committing to an action based on a learned value function, a rollout algorithm “imagines” what would happen if it took each action and then followed some default strategy. It picks the action that leads to the best imagined outcome. This is planning at decision time — only the current decision is planned, not future ones.

Key Properties

  • Requires a model (simulator) of the environment to run rollouts
  • The rollout policy can be simple (even random) — MCTS is an extension that builds a smarter tree policy
  • The chosen action is a policy improvement over the rollout policy
  • Only one action is actually executed in the real environment per planning cycle
  • Quality depends on rollout budget (number of simulations) and quality of rollout policy

Connection to Policy Improvement

Rollout as Policy Improvement

If is the action-value function under rollout policy , then the rollout algorithm selects: This is equivalent to one step of policy improvement over .

Connections

Appears In