Dynamic Programming

Definition

Dynamic Programming

Dynamic Programming refers to a collection of algorithms that compute optimal policies given a perfect model of the environment (i.e., the MDP dynamics ). DP uses the Bellman Equation as an update rule to iteratively improve value estimates.

Core Idea

DP turns the Bellman equation into an assignment (update rule). Instead of solving a system of equations, it repeatedly applies the Bellman equation as an update until convergence. “Sweep” through all states, update each one, repeat.

Key Algorithms

Policy Evaluation (Prediction)

Compute for a given policy by iterative application of the Bellman equation:

Iterative Policy Evaluation Update

Repeat until (convergence threshold).

Policy Iteration

Alternates between evaluation and improvement:

  1. Policy Evaluation: Compute (iteratively until convergence)
  2. Policy Improvement: (greedy w.r.t. current value function)
  3. Repeat until policy is stable ()

Guaranteed to converge to optimal policy in finite number of iterations (for finite MDPs).

Value Iteration

Combines evaluation and improvement into a single update:

Value Iteration Update

Equivalent to: one sweep of policy evaluation + greedy policy improvement. Converges to .

Generalized Policy Iteration (GPI)

GPI

Any interaction between policy evaluation and policy improvement, regardless of granularity. Value iteration, policy iteration, and most RL algorithms are instances of GPI.

Evaluation ←→ Improvement
    ↓              ↓
  v ≈ v_π      π ≈ greedy(v)
    ↘              ↙
       v* and π*

Limitations

  • Requires full model: Must know for all transitions
  • Curse of dimensionality: Sweeps over all states — infeasible for large/continuous state spaces
  • Full-width backups: Each update considers all possible next states

Why DP Matters Despite Limitations

DP provides the theoretical foundation for all of RL. MC and TD methods are essentially doing DP-like updates but with samples instead of expectations. Understanding DP is key to understanding everything else.

Connections

Appears In