Optimality and Approximation

The Limit of Exact Solutions

In large or continuous state spaces, finding the exact optimal policy or value function is computationally infeasible. Approximation refers to using parameterized functions (like neural networks or linear models) to represent the value functions, accepting a small amount of error to gain the ability to generalize across states.

Coverage vs. Accuracy

When we have millions of states (like in Go or Chess), we cannot store a table with one entry per state. Instead, we use a compact representation (a “function approximator”). This means:

  1. Generalization: Learning about one state helps us estimate the value of similar, unvisited states.
  2. Prioritization: We focus our limited “accuracy budget” on the states the agent actually visits.

Core Concepts

  • Online Learning: By learning as the agent interacts with the world, the approximation naturally focuses on the most relevant parts of the state space (the On-Policy Distribution).
  • Functional Forms: We typically approximate the value function as , where are the weights of the model.
  • The “Deadly Triad”: A warning in RL that combining Function Approximation, Bootstrapping, and Off-policy learning can lead to instability and divergence.

Connections

Appears In