Mean Squared Value Error

Definition

Mean Squared Value Error (MSVE)

In RL with function approximation, we cannot represent the true value function exactly for all states. The Mean Squared Value Error (MSVE) is the standard objective function used to measure how well our approximate value function matches the true value function .

Mathematical Formulation

MSVE

where:

  • — Learnable weights of the function approximator
  • — True value of state under policy
  • — Approximate value (e.g., from a neural network or linear combination)
  • State distribution, usually the on-policy distribution (stationary distribution under ). It weights the error by how often the agent actually visits state .

Why We Need This

Trade-offs in Approximation

With function approximation, we have fewer parameters than states (). This means improving the accuracy in one state usually makes it worse in another. The MSVE tells us which states are more important to get “right” based on the distribution . We accept more error in rarely visited states to achieve lower error in frequently visited ones.

Key Properties

  • Objective: Algorithms like Gradient Descent minimize this error by calculating .
  • The Ideal Goal: In the tabular case, . In approximation, we seek a global (or local) minimum.
  • Challenge: In RL, we don’t actually know the true (the “target”). Algorithms like TD replace it with a bootstrapped estimate, which changes the optimization landscape.

Connections

Appears In