Expected SARSA

Expected SARSA Update

Between SARSA and Q-Learning

Instead of sampling a single next action (SARSA) or taking the max (Q-Learning), Expected SARSA takes the expectation over all possible next actions under the current policy. This reduces variance compared to SARSA while being more general than Q-learning.

  • With a greedy target policy: Expected SARSA = Q-learning
  • With an ε-greedy target policy: Expected SARSA accounts for the exploration probability
  • Generally lower variance than SARSA, slightly more computation per step

Appears In