Every-Visit MC

Every-Visit MC

A Monte Carlo prediction method that averages returns from every visit to a state within each episode, not just the first. If state is visited times in an episode, all returns are used.

Intuition

Where First-Visit MC only uses the return from the first time a state is encountered per episode, Every-Visit MC uses all visits. This means a single episode can contribute multiple samples for the same state — yielding more data per episode but with correlated samples.

Algorithm

Algorithm: Every-Visit MC Prediction (estimating V ≈ v_π)
─────────────────────────────────────────────────────────
Input: policy π to evaluate
Initialize:
  V(s) ∈ ℝ arbitrarily, for all s ∈ S
  Returns(s) ← empty list, for all s ∈ S
 
Loop forever (for each episode):
  Generate episode following π: S₀,A₀,R₁, ..., S_{T-1},A_{T-1},R_T
  G ← 0
  Loop t = T-1, T-2, ..., 0:
    G ← γG + R_{t+1}
    Append G to Returns(S_t)          // ← no first-visit check!
    V(S_t) ← average(Returns(S_t))

The only difference from First-Visit MC is the removal of the “unless appears in ” check.

Mathematical Properties

  • Biased for finite samples (returns within the same episode are correlated)
  • Consistent: converges to as the number of episodes → ∞
  • Lower variance per update compared to first-visit when states are revisited often
  • Converges quadratically — slightly faster in practice for many problems

Comparison with First-Visit MC

PropertyFirst-Visit MCEvery-Visit MC
Samples per episode per state1Up to (number of visits)
BiasUnbiasedBiased (finite samples)
Consistency
Data efficiencyLessMore (uses all visits)
ImplementationNeeds visit trackingSimpler

Connections

Appears In