Every-Visit MC
Every-Visit MC
A Monte Carlo prediction method that averages returns from every visit to a state within each episode, not just the first. If state is visited times in an episode, all returns are used.
Intuition
Where First-Visit MC only uses the return from the first time a state is encountered per episode, Every-Visit MC uses all visits. This means a single episode can contribute multiple samples for the same state — yielding more data per episode but with correlated samples.
Algorithm
Algorithm: Every-Visit MC Prediction (estimating V ≈ v_π)
─────────────────────────────────────────────────────────
Input: policy π to evaluate
Initialize:
V(s) ∈ ℝ arbitrarily, for all s ∈ S
Returns(s) ← empty list, for all s ∈ S
Loop forever (for each episode):
Generate episode following π: S₀,A₀,R₁, ..., S_{T-1},A_{T-1},R_T
G ← 0
Loop t = T-1, T-2, ..., 0:
G ← γG + R_{t+1}
Append G to Returns(S_t) // ← no first-visit check!
V(S_t) ← average(Returns(S_t))The only difference from First-Visit MC is the removal of the “unless appears in ” check.
Mathematical Properties
- Biased for finite samples (returns within the same episode are correlated)
- Consistent: converges to as the number of episodes → ∞
- Lower variance per update compared to first-visit when states are revisited often
- Converges quadratically — slightly faster in practice for many problems
Comparison with First-Visit MC
| Property | First-Visit MC | Every-Visit MC |
|---|---|---|
| Samples per episode per state | 1 | Up to (number of visits) |
| Bias | Unbiased | Biased (finite samples) |
| Consistency | ✓ | ✓ |
| Data efficiency | Less | More (uses all visits) |
| Implementation | Needs visit tracking | Simpler |
Connections
- Core variant of Monte Carlo Methods
- Compared with First-Visit MC — the other main MC prediction variant
- Both lead into Monte Carlo Control when combined with Generalized Policy Iteration
- Extended to action values for Monte Carlo Control with Exploring Starts or ε-greedy policies