Experience Replay
Experience Replay
Experience replay stores agent transitions in a buffer and trains by sampling random mini-batches from this buffer instead of using consecutive online samples.
Benefits:
- Breaks correlations: Consecutive samples are highly correlated → bad for SGD. Random sampling decorrelates the training data.
- Data efficiency: Each transition can be reused in many updates.
- Distribution smoothing: Averages over many past behavior policies.
Limitations:
- Memory-intensive (large buffer needed)
- Off-policy by nature (old data from previous policies)
- Not suitable for on-policy methods without modifications