Study Notes

❯

❯

Experience Replay

Experience Replay

Mar 20, 20261 min read

deep-rl
exam-topic

Experience Replay

Experience Replay

Experience replay stores agent transitions $(S_{t}, A_{t}, R_{t + 1}, S_{t + 1})$ in a buffer $D$ and trains by sampling random mini-batches from this buffer instead of using consecutive online samples.

Benefits:

Breaks correlations: Consecutive samples are highly correlated → bad for SGD. Random sampling decorrelates the training data.
Data efficiency: Each transition can be reused in many updates.
Distribution smoothing: Averages over many past behavior policies.

Limitations:

Memory-intensive (large buffer needed)
Off-policy by nature (old data from previous policies)
Not suitable for on-policy methods without modifications

Appears In

RL-L08 - Deep RL Value-Based, Deep Q-Network (DQN)

Graph View

Experience Replay
Appears In

Backlinks

Deadly Triad
Deep Q-Network (DQN)
Deep Recurrent Q-Learning
Deep Reinforcement Learning
Dyna
Neural Network Function Approximation
Off-Policy Learning
Q-Learning
Soft Actor-Critic (SAC)
RL-Book Ch16 - Applications and Case Studies
RL-L06 - On-Policy TD with Approximation
RL-L08 - Deep RL Value-Based
RL-L11 - SAC, Decision Transformer & Diffuser
RL-L12 - Model-Based RL
RL-L14 - Recap
RL - Overview

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community