Epsilon-Greedy Policy
ε-Greedy Policy
An ε-greedy policy selects the greedy action (highest estimated value) with probability , and a uniformly random action with probability .
ε-Greedy Action Probabilities
- Simplest method for Exploration vs Exploitation balance
- ε-greedy is a special case of ε-soft policies (where for all )
- Common to decay over time: high early (explore) → low later (exploit)
Connections
- Used by: SARSA, Q-Learning, Monte Carlo Control, Deep Q-Network (DQN)
- Alternative: Upper Confidence Bound, Boltzmann/softmax