Epsilon-Greedy Policy

ε-Greedy Policy

An ε-greedy policy selects the greedy action (highest estimated value) with probability , and a uniformly random action with probability .

ε-Greedy Action Probabilities

  • Simplest method for Exploration vs Exploitation balance
  • ε-greedy is a special case of ε-soft policies (where for all )
  • Common to decay over time: high early (explore) → low later (exploit)

Connections

Appears In