Exploring Starts
Exploring Starts
The assumption that every episode begins with a randomly chosen state-action pair , with every pair having non-zero probability. This guarantees that all state-action pairs will be visited infinitely often.
- Used in Monte Carlo ES (Exploring Starts) algorithm for MC control
- Ensures sufficient exploration for convergence to optimal policy
- Unrealistic in practice — can’t always control the starting state (e.g., in real-world environments)
- Alternatives: Epsilon-Greedy Policy (on-policy), Importance Sampling (off-policy)