Study Notes

❯

❯

Exploring Starts

Exploring Starts

Mar 20, 20261 min read

tabular-methods

Exploring Starts

Exploring Starts

The assumption that every episode begins with a randomly chosen state-action pair $(S_{0}, A_{0})$ , with every pair having non-zero probability. This guarantees that all state-action pairs will be visited infinitely often.

Used in Monte Carlo ES (Exploring Starts) algorithm for MC control
Ensures sufficient exploration for convergence to optimal policy
Unrealistic in practice — can’t always control the starting state (e.g., in real-world environments)
Alternatives: Epsilon-Greedy Policy (on-policy), Importance Sampling (off-policy)

Appears In

RL-L03 - Monte Carlo Methods, RL-Book Ch5 - Monte Carlo Methods

Graph View

Exploring Starts
Appears In

Backlinks

Every-Visit MC
Exploration vs Exploitation
Monte Carlo Control
Monte Carlo Methods
On-Policy Learning
RL-Book Ch5 - Monte Carlo Methods
RL-L03 - Monte Carlo Methods
RL-L14 - Recap
RL - Overview

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community