Study Notes

❯

❯

Monte Carlo Control

Monte Carlo Control

Jun 06, 20261 min read

tabular-methods

Monte Carlo Control

Finding optimal policies using Monte Carlo Methods. Learns $q_{π} (s, a)$ (not $v_{π} (s)$ ) to enable model-free policy improvement.

Three main variants:

MC with Exploring Starts: Guarantees coverage but unrealistic
On-policy MC (Epsilon-Greedy Policy): Practical, converges to best ε-soft policy
Off-policy MC (Importance Sampling): Learns optimal policy from exploratory data

All follow the Generalized Policy Iteration framework.

See RL-L03 - Monte Carlo Methods for full algorithms.

Graph View

Backlinks

Epsilon-Greedy Policy
Every-Visit MC
Exploration vs Exploitation
On-Policy Learning
RL-L14 - Recap

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community