Tabular RL
Definition
Tabular RL
Tabular RL refers to reinforcement learning methods used in problems with small, discrete state and action spaces. In these settings, value functions ( or ) can be represented as a literal table (or array) where every state or state-action pair has its own dedicated entry.
Characteristics
- Exact Representation: Every state is unique; updating one doesn’t affect another.
- No Generalization: Learning about one state tells you nothing about a similar state you haven’t visited yet.
- Foundation: These methods provide the theoretical basis (convergence proofs) before moving to Function Approximation.
Main Categories of Methods
- Dynamic Programming (DP): Requires a model. Includes Value Iteration and Policy Iteration.
- Monte Carlo (MC): Model-free. Learns from full returns at the end of episodes.
- Temporal Difference (TD): Model-free. Updates estimates based on other estimates (bootstrapping). Includes Q-Learning and SARSA.
Convergence
In the tabular case, most standard RL algorithms are guaranteed to converge to the optimal value function or given sufficient exploration and decaying learning rates.
The Table is the Map
Imagine the environment as a small grid. Every cell (state) in the grid has a value written in it. When you move, you just erase the old value and write a better one. There’s no “guessing” — just record-keeping.
Connections
- Contrasts with: Deep Reinforcement Learning (Function Approximation)
- Includes: Q-Learning, SARSA, Monte Carlo Methods
- Limitation: Curse of Dimensionality (doesn’t scale to large or continuous spaces)