Partial Observability
Partial Observability
A setting where the agent cannot directly observe the true state of the environment. Instead, it receives observations that provide incomplete or noisy information about the underlying state. The standard MDP assumption of full state access is violated.
Intuition
Seeing Through a Keyhole
Full observability is like playing chess — you can see the entire board. Partial observability is like playing poker — you can only see your own cards. Your decisions must account for uncertainty about what you can’t see.
Handling Partial Observability
Three approaches (from least to most approximate):
| Approach | Idea | Requirements | Limitations |
|---|---|---|---|
| Belief State | Probability distribution over hidden states | Full model (transitions, observations) | Discrete states only, model needed |
| Predictive State Representation | Predictions about future observations | Core tests | Tabular setting |
| Approximate | Use recent observations as state | Nothing extra | Not optimal, no guarantees |
Approximate Methods in Practice
- Single observation: — simplest, often “good enough”
- Frame stacking: — used in Atari DQN (4 frames)
- Recurrent networks: Deep Recurrent Q-Learning — LSTM maintains internal memory
Practical Reality
In practice, many successful RL systems simply treat observations as states (). With function approximation, there’s typically no guarantee that the features define a Markov state anyway. As long as the system is “close enough” to Markov, this can work well enough.
Connections
- Formalized by POMDP
- Generalizes Markov Decision Process (MDP is a special case with )
- Belief State and Predictive State Representation are exact approaches
- Deep Recurrent Q-Learning is an approximate deep learning approach