Gradient-TD Methods
Gradient-TD Methods
Gradient-TD methods (GTD, GTD2, TDC) are true stochastic gradient descent methods that converge even with off-policy data and Linear Function Approximation. They avoid the Deadly Triad by performing gradient descent on a proper objective function (the projected Bellman error or the mean squared TD error).
Why Needed
Semi-Gradient Methods can diverge with off-policy + function approximation. Gradient-TD methods fix this by computing the full gradient (including the gradient through the target).
Key Algorithms
GTD2
GTD2 Update
Uses an auxiliary weight vector to estimate the expected TD error direction.
TDC (TD with gradient Correction)
TDC Update
First term = semi-gradient TD. Second term = correction that makes it a true gradient method.
Trade-offs
- ✅ Converges off-policy with linear FA (solves the deadly triad)
- ❌ Two sets of weights to maintain ( and )
- ❌ Two step sizes to tune ( and )
- ❌ Slower convergence than semi-gradient TD (when semi-gradient converges)
Connections
- Fixes: Deadly Triad (for linear case)
- Alternative to: Semi-Gradient Methods (off-policy)
- Related: LSTD (also solves the same fixed point, but in closed form)