Study Notes

❯

❯

TD(0)

Jun 06, 20261 min read

tabular-methods
key-formula

TD(0)

The simplest Temporal Difference Learning method. One-step TD prediction.

TD(0) Update

$V (S_{t}) \leftarrow V (S_{t}) + α [R_{t + 1} + γV (S_{t + 1}) - V (S_{t})]$

Uses a single step of experience: $(S_{t}, R_{t + 1}, S_{t + 1})$
The TD Error: $δ_{t} = R_{t + 1} + γV (S_{t + 1}) - V (S_{t})$
Bootstraps from $V (S_{t + 1})$ — does not wait for episode end

See Temporal Difference Learning for full details and comparison with MC/DP.

Graph View

Backlinks

RL-Book Ch6 - Temporal-Difference Learning
RL-L14 - Recap

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community