TD(0)
The simplest Temporal Difference Learning method. One-step TD prediction.
TD(0) Update
- Uses a single step of experience:
- The TD Error:
- Bootstraps from — does not wait for episode end
See Temporal Difference Learning for full details and comparison with MC/DP.