Return

Definition

Return

The return is the total accumulated reward from time step onward. It is the quantity that RL agents seek to maximize (in expectation).

Return (Discounted)

where:

  • — reward received steps after time
  • Discount Factor

Variants

Episodic (undiscounted or discounted):

where is the terminal time step. With , this is just the sum of all remaining rewards.

Continuing (must have ):

Converges as long as and rewards are bounded.

Recursive Property

Recursive Return

This is the key recursive relationship that enables Bootstrapping and the Bellman Equation.

Why This Matters

You don’t need to compute the entire sum from scratch. The return at time equals the immediate reward plus the discounted return from the next step. This decomposition is the foundation of Dynamic Programming and Temporal Difference Learning.

Role in RL Methods

Connections

Appears In