Episodic Semi-Gradient Control

Episodic Semi-Gradient Control

Extension of Semi-Gradient Methods to the control setting using action-value approximation .

Semi-Gradient Sarsa Update

Algorithm: Episodic Semi-Gradient Sarsa
───────────────────────────────────────
Initialize w arbitrarily
Loop for each episode:
  S ← initial state; A ← ε-greedy(q̂(S,·,w))
  Loop for each step:
    Take A, observe R, S'
    If S' is terminal:
      w ← w + α[R - q̂(S,A,w)] ∇q̂(S,A,w)
      Go to next episode
    A' ← ε-greedy(q̂(S',·,w))
    w ← w + α[R + γq̂(S',A',w) - q̂(S,A,w)] ∇q̂(S,A,w)
    S ← S'; A ← A'

With linear FA: , so .

Appears In