Gaussian Policy

Definition

A Gaussian policy is a stochastic policy for continuous action spaces that models the action distribution as a multivariate Gaussian (normal) distribution:

where:

  • is the mean (mean action), parameterized by
  • is the covariance matrix (controls exploration magnitude)
  • is the continuous action

Intuition

For continuous control (e.g., robot joint angles, continuous force), we need a policy that:

  • Learns a preferred action (the mean)
  • Maintains uncertainty/exploration around that mean
  • Adjusts both mean and variance based on state

Gaussian policy naturally provides this: it’s differentiable, supported on , and captures both exploitation (mean) and exploration (variance).

Mathematical Formulation

Probability Density

For a diagonal Gaussian (common simplification):

Log-Policy (for gradient computation)

Gradient w.r.t. Mean

Interpretation: Update mean in direction of the action error, scaled by inverse variance.

Gradient w.r.t. Variance

This shows variance should increase when actions are far from mean, decrease when close.

Key Properties/Variants

Mean Parameterization

Common choices:

  1. Linear:

    • Simple, interpretable
    • Good for linear relationships
  2. Neural network:

    • Highly expressive
    • Standard for deep RL

Variance Parameterization

  1. Fixed variance: is a hyperparameter, not learned

    • Simpler, faster
    • May require careful tuning
  2. Learned scalar variance: One per dimension

    • Adapts exploration per action dimension
    • Common in practice
  3. State-dependent variance: also learned

    • Maximum flexibility
    • Needs careful initialization
  4. Log-variance: Often parameterize to ensure positivity

Diagonal vs Full Covariance

  • Diagonal (most common):
    • Simpler gradient computation
    • Assumes action dimensions are independent
  • Full covariance: Allows correlation between actions
    • More expressive, more expensive
    • Rarely needed

Connections

Appears In