Reparameterization Trick

Reparameterization Trick

A technique for computing gradients through stochastic sampling operations. Instead of sampling (which blocks gradient flow), express the sample as a deterministic, differentiable function of the parameters and independent noise: where .

Intuition

Making Randomness Differentiable

Normally, you can’t backpropagate through a random sampling step. The reparameterization trick moves the randomness into an input variable that doesn’t depend on the parameters . The actual sample becomes a deterministic transformation of , so gradients can flow through .

For Gaussian Policies

If , then:

Now is well-defined:

Application in SAC

In Soft Actor-Critic (SAC), the reparameterization trick enables computing the policy gradient as:

The expectation over doesn’t depend on , so the gradient moves inside the expectation.

Key Properties

  • Enables lower-variance gradient estimates compared to the log-derivative trick (REINFORCE)
  • Works for continuous distributions that can be expressed as transformations of base distributions
  • Standard technique in variational autoencoders (VAEs) and modern deep RL
  • Requires the sampling distribution to be reparameterizable (Gaussian, etc.)

Connections

Appears In