Classifier-Free Guidance

Definition

Classifier-Free Guidance (CFG)

Classifier-free guidance is a technique for conditional sampling from diffusion models that steers the reverse (denoising) process toward samples with a desired property without training a separate classifier . A single noise-prediction network is trained jointly on conditioned inputs and unconditioned inputs (by randomly dropping the condition during training). At sampling time, the two predictions are linearly extrapolated with a guidance weight to amplify the influence of . In the Decision Diffuser, is a return level, a constraint, or a skill, and is a state trajectory.

Intuition

One Network, Two Modes

A diffusion model generates data by starting from pure Gaussian noise and iteratively denoising it. To make that generation conditional (e.g. “produce a high-return trajectory”), the older approach (classifier guidance) trained a separate classifier on noisy data and pushed samples uphill along — but training a classifier on noisy inputs is awkward and adds a second model.

CFG avoids this. During training, the same network sometimes sees the condition and sometimes sees a null token (the condition is “dropped”). So the network learns to denoise both conditionally and unconditionally. At inference, the difference between the conditional and unconditional noise predictions, , implicitly points in the direction — the very direction a classifier would have given. We then over-emphasize that direction by the weight , sharpening how strongly the sample obeys .

Mathematical Formulation

A diffusion model defines a forward (noising) process that gradually corrupts data into noise over steps, and learns to reverse it.

Forward Diffusion (DDPM)

where:

  • — clean sample (in Decision Diffuser, a state trajectory )
  • — diffusion timestep (not the RL time index)
  • — noise variance schedule
  • — cumulative signal-retention factor
  • — the noise actually injected

The network is trained to predict that injected noise, with the condition randomly replaced by with probability (dropout).

Denoising Training Loss (with condition dropout)

where:

  • — noise-prediction network (a temporal U-Net in Decision Diffuser)
  • — the conditioning property (return, constraint, skill), projected to a latent via an MLP
  • — null / “unconditioned” token
  • — drops the condition so the model also learns

At sampling time, the conditional and unconditional predictions are combined into a single guided noise estimate:

Classifier-Free Guided Noise Prediction

where:

  • — guided noise used in the reverse step to compute
  • — unconditional prediction
  • — conditional prediction
  • guidance weight: gives unconditional sampling, gives ordinary conditional sampling, amplifies adherence to (sharper conditioning, less diversity)

Why the Extrapolation Works

Score-matching theory says . By Bayes’ rule , so the gap between conditional and unconditional scores equals the classifier gradient . CFG reconstructs that gradient without a classifier and scales it by , recovering classifier guidance with strength as a special case.

Key Properties / Variants

  • No separate classifier: avoids training a noise-robust classifier ; one network handles both conditional and unconditional generation.
  • Guidance weight trades fidelity vs diversity: larger produces samples that more strongly satisfy but reduces sample diversity (and can introduce artifacts).
  • Condition dropout probability : a hyperparameter; the model must see enough unconditioned examples to learn a usable .
  • Composable conditions: because conditioning is just an input to , multiple guidance signals (e.g. several constraints) can be combined — the property the Decision Diffuser exploits to satisfy combinations of constraints, which classifier guidance struggles with.
  • vs classifier guidance: the original Diffuser (Janner et al., 2022) uses classifier guidance — it perturbs the reverse process by the gradient of a learned return predictor — and generates state-action pairs; Decision Diffuser switches to CFG and generates states only (with an Inverse Dynamics Model recovering actions).
  • Low-temperature sampling is typically combined with CFG at inference: reducing the variance of the predicted noise yields more deterministic, higher-quality plans.

Sampling procedure with CFG inside the reverse diffusion loop:

Algorithm: Conditional Sampling with Classifier-Free Guidance
─────────────────────────────────────────────────────────────
Input: trained ε_θ, condition y, guidance weight ω, schedule {β_k}
Sample x_K ~ N(0, I)                       # start from pure noise
 
Loop for k = K, K-1, ..., 1:
    # two forward passes through the SAME network
    ε_cond   ← ε_θ(x_k, y, k)              # conditional prediction
    ε_uncond ← ε_θ(x_k, ∅, k)              # unconditioned prediction
 
    # classifier-free guided noise (extrapolate)
    ε̂ ← ε_uncond + ω * (ε_cond - ε_uncond)
 
    # one reverse (denoising) step using ε̂
    x_{k-1} ← reverse_step(x_k, ε̂, k)      # optionally low-temperature
end Loop
 
return x_0                                 # e.g. a generated state trajectory
# Decision Diffuser then applies inverse dynamics: a_t = f_φ(s_t, s_{t+1})

Connections

  • Conditioning mechanism used by: Decision Diffuser
  • Action recovery after generation: Inverse Dynamics Model
  • Sits inside: Offline Reinforcement Learning (generate high-return plans from a fixed dataset)
  • Conceptual sibling: classifier guidance (original Diffuser, Janner et al.) — uses an explicit return-predictor gradient
  • Related conditioning idea in RL: Decision Transformer conditions on return-to-go via the input sequence rather than diffusion guidance
  • Contrast with value-based offline RL: Conservative Q-Learning (CQL)
  • Builds on denoising diffusion probabilistic models (DDPM) from generative vision

Appears In