Classifier-Free Guidance
Definition
Classifier-Free Guidance (CFG)
Classifier-free guidance is a technique for conditional sampling from diffusion models that steers the reverse (denoising) process toward samples with a desired property without training a separate classifier . A single noise-prediction network is trained jointly on conditioned inputs and unconditioned inputs (by randomly dropping the condition during training). At sampling time, the two predictions are linearly extrapolated with a guidance weight to amplify the influence of . In the Decision Diffuser, is a return level, a constraint, or a skill, and is a state trajectory.
Intuition
One Network, Two Modes
A diffusion model generates data by starting from pure Gaussian noise and iteratively denoising it. To make that generation conditional (e.g. “produce a high-return trajectory”), the older approach (classifier guidance) trained a separate classifier on noisy data and pushed samples uphill along — but training a classifier on noisy inputs is awkward and adds a second model.
CFG avoids this. During training, the same network sometimes sees the condition and sometimes sees a null token (the condition is “dropped”). So the network learns to denoise both conditionally and unconditionally. At inference, the difference between the conditional and unconditional noise predictions, , implicitly points in the direction — the very direction a classifier would have given. We then over-emphasize that direction by the weight , sharpening how strongly the sample obeys .
Mathematical Formulation
A diffusion model defines a forward (noising) process that gradually corrupts data into noise over steps, and learns to reverse it.
Forward Diffusion (DDPM)
where:
- — clean sample (in Decision Diffuser, a state trajectory )
- — diffusion timestep (not the RL time index)
- — noise variance schedule
- — cumulative signal-retention factor
- — the noise actually injected
The network is trained to predict that injected noise, with the condition randomly replaced by with probability (dropout).
Denoising Training Loss (with condition dropout)
where:
- — noise-prediction network (a temporal U-Net in Decision Diffuser)
- — the conditioning property (return, constraint, skill), projected to a latent via an MLP
- — null / “unconditioned” token
- — drops the condition so the model also learns
At sampling time, the conditional and unconditional predictions are combined into a single guided noise estimate:
Classifier-Free Guided Noise Prediction
where:
- — guided noise used in the reverse step to compute
- — unconditional prediction
- — conditional prediction
- — guidance weight: gives unconditional sampling, gives ordinary conditional sampling, amplifies adherence to (sharper conditioning, less diversity)
Why the Extrapolation Works
Score-matching theory says . By Bayes’ rule , so the gap between conditional and unconditional scores equals the classifier gradient . CFG reconstructs that gradient without a classifier and scales it by , recovering classifier guidance with strength as a special case.
Key Properties / Variants
- No separate classifier: avoids training a noise-robust classifier ; one network handles both conditional and unconditional generation.
- Guidance weight trades fidelity vs diversity: larger produces samples that more strongly satisfy but reduces sample diversity (and can introduce artifacts).
- Condition dropout probability : a hyperparameter; the model must see enough unconditioned examples to learn a usable .
- Composable conditions: because conditioning is just an input to , multiple guidance signals (e.g. several constraints) can be combined — the property the Decision Diffuser exploits to satisfy combinations of constraints, which classifier guidance struggles with.
- vs classifier guidance: the original Diffuser (Janner et al., 2022) uses classifier guidance — it perturbs the reverse process by the gradient of a learned return predictor — and generates state-action pairs; Decision Diffuser switches to CFG and generates states only (with an Inverse Dynamics Model recovering actions).
- Low-temperature sampling is typically combined with CFG at inference: reducing the variance of the predicted noise yields more deterministic, higher-quality plans.
Sampling procedure with CFG inside the reverse diffusion loop:
Algorithm: Conditional Sampling with Classifier-Free Guidance
─────────────────────────────────────────────────────────────
Input: trained ε_θ, condition y, guidance weight ω, schedule {β_k}
Sample x_K ~ N(0, I) # start from pure noise
Loop for k = K, K-1, ..., 1:
# two forward passes through the SAME network
ε_cond ← ε_θ(x_k, y, k) # conditional prediction
ε_uncond ← ε_θ(x_k, ∅, k) # unconditioned prediction
# classifier-free guided noise (extrapolate)
ε̂ ← ε_uncond + ω * (ε_cond - ε_uncond)
# one reverse (denoising) step using ε̂
x_{k-1} ← reverse_step(x_k, ε̂, k) # optionally low-temperature
end Loop
return x_0 # e.g. a generated state trajectory
# Decision Diffuser then applies inverse dynamics: a_t = f_φ(s_t, s_{t+1})Connections
- Conditioning mechanism used by: Decision Diffuser
- Action recovery after generation: Inverse Dynamics Model
- Sits inside: Offline Reinforcement Learning (generate high-return plans from a fixed dataset)
- Conceptual sibling: classifier guidance (original Diffuser, Janner et al.) — uses an explicit return-predictor gradient
- Related conditioning idea in RL: Decision Transformer conditions on return-to-go via the input sequence rather than diffusion guidance
- Contrast with value-based offline RL: Conservative Q-Learning (CQL)
- Builds on denoising diffusion probabilistic models (DDPM) from generative vision