Inverse Propensity Weighting
Definition
Inverse Propensity Weighting (IPW) is a general technique for obtaining unbiased estimates from selection-biased data. The core idea: weight observations inversely to their probability of being selected/observed.
In the context of Learning to Rank, IPW corrects for Position Bias by reweighting clicks by the inverse of their examination probability:
Intuition
Simple analogy: Imagine surveying people on the street:
- Rich people are less likely to stop and answer your questions (low propensity)
- Poor people are more likely to participate (high propensity)
- Without correction, your survey is biased toward poor respondents
- Solution: Up-weight the opinions of rich people and down-weight poor people’s opinions
- Result: An unbiased view of the whole population
In ranking:
- Clicks at position 1 have high propensity (100% examined)
- Clicks at position 8 have low propensity (40% examined)
- Without correction, you overlearn from position 1 and ignore position 8
- Solution: Down-weight position 1 clicks, up-weight position 8 clicks
- Result: An unbiased estimate of relevance independent of position
Mathematical Formulation
General IPW
For a selection process where observations are biased by propensity :
This is an unbiased estimator:
IPW for Position Bias
Under the Position-Based Click Model:
The unbiased estimator of relevance:
Or for a ranking:
Inverse Propensity Weights
Concrete weights for standard examination propensities:
| Rank | Weight | |
|---|---|---|
| 1 | 1.00 | 1.00x |
| 2 | 1.00 | 1.00x |
| 3 | 0.90 | 1.11x |
| 4 | 0.80 | 1.25x |
| 5 | 0.70 | 1.43x |
| 6 | 0.60 | 1.67x |
| 7 | 0.50 | 2.00x |
| 8 | 0.40 | 2.50x |
A click at rank 8 is worth 2.5× a click at rank 1 in terms of relevance evidence.
Key Properties
Unbiasedness
IPW produces an unbiased estimate under the assumed model:
This means:
- On average, across many replicates, the estimate is correct
- It doesn’t mean any single estimate is correct
- It doesn’t mean low variance
Variance Problem
IPW suffers from high variance when propensities are low:
When is small, the weight is large, amplifying noise.
Example:
- If , weight = 10x
- A single noisy click at rank 8 becomes a massive signal
- Variance explodes
Variance-Bias Trade-off
Unbiasedness is a property of the expected value, not the actual estimate quality.
Biased, Low Var Unbiased, High Var Unbiased, Low Var
________________ ________________ ________________
╱╲ │ ╱╲
╱ ╲ │ ╱ ╲
╱ ╲ │ ╱ ╲
__|______|__ _____|_____ ___|______|___
True Value True Value True Value
(Consistent (Correct (Correct &
miss) on average) stable)
Practical lesson: Unbiasedness alone is not sufficient. We need both unbiasedness AND low variance.
Practical Considerations
Propensity Clipping
Prevent extreme weights by clipping propensities:
Common choices:
Trade-off: Introduces slight bias but dramatically reduces variance.
Non-Zero Propensities
IPW requires all items to have non-zero propensity:
Problem: In top-k ranking (k=10), items at rank 11+ have zero propensity. Standard IPW fails.
Solutions:
- Enforce stochastic policies (show every item with some probability)
- Use Doubly Robust Estimation instead
Propensity Estimation Error
IPW depends on accurate propensity estimates .
If estimates are wrong:
- Weights are wrong
- Estimates become biased
Estimation error compounds, especially for low-propensity items.
Assumptions
1. Correct User Model
Users must behave according to the assumed model (e.g., Position-Based Click Model).
Violation: If users follow Cascading Position Bias instead, PBM-based IPW fails.
2. Correct Propensity Estimation
Must have accurate estimates of .
How to ensure:
- Online randomization (gold standard)
- Intervention harvesting (good if assumptions hold)
3. Overlap / Positivity
All items must have non-zero propensity.
Origin & Connection to Importance Sampling
IPW is essentially Importance Sampling applied to observational data:
In importance sampling, to estimate when samples are from :
In ranking:
- true relevance distribution
- observed distribution (biased by position)
- Weight = inverse propensity
Strengths & Weaknesses
Strengths
✓ Theoretically guaranteed unbiasedness (under assumptions)
✓ Works across any user model (if you model it correctly)
✓ Simple to implement
✓ No learned parameters needed (only propensities)
Weaknesses
✗ High variance from low-propensity items
✗ Sensitive to propensity estimation errors
✗ Fails with zero propensities
✗ Variance-reduction via clipping introduces bias
✗ Can be unstable in practice
Connections
- Generalization: Doubly Robust Estimation combines IPW with learned models for lower variance
- Alternative: Click Models provide lower variance but weaker guarantees
- Causal inference: IPW is a core technique in causal inference for observational studies
- Importance sampling: IPW is the observational version of importance sampling
Appears In
- Unbiased Learning to Rank
- Counterfactual Learning to Rank
- Doubly Robust Estimation
- Position Bias estimation and correction