Inverse Propensity Weighting

Definition

Inverse Propensity Weighting (IPW) is a general technique for obtaining unbiased estimates from selection-biased data. The core idea: weight observations inversely to their probability of being selected/observed.

In the context of Learning to Rank, IPW corrects for Position Bias by reweighting clicks by the inverse of their examination probability:

Intuition

Simple analogy: Imagine surveying people on the street:

  • Rich people are less likely to stop and answer your questions (low propensity)
  • Poor people are more likely to participate (high propensity)
  • Without correction, your survey is biased toward poor respondents
  • Solution: Up-weight the opinions of rich people and down-weight poor people’s opinions
  • Result: An unbiased view of the whole population

In ranking:

  • Clicks at position 1 have high propensity (100% examined)
  • Clicks at position 8 have low propensity (40% examined)
  • Without correction, you overlearn from position 1 and ignore position 8
  • Solution: Down-weight position 1 clicks, up-weight position 8 clicks
  • Result: An unbiased estimate of relevance independent of position

Mathematical Formulation

General IPW

For a selection process where observations are biased by propensity :

This is an unbiased estimator:

IPW for Position Bias

Under the Position-Based Click Model:

The unbiased estimator of relevance:

Or for a ranking:

Inverse Propensity Weights

Concrete weights for standard examination propensities:

RankWeight
11.001.00x
21.001.00x
30.901.11x
40.801.25x
50.701.43x
60.601.67x
70.502.00x
80.402.50x

A click at rank 8 is worth 2.5× a click at rank 1 in terms of relevance evidence.

Key Properties

Unbiasedness

IPW produces an unbiased estimate under the assumed model:

This means:

  • On average, across many replicates, the estimate is correct
  • It doesn’t mean any single estimate is correct
  • It doesn’t mean low variance

Variance Problem

IPW suffers from high variance when propensities are low:

When is small, the weight is large, amplifying noise.

Example:

  • If , weight = 10x
  • A single noisy click at rank 8 becomes a massive signal
  • Variance explodes

Variance-Bias Trade-off

Unbiasedness is a property of the expected value, not the actual estimate quality.

        Biased, Low Var    Unbiased, High Var    Unbiased, Low Var
        ________________   ________________      ________________
            ╱╲                    │                     ╱╲
           ╱  ╲                   │                    ╱  ╲
          ╱    ╲                  │                   ╱    ╲
       __|______|__          _____|_____         ___|______|___
       True Value            True Value          True Value
       (Consistent           (Correct            (Correct & 
        miss)                 on average)        stable)

Practical lesson: Unbiasedness alone is not sufficient. We need both unbiasedness AND low variance.

Practical Considerations

Propensity Clipping

Prevent extreme weights by clipping propensities:

Common choices:

Trade-off: Introduces slight bias but dramatically reduces variance.

Non-Zero Propensities

IPW requires all items to have non-zero propensity:

Problem: In top-k ranking (k=10), items at rank 11+ have zero propensity. Standard IPW fails.

Solutions:

Propensity Estimation Error

IPW depends on accurate propensity estimates .

If estimates are wrong:

  • Weights are wrong
  • Estimates become biased

Estimation error compounds, especially for low-propensity items.

Assumptions

1. Correct User Model

Users must behave according to the assumed model (e.g., Position-Based Click Model).

Violation: If users follow Cascading Position Bias instead, PBM-based IPW fails.

2. Correct Propensity Estimation

Must have accurate estimates of .

How to ensure:

  • Online randomization (gold standard)
  • Intervention harvesting (good if assumptions hold)

3. Overlap / Positivity

All items must have non-zero propensity.

Origin & Connection to Importance Sampling

IPW is essentially Importance Sampling applied to observational data:

In importance sampling, to estimate when samples are from :

In ranking:

  • true relevance distribution
  • observed distribution (biased by position)
  • Weight = inverse propensity

Strengths & Weaknesses

Strengths

✓ Theoretically guaranteed unbiasedness (under assumptions)
✓ Works across any user model (if you model it correctly)
✓ Simple to implement
✓ No learned parameters needed (only propensities)

Weaknesses

✗ High variance from low-propensity items
✗ Sensitive to propensity estimation errors
✗ Fails with zero propensities
✗ Variance-reduction via clipping introduces bias
✗ Can be unstable in practice

Connections

  • Generalization: Doubly Robust Estimation combines IPW with learned models for lower variance
  • Alternative: Click Models provide lower variance but weaker guarantees
  • Causal inference: IPW is a core technique in causal inference for observational studies
  • Importance sampling: IPW is the observational version of importance sampling

Appears In