Item Selection Bias

Definition

Item Selection Bias occurs when only a subset of items (e.g., top-k results) are displayed, making items outside the displayed set have zero examination probability.

Formally:

where is the cutoff (e.g., 10 results per page).

Intuition

The Problem

In a top-k ranking display:

  • Users can only see and interact with items at ranks
  • Items at ranks are completely hidden
  • Hidden items have zero probability of examination and zero probability of click

Consequence: We have no evidence whether hidden items are relevant or not.

Visual Example

Page 1: Displaying top 10 results
[1] Item A  ← Examined
[2] Item B  ← Examined
[3] Item C  ← Examined
...
[10] Item J ← Examined

[11] Item K ← NOT visible, zero examination
[12] Item L ← NOT visible, zero examination
...

Users would have to scroll/click "next page" to see item K.

Extreme Form of Position Bias

Item selection bias is position bias taken to the extreme:

Bias TypeExam(rank 5)Exam(rank 10)Exam(rank 11)
Position Bias~70%~40%~35%
Item Selection~70%~40%0%

With item selection bias, there’s a hard cutoff at .

Mathematical Formulation

Click Distribution Under Item Selection Bias

For items within display:

where .

For items outside display:

for all .

Selection Mechanism

The selection is deterministic:

  • If : item is shown
  • If : item is hidden

This is NOT random; it’s a hard constraint based on the display mechanism.

Why It’s a Problem

IPS Breaks Down

Standard Inverse Propensity Weighting:

For hidden items:

Even if an item is truly relevant, we can never observe a click. IPS cannot estimate its relevance.

Overlap Violation

A key assumption for IPS is positivity (or overlap):

Item selection bias violates this assumption.

No Counterfactual Evidence

We cannot evaluate how a new policy would perform on hidden items because:

  • We have no interaction data
  • We can’t know if they would be clicked
  • They might be highly relevant but unrepresented in logs

Example: Impact on Learning

Scenario

Logging policy: Shows top 10 results (items 1-10)

Hypothetical new policy: Would rank
[A, X, B, C, ...] 

where X is an item that was hidden (rank > 10)

Question: Would X get clicks?
Answer: No data! Can't know.

Naive IPS: Assign X zero relevance (wrong!)
Correct view: X's relevance is unknown.

Practical Settings

Common in Practice

  1. Web search: First page shows 10 results, rest paginated
  2. E-commerce: Product listings show 20-50 items, pagination for more
  3. Recommendations: Feed shows 10-20 items, “load more” for rest
  4. Ads: Top 3-5 ad slots visible, rest below fold

Severity

Depends on search behavior:

  • Navigational queries (“find www.amazon.com”): Results 2-10 rarely clicked, item selection bias less harmful
  • Ambiguous queries (“books”): Many items relevant, selection bias hides good results
  • Tail queries (“obscure topic”): Limited results, selection bias less relevant

Solutions

Solution 1: Stochastic Policies

Show every item with some probability, avoiding hard cutoff:

Instead of: Top 10 deterministically

Use: Every item has probability p(k) > 0
  p(1) = 0.9 (show with high prob)
  p(2) = 0.9
  ...
  p(10) = 0.5
  p(11) = 0.1 (show occasionally)
  p(12) = 0.01 (show rarely)

Advantage: Every item visible with non-zero probability
Disadvantage: Harms user experience (shows worse results)

Solution 2: Inverse Propensity Weighting with Stochastic Policy

If every item has non-zero visibility probability:

Then standard IPS works:

Advantage: Theoretically sound
Disadvantage: User experience cost

Solution 3: Doubly Robust Estimation

Use the direct method to estimate relevance for unseen items:

Advantage: Handles unseen items via learned model
Disadvantage: Depends on model quality

Solution 4: Pagination & Multi-Step Exposure

Randomize pagination:

Policy A: Top 10 on page 1
Policy B: Items 5-14 on page 1 (offset)
Policy C: Random subset of size 10 on page 1

Collect logs from all → estimate relative propensities

Advantage: Offline data collection
Disadvantage: Complexity in logging

Solution 5: Historical Exploration

Use past logs where different systems showed different items:

System A: Ranked [A, B, C, D, ...]
System B: Ranked [X, A, C, B, ...]

Item X appeared at position 1 in System B
Item X appeared at position > 10 in System A (not visible)

Use System B data to infer X's relevance

Advantage: Non-intrusive
Disadvantage: Requires historical diversity

Identification & Off-Policy Learning

Can We Infer Hidden Items?

Difficult question: Without any interaction data, can we estimate hidden items’ relevance?

Answer: Only with strong modeling assumptions.

Extrapolation via Features

If we have item features, we can train a relevance model:

For hidden items:

Assumption: Feature space generalizes beyond visible items.

Risk: If hidden items have unusual feature distributions, extrapolation fails.

Real-World Implications

Search Engine Perspective

Web search typically shows 10 results per page:

Visible: Results 1-10
Hidden: Results 11-100+

Learning from clicks:
- Results 11+ get zero training signal
- New ranker can't be better than showing top 10
- Innovation limited by what's visible

E-commerce Perspective

Product search shows 24 items per page:

Visible: First 24 products
Hidden: 25+

Click data trains on visible products only.
New ranking might prefer items 25-50,
but no evidence to learn from.

Recommendation Systems

Feed shows 10 items before scrolling:

Visible: Items 1-10 on screen
Hidden: Items 11+ below fold

Cascade effect: Users don't scroll, so 11+ get zero clicks.
But zero clicks ≠ zero relevance.

Mitigation Strategies

Strategy 1: Design for Exploration

Show diverse items in top-k:

Instead of: Pure ranking [A, B, C, ...]
Use: Diverse ranking [A, X (different), B, Y (different), C, ...]

Expose "hidden" good items within top-k.

Strategy 2: Progressive Disclosure

Allow easy access to more items:

Default: Show 10
User can "expand" to see 20, 50, 100
Track clicks at different expansion levels
→ Less selection bias in deeper pages

Strategy 3: Bandit Algorithms

Balance exploitation (show best items) with exploration (show new items):

Strategy 4: Model-Based Augmentation

Combine click data with learned model:

Train relevance model on visible items
Extrapolate to hidden items via features
Use DR estimation to correct

More robust than pure clicks.

Connections

Appears In

  • Unbiased Learning to Rank
  • Search ranking systems
  • E-commerce search
  • Recommendation systems
  • Production ML systems with limited display slots