Query Likelihood Model

Query Likelihood Model

The Query Likelihood (QL) model ranks documents by the probability that the document’s language model would generate the query. Instead of asking “is this document relevant?”, it asks “how likely is this query given this document?”

Query Likelihood

$P (q ∣ d) = \prod_{t \in q} P (t ∣ d)$

Log form (for ranking): $lo g P (q ∣ d) = \sum_{t \in q} lo g P (t ∣ d)$

Smoothing

Raw maximum likelihood estimation ( $P (t ∣ d) = \frac{f ( t , d )}{∣ d ∣}$ ) assigns zero probability to terms not in the document. Smoothing fixes this:

Jelinek-Mercer (Linear Interpolation)

$P (t ∣ d) = (1 - λ) \frac{f ( t , d )}{∣ d ∣} + λ P (t ∣ C)$

Dirichlet Prior

$P (t ∣ d) = \frac{f ( t , d ) + μ P ( t ∣ C )}{∣ d ∣ + μ}$

Smoothing Intuition

$P (t ∣ C)$ is the collection language model (background probability)

Smoothing says: “If this term isn’t in the document, fall back to how common it is in the whole collection”

Dirichlet $μ$ : pseudo-count. Large $μ$ → more smoothing (trust the collection more)

Connections

Alternative to: BM25, TF-IDF
Uses: Smoothing
Instance of: Language Model for IR

Study Notes

Explorer

Query Likelihood Model

Query Likelihood Model

Smoothing

Jelinek-Mercer (Linear Interpolation)

Dirichlet Prior

Connections

Appears In

Graph View

Table of Contents

Backlinks